# American Lawful Immigration 2018 - New LPRs by State of Residence

<hr>

Legal Permanent Residents (LPRs) are non-citizens who are lawfully authorized to live permanently within the United States. Let's explore data from the U.S. Department of Homeland Security to highlight the states in which people who received a green card in 2018 settled.

## Import required libraries

In [1]:
import pandas as pd
import plotly.express as px
import plotly.io as pio
from IPython.display import Javascript

Javascript(
"""require.config({
 paths: { 
     plotly: 'https://cdn.plot.ly/plotly-latest.min'
 }
});"""
)

pio.renderers.default = 'notebook_connected'

## Data pre-processing

To map the figures of american new Legal Permanent Residents by state of residence in 2018, we need to merge three datasets :

* The Persons Obtaining Legal Permanent Resident Status by State or Territory of Residence dataset, which is our main dataset containing among other things for each state of residence the number of new LPRs in 2018;
* The US State Abbreviations dataset, which allows us to merge, for each state of residence, the associated state abreviation;
* The US Population 2018 dataset, which gives us the numbers of inhabitants of each US state in 2018.

### LPRs 2018 by state of residence dataset

In [2]:
df = pd.read_csv('Data/Persons_Obtaining_Lawful_Permanent_Resident_Status_by_State_or_Territory_of_Residence.csv', 
                 sep=';', 
                 skiprows=4)

# Cleaning
df.drop(df.iloc[:, 1:10], inplace = True, axis = 1)
df.columns = ['State or territory of residence', 'LPRs 2018']
df.dropna(inplace=True)
df['LPRs 2018'] = df['LPRs 2018'].str.replace(" ", "")
df['LPRs 2018'] = df['LPRs 2018'].astype('int')

df.head()

Unnamed: 0,State or territory of residence,LPRs 2018
0,Alabama,3737
1,Alaska,1375
2,Arizona,18335
3,Arkansas,3000
4,California,200897


### US State Abbreviations dataset

In [3]:
df_state = pd.read_csv('Data/US_State_Abbreviations.csv', 
                       sep=';')

df_state.rename(columns = {'State':'State or territory of residence'}, inplace = True)

df_state.head()

Unnamed: 0,State or territory of residence,Code
0,Alabama,AL
1,Alaska,AK
2,Arizona,AZ
3,Arkansas,AR
4,California,CA


### US Population 2018 dataset

In [4]:
df_pop = pd.read_csv('Data/US_Population_2018.csv', 
                     sep=';', 
                     skiprows=8)

# Cleaning
df_pop.drop(df_pop.iloc[:, 1:11], inplace = True, axis = 1)
df_pop = df_pop.iloc[:, :-1]
df_pop.columns = ['State or territory of residence', 'US Population 2018']
df_pop.dropna(inplace=True)
df_pop['US Population 2018'] = df_pop['US Population 2018'].str.replace(" ", "")
df_pop['US Population 2018'] = df_pop['US Population 2018'].astype('int')

df_pop.head()

Unnamed: 0,State or territory of residence,US Population 2018
0,Alabama,4887681
1,Alaska,735139
2,Arizona,7158024
3,Arkansas,3009733
4,California,39461588


## Merging

Now our three datasets are clean, they can be merged. To this, we add two columns that will make the data more meaningful :
* The percentage of new LPRs 2018;
* The new LPRs 2018 per 1,000 population.

In [5]:
#Merge LPRs 2018 by state of residence and US State Abbreviations datasets
df_merge = pd.merge(df, df_state, on='State or territory of residence')

#Compute and add Percentage of LPRs 2018 column
df_merge['Percentage of LPRs 2018'] = df_merge['LPRs 2018']/df_merge['LPRs 2018'].sum()*100

#Merge US Population 2018 dataset to the previous merge, giving us our final dataset
df_LPRs_2018 = pd.merge(df_merge, df_pop, on='State or territory of residence')

#Compute and add LPRs 2018 per 1,000 population column
df_LPRs_2018['LPRs 2018 per 1,000 population'] = df_LPRs_2018['LPRs 2018']/df_LPRs_2018['US Population 2018']*1000

df_LPRs_2018 = df_LPRs_2018.sort_values('LPRs 2018', ascending=False)
df_LPRs_2018

Unnamed: 0,State or territory of residence,LPRs 2018,Code,Percentage of LPRs 2018,US Population 2018,"LPRs 2018 per 1,000 population"
4,California,200897,CA,18.350529,39461588,5.090951
32,New York,134839,NY,12.316595,19530351,6.904075
9,Florida,130405,FL,11.91158,21244317,6.138347
44,Texas,104515,TX,9.546711,28628666,3.650711
30,New Jersey,54424,NJ,4.97125,8886025,6.124673
13,Illinois,38287,IL,3.497248,12723071,3.009258
21,Massachusetts,33174,MA,3.030212,6882635,4.819956
47,Virginia,27426,VA,2.505172,8501286,3.2261
10,Georgia,26725,GA,2.441141,10511131,2.542543
38,Pennsylvania,26078,PA,2.382042,12800922,2.037197


## Mapping

### New LPRs by state of residence in 2018

First, we want to represent the number of new LPRs by state of residence in 2018 :

In [6]:
fig = px.choropleth(df_LPRs_2018, 
                    locations='Code',
                    color='LPRs 2018',
                    hover_name='State or territory of residence',
                    locationmode="USA-states",
                    scope='usa',
                    labels={'LPRs 2018':'New LPRs'},
                    color_continuous_scale=px.colors.sequential.dense,
                    title="<b>U.S. New LPRs by State of Residence in 2018</b><br>" + 
                    "<i>Source : U.S. Department of Homeland Security</i>"
                   )

# Style
fig.update_layout(
    font_family='Helvetica',
    font_color='grey',
    font_size=12,
    title_font_size=18
)

fig.show()

In [7]:
fig = px.treemap(df_LPRs_2018,
                 title="<b>U.S. New LPRs by State of Residence in 2018</b><br>" + 
                 "<i>Source : U.S. Department of Homeland Security</i>",
                 labels={'LPRs 2018':'New LPRs', 'State or territory of residence':''},
                 path=['State or territory of residence'], 
                 values='LPRs 2018',
                 color = 'LPRs 2018',
                 color_continuous_scale=px.colors.sequential.dense
                )

# Style
fig.update_layout(
    font_family='Helvetica',
    font_color='grey',
    font_size=12,
    title_font_size=18
)

fig.show()

### New LPRs by state of residence per 1,000 population in 2018

Now, we want to represent the number of new LPRs by state of residence per 1,000 population in 2018 :

In [8]:
df_LPRs_1000_population = df_LPRs_2018.sort_values('LPRs 2018 per 1,000 population', ascending=False)
df_LPRs_1000_population

Unnamed: 0,State or territory of residence,LPRs 2018,Code,Percentage of LPRs 2018,US Population 2018,"LPRs 2018 per 1,000 population"
32,New York,134839,NY,12.316595,19530351,6.904075
9,Florida,130405,FL,11.91158,21244317,6.138347
30,New Jersey,54424,NJ,4.97125,8886025,6.124673
4,California,200897,CA,18.350529,39461588,5.090951
21,Massachusetts,33174,MA,3.030212,6882635,4.819956
40,Rhode Island,4336,RI,0.396063,1058287,4.097187
20,Maryland,24301,MD,2.219726,6035802,4.026143
8,District of Columbia,2775,DC,0.253477,701547,3.955544
11,Hawaii,5430,HI,0.495992,1420593,3.822347
44,Texas,104515,TX,9.546711,28628666,3.650711


In [9]:
fig = px.choropleth(df_LPRs_1000_population, 
                    locations='Code',
                    color='LPRs 2018 per 1,000 population',
                    hover_name='State or territory of residence',
                    locationmode="USA-states",
                    scope='usa',
                    labels={'LPRs 2018 per 1,000 population':'New LPRs per<br>1,000 inhabitants'},
                    color_continuous_scale=px.colors.sequential.matter,
                    title="<b>U.S. New LPRs by State of Residence per 1,000 inhabitants in 2018</b><br>" + 
                    "<i>Source : U.S. Department of Homeland Security</i>"
                   )

# Style
fig.update_layout(
    font_family='Helvetica',
    font_color='grey',
    font_size=12,
    title_font_size=18
)

fig.show()

In [10]:
fig = px.treemap(df_LPRs_1000_population,
                 title="<b>U.S. New LPRs by State of Residence per 1,000 inhabitants in 2018</b><br>" + 
                 "<i>Source : U.S. Department of Homeland Security</i>",
                 labels={'LPRs 2018 per 1,000 population':'New LPRs per<br>1,000 inhabitants', 'State or territory of residence':''},
                 path=['State or territory of residence'], 
                 values='LPRs 2018 per 1,000 population',
                 color = 'LPRs 2018 per 1,000 population',
                 color_continuous_scale=px.colors.sequential.matter
                )

# Style
fig.update_layout(
    font_family='Helvetica',
    font_color='grey',
    font_size=12,
    title_font_size=18
)

fig.show()

<hr>

## Sources

* [U.S. Department of Homeland Security](https://www.dhs.gov/immigration-statistics)
* [U.S. Census Bureau](https://www.census.gov/)