# Analysis on Employment and Wage Disparities among Immigrants in the United States 

## Introduction
Immigrants in the United States often face significant challenges in the job market. Despite efforts to create fairness, many immigrants deal with lower wages and difficulties in finding employment. Our project explores these issues by analyzing data to reveal the disparities in pay and job opportunities between immigrants and native-born workers.

We specifically examine how age, country of origin, etc influence the annual salaries of immigrants. Through clear and detailed graphs, we highlight the extent of these disparities and the factors behind them. One of our goals is to provide a clear understanding of the employment situation for immigrants.

Additionally, we look into unemployment rates and other factors like education to see how they contribute to wage differences. By exploring these elements, we aim to understand how they contribute to the observed wage gaps.

**Authors**

**·** Paul Stokreef

**·** Ramtin Baschardoust Vagh

**·** Reanu Visser

**·** Mohammed Aouaragh

In [1]:
import plotly.graph_objects as go
import plotly.io as pio
pio.renderers.default = 'notebook'  # or 'notebook_connected'
import pandas as pd

## Dataset and Preprocessing

Initially, our team decided that each member should identify at least one dataset of interest with sufficient data from which at least two perspectives could be derived. During the first team call, each dataset was discussed along with possible correlations. In the end, we chose multiple datasets related to US education, employment, and earnings spanning various years. These datasets were found to have valuable correlations that could be used for a potential topic. After brainstorming during team meetings, we decided to explore wage and employment disparities among immigrants in the United States, as the datasets contained sufficient variables to analyze these perspectives. We sourced our datasets from various organizations, including the OECD, and filtered them to include only information relevant to the United States. 

Variable Descriptions

In terms of variable type and measurement scale, the variables in the final dataset can be classified under several combinations:

    Continuous / Ratio variables:
        Median usual weekly earnings
        Value
        Mean earnings per household

    Discrete / Ordinal variables:
        Educational attainment
        Education level

    Discrete / Nominal variables:
        Country
        Place of birth
        Sex / Gender
        Country of birth
        Country of residence

    Discrete / Interval variables:
        Year

    Discrete / Ratio variables:
        Duration of stay
        Rate

Variables that are currently being used are: Country, Place of birth, Educational attainment, Education level, Sex/Gender, Country of birth, Country of residence, Year, Duration of stay, and Rate.

Relevant United States data gathered from various datasets of https://stats.oecd.org/ (no direct shareable link, see .csv files for the data)

## Visualisations

### Introduction visualisations

For our data story project, "Immigrants Face Many Challenges and Wage/Employment Disparities," we delve into the ongoing struggle immigrants face in achieving equal treatment and opportunities in the workforce. Despite the progress made over the years in many Western countries to enshrine equal rights into their constitutions, immigrants still encounter significant disparities in employment and wages.

The journey toward equality has been a slow march, with incremental developments across decades. While some forms of discrimination, such as those related to access to education or services, are easier to identify and address, wage and employment disparities remain persistent issues. Studies and reports indicate that, even today, immigrants often earn less and face higher unemployment rates compared to native-born citizens. Some argue that non-discriminatory factors such as education, work experience, and language proficiency account for these disparities. However, these explanations do not fully capture the complexity and breadth of the challenges immigrants face.

To gain deeper insights into these disparities, we analyzed data from the U.S. Bureau of Labor Statistics (BLS) report on foreign-born workers in the labor force. The report provides comprehensive data on employment status, occupation, and earnings among immigrants, highlighting the economic realities they navigate.

Firstly, we assessed the overall employment and wage landscape for immigrants. According to the BLS report, in 2022, the median weekly earnings for foreign-born full-time wage and salary workers were 881 dollars, compared to 1,012 dollars for their native-born counterparts. This represents a wage gap of approximately 13%, underscoring the economic challenges faced by immigrants.

Moreover, the report reveals disparities in employment rates. The unemployment rate for foreign-born individuals stood at 5.6%, while it was 4.8% for native-born workers. This difference indicates that immigrants not only earn less but also struggle more with securing employment.

To understand the factors contributing to these disparities, we examined various elements such as educational attainment and citizenship. For instance, while some immigrants possess high levels of education and expertise, they often face difficulties in having their qualifications recognized, leading to underemployment or employment in lower-paying jobs.

This analysis highlights the complex interplay of factors influencing wage and employment disparities among immigrants. Addressing these issues requires targeted policies and interventions that consider the unique challenges immigrants face, from recognition of foreign credentials to providing support for education. By shedding light on these disparities, our data story project aims to contribute to a broader understanding and encourage efforts towards creating a more equitable labor market for all.



### Wage disparities


Immigrants in the United States often face significant wage disparities and employment challenges compared to their native-born counterparts. This can be attributed to various factors including lower median weekly earnings, higher representation in low-wage jobs, and the persistence of gender wage gaps. Immigrants often encounter significant wage disparities in the labor market. 

To illustrate these disparities, we analyzed the median weekly earnings of foreign-born and native-born individuals in the United States for the years 2022 and 2023. The data reveals a consistent pattern where foreign-born workers earn less than native-born workers across different demographic categories.

In [25]:
import pandas as pd
import plotly.express as px

# Load the CSV file
file_path = 'median_usual_weekly_earnings.csv'
data = pd.read_csv(file_path)

# Drop rows that are not relevant (rows with NaN in the earnings columns)
data = data.dropna(subset=['2022_Foreign_Born_Median_Weekly_Earnings', 
                           '2022_Native_Born_Median_Weekly_Earnings', 
                           '2023_Foreign_Born_Median_Weekly_Earnings', 
                           '2023_Native_Born_Median_Weekly_Earnings'])

# Create a long format dataframe suitable for Plotly
data_long = pd.melt(data, 
                    id_vars=['Characteristic'], 
                    value_vars=['2022_Foreign_Born_Median_Weekly_Earnings', 
                                '2022_Native_Born_Median_Weekly_Earnings', 
                                '2023_Foreign_Born_Median_Weekly_Earnings', 
                                '2023_Native_Born_Median_Weekly_Earnings'],
                    var_name='Year_Population', 
                    value_name='Median_Weekly_Earnings')

# Create an interactive line plot with custom color sequence
fig = px.line(data_long, 
              x='Characteristic', 
              y='Median_Weekly_Earnings', 
              color='Year_Population',
              color_discrete_sequence= ['#6CABD4', '#80C088','#00008B','#006400'] ,
              labels={
                  "Median_Weekly_Earnings": "Median Weekly Earnings",
                  "Characteristic": "Characteristic"
              },
              title="Median Weekly Earnings in the US by Characteristic (2022 vs 2023)")

fig.update_layout(
    xaxis_title="Characteristic",
    yaxis_title="Median Weekly Earnings",
    legend_title="Year and Population",
    plot_bgcolor='rgba(255, 255, 0, 0.3)'
)

fig.show()


When considered the overall picture for individuals aged 16 years and over, the statistics show us that
in 2022 the median of the weekly earnings for foreign-born workers was 945 dollars and for the native-born
workers 1087 dollars. For 2023 the figure shows us that the median weekly earnings for foreign-born workers was 987 dollars while the median for native-borns being 1140 dollars. Taking the foreign-born earnings as a percentage of the native-born earnings it will give us a percentage of 86.9% in 2022, and a percentage of 86.6% in 2023. These figures indicate a persistent wage gap, with foreign-born workers earning approximately 86-87% of what 
native-born workers earn.

The persistent wage disparities between foreign-born and native-born workers highlight the challenges immigrants 
face in achieving economic parity. These disparities are evident across different genders and age groups, 
underscoring the need for targeted policies and interventions to support immigrant workers. Actions such 
as recognition of foreign qualifications and reducing discrimination can help bridge this gap and promote greater economic equality (Europian commission, 2022).

### Employment rate

Examining the chances of employment is crucial in understanding the disparities between foreign-born and native-born individuals. Equal employment opportunities are essential for providing immigrants with fair chances in the U.S. economy. By analyzing employment rates across the different groups, we can observe the following trends:

In [18]:
import plotly.express as px
import pandas as pd

# Load the CSV file
file_path_employment = 'us_employmentunemployment.csv'
df_employment = pd.read_csv(file_path_employment)

# Filter data for employment rates
employment_data = df_employment[df_employment['Rate'] == 'Employment rate']

# Define colors for each combination of Gender and Place of birth
color_map = {
    ('Men', 'Native-born'): 'lightblue',
    ('Men', 'Foreign-born'): 'darkblue',
    ('Women', 'Native-born'): 'lightpink',
    ('Women', 'Foreign-born'): 'darkred',
    ('Total', 'Native-born'): 'lightgreen',
    ('Total', 'Foreign-born'): 'darkgreen'
}

# Map the combinations to colors in the dataframe
employment_data['Color'] = employment_data.apply(lambda row: color_map[(row['Gender'], row['Place of birth'])], axis=1)

# Create a scatter plot for employment rate by gender and place of birth
fig_scatter = px.scatter(employment_data, x='Year', y='Value', color='Color', symbol='Place of birth',
                         title='Employment Rate in the US by Gender and Place of Birth (2000-2020)',
                         labels={'Value': 'Employment Rate (%)', 'Year': 'Year'},
                         trendline='ols',  # Adding trendline for Ordinary Least Squares (OLS) regression
                         )

fig_scatter.update_layout(
    plot_bgcolor='rgba(255, 255, 0, 0.3)'
)

fig_scatter.show()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



The graph gives us a new insight about the topic. It shows that the foreign-born men have a higher employement rate than the native-borns. However, the native-born women have a higher employement rate than the ones that are foreign-born. Taking the total it shows us that the employement rates of both groups are almost the same, with even the employement rate of the foreign-born being slightly higher in the last couple of years. This tells us that the chance of finding jobs is almost equal for both groups.

However having the same employement rate, there still is a significant wage gap in between the two groups. Reasons for this gap maybe the representation of the groups in lower paid jobs or the level of education per group. There is also a big difference in employement rate for foreign-born men and woman. This may be because the men are most likely to work while the woman stay most of the time taking care of the home (Women in the States, 2015). 

### Education

It is well known that a higher education will result in a higher paying job. Knowing that there exists a wage gap while the chance of employement is the same, we will look at the education of both groups. Education can influence how much an individual will earn and this may be a cause for the existing wage gap. Visualising the results showed us the following about the education:

In [27]:
import pandas as pd
import plotly.graph_objects as go

# Load the CSV file
file_path = 'Final_Corrected_Educational_Attainment_Data_Correct_Categories.csv'
data = pd.read_csv(file_path)

# Clean the data to extract relevant columns and convert percentages to numeric values
data_cleaned = data[['Label (Grouping)', 'United States!!Native!!Estimate', 'United States!!Foreign born!!Estimate']].copy()
data_cleaned.columns = data_cleaned.columns.str.strip()
data_cleaned['Label (Grouping)'] = data_cleaned['Label (Grouping)'].str.strip()
data_cleaned['United States!!Native!!Estimate'] = data_cleaned['United States!!Native!!Estimate'].str.strip('%').astype(float)
data_cleaned['United States!!Foreign born!!Estimate'] = data_cleaned['United States!!Foreign born!!Estimate'].str.strip('%').astype(float)

# Calculate the median values for each educational level
median_values = data_cleaned.groupby('Label (Grouping)').median().reset_index()

# Ensure the order of educational levels is from low to high
order = [
    'Less than high school graduate', 
    'High school graduate (includes equivalency)',
    "Some college or associate's degree",
    "Bachelor's degree",
    'Graduate or professional degree'
]
median_values['Label (Grouping)'] = pd.Categorical(median_values['Label (Grouping)'], categories=order, ordered=True)
median_values = median_values.sort_values('Label (Grouping)')

# Create the bar chart using Plotly for interactivity
fig = go.Figure()

# Add traces for native and foreign-born individuals
fig.add_trace(go.Bar(
    y=median_values['Label (Grouping)'],
    x=median_values['United States!!Native!!Estimate'],
    name='Native',
    orientation='h',
    marker=dict(color='forestgreen'),
    textposition='auto'
))

fig.add_trace(go.Bar(
    y=median_values['Label (Grouping)'],
    x=median_values['United States!!Foreign born!!Estimate'],
    name='Foreign Born',
    orientation='h',
    marker=dict(color='royalblue'),
    textposition='auto'
))

# Update the layout for the desired visual style
fig.update_layout(
    barmode='group',
    bargap=0.2,
    bargroupgap=0.0,
    plot_bgcolor='rgba(0, 0, 0, 0)',
    paper_bgcolor='rgba(255, 255, 0, 0.3)',
    legend_bgcolor='rgba(0, 0, 0, 0)',
    font=dict(color='black'),
    legend_title_text='Nativity',
    xaxis_title='Percentage',
    yaxis_title='Level of education',
    title='Mean Educational Attainment in the US: Foreign-Born VS Native-Born (2017-2022, Excluding 2020)',
    title_x=0.5,
    xaxis=dict(
        showgrid=True,
        zeroline=True,
        dtick=10,
        title=dict(
            text='Percentage',
            font=dict(size=18)  # Change the font size here
        ),
        tickfont=dict(size=14)  # Change the font size of x-axis ticks
    ),
    yaxis=dict(
        showgrid=True,
        zeroline=True,
        title=dict(
            text='Level of education',
            font=dict(size=18)  # Change the font size here
        ),
        tickfont=dict(size=14)  # Change the font size of y-axis ticks
    )
)

# Show the figure
fig.show()


The graph illustrates a significant disparity in educational attainment between foreign-born and native-born individuals in the United States. Foreign-born individuals tend to have lower levels of education, with a higher percentage not completing high school compared to their native-born counterparts. Native-born individuals tend to have some form of education more often. In higher education, there is a similarity between the groups, but the key difference lies in the fact that most foreign-born individuals lack any education, whereas most native-born individuals have at least some educational background. This educational gap highlights a potential contributing factor to the existing wage gap.

However, when looking at the foreign-borners from another perspective, they can be devided in two groups of individuals: the foreign-born Naturalized citizen (will be referred to as naturalized citizen) and the foreign born who is not a U.S. citizen (will be referred to as non-US citizen). A naturalized citizen is born outside the U.S. and has obtained a U.S. citizenship. To be eligible to apply for a citizenship in the U.S. you have to have been having the U.S. as your place of residence for the past 5 years and have lived in the U.S. for 3 years of those 5 years (CitizenPath, 2024). A non-US citizen is born outside the U.S. and has not obtained a U.S. citizenship. They might meet the requirements and be waiting for a decision on their application or they don't yet meet the requirements and therefore can't apply yet.

To see if being a naturalized citizen might correlate to a higher education than being a non-US citizen, we compared the level of education of both groups with each other. The results show the following:

In [29]:
import pandas as pd
import plotly.graph_objects as go

# Load the CSV file
file_path = 'Final_Corrected_Educational_Attainment_Data_Correct_Categories.csv'
data = pd.read_csv(file_path)

# Clean the data to extract relevant columns and convert percentages to numeric values
data_cleaned = data[['Label (Grouping)', 'United States!!Foreign born; Naturalized citizen!!Estimate', 'United States!!Foreign born; Not a U.S. citizen!!Estimate']].copy()
data_cleaned.columns = data_cleaned.columns.str.strip()
data_cleaned['Label (Grouping)'] = data_cleaned['Label (Grouping)'].str.strip()
data_cleaned['United States!!Foreign born; Naturalized citizen!!Estimate'] = data_cleaned['United States!!Foreign born; Naturalized citizen!!Estimate'].str.strip('%').astype(float)
data_cleaned['United States!!Foreign born; Not a U.S. citizen!!Estimate'] = data_cleaned['United States!!Foreign born; Not a U.S. citizen!!Estimate'].str.strip('%').astype(float)

# Calculate the median values for each educational level
median_values = data_cleaned.groupby('Label (Grouping)').median().reset_index()

# Ensure the order of educational levels is from low to high
order = [
    'Less than high school graduate', 
    'High school graduate (includes equivalency)',
    "Some college or associate's degree",
    "Bachelor's degree",
    'Graduate or professional degree'
]
median_values['Label (Grouping)'] = pd.Categorical(median_values['Label (Grouping)'], categories=order, ordered=True)
median_values = median_values.sort_values('Label (Grouping)')

# Create the bar chart using Plotly for interactivity
fig = go.Figure()

# Add traces for naturalized citizens and non-U.S. citizens
fig.add_trace(go.Bar(
    y=median_values['Label (Grouping)'],
    x=median_values['United States!!Foreign born; Naturalized citizen!!Estimate'],
    name='Naturalized Citizen',
    orientation='h',
    marker=dict(color='lightgreen'),
    textposition='auto'
))

fig.add_trace(go.Bar(
    y=median_values['Label (Grouping)'],
    x=median_values['United States!!Foreign born; Not a U.S. citizen!!Estimate'],
    name='Non-U.S. Citizen',
    orientation='h',
    marker=dict(color='lightcoral'),
    textposition='auto'
))

# Update the layout for the desired visual style
fig.update_layout(
    barmode='group',
    bargap=0.2,
    bargroupgap=0.0,
    plot_bgcolor='rgba(0, 0, 0, 0)',
    paper_bgcolor='rgba(255, 255, 0, 0.3)',
    legend_bgcolor='rgba(0, 0, 0, 0)',
    font=dict(color='black'),
    legend_title_text='Nativity',
    xaxis_title='Percentage',
    yaxis_title='Level of education',
    title='Mean Educational Attainment in the US: Naturalized Citizen VS Non-U.S. Citizen (2017-2022, Excluding 2020)',
    title_x=0.5,
    xaxis=dict(
        showgrid=True,
        zeroline=True,
        dtick=10,
        title=dict(
            text='Percentage',
            font=dict(size=18)  # Change the font size here
        ),
        tickfont=dict(size=14)  # Change the font size of x-axis ticks
    ),
    yaxis=dict(
        showgrid=True,
        zeroline=True,
        title=dict(
            text='Level of education',
            font=dict(size=18)  # Change the font size here
        ),
        tickfont=dict(size=14)  # Change the font size of y-axis ticks
    )
)

# Show the figure
fig.show()


This shows us that there is a disparity in educational attainment between naturalized citizens and non-U.S. citizens. Naturalized citizens apparently have higher levels of education compared to non-U.S. citizens. When we compare these results to the previous findings of foreign-born versus native-born individuals, it becomes a more nuanced picture. While foreign-born individuals, in general, tend to have lower educational attainment than native-born individuals, breaking down the foreign-born group reveals that naturalized citizens are actually more likely to have some form of education or even a high form of education.

One possible explanation for this phenomenon is that naturalized citizens benefit from a more stable legal status, which provides them with access to educational resources and scholarships. Specifically, naturalized citizens can take advantage of in-state tuition rates at several public schools and universities, which offer discounted tuition fees. This support in the field of education enables naturalized citizens to pursue higher education and improve their educational attainment, bridging the gap with native borns in the educational attainment (Admin, 2024).

### Naturalized vs. Non-US Citizen

Now we know that there is a disparity in education between natives and foreigners. We also know that looking at foreigners we can seperate between naturalized citizens and non-citizens. Naturalized citizens have higher education than non-citizens. Keeping this in mind, let us compare the financial success of naturalized, native and non-citizens by looking at their mean incomes per state.

In [28]:
import pandas as pd
import plotly.graph_objects as go

# Load the data
file_path = 'Row_2_without_Foreign_Born_Estimate.csv'
data = pd.read_csv(file_path)

# Extract the relevant columns for plotting
data_to_plot = data.iloc[0].to_frame().reset_index()
data_to_plot.columns = ['State_Earnings', 'Value']

# Split the 'State_Earnings' column to get the state and the type of earnings
data_to_plot[['State', 'Earnings_Type']] = data_to_plot['State_Earnings'].str.extract(r'([A-Za-z\s]+)!!Foreign born; (Naturalized citizen|Not a U.S. citizen)!!Estimate')

# Remove any rows that did not match the pattern
data_to_plot.dropna(subset=['State', 'Earnings_Type'], inplace=True)

# Pivot the dataframe to get a single row per state with two columns for earnings
data_to_plot_pivot = data_to_plot.pivot(index='State', columns='Earnings_Type', values='Value').reset_index()

# Add Native earnings data
native_earnings = data.iloc[0].to_frame().reset_index()
native_earnings.columns = ['State_Earnings', 'Native']
native_earnings[['State', 'Drop']] = native_earnings['State_Earnings'].str.split('!!', n=1, expand=True)
native_earnings = native_earnings[['State', 'Native']]

# Merge with the pivot data
data_to_plot_pivot = pd.merge(data_to_plot_pivot, native_earnings, on='State')

# Convert earnings to numeric values
data_to_plot_pivot['Naturalized citizen'] = data_to_plot_pivot['Naturalized citizen'].str.replace(',', '').astype(float)
data_to_plot_pivot['Not a U.S. citizen'] = data_to_plot_pivot['Not a U.S. citizen'].str.replace(',', '').astype(float)
data_to_plot_pivot['Native'] = data_to_plot_pivot['Native'].str.replace(',', '').astype(float)

# Ensure no duplicate states
data_to_plot_pivot = data_to_plot_pivot.drop_duplicates(subset=['State'])

# Standardize state names to match Plotly's USA-states format
state_abbrev = {
    'Alabama': 'AL', 'Alaska': 'AK', 'Arizona': 'AZ', 'Arkansas': 'AR', 'California': 'CA', 'Colorado': 'CO',
    'Connecticut': 'CT', 'Delaware': 'DE', 'Florida': 'FL', 'Georgia': 'GA', 'Hawaii': 'HI', 'Idaho': 'ID',
    'Illinois': 'IL', 'Indiana': 'IN', 'Iowa': 'IA', 'Kansas': 'KS', 'Kentucky': 'KY', 'Louisiana': 'LA',
    'Maine': 'ME', 'Maryland': 'MD', 'Massachusetts': 'MA', 'Michigan': 'MI', 'Minnesota': 'MN', 'Mississippi': 'MS',
    'Missouri': 'MO', 'Montana': 'MT', 'Nebraska': 'NE', 'Nevada': 'NV', 'New Hampshire': 'NH', 'New Jersey': 'NJ',
    'New Mexico': 'NM', 'New York': 'NY', 'North Carolina': 'NC', 'North Dakota': 'ND', 'Ohio': 'OH', 'Oklahoma': 'OK',
    'Oregon': 'OR', 'Pennsylvania': 'PA', 'Rhode Island': 'RI', 'South Carolina': 'SC', 'South Dakota': 'SD',
    'Tennessee': 'TN', 'Texas': 'TX', 'Utah': 'UT', 'Vermont': 'VT', 'Virginia': 'VA', 'Washington': 'WA',
    'West Virginia': 'WV', 'Wisconsin': 'WI', 'Wyoming': 'WY', 'District of Columbia': 'DC', 'Puerto Rico': 'PR'
}

data_to_plot_pivot['State'] = data_to_plot_pivot['State'].map(state_abbrev)

# Calculate the differences
data_to_plot_pivot['Difference'] = data_to_plot_pivot['Naturalized citizen'] - data_to_plot_pivot['Native']
data_to_plot_pivot['Not US Citizen - Native'] = data_to_plot_pivot['Not a U.S. citizen'] - data_to_plot_pivot['Native']
data_to_plot_pivot['Not US Citizen - Naturalized'] = data_to_plot_pivot['Not a U.S. citizen'] - data_to_plot_pivot['Naturalized citizen']

# Determine the overall min and max earnings across all groups
min_earnings = data_to_plot_pivot[['Native', 'Naturalized citizen', 'Not a U.S. citizen']].min().min()
max_earnings = data_to_plot_pivot[['Native', 'Naturalized citizen', 'Not a U.S. citizen']].max().max()

# Determine the min and max for the differences
min_difference = data_to_plot_pivot[['Difference', 'Not US Citizen - Native', 'Not US Citizen - Naturalized']].min().min()
max_difference = data_to_plot_pivot[['Difference', 'Not US Citizen - Native', 'Not US Citizen - Naturalized']].max().max()

# Create the choropleth traces
fig = go.Figure()

# Create traces for each category
traces = [
    go.Choropleth(
        locations=data_to_plot_pivot['State'],
        z=data_to_plot_pivot['Native'],
        locationmode='USA-states',
        colorscale='Viridis',
        zmin=min_earnings,
        zmax=max_earnings,
        visible=True,
        name="Native",
        colorbar=dict(title="Mean Earnings", tickprefix="$"),
        hovertemplate='<b>%{location}</b><br>Native: $%{z:,.2f}<extra></extra>'
    ),
    go.Choropleth(
        locations=data_to_plot_pivot['State'],
        z=data_to_plot_pivot['Naturalized citizen'],
        locationmode='USA-states',
        colorscale='Viridis',
        zmin=min_earnings,
        zmax=max_earnings,
        visible=False,
        name="Naturalized citizen",
        hovertemplate='<b>%{location}</b><br>Naturalized: $%{z:,.2f}<extra></extra>'
    ),
    go.Choropleth(
        locations=data_to_plot_pivot['State'],
        z=data_to_plot_pivot['Not a U.S. citizen'],
        locationmode='USA-states',
        colorscale='Viridis',
        zmin=min_earnings,
        zmax=max_earnings,
        visible=False,
        name="Not a U.S. citizen",
        hovertemplate='<b>%{location}</b><br>Not a U.S. citizen: $%{z:,.2f}<extra></extra>'
    ),
    go.Choropleth(
        locations=data_to_plot_pivot['State'],
        z=data_to_plot_pivot['Difference'],
        locationmode='USA-states',
        colorscale='RdBu',
        zmid=0,
        zmin=min_difference,
        zmax=max_difference,
        visible=False,
        name="Naturalized - Native",
        colorbar=dict(title="Difference", tickprefix="$"),
        hovertemplate='<b>%{location}</b><br>Difference: $%{z:,.2f}<extra></extra>'
    ),
    go.Choropleth(
        locations=data_to_plot_pivot['State'],
        z=data_to_plot_pivot['Not US Citizen - Native'],
        locationmode='USA-states',
        colorscale='RdBu',
        zmid=0,
        zmin=min_difference,
        zmax=max_difference,
        visible=False,
        name="Not US Citizen - Native",
        colorbar=dict(title="Difference", tickprefix="$"),
        hovertemplate='<b>%{location}</b><br>Not US Citizen - Native: $%{z:,.2f}<extra></extra>'
    ),
    go.Choropleth(
        locations=data_to_plot_pivot['State'],
        z=data_to_plot_pivot['Not US Citizen - Naturalized'],
        locationmode='USA-states',
        colorscale='RdBu',
        zmid=0,
        zmin=min_difference,
        zmax=max_difference,
        visible=False,
        name="Not US Citizen - Naturalized",
        colorbar=dict(title="Difference", tickprefix="$"),
        hovertemplate='<b>%{location}</b><br>Not US Citizen - Naturalized: $%{z:,.2f}<extra></extra>'
    )
]

# Add traces to the figure
for trace in traces:
    fig.add_trace(trace)

# Update layout with buttons and annotations
fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(
                    args=[{'visible': [True, False, False, False, False, False]},
                          {'annotations': []}],
                    label="Native",
                    method="update"
                ),
                dict(
                    args=[{'visible': [False, True, False, False, False, False]},
                          {'annotations': []}],
                    label="Naturalized citizen",
                    method="update"
                ),
                dict(
                    args=[{'visible': [False, False, True, False, False, False]},
                          {'annotations': []}],
                    label="Not a U.S. citizen",
                    method="update"
                ),
                dict(
                    args=[{'visible': [False, False, False, True, False, False]},
                          {'annotations': [
                              dict(
                                  x=0.5,
                                  y=1.15,  # Adjust this value to move the text higher
                                  xref='paper',
                                  yref='paper',
                                  showarrow=False,
                                  text='Red: Native earns more, Blue: Naturalized earns more',
                                  font=dict(
                                      size=12
                                  ),
                                  align='center'
                              )
                          ]}],
                    label="Naturalized - Native",
                    method="update"
                ),
                dict(
                    args=[{'visible': [False, False, False, False, True, False]},
                          {'annotations': [
                              dict(
                                  x=0.5,
                                  y=1.15,  # Adjust this value to move the text higher
                                  xref='paper',
                                  yref='paper',
                                  showarrow=False,
                                  text='Red: Native earns more, Blue: Not a U.S. citizen earns more',
                                  font=dict(
                                      size=12
                                  ),
                                  align='center'
                              )
                          ]}],
                    label="Not US Citizen - Native",
                    method="update"
                ),
                dict(
                    args=[{'visible': [False, False, False, False, False, True]},
                          {'annotations': [
                              dict(
                                  x=0.5,
                                  y=1.15,  # Adjust this value to move the text higher
                                  xref='paper',
                                  yref='paper',
                                  showarrow=False,
                                  text='Red: Naturalized earns more, Blue: Not a U.S. citizen earns more',
                                  font=dict(
                                      size=12
                                  ),
                                  align='center'
                              )
                          ]}],
                    label="Not US Citizen - Naturalized",
                    method="update"
                )
            ]),
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.17,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

# Update layout for better visuals
fig.update_layout(
    title_text='Mean Earnings by State and Citizenship Status',
    geo=dict(
        scope='usa',
        projection_type='albers usa',
        showlakes=True,
        lakecolor='rgb(255, 255, 255)'
    )
)

fig.show()

In this interactive graph the difference in mean incomes between native citizens, naturalized citizens and non-citizens is shown. The first three options on the dropdown menu allow you to view each groups' mean income per state seperately. The scale took the lowest mean income that was found (not seperately, just in general from all groups) as the lower end and did the same for the higher end. By using the same scale for each group it clearly shows which group earns more across the states, as the colors are chosen by the scale. The scale shows the highest mean earnings with yellow/green while the lowest mean earnings are blue/purple. Hovering over the states on either one of these maps will show the mean income for that group per state.

With this information in mind, the native map is mostly on the top half of the scale as it is mostly blue-greenish with some being fairly green. Switching over to the naturalized citizens it turns even greener, which means they are on the high-end and largely outearning native citizens. Then the last of the individual groups, the non-citizens, results in a predominantly blue map showing us the non-citizens severely underperform compared to the other two groups.

To more clearly visualize the difference in mean incomes from these groups, the last three graphs were made. Again, the same scale is maintained across these last three maps for cohesion and clarity. The map 'Naturalized - Native' subtracts the incomes of the native citizens from the naturalized citizens' mean incomes. Then according to the scale it is shown which group earns more in which state. This map either heavily leans to blue states, lightly to blue states or lightly to red states. This indicates naturalized citizens will either noticably outperform native citizens or come close to the native citizens' earnings.

Adhering to the same scale for clarity, the difference between non-citizens and native citizens is shown in the map 'Not US Citizen - Native'. This map subtracts the mean incomes of the native citizens from the naturalized citizens' mean incomes to keep in style with the previous map. Here the map pretty heavily leans to red with barely any state  leaning to blue. This clearly shows non-citizens are noticably underperforming in comparison to native citizens. This is a huge difference compared to the naturalized citizens, who were noticably outperforming the native citizens.

Then lastly the disparity between naturalized citizens and non-citizens is presented in the map 'Not US Citizen - Naturalized' where the mean incomes of naturalized citizens is subtracted from the non-citizens' mean incomes. This map is entirely leaning to (heavily) red. This shows that the largest difference in earnings is between the naturalized citizens and the non-us citizens.

Evaluating the results, the naturalized citizens, who stayed longer and managed to adapt more, on average outperformed the native citizens in earnings. This shows that education plays a huge role in the wage gap, because foreigners with a higher education are earning more than even the native citizens. Therefore the problem seems to be that non-citizens are falling behind in education, in which they will have easier access to once they are a naturalized citizen.

### Duration of stay

The duration of stay is a significant factor influencing employment rates among foreign-born individuals in the US. Initial years are often marked by adjustment challenges, but as the duration of stay increases, so do the employment opportunities, possibility for education and stability. As was just shown, naturalized citizens are the best performing group when looking at mean incomes. Despite this foreigners as a whole still underperform compared to the native citizens. This would mean that most foreigners are non-citizens, which they are. So if non-citizens would apply for citizenship and be accepted and become more educated, data suggests foreigners would start outearning native citizens on average. Because a citizenship requires a 5 year stay in the US, let us see what percentage of foreigners has a duration of stay of 5 years and up and see if this matches our hypothesis.

In [23]:
import plotly.graph_objs as go
import plotly.offline as pyo
import pandas as pd
import plotly.express as px

# Load the CSV file
file_path = 'duration.csv'
df = pd.read_csv(file_path)

# Filter the data for North America and foreign-born individuals
north_america_data = df[(df['Country of birth'] == 'North America') & (df['Place of birth'] == 'Foreign-born')]

# Exclude the 'All durations of stay' category
north_america_data = north_america_data[north_america_data['Duration of stay'] != 'All durations of stay']

# Group by duration of stay and sum the values
duration_summary = north_america_data.groupby('Duration of stay')['Value'].sum().reset_index()

# Calculate the total number of people
total_people = duration_summary['Value'].sum()

# Data for the plot
durations = duration_summary['Duration of stay']
values = duration_summary['Value']

# Create the donut chart
trace = go.Pie(
    labels=durations,
    values=values,
    hole=0.8,  # This creates the hole in the middle of the pie chart to make it a donut chart
    marker=dict(colors=px.colors.qualitative.T10)  # Set the colors for the pie chart
)

layout = go.Layout(
    title='Duration of Stay of Foreign-born Individuals in North America',
    height=600,
    annotations=[dict(
        x=0.5,
        y=-0.2,
        showarrow=False,
        text=f'Total: {total_people:,}',
        xref='paper',
        yref='paper',
        font=dict(size=18)  # Set the font size for the annotation
    )]
)

fig = go.Figure(data=[trace], layout=layout)

fig.update_layout(showlegend=False)
fig.update_traces(textinfo='label+percent', textposition='outside')

# Display the plot
pyo.iplot(fig)


The donut chart shows the duration of stay of foreign-born individuals in the U.S. divided in four groups. Based on the donut chart we can add together the group of 5-10 years and the group of more than 10 years to get a total percentage of 76.43% of foreigners that have been living in the US for 5 or more years. This number could even be higher when looking at the group with an unknown duration of stay, of which a part could also have a longer stay than 5 years.

Despite close to 80% of foreigners having the ability to apply for a citizenship, the split between naturalized citizens and non-citizens is closer to a 55/45 split in favor of naturalized citizens (U.S. Census Bureau. (2022), in contrast to an expected split of closer to 70/30. This means there is still more than 20% of foreigners that could obtain a citizenship and start becoming more educated and on average outperform native citizens. This would diminish the disparity between foreigners as a whole and native citizens, reinforcing the claim that a lack of education plays a very big role, rather than discrimination. However, discrimination can still play a role in the disparity but that is a topic for further discussion.

## Summary


Immigrants in the United States face significant challenges in the job market, including lower wages and difficulties in securing employment compared to native-born workers. Our analysis reveals that these disparities persist despite efforts to ensure fairness. Factors such as age, country of origin, and educational attainment play crucial roles in influencing the annual salaries of immigrants.

Data shows a consistent wage gap between foreign-born and native-born workers. In 2022, the median weekly earnings for foreign-born workers were 945 dollars, while native-born workers earned 1,087 dollars. In 2023, the median weekly earnings for foreign-born workers increased to 987 dollars, while native-born workers earned 1,140 dollar. This results in foreign-born workers earning approximately 86-87% of what native-born workers earn. 

Employment rates also highlight disparities. While foreign-born men have a higher employment rate than native-born men, native-born women have a higher employment rate than foreign-born women. Overall, the employment rates of both groups are nearly equal, with a slight advantage for foreign-born individuals in recent years. However, the wage gap remains significant, suggesting that foreign-born workers are more likely to be employed in lower-paying jobs.

Educational attainment is a critical factor contributing to wage disparities. Foreign-born individuals generally have lower levels of education, with a higher percentage not completing high school compared to native-born individuals. However, when examining foreign-born individuals who are naturalized citizens versus those who are non-citizens, naturalized citizens tend to have higher levels of education and, consequently, higher earnings. Naturalized citizens often outperform native-born workers in terms of earnings, while non-citizens significantly underperform.

The duration of stay in the U.S. also influences employment outcomes. As the length of stay increases, so do employment opportunities and earnings. Naturalized citizens, who have typically resided in the U.S. for at least five years, tend to have better job prospects and higher incomes compared to non-citizens. The duration of stay also directly relates to the chances of getting citizenship and therefore profiting more of the country's resources.

In summary, while immigrants face numerous challenges in the U.S. labor market, targeted policies and interventions focusing on education, recognition of foreign qualifications and focusing on citizenship are essential to bridge the wage gap and promote greater economic equality.


## References

Bureau of Labor Statistics. (2024, May). Forbrn.pdf. \
https://www.bls.gov/news.release/pdf/forbrn.pdf.

U.S. Census Bureau. (2022). ACS 5-Year Estimates Subject Tables: S0501 - Selected Characteristics of the Native and Foreign-Born Populations \
https://data.census.gov/table/ACSST5Y2022.S0501?q=S0501:%20SELECTED%20CHARACTERISTICS%20OF%20THE%20NATIVE%20AND%20FOREIGN-BORN%20POPULATIONS&moe=false

The migrant pay gap: Understanding wage differences between migrants and nationals. (2020, December). Europian commission. \
https://migrant-integration.ec.europa.eu/library-document/migrant-pay-gap-understanding-wage-differences-between-migrants-and-nationals_en

Immigrant women - women in the states. (2015, April 15). Women in the States. \
https://statusofwomendata.org/immigrant-women/

CitizenPath. (2024, February 10). Citizenship requirements for 5-Year permanent resident. \
https://citizenpath.com/citizenship-requirements-5-year-permanent-resident/

Admin. (2024, May 13). Top 12 benefits of having a US citizenship | Passport Legacy. Passport Legacy. \
https://passportlegacy.com/benefits-of-having-a-us-citizenship/

## Reflection

**Peer Feedback**

On June 20th, 2024, our team, J8, met with group J1 for a peer feedback session. During this meeting, each group presented their work from the past three weeks. After the presentations, we provided each other with constructive feedback on what was done well and areas that could be improved.

Group J1 praised our data story for its engaging subject and clear visualizations. They particularly liked our use of line graphs to represent the data, which made it easier to understand the employment and wage disparities among immigrants. They also mentioned that our narrative was well-structured and effectively conveyed the key findings and arguments.

Here are the specific points of feedback from group J1:

    Visualization Variety: They praised our choice of different graph types, which made the data more engaging and accessible. However, they suggested exploring additional visualizations that could further highlight key trends and comparisons.
    Data Preprocessing: While the analysis was thorough, they recommended including more details on the data preprocessing steps to enhance transparency and reproducibility.
    Interactive Elements: Group J1 suggested incorporating more interactive elements into our visualizations to allow users to explore the data in greater depth.

Upon reviewing the feedback, our team agreed that these were valuable suggestions to work on. The day following the feedback session, we implemented the following changes:

    Enhanced Visualizations: We explored and added additional graph types to highlight key trends more effectively.
    Added Captions: We ensured all graphs had descriptive captions to provide context and aid interpretation.
    Detailed Preprocessing: We included a more detailed description of our data preprocessing steps in the notebook to improve transparency.
    Interactive Visuals: We incorporated interactive elements into our visualizations, allowing users to engage with the data more dynamically.

The feedback session with group J1 was invaluable, helping us to refine our data story and enhance the overall quality of our project. We appreciate their constructive suggestions and look forward to incorporating similar collaborative feedback in future projects.

**Self-Reflection**

If we had an extra one or two weeks for this project, our main goal would be to create an integrated model that looks at the various factors influencing employment and wage disparities among immigrants. Right now, we analyze factors like education and place of birth separately. While this gives us useful insights, a more advanced analysis would look at these factors together.

A multifactorial model would let us see how variables like education, gender, country of origin, and how long someone has lived in the U.S. interact and affect employment rates and wages. This approach would provide a clearer picture of how these factors work together and contribute to disparities in the job market. It would also help us determine if employment differences are mainly due to one factor or a combination of several.

Building this comprehensive model would involve using advanced statistical techniques and possibly machine learning algorithms to handle the complexity of multiple interacting variables. This would require careful data preparation and rigorous validation, making it a time-consuming task. However, the insights gained from this integrated analysis would be invaluable for creating targeted policies and interventions to address workforce inequalities.

Additionally, with more time, we would enhance the interactivity and user-friendliness of our visualizations. By adding more dynamic and interactive features, we could make the data more engaging and easier to explore. This would make our analysis more impactful, especially for stakeholders and policymakers who could use these insights to drive meaningful change.

In conclusion, having more time would allow us to develop a deeper understanding of the many factors influencing employment and wage disparities among immigrants. By creating a multifactorial model and improving our visualizations, we would provide richer and more actionable insights to address these complex issues.

## Work Distribution

Work Distribution

Our team, consisting of Reanu, Ramtin, Mohammed, and Paul, worked closely together through meetings, workshops, and constant communication via messaging apps to ensure smooth collaboration. In our initial meetings, each member presented a dataset along with possible analysis approaches. After careful discussion, we chose the most relevant data and defined the project scope. We then moved on to data cleaning and preprocessing, followed by brainstorming sessions to decide on the best visualizations for our story. Throughout the project, we continually refined our visualizations and storyline, incorporating feedback from peer reviews to improve our final presentation.

Paul

Paul acted as the project coordinator, keeping track of overall progress and making sure we met our deadlines. He handled the initial data cleaning and integration, ensuring the datasets were ready for analysis. Paul also played a major role in designing and developing visualizations, particularly the sunburst chart for educational attainment. Additionally, he led the narrative development, writing several key sections of the report and facilitating team discussions to keep everyone on the same page.

Ramtin

Ramtin was in charge of detailed data preprocessing, making sure all data was accurately cleaned and formatted. He created interactive visualizations using Plotly.js, including the heatmap for employment rates by gender and place of birth. Ramtin also contributed significantly to the analytical sections of the report, offering insights and interpretations of the visual data. He worked on incorporating peer feedback to refine our visualizations and narrative.


Mohammed

Mohammed focused on creating and enhancing visualizations, particularly the treemap for native-born individuals' educational attainment. He played a crucial role in validating the data, ensuring accuracy and consistency. Mohammed also drafted the conclusion and policy recommendations, providing a strong finish to our narrative. He presented our project during peer feedback sessions and integrated the feedback received into the final version.

Reanu

Reanu focused on refining the visualizations and enhancing their clarity and impact. He was instrumental in developing the narrative framework, ensuring that our story was coherent and compelling. But primarily, Reanu was instrumental to the development of the map visualisation. Reanu conducted extensive research to support our arguments, integrating findings from relevant literature into our storyline. He also worked on the overall design and flow of the presentation, making sure it was polished and professional. Additionally, Reanu assisted in finalizing the documentation and presentation materials.