# Study case: The affect of net migration on the US and Romania

Imports and set magics:

In [8]:
# a. import packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
from pandas_datareader import wb

# b. autoreload modules when code is run
%load_ext autoreload
%autoreload 2

# c. user written modules
from plot_function import *


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Introduction

In this project we would like to analyse the impacts of net migration on 4 different variables: GDP, employment rate, labour force and wage. We took on this project because migration is a big debate in the majority of rich countries, and especially with the rise of the far-right in Europe.

We chose to study the USA, since it is a net imigration country, and Romania, since it is a net emigration country. We decided to use the data from 1991 - the first year we had all information - until 2019, to not take into account the pandemic. 

# Read and clean data

In [9]:
#setup period
start_year = 1991
end_year = 2019

We downloaded each dataframe and renamed the variable.

In [10]:
# a. migration
df_migration = wb.download(indicator='SM.POP.NETM', country=[ 'USA', 'ROU'], start=start_year, end=end_year)
df_migration = df_migration.rename(columns = {'SM.POP.NETM':'Net Migration'})

# b. GDP
df_gdp = wb.download(indicator='NY.GDP.MKTP.CD', country=[ 'USA', 'ROU'], start=start_year, end=end_year)
df_gdp = df_gdp.rename(columns = {'NY.GDP.MKTP.CD':'GDP'})

# c. employment rate
df_employ = wb.download(indicator='SL.EMP.TOTL.SP.ZS', country=[ 'USA', 'ROU'], start=start_year, end=end_year)
df_employ = df_employ.rename(columns = {'SL.EMP.TOTL.SP.ZS':'Employment Rate'})

# d. labor force
df_labour = wb.download(indicator='SL.TLF.TOTL.IN', country=[ 'USA', 'ROU'], start=start_year, end=end_year)
df_labour = df_labour.rename(columns = {'SL.TLF.TOTL.IN':'Labor Force'})

# e. wage
df_wage = wb.download(indicator='SL.EMP.WORK.ZS', country=[ 'USA', 'ROU'], start=start_year, end=end_year)
df_wage = df_wage.rename(columns = {'SL.EMP.WORK.ZS':'Wage'})

# f. resetting indexes and column type
df_migration = df_migration.reset_index().astype({'year': int, 'country': 'string'})
df_gdp = df_gdp.reset_index().astype({'year': int, 'country': 'string'})
df_employ = df_employ.reset_index().astype({'year': int, 'country': 'string'})
df_labour = df_labour.reset_index().astype({'year': int, 'country': 'string'})
df_wage = df_wage.reset_index().astype({'year': int, 'country': 'string'})

In [11]:
# a. create a list with the dataframes
df_list = [df_gdp, df_employ, df_labour, df_wage]


# b. merge all dataframes together
df = df_migration

for dtf in df_list:
    df = pd.merge(df, dtf, how = 'outer', on = ['country','year'],)


## Explore each data set

In our first interactive graph we decided to plot net migration and other data for each country to see a common trend.

In [12]:
# a. plot
plot(df,plot_func)

interactive(children=(Dropdown(description='Country:', options=('Romania', 'United States'), value='Romania'),…

As we can already, some graphs seem to have no correlation at all.

After, we decided to make a scatter plot with the net migration against other data in a specific country.

In [13]:
# a. plot
plot(df,scatter_func)

interactive(children=(Dropdown(description='Country:', options=('Romania', 'United States'), value='Romania'),…

The graphs show that there is more correlation in the US comparing to Romania. It could also suggest that migration happens for other reasons in US than in Romania.

# Analysis

In [14]:
# a. sepparate US and Romania data
df_US = df[df["country"] == "United States"]
df_Rom = df[df["country"] == "Romania"]

# b. select the correct columns
var_list = df.columns.tolist()[3:]

# c. create a dictionary with all combinations of countris and measures
dict_var = {"country":["United States", "United States", "Romania", "Romania"], "measure":["corr","R^2 (%)","corr","R^2 (%)"]}

# d. adding the variables to the dictionary
for data in var_list:

    # i. calculating correlation between migration and data for each country 
    US_data = df_US["Net Migration"].corr(df_US[data])
    Rom_data = df_Rom["Net Migration"].corr(df_Rom[data])

    # ii. adding the correlation and coeficient of determination to the dictionary
    dict_var[data] = [f"{US_data:.3f}" , f"{100 * US_data**2:.0f}", f"{Rom_data:.3f}", f"{100 * Rom_data**2:.0f}"]

# e. create dataframe with data
corr_table = pd.DataFrame(dict_var).set_index(["country","measure"])

corr_table

Unnamed: 0_level_0,Unnamed: 1_level_0,GDP,Employment Rate,Labor Force,Wage
country,measure,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
United States,corr,-0.67,0.376,-0.748,-0.69
United States,R^2 (%),45.0,14.0,56.0,48.0
Romania,corr,0.251,0.06,-0.105,0.378
Romania,R^2 (%),6.0,0.0,1.0,14.0


As we can see, our correlation results about GDP are inconclusive, since the US has a negative correlation, meaning that an increase in net migration is normally accompanied by a decrease in GDP, while Romania has a small, but positive correlation, showing the opposite effect. The coefficient of determination of Romania is really small, meaning the variation of the net migration rate can only explain 6% of the variation in the GDP. On the other hand, the US's net migration rate can explain 45% of the GDP variation.

The employment percentage between both countries have a posisitive correlation, although very small in both, meaning an increase of the net migration rate, the employment rate increases too. But, this correlation only explains 14% of the variation in the US and 0% of the variation in Romania, therefore not being correlated.

Labor force in both countries has a negative correlation, meaning a bigger net migration rate causes a negative influence in the labor force in a country. In the US, the variation in the net migration rate explains 56% of the variation in the labor force, while in Romania, it only explains 1%.

Finally, the wage in both countries has a inconclusive correlation. The United States show a negative correlation, meaning a bigger net migration rate decreases wage, while the opposite happens in Romania. In the US, the wage variation can only be explained by 48% of the variation in the net migration rate, while in Romania, it only explains 14%.

# Conclusion

We can conlude that net migration rate does not explain much of GDP, employment, labor force and wage in the US and in Romania. Therefore, it is not a good way of predicting all of those variables, and other variables might be more closely related to them.

The labor force in the US has the best correlation with net migration, of 56%, but even then we can't sustain the claim that migration is bad, since correlation is not causation.