# Homework 3 - Immigration, the stock market, and GDP

The objective of this homework is to practice working with Pandas Dataframes. To successfully complete this homework, you may use any resources available to you. 

Answer the following question: What has a higher correlation with the GDP in the US: stock market returns or immigration?

You need to accomplish the following tasks:
1. Install the [wbdata](http://wbdata.readthedocs.io/en/latest/) package for API access to Worldbank data.
2. Explore the databases `Population estimates and projections`, `Global Financial Development`, and `World Development Indicators`.
3. Get the data on `GDP per capita growth (annual %)` as a dataframe.
4. Get the data on `Net immigration` as a dataframe (Make sure that you also have a percentage value for this). 
5. Get the data on `Stock market return (%, year-on-year)` as a dataframe.
5. Explore the data and note the issues. 
5. Clean and combine the data.
6. What is the correlation between the GDP and net immigration and stock market returns. 

In [2]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

1. Install the wbdata package for API access to Worldbank data

In [None]:
pip install wbdata

2. Explore the databases Population estimates and projections, Global Financial Development, and World Development Indicators

In [None]:
import wbdata
import pandas as pd
wbdata.get_source()

In [None]:
wbdata.get_indicator(source=40) # databases of Population estimates and projections

In [None]:
wbdata.get_indicator(source=32) # database of Global Financial Development

In [None]:
wbdata.get_indicator(source=2) # database of World Development Indicators

3. Get the data on GDP per capita growth (annual %) as a dataframe

In [None]:
wbdata.search_indicators("GDP per capita growth") # Search the ID for GDP per capita growth (annual %)
wbdata.search_countries("united") # Search the ID for USA

In [None]:
indicators = {'NY.GDP.PCAP.KD.ZG':''}
countries = ['USA']

#grab indicators above for USA and load into data frame
GDP= wbdata.get_dataframe(indicators, countries, data_date=None, convert_date=False, keep_levels=False)
GDP

4. Get the data on Net immigration as a dataframe (Make sure that you also have a percentage value for this)

In [None]:
wbdata.search_indicators('net migration') # Search the ID for net migration

In [None]:
indicators2 = {'SP.POP.TOTL':''} # Indicator of total population
indicators1 = {'SM.POP.NETM':''} # Indicator of net immigration
countries = ['USA']
# Get net Immigration in dataframe
NI= wbdata.get_dataframe(indicators1, countries, data_date=None, convert_date=False, keep_levels=False)

# Get total population in dataframe
TP= wbdata.get_dataframe(indicators2, countries, data_date=None, convert_date=False, keep_levels=False)

# Get a jointed dataframe of NI & TP
R=NI.join(TP, lsuffix='Net Immigration', rsuffix='Total Population')

# Get the Percentage value of this
R['Ratio']=R['Net Immigration']/R['Total Population']

# Format it into percentage value
for values in R['Ratio']:
    values=round(values * 100, 2)

# Delete column 'Total Population' that is useless
R.drop(['Total Population'],axis=1)


5. Get the data on Stock market return (%, year-on-year) as a dataframe

In [None]:
wbdata.search_indicators('Stock market return') # Search the ID for Stock market return

In [None]:
indicators = {'GFDD.OM.02':''}
countries = ['USA']
#grab indicators above for USA and load into data frame
SMR= wbdata.get_dataframe(indicators, countries, data_date=None, convert_date=False, keep_levels=False)
SMR

6. Explore the data and note the issues

In [None]:
# Combine net immigration, GDP growth and stock market return
C=R.join(GDP, lsuffix='Ratio', rsuffix='GDP')
C=C.join(SMR, lsuffix='GDP', rsuffix='SMR')


# If we observe the data, year 2017 and 1960 have almost all NaN which serve no effect and is ok to drop
C=C.drop(['2017'])
C=C.drop(['1960']) 
C=C.drop(['Total Population'],axis=1)
C

# Check how many nulls in C
C.isnull().sum()
# After we observe the data we can easily find that there are not all the data for the whole year which can influence correlation result

7. Clean and combine the data.

In [None]:
# Drop NaN values that will influence the correlation results
CF=C.dropna()
CF

# Test get_dummies() for fun
#CD=pd.get_dummies(CF, columns=['SMR'])
#CD

8. What is the correlation between the GDP and net immigration and stock market returns.

In [None]:
# Calculate the correlation between net immigration percentage and stock market returns
CF['Ratio'].corr(CF['SMR'])

In [None]:
# Calculate the correlation between GDP and stock market returns
CF['GDP'].corr(CF['SMR'])

In [None]:
# Calculate the correlation between GDP and net immigration percentage
CF['GDP'].corr(CF['Ratio'])

In [None]:
# From those three data above, Ratio and Stock Market Returns have the highest correlation value around 0.43 which is still less than 1
# Based on this, I don't think there is a strong connection between Net Immigraiton percentage, GDP and stock market status