# Correlation between GDP and S&P500
In this exercise, you want to analyze stock returns from the S&P 500. You believe there may be a relationship between the returns of the S&P 500 and the GDP of the US. Merge the different datasets together to compute the correlation.

Two tables have been provided for you, named `sp500`, and `gdp`. As always, `pandas` has been imported for you as `pd`.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
path=r'/media/documentos/Cursos/Data Science/Python/Data_Science_Python/data_sets/'

gdp=pd.read_csv(path+'WorldBank_GDP.csv',usecols = ['Country Code','Year','GDP'])
print('gdp \n',gdp.head(),'\n')

sp500=pd.read_csv(path+'S&P500.csv')
print('sp500 \n',sp500.head(),'\n')



gdp 
   Country Code  Year           GDP
0          CHN  2010  6.087160e+12
1          DEU  2010  3.417090e+12
2          JPN  2010  5.700100e+12
3          USA  2010  1.499210e+13
4          CHN  2011  7.551500e+12 

sp500 
    Date  Returns
0  2008   -38.49
1  2009    23.45
2  2010    12.78
3  2011     0.00
4  2012    13.41 



- Use `merge_ordered()` to merge `gdp` and `sp500` using a left join on `year` and `date`. Save the results as `gdp_sp500`.
- Print `gdp_sp500` and look at the returns for the year 2018.

In [5]:
# Use merge_ordered() to merge gdp and sp500 on year and date
gdp_sp500 = pd.merge_ordered(gdp, sp500, left_on='Year', right_on='Date', how='left')

# Print gdp_sp500
print(gdp_sp500)

   Country Code  Year           GDP    Date  Returns
0           CHN  2010  6.087160e+12  2010.0    12.78
1           DEU  2010  3.417090e+12  2010.0    12.78
2           JPN  2010  5.700100e+12  2010.0    12.78
3           USA  2010  1.499210e+13  2010.0    12.78
4           CHN  2011  7.551500e+12  2011.0     0.00
5           DEU  2011  3.757700e+12  2011.0     0.00
6           JPN  2011  6.157460e+12  2011.0     0.00
7           USA  2011  1.554260e+13  2011.0     0.00
8           CHN  2012  8.532230e+12  2012.0    13.41
9           DEU  2012  3.543980e+12  2012.0    13.41
10          JPN  2012  6.203210e+12  2012.0    13.41
11          USA  2012  1.619700e+13  2012.0    13.41
12          CHN  2012  8.532230e+12  2012.0    13.41
13          DEU  2012  3.543980e+12  2012.0    13.41
14          JPN  2012  6.203210e+12  2012.0    13.41
15          USA  2012  1.619700e+13  2012.0    13.41
16          CHN  2013  9.570410e+12  2013.0    29.60
17          DEU  2013  3.752510e+12  2013.0   

- Use `merge_ordered()`, again similar to before, to merge `gdp` and `sp500` use the function's ability to interpolate missing data to forward fill the missing value for returns, assigning this table to the variable `gdp_sp500`.

In [6]:
# Use merge_ordered() to merge gdp and sp500 on year and date
gdp_sp500 = pd.merge_ordered(gdp, sp500, left_on='Year', right_on='Date', how='left',fill_method='ffill')

# Print gdp_sp500
print(gdp_sp500)

   Country Code  Year           GDP  Date  Returns
0           CHN  2010  6.087160e+12  2010    12.78
1           DEU  2010  3.417090e+12  2010    12.78
2           JPN  2010  5.700100e+12  2010    12.78
3           USA  2010  1.499210e+13  2010    12.78
4           CHN  2011  7.551500e+12  2011     0.00
5           DEU  2011  3.757700e+12  2011     0.00
6           JPN  2011  6.157460e+12  2011     0.00
7           USA  2011  1.554260e+13  2011     0.00
8           CHN  2012  8.532230e+12  2012    13.41
9           DEU  2012  3.543980e+12  2012    13.41
10          JPN  2012  6.203210e+12  2012    13.41
11          USA  2012  1.619700e+13  2012    13.41
12          CHN  2012  8.532230e+12  2012    13.41
13          DEU  2012  3.543980e+12  2012    13.41
14          JPN  2012  6.203210e+12  2012    13.41
15          USA  2012  1.619700e+13  2012    13.41
16          CHN  2013  9.570410e+12  2013    29.60
17          DEU  2013  3.752510e+12  2013    29.60
18          JPN  2013  5.155720

- Subset the `gdp_sp500` table, select the `gdp` and `returns` columns, and save as gdp_returns.
Print the correlation matrix of the gdp_returns table.

In [9]:
# Use merge_ordered() to merge gdp and sp500, interpolate missing value
gdp_sp500 = pd.merge_ordered(gdp, sp500, left_on='Year', right_on='Date', 
                             how='left',  fill_method='ffill')

# Subset the gdp and returns columns
gdp_returns = gdp_sp500[['GDP','Returns']]

# Print gdp_returns correlation
print (gdp_returns.corr())

              GDP   Returns
GDP      1.000000  0.040669
Returns  0.040669  1.000000
