**Central question: Is there a relation between a country's Gross Domestrict Product (GDP) and its income inequality?**

The Gini coefficient is a measure of the inequality of the income distribution in a population. Higher values
indicate a higher level of inequality.

In [30]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#Import dataframes

gdp_path = '/content/drive/MyDrive/Datasets/gdp-per-capita-maddison-2020.csv'
df_gdp = pd.read_csv(gdp_path)

economic_inequality_path = '/content/drive/MyDrive/Datasets/economic-inequality-gini-index.csv'
df_economic_inequality = pd.read_csv(economic_inequality_path)



In [31]:
#GDP per capita measures the economic output of a nation per person. 
df_gdp
df_gdp.drop(columns='417485-annotations', inplace=True)

In [32]:
#De gini-coëfficiënt is doorgaans een getal tussen 0 en 1 en wordt soms uitgedrukt als percentage. 
#De waarde 0 correspondeert hierbij met volkomen gelijkheid en 1 correspondeert met volkomen ongelijkheid. 

df_economic_inequality

Unnamed: 0,Entity,Code,Year,Gini coefficient
0,Albania,ALB,1996,0.270103
1,Albania,ALB,2002,0.317390
2,Albania,ALB,2005,0.305957
3,Albania,ALB,2008,0.299847
4,Albania,ALB,2012,0.289605
...,...,...,...,...
2120,Zambia,ZMB,2010,0.556215
2121,Zambia,ZMB,2015,0.571361
2122,Zimbabwe,ZWE,2011,0.431536
2123,Zimbabwe,ZWE,2017,0.443371


To be able to answer this question you would want to calculate the "correlation coefficient" of the GDP and the Gini coefficient. But before you can do that you may need to resample the data so a correlation coefficient can be calculated.

In [39]:
#Merge tables on 'Entity' and 'Year'
left_merged = pd.merge(df_gdp, df_economic_inequality, how="left", on=['Entity', 'Year'])
new_df = left_merged[['GDP per capita', 'Gini coefficient']].dropna()
new_df.head(5)

Unnamed: 0,GDP per capita,Gini coefficient
121,3965.685303,0.270103
127,5608.962402,0.31739
130,6858.466797,0.305957
133,8522.129883,0.299847
137,9592.0,0.289605


In [40]:
#Calculate the correlation coefficient

new_df.corr()

Unnamed: 0,GDP per capita,Gini coefficient
GDP per capita,1.0,-0.432258
Gini coefficient,-0.432258,1.0


**Interpretation of the result:**
A Pearson Correlation of -0.43 refers to a low negative correlation. Meaning that the higher the GDP per capita, the lower the Gini coefficient. So in other words, the higher the economic output per capita, the lower the economic inequality in the respective country.  

In [41]:
#Do we have enough data to make this significant (p-value)?

from scipy.stats import pearsonr
pearsonr(new_df['GDP per capita'], new_df['Gini coefficient'])


(-0.4322580717577245, 2.705271601736767e-83)

The above shows the same correlation coefficient (-0,43) together with a p-value of 2.71e-83. When the p-value is below 0.5 we consider it as significant. So in our case we can conclude that there is a significant linear association between the two variables.