# Project Goals

## Double Variable Association
Accounting for Year
- Trendline in Life Expectancy
- Trendline in GDP

Accounting for Country
- Correlation with Life Expectancy
- Correlation with GDP

## Triple Variable Association
Accounting for Year
- Correlation of Life Expectancy and GDP

Accounting for Country
- Correlation of Life Expectancy and GDP

# Load the Data

In [109]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import math
import numpy as np

%matplotlib notebook

In [2]:
df = pd.read_csv('all_data.csv')

In [3]:
df.head()

Unnamed: 0,Country,Year,Life expectancy at birth (years),GDP
0,Chile,2000,77.3,77860930000.0
1,Chile,2001,77.3,70979920000.0
2,Chile,2002,77.8,69736810000.0
3,Chile,2003,77.9,75643460000.0
4,Chile,2004,78.0,99210390000.0


# Explore and Explain Data


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 96 entries, 0 to 95
Data columns (total 4 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   Country                           96 non-null     object 
 1   Year                              96 non-null     int64  
 2   Life expectancy at birth (years)  96 non-null     float64
 3   GDP                               96 non-null     float64
dtypes: float64(2), int64(1), object(1)
memory usage: 3.1+ KB


In [8]:
for col in df.columns:
    print(df[col].describe(), '\n')

count        96
unique        6
top       Chile
freq         16
Name: Country, dtype: object 

count      96.000000
mean     2007.500000
std         4.633971
min      2000.000000
25%      2003.750000
50%      2007.500000
75%      2011.250000
max      2015.000000
Name: Year, dtype: float64 

count    96.000000
mean     72.789583
std      10.672882
min      44.300000
25%      74.475000
50%      76.750000
75%      78.900000
max      81.000000
Name: Life expectancy at birth (years), dtype: float64 

count    9.600000e+01
mean     3.880499e+12
std      5.197561e+12
min      4.415703e+09
25%      1.733018e+11
50%      1.280220e+12
75%      4.067510e+12
max      1.810000e+13
Name: GDP, dtype: float64 



In [9]:
df.Country.unique()

array(['Chile', 'China', 'Germany', 'Mexico', 'United States of America',
       'Zimbabwe'], dtype=object)

## Basic Findings
**Country**
- Unique:
    - Chile
    - China
    - Germany
    - Mexico
    - USA
    - Zimbabwe

**Year**
- Mean:&ensp; 2000
- Max:&emsp;&nbsp;2015

**Life Expectancy**
- Range:&emsp;44.3 - 81.0
- Mean:&emsp; 72.79
- Median:&ensp;76.7

**GDP**
- Range:&emsp;4.4E9 - 1.8E13
- Mean:&emsp;&nbsp;3.88E
- Median:&ensp;1.28E12

## Single Variable Visualisation

In [118]:
def get_basic_dist(outliers=True):
    data_cols = [df['Life expectancy at birth (years)'], df['GDP']]
    data_quantiles = [np.quantile(x, [0.25, 0.75]) for x in data_cols]
    data_within_quantiles = [data_cols[x][(data_cols[x] >= data_quantiles[x][0]) & (data_cols[x] <= data_quantiles[x][1])] for x in range(2)]
    dfs_to_plot = data_cols if outliers == True else data_within_quantiles
    fig, axes = plt.subplots(2, 2, figsize=(8,8))
    for index, col in enumerate(dfs_to_plot):
        sns.histplot(ax=axes[index,0], data=col)
        sns.boxplot(ax=axes[index,1], data=col)
    plt.show()

In [119]:
# With Outliers
get_basic_dist()

<IPython.core.display.Javascript object>

In [120]:
# Without Outliers
get_basic_dist(outliers=False)

<IPython.core.display.Javascript object>

KeyError: 0

## Single Variable Findings


## Double Variable Visualisation

In [101]:
def double_variable_vis(accounting_for, data):
    n_rows = 4 if accounting_for == 'Year' else 2
    n_cols = 4 if accounting_for == 'Year' else 3
    fig, axes = plt.subplots(n_rows, n_cols, figsize=(n_cols*4, n_rows*4))
    for col_index, value in enumerate(df[accounting_for].unique()):
        row = math.floor(col_index / (df[accounting_for].nunique() / n_rows))
        col = col_index % n_cols
        sns.histplot(ax=axes[row,col], data=df[df[accounting_for] == value][data], bins=10)
        axes[row, col].set_title(value)
    for ax in axes.flat:
        ax.label_outer()
    plt.show()

In [102]:
df.groupby(['Country'])[['Life expectancy at birth (years)', 'GDP']].describe()

Unnamed: 0_level_0,Life expectancy at birth (years),Life expectancy at birth (years),Life expectancy at birth (years),Life expectancy at birth (years),Life expectancy at birth (years),Life expectancy at birth (years),Life expectancy at birth (years),Life expectancy at birth (years),GDP,GDP,GDP,GDP,GDP,GDP,GDP,GDP
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,std,min,25%,50%,75%,max
Country,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2
Chile,16.0,78.94375,1.058911,77.3,77.975,79.0,79.825,80.5,16.0,169788800000.0,76878840000.0,69736810000.0,93873030000.0,172997500000.0,244951500000.0,278384000000.0
China,16.0,74.2625,1.318016,71.7,73.4,74.45,75.25,76.1,16.0,4957714000000.0,3501096000000.0,1211350000000.0,1881585000000.0,4075195000000.0,7819550000000.0,11064700000000.0
Germany,16.0,79.65625,0.975,78.0,78.95,79.85,80.525,81.0,16.0,3094776000000.0,667486200000.0,1949950000000.0,2740870000000.0,3396350000000.0,3596078000000.0,3890610000000.0
Mexico,16.0,75.71875,0.620987,74.8,75.225,75.65,76.15,76.7,16.0,976650600000.0,209571600000.0,683648000000.0,763091000000.0,1004376000000.0,1156992000000.0,1298460000000.0
United States of America,16.0,78.0625,0.832566,76.8,77.425,78.15,78.725,79.3,16.0,14075000000000.0,2432694000000.0,10300000000000.0,12100000000000.0,14450000000000.0,15675000000000.0,18100000000000.0
Zimbabwe,16.0,50.09375,5.940311,44.3,45.175,47.4,55.325,60.7,16.0,9062580000.0,4298310000.0,4415703000.0,5748309000.0,6733671000.0,12634460000.0,16304670000.0


In [103]:
double_variable_vis('Country', 'GDP')

<IPython.core.display.Javascript object>

In [104]:
double_variable_vis('Country', 'Life expectancy at birth (years)')

<IPython.core.display.Javascript object>