# Introduction

**Data Source:** [United Nations Development Programme - Human Development Report](http://hdr.undp.org/en/data)

**Reader's Guide:** [Annex](http://hdr.undp.org/en/content/hdr-2016-readers-guide)

#  Research Question & Rationale

**Research Question:** Which human development indicators (and corresponding aggregate dimensions)  have the most impact on a country's index score?

**Rationale:** Given budgetary/resource constraints, how might multi-lateral development coalitions (comprised of banks, governments, and non-governmental organizations) prioritize funding initiatives to maximize year-over-year improvements in global citizens' quality of life?

# Dataset Assembly & Cleaning

In [173]:
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

In [194]:
Afghanistan = pd.read_csv('C:\\Users\\beri.e.ndifon\\Downloads\\Afghanistan.csv',encoding='latin-1').transpose()
Albania = pd.read_csv('C:\\Users\\beri.e.ndifon\\Downloads\\Albania.csv',encoding='latin-1').transpose()
Algeria = pd.read_csv('C:\\Users\\beri.e.ndifon\\Downloads\\Algeria.csv',encoding='latin-1').transpose()
Andorra = pd.read_csv('C:\\Users\\beri.e.ndifon\\Downloads\\Andorra.csv',encoding='latin-1').transpose()
Angola = pd.read_csv('C:\\Users\\beri.e.ndifon\\Downloads\\Angola.csv',encoding='latin-1').transpose()
Antigua_Barbuda = pd.read_csv('C:\\Users\\beri.e.ndifon\\Downloads\\Antigua and Barbuda.csv',encoding='latin-1').transpose()
Argentina = pd.read_csv('C:\\Users\\beri.e.ndifon\\Downloads\\Argentina.csv',encoding='latin-1').transpose()
Armenia = pd.read_csv('C:\\Users\\beri.e.ndifon\\Downloads\\Armenia.csv',encoding='latin-1').transpose()
Australia = pd.read_csv('C:\\Users\\beri.e.ndifon\\Downloads\\Australia.csv',encoding='latin-1').transpose()
Austria = pd.read_csv('C:\\Users\\beri.e.ndifon\\Downloads\\Austria.csv',encoding='latin-1').transpose()
Azerbaijan = pd.read_csv('C:\\Users\\beri.e.ndifon\\Downloads\\Azerbaijan.csv',encoding='latin-1').transpose()
Bahamas = pd.read_csv('C:\\Users\\beri.e.ndifon\\Downloads\\Bahamas.csv',encoding='latin-1').transpose()
Bahrain = pd.read_csv('C:\\Users\\beri.e.ndifon\\Downloads\\Bahrain.csv',encoding='latin-1').transpose()
featureset = pd.concat([Afghanistan,Albania,Algeria,Andorra,Angola,Antigua_Barbuda,Argentina,Armenia,Australia,Austria,Azerbaijan,Bahamas,Bahrain],axis=0)

In [254]:
featureset.head(13)

Unnamed: 0,Life expectancy at birth (years),"Adult mortality rate, female (per 1,000 people)","Adult mortality rate, male (per 1,000 people)","Deaths due to malaria (per 100,000 people)","Deaths due to tuberculosis (per 100,000 people)","HIV prevalence, adult (% ages 15-49), total","Infant mortality rate (per 1,000 live births)","Infants lacking immunization, DTP (% of one-year-olds)","Infants lacking immunization, measles (% of one-year-olds)",Public health expenditure (% of GDP),...,Renewable energy consumption (% of total final energy consumption),"Population, total (millions)","Dependency ratio, young age (0-14) (per 100 people ages 15-64)",Median age (years),"Dependency ratio, old age (65 and older) (per 100 people ages 15-64)","Population, ages 1564 (millions)","Population, ages 65 and older (millions)","Population, under age 5 (millions)","Population, urban (%)",Sex ratio at birth (male to female births)
Afghanistan,60.7,238,281,0.2,44.0,0.1,66.3,18,34,2.9,...,10.8,32.5,82.3,17.5,4.6,17.4,0.8,5.0,26.7,1.06
Albania,78.0,50,85,n.a.,0.6,n.a.,12.5,1,2,2.9,...,38.2,2.9,26.9,34.3,18.0,2.0,0.4,0.2,57.4,1.08
Algeria,75.0,84,135,0.0,11.0,0.1,21.9,1,5,5.2,...,0.2,39.7,43.6,27.6,9.1,26.0,2.4,4.6,70.7,1.05
Andorra,81.5,n.a.,n.a.,n.a.,0.8,n.a.,2.1,1,4,6.3,...,n.a.,0.1,n.a.,n.a.,n.a.,n.a.,n.a.,n.a.,85.1,n.a.
Angola,52.7,321,369,100.9,52.0,2.2,96.0,1,15,2.1,...,57.2,25.0,95.2,16.1,4.6,12.5,0.6,4.7,44.1,1.03
Antigua and Barbuda,76.2,108,154,n.a.,3.8,n.a.,5.8,1,2,3.8,...,n.a.,0.1,35.2,30.9,10.4,0.1,0.0,0.0,23.8,1.03
Argentina,76.5,75,154,n.a.,1.4,0.4,11.1,2,5,2.7,...,8.8,43.4,39.4,30.8,17.1,27.7,4.7,3.7,91.8,1.04
Armenia,74.9,70,170,n.a.,4.7,0.2,12.6,3,3,1.9,...,6.6,3.0,26.0,34.6,15.3,2.1,0.3,0.2,62.7,1.13
Australia,82.5,n.a.,n.a.,n.a.,0.2,0.2,3.0,8,7,6.3,...,8.4,24.0,28.2,37.5,22.7,15.9,3.6,1.5,89.4,1.06
Austria,81.6,46,86,n.a.,0.7,n.a.,2.9,7,24,8.7,...,34.5,8.5,21.2,43.2,28.0,5.7,1.6,0.4,66.0,1.06


In [272]:
HDI = pd.read_csv('C:\\Users\\beri.e.ndifon\\Downloads\\HDI.csv',encoding='latin-1',header=1)
HDI = HDI.filter(items=['HDI Rank (2015)','2015'],axis=1)
HDI.columns = ['HDI Rank (2015)','HDI Score (2015)']
HDI = HDI.iloc[:13,:].set_index(featureset.index.values)

In [273]:
HDI.shape

(13, 2)

In [275]:
HDI.head(13)

Unnamed: 0,HDI Rank (2015),HDI Score (2015)
Afghanistan,169.0,0.479
Albania,75.0,0.764
Algeria,83.0,0.745
Andorra,32.0,0.858
Angola,150.0,0.533
Antigua and Barbuda,62.0,0.786
Argentina,45.0,0.827
Armenia,84.0,0.743
Australia,2.0,0.939
Austria,24.0,0.893


In [276]:
frame = pd.concat([featureset, HDI],axis=1)

In [278]:
frame.head(13)

Unnamed: 0,Life expectancy at birth (years),"Adult mortality rate, female (per 1,000 people)","Adult mortality rate, male (per 1,000 people)","Deaths due to malaria (per 100,000 people)","Deaths due to tuberculosis (per 100,000 people)","HIV prevalence, adult (% ages 15-49), total","Infant mortality rate (per 1,000 live births)","Infants lacking immunization, DTP (% of one-year-olds)","Infants lacking immunization, measles (% of one-year-olds)",Public health expenditure (% of GDP),...,"Dependency ratio, young age (0-14) (per 100 people ages 15-64)",Median age (years),"Dependency ratio, old age (65 and older) (per 100 people ages 15-64)","Population, ages 1564 (millions)","Population, ages 65 and older (millions)","Population, under age 5 (millions)","Population, urban (%)",Sex ratio at birth (male to female births),HDI Rank (2015),HDI Score (2015)
Afghanistan,60.7,238,281,0.2,44.0,0.1,66.3,18,34,2.9,...,82.3,17.5,4.6,17.4,0.8,5.0,26.7,1.06,169.0,0.479
Albania,78.0,50,85,n.a.,0.6,n.a.,12.5,1,2,2.9,...,26.9,34.3,18.0,2.0,0.4,0.2,57.4,1.08,75.0,0.764
Algeria,75.0,84,135,0.0,11.0,0.1,21.9,1,5,5.2,...,43.6,27.6,9.1,26.0,2.4,4.6,70.7,1.05,83.0,0.745
Andorra,81.5,n.a.,n.a.,n.a.,0.8,n.a.,2.1,1,4,6.3,...,n.a.,n.a.,n.a.,n.a.,n.a.,n.a.,85.1,n.a.,32.0,0.858
Angola,52.7,321,369,100.9,52.0,2.2,96.0,1,15,2.1,...,95.2,16.1,4.6,12.5,0.6,4.7,44.1,1.03,150.0,0.533
Antigua and Barbuda,76.2,108,154,n.a.,3.8,n.a.,5.8,1,2,3.8,...,35.2,30.9,10.4,0.1,0.0,0.0,23.8,1.03,62.0,0.786
Argentina,76.5,75,154,n.a.,1.4,0.4,11.1,2,5,2.7,...,39.4,30.8,17.1,27.7,4.7,3.7,91.8,1.04,45.0,0.827
Armenia,74.9,70,170,n.a.,4.7,0.2,12.6,3,3,1.9,...,26.0,34.6,15.3,2.1,0.3,0.2,62.7,1.13,84.0,0.743
Australia,82.5,n.a.,n.a.,n.a.,0.2,0.2,3.0,8,7,6.3,...,28.2,37.5,22.7,15.9,3.6,1.5,89.4,1.06,2.0,0.939
Austria,81.6,46,86,n.a.,0.7,n.a.,2.9,7,24,8.7,...,21.2,43.2,28.0,5.7,1.6,0.4,66.0,1.06,24.0,0.893


In [280]:
frame.to_csv('GlobalData.csv', encoding='utf-8')

# Dataset Exploratory Analysis 

# Feature Engineering

# Model-Building & Evaluation