<a href="https://colab.research.google.com/github/DonnaVakalis/Livability/blob/master/Gapminder1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project: Which other metrics track the GINI coefficient, using data from Gapminder?

## Table of Contents
<ul>
<li><a href="#intro">Introduction</a></li>
<li><a href="#wrangling">Data Wrangling</a></li>
<li><a href="#eda">Exploratory Data Analysis</a></li>
<li><a href="#conclusions">Conclusions</a></li>
</ul>

<a id='intro'></a>
## Introduction

Questions posed:
For the most recent year with data available, were other measures of "quality of life" (such as air quality, minimum wage etc.) correlated with economic equality as measured by the GINI metric? 

Datasets:  inidividual datasets for traffice deaths, cellphones, air quality and minimum wage downloaded from https://www.gapminder.org/data/



<a id='wrangling'></a>
## Data Wrangling


### LOAD DATA


In [1]:
# Install pycountry
!pip install pycountry

# Imports 
import pandas as pd
import matplotlib.pyplot as plt
import os
from google.colab import drive
import pycountry
from functools import reduce #for merging dataframes

# Settings
%matplotlib inline 
pd.options.display.float_format = '{:,.2f}'.format # display numbers with two decimal places



Collecting pycountry
[?25l  Downloading https://files.pythonhosted.org/packages/76/73/6f1a412f14f68c273feea29a6ea9b9f1e268177d32e0e69ad6790d306312/pycountry-20.7.3.tar.gz (10.1MB)
[K     |████████████████████████████████| 10.1MB 5.8MB/s 
[?25hBuilding wheels for collected packages: pycountry
  Building wheel for pycountry (setup.py) ... [?25l[?25hdone
  Created wheel for pycountry: filename=pycountry-20.7.3-py2.py3-none-any.whl size=10746863 sha256=9b449a51c32d84cdb8c02306af248e20d8a04d81027007b1434bc974c0ef39ae
  Stored in directory: /root/.cache/pip/wheels/33/4e/a6/be297e6b83567e537bed9df4a93f8590ec01c1acfbcd405348
Successfully built pycountry
Installing collected packages: pycountry
Successfully installed pycountry-20.7.3


In [2]:
# Mount Google Drive

drive.mount('/content/gdrive')
os.chdir("/content/gdrive/My Drive/")


Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/gdrive


In [3]:
# Load the data

base_dir = "/content/gdrive/My Drive/Colab Notebooks/project_gapminder/"

# 1. Read GINI --> CSV format
file = base_dir + 'gini.csv' # from https://www.gapminder.org/data/
df_gini = pd.read_csv(file)
df_gini.head()

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,1809,1810,1811,1812,1813,1814,1815,1816,1817,1818,1819,1820,1821,1822,1823,1824,1825,1826,1827,1828,1829,1830,1831,1832,1833,1834,1835,1836,1837,1838,...,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024,2025,2026,2027,2028,2029,2030,2031,2032,2033,2034,2035,2036,2037,2038,2039,2040
0,Afghanistan,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,...,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8,36.8
1,Albania,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,...,30.7,31.0,31.1,31.0,30.7,30.4,30.2,30.0,29.7,29.5,29.3,29.1,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0,29.0
2,Algeria,56.2,56.2,56.2,56.2,56.2,56.2,56.2,56.2,56.2,56.2,56.2,56.2,56.2,56.2,56.2,56.2,56.2,56.2,56.3,56.4,56.5,56.6,56.7,56.8,56.9,57.0,57.2,57.4,57.5,57.7,57.9,58.1,58.2,58.4,58.6,58.8,58.9,59.1,59.3,...,32.6,32.2,31.7,31.2,30.8,30.3,29.9,29.4,29.0,28.5,28.2,27.9,27.7,27.6,27.6,27.6,27.6,27.6,27.6,27.6,27.6,27.6,27.6,27.6,27.6,27.6,27.6,27.6,27.6,27.6,27.6,27.6,27.6,27.6,27.6,27.6,27.6,27.6,27.6,27.6
3,Andorra,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,...,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0,40.0
4,Angola,57.2,57.2,57.2,57.2,57.2,57.2,57.2,57.2,57.2,57.2,57.2,57.2,57.2,57.2,57.2,57.2,57.2,57.2,57.1,56.9,56.8,56.6,56.4,56.1,55.9,55.7,55.4,55.1,54.7,54.4,54.1,53.7,53.4,53.1,52.7,52.4,52.1,51.7,51.4,...,51.3,50.6,49.7,48.5,47.3,46.2,45.0,44.1,43.4,42.9,42.7,42.6,42.6,42.6,42.6,42.6,42.6,42.6,42.6,42.6,42.6,42.6,42.6,42.6,42.6,42.6,42.6,42.6,42.6,42.6,42.6,42.6,42.6,42.6,42.6,42.6,42.6,42.6,42.6,42.6


In [4]:
# 2. Read Income per Person --> CSV format
file = base_dir + 'income_per_person_gdppercapita_ppp_inflation_adjusted.csv'  # from https://www.gapminder.org/data/
df_incm = pd.read_csv(file)
df_incm.head()

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,1809,1810,1811,1812,1813,1814,1815,1816,1817,1818,1819,1820,1821,1822,1823,1824,1825,1826,1827,1828,1829,1830,1831,1832,1833,1834,1835,1836,1837,1838,...,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024,2025,2026,2027,2028,2029,2030,2031,2032,2033,2034,2035,2036,2037,2038,2039,2040
0,Afghanistan,603,603,603,603,603,603,603,603,603,603,604,604,604,604,604,604,604,604,604,604,604,607,609,611,613,615,617,619,621,623,625,627,630,632,634,636,638,640,643,...,646,1020,1060,1030,1100,1120,1250,1270,1500,1670,1630,1770,1810,1800,1770,1760,1760,1740,1760,1800,1850,1900,1970,2050,2140,2220,2290,2360,2430,2490,2550,2600,2660,2710,2770,2820,2880,2940,3000,3060
1,Albania,667,667,667,667,667,668,668,668,668,668,668,668,668,668,668,669,669,669,669,669,669,671,672,674,675,677,678,680,681,683,684,686,688,689,691,692,694,695,697,...,5950,6240,6610,7000,7430,7910,8450,9160,9530,9930,10200,10400,10500,10700,11000,11400,11800,12300,12700,13200,13800,14400,15000,15600,16200,16800,17400,18000,18500,18900,19400,19800,20200,20600,21000,21500,21900,22300,22800,23300
2,Algeria,715,716,717,718,719,720,721,722,723,724,725,726,727,728,729,730,731,732,733,734,735,743,751,759,767,775,784,792,801,810,819,828,837,846,855,864,874,883,893,...,10400,10900,11500,11800,12400,12400,12600,12700,12700,12900,13000,13200,13300,13500,13800,13900,13900,13900,14000,14000,14000,14000,13900,13800,13700,13700,13700,13800,13900,14100,14300,14600,14900,15200,15500,15800,16100,16500,16800,17100
3,Andorra,1200,1200,1200,1200,1210,1210,1210,1210,1220,1220,1220,1220,1220,1230,1230,1230,1230,1240,1240,1240,1240,1260,1270,1290,1300,1320,1330,1350,1370,1380,1400,1410,1430,1450,1470,1480,1500,1520,1540,...,31800,31900,34500,36300,39800,42700,43400,41400,41700,39000,42000,41900,43700,44900,46600,48200,49800,51500,53200,55000,56900,58700,60400,62100,63900,65600,67300,68900,70500,72100,73600,75100,76700,78300,79900,81500,83100,84800,86500,88300
4,Angola,618,620,623,626,628,631,634,637,640,642,645,648,651,654,657,660,662,665,668,671,674,677,680,683,686,689,692,695,698,701,704,708,711,714,717,720,723,726,730,...,3920,4320,4300,4610,5110,5500,6040,6470,6290,6360,6350,6650,6730,6810,6650,6260,6050,5730,5540,5440,5440,5460,5520,5560,5600,5660,5720,5800,5890,6000,6110,6230,6350,6480,6610,6750,6880,7020,7170,7310


In [5]:
# 3. Read CO2 emissions --> CSV format
file = base_dir + 'co2_emissions_tonnes_per_person.csv' # from https://www.gapminder.org/data/
df_crbn = pd.read_csv(file)
df_crbn.head()

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,1809,1810,1811,1812,1813,1814,1815,1816,1817,1818,1819,1820,1821,1822,1823,1824,1825,1826,1827,1828,1829,1830,1831,1832,1833,1834,1835,1836,1837,1838,...,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014
0,Afghanistan,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,0.17,0.15,0.18,0.16,0.17,0.13,0.15,0.16,0.2,0.23,0.29,0.27,0.27,0.25,0.23,0.21,0.18,0.1,0.09,0.08,0.07,0.06,0.06,0.05,0.04,0.04,0.04,0.05,0.05,0.04,0.05,0.06,0.08,0.15,0.24,0.29,0.41,0.34,0.31,0.29
1,Albania,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,1.91,2.01,2.27,2.52,2.89,1.93,2.68,2.62,2.68,2.69,2.65,2.65,2.38,2.29,2.76,1.68,1.31,0.78,0.73,0.61,0.67,0.65,0.5,0.56,0.96,0.97,1.03,1.2,1.38,1.34,1.38,1.27,1.29,1.46,1.47,1.56,1.79,1.68,1.74,1.97
2,Algeria,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,1.93,2.29,2.38,3.45,2.45,3.46,2.34,1.92,2.49,3.27,3.24,3.3,3.54,3.43,3.19,2.99,2.99,2.96,2.97,3.06,3.31,3.32,2.94,3.54,3.01,2.83,2.68,2.82,2.84,2.71,3.24,3.0,3.2,3.17,3.44,3.31,3.31,3.48,3.53,3.74
3,Andorra,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,7.47,7.18,6.91,6.74,6.49,6.66,7.06,7.24,7.66,7.98,8.02,7.79,7.59,7.32,7.36,7.3,6.75,6.52,6.43,6.12,6.12,5.87,5.92,5.9,5.83
4,Angola,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,0.63,0.45,0.47,0.69,0.68,0.64,0.61,0.52,0.55,0.52,0.47,0.45,0.54,0.46,0.44,0.43,0.42,0.41,0.44,0.29,0.79,0.73,0.5,0.48,0.58,0.58,0.57,0.72,0.5,1.0,0.99,1.11,1.2,1.19,1.23,1.24,1.25,1.33,1.25,1.29


In [6]:
# 4. Read Cellphone per 100 people --> CSV format 
file = base_dir + 'cell_phones_per_100_people.csv' # from https://www.gapminder.org/data/
df_phon= pd.read_csv(file)
df_phon.head()

Unnamed: 0,country,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
0,Afghanistan,0.0,,,,,0.0,,,,,0.0,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.11,0.84,2.43,4.68,9.53,17.2,28.5,37.0,35.0,45.8,49.2,52.1,55.2,57.3,61.1,65.9,59.1
1,Albania,0.0,,,,,0.0,,,,,0.0,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.11,0.18,0.35,0.95,12.5,27.2,35.3,40.6,49.6,62.4,76.5,61.9,82.9,91.3,106.0,120.0,127.0,116.0,118.0,117.0,126.0,94.2
2,Algeria,0.0,,,,,0.0,,,,,0.0,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.02,0.0,0.02,0.04,0.06,0.06,0.23,0.28,0.32,1.41,4.48,14.9,41.2,62.4,80.7,77.8,92.6,91.1,97.1,100.0,104.0,111.0,109.0,116.0,111.0,112.0
3,Andorra,0.0,,,,,0.0,,,,,0.0,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.31,1.28,1.25,4.42,8.53,13.4,22.0,32.0,36.0,43.7,46.8,70.9,76.6,81.9,85.2,76.8,76.6,76.4,77.6,77.7,77.5,79.1,83.6,91.4,98.5,104.0,107.0
4,Angola,0.0,,,,,0.0,,,,,0.0,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.02,0.05,0.06,0.15,0.16,0.44,0.8,1.93,3.94,8.29,15.2,23.7,31.2,36.0,40.3,49.8,50.9,51.1,52.2,49.8,45.1,44.7,43.1


In [7]:
# types and look for shape, types and instances of missing or possibly errant data 
df_gini.info()
df_gini

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 195 entries, 0 to 194
Columns: 242 entries, country to 2040
dtypes: float64(241), object(1)
memory usage: 368.8+ KB


Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,1809,1810,1811,1812,1813,1814,1815,1816,1817,1818,1819,1820,1821,1822,1823,1824,1825,1826,1827,1828,1829,1830,1831,1832,1833,1834,1835,1836,1837,1838,...,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024,2025,2026,2027,2028,2029,2030,2031,2032,2033,2034,2035,2036,2037,2038,2039,2040
0,Afghanistan,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,30.50,...,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80,36.80
1,Albania,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,38.90,...,30.70,31.00,31.10,31.00,30.70,30.40,30.20,30.00,29.70,29.50,29.30,29.10,29.00,29.00,29.00,29.00,29.00,29.00,29.00,29.00,29.00,29.00,29.00,29.00,29.00,29.00,29.00,29.00,29.00,29.00,29.00,29.00,29.00,29.00,29.00,29.00,29.00,29.00,29.00,29.00
2,Algeria,56.20,56.20,56.20,56.20,56.20,56.20,56.20,56.20,56.20,56.20,56.20,56.20,56.20,56.20,56.20,56.20,56.20,56.20,56.30,56.40,56.50,56.60,56.70,56.80,56.90,57.00,57.20,57.40,57.50,57.70,57.90,58.10,58.20,58.40,58.60,58.80,58.90,59.10,59.30,...,32.60,32.20,31.70,31.20,30.80,30.30,29.90,29.40,29.00,28.50,28.20,27.90,27.70,27.60,27.60,27.60,27.60,27.60,27.60,27.60,27.60,27.60,27.60,27.60,27.60,27.60,27.60,27.60,27.60,27.60,27.60,27.60,27.60,27.60,27.60,27.60,27.60,27.60,27.60,27.60
3,Andorra,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,...,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00,40.00
4,Angola,57.20,57.20,57.20,57.20,57.20,57.20,57.20,57.20,57.20,57.20,57.20,57.20,57.20,57.20,57.20,57.20,57.20,57.20,57.10,56.90,56.80,56.60,56.40,56.10,55.90,55.70,55.40,55.10,54.70,54.40,54.10,53.70,53.40,53.10,52.70,52.40,52.10,51.70,51.40,...,51.30,50.60,49.70,48.50,47.30,46.20,45.00,44.10,43.40,42.90,42.70,42.60,42.60,42.60,42.60,42.60,42.60,42.60,42.60,42.60,42.60,42.60,42.60,42.60,42.60,42.60,42.60,42.60,42.60,42.60,42.60,42.60,42.60,42.60,42.60,42.60,42.60,42.60,42.60,42.60
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
190,Venezuela,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,62.80,...,49.10,49.40,50.30,50.00,49.30,48.60,48.00,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90,46.90
191,Vietnam,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,34.20,...,36.40,36.60,36.70,36.60,36.30,36.00,36.10,36.70,37.10,37.10,37.00,36.50,35.60,35.20,35.10,35.10,35.20,35.30,35.30,35.30,35.30,35.30,35.30,35.30,35.30,35.30,35.30,35.30,35.30,35.30,35.30,35.30,35.30,35.30,35.30,35.30,35.30,35.30,35.30,35.30
192,Yemen,50.10,50.10,50.10,50.10,50.10,50.10,50.10,50.10,50.10,50.10,50.10,50.10,50.10,50.10,50.10,50.10,50.10,50.10,50.10,50.20,50.30,50.40,50.50,50.60,50.70,50.80,51.00,51.10,51.30,51.40,51.60,51.70,51.90,52.00,52.20,52.40,52.50,52.70,52.80,...,34.90,34.90,34.80,34.80,34.80,34.90,35.00,35.20,35.50,35.70,36.00,36.20,36.40,36.60,36.70,36.70,36.70,36.70,36.70,36.70,36.70,36.70,36.70,36.70,36.70,36.70,36.70,36.70,36.70,36.70,36.70,36.70,36.70,36.70,36.70,36.70,36.70,36.70,36.70,36.70
193,Zambia,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,54.50,...,44.90,45.00,46.60,48.50,50.70,53.30,54.70,55.00,55.30,55.60,55.90,56.20,56.50,56.80,57.00,57.10,57.10,57.10,57.10,57.10,57.10,57.10,57.10,57.10,57.10,57.10,57.10,57.10,57.10,57.10,57.10,57.10,57.10,57.10,57.10,57.10,57.10,57.10,57.10,57.10


comments about df_gini:
- many more years than other datasets
- consider limiting scope of question to "last 20 years"

In [None]:
# types and look for shape, types and instances of missing or possibly errant data 
df_incm.info()
df_incm

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 193 entries, 0 to 192
Columns: 242 entries, country to 2040
dtypes: int64(241), object(1)
memory usage: 365.0+ KB


Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,1809,1810,1811,1812,1813,1814,1815,1816,1817,1818,1819,1820,1821,1822,1823,1824,1825,1826,1827,1828,1829,1830,1831,1832,1833,1834,1835,1836,1837,1838,...,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022,2023,2024,2025,2026,2027,2028,2029,2030,2031,2032,2033,2034,2035,2036,2037,2038,2039,2040
0,Afghanistan,603,603,603,603,603,603,603,603,603,603,604,604,604,604,604,604,604,604,604,604,604,607,609,611,613,615,617,619,621,623,625,627,630,632,634,636,638,640,643,...,646,1020,1060,1030,1100,1120,1250,1270,1500,1670,1630,1770,1810,1800,1770,1760,1760,1740,1760,1800,1850,1900,1970,2050,2140,2220,2290,2360,2430,2490,2550,2600,2660,2710,2770,2820,2880,2940,3000,3060
1,Albania,667,667,667,667,667,668,668,668,668,668,668,668,668,668,668,669,669,669,669,669,669,671,672,674,675,677,678,680,681,683,684,686,688,689,691,692,694,695,697,...,5950,6240,6610,7000,7430,7910,8450,9160,9530,9930,10200,10400,10500,10700,11000,11400,11800,12300,12700,13200,13800,14400,15000,15600,16200,16800,17400,18000,18500,18900,19400,19800,20200,20600,21000,21500,21900,22300,22800,23300
2,Algeria,715,716,717,718,719,720,721,722,723,724,725,726,727,728,729,730,731,732,733,734,735,743,751,759,767,775,784,792,801,810,819,828,837,846,855,864,874,883,893,...,10400,10900,11500,11800,12400,12400,12600,12700,12700,12900,13000,13200,13300,13500,13800,13900,13900,13900,14000,14000,14000,14000,13900,13800,13700,13700,13700,13800,13900,14100,14300,14600,14900,15200,15500,15800,16100,16500,16800,17100
3,Andorra,1200,1200,1200,1200,1210,1210,1210,1210,1220,1220,1220,1220,1220,1230,1230,1230,1230,1240,1240,1240,1240,1260,1270,1290,1300,1320,1330,1350,1370,1380,1400,1410,1430,1450,1470,1480,1500,1520,1540,...,31800,31900,34500,36300,39800,42700,43400,41400,41700,39000,42000,41900,43700,44900,46600,48200,49800,51500,53200,55000,56900,58700,60400,62100,63900,65600,67300,68900,70500,72100,73600,75100,76700,78300,79900,81500,83100,84800,86500,88300
4,Angola,618,620,623,626,628,631,634,637,640,642,645,648,651,654,657,660,662,665,668,671,674,677,680,683,686,689,692,695,698,701,704,708,711,714,717,720,723,726,730,...,3920,4320,4300,4610,5110,5500,6040,6470,6290,6360,6350,6650,6730,6810,6650,6260,6050,5730,5540,5440,5440,5460,5520,5560,5600,5660,5720,5800,5890,6000,6110,6230,6350,6480,6610,6750,6880,7020,7170,7310
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
188,Venezuela,1210,1200,1200,1190,1190,1180,1170,1170,1160,1160,1150,1140,1140,1130,1130,1120,1120,1110,1100,1100,1090,1170,1250,1340,1440,1540,1650,1770,1890,2020,2170,2140,2420,2450,2480,2430,2260,2130,1910,...,14800,13300,12000,14000,15100,16400,17600,18200,17400,16900,17300,18000,18000,17100,15600,15200,14500,12500,9720,9050,8600,8430,8300,8180,8070,8000,7970,7990,8050,8150,8270,8420,8580,8760,8930,9110,9300,9490,9680,9880
189,Vietnam,778,778,778,778,778,778,778,778,778,778,778,778,778,778,778,778,778,778,778,778,778,777,777,776,775,775,774,773,773,772,771,771,770,770,769,768,768,767,766,...,2710,2850,3020,3210,3430,3630,3850,4030,4210,4430,4660,4860,5070,5310,5610,5900,6230,6610,6970,7350,7760,8190,8640,9130,9620,10100,10500,11000,11300,11600,11900,12200,12500,12700,13000,13300,13500,13800,14100,14400
190,Yemen,877,879,882,884,887,889,892,894,897,899,902,905,907,910,912,915,917,920,923,925,928,931,933,936,938,941,944,947,949,952,955,957,960,963,965,968,971,974,976,...,4030,4070,4100,4150,4250,4270,4290,4320,4360,4570,3880,3860,3940,3830,3110,2620,2400,2360,2340,2330,2580,2730,2840,2880,2920,2960,3010,3060,3110,3170,3230,3290,3360,3430,3500,3570,3640,3720,3790,3870
191,Zambia,663,665,667,668,670,671,673,675,676,678,680,681,683,684,686,688,689,691,693,694,696,698,700,701,703,705,706,708,710,711,713,715,717,718,720,722,724,725,727,...,2180,2220,2320,2420,2520,2650,2800,2930,3120,3340,3420,3570,3630,3690,3680,3700,3720,3740,3700,3650,3600,3560,3500,3450,3410,3380,3370,3380,3410,3450,3500,3560,3630,3700,3780,3860,3930,4010,4100,4180


comments about df_incm:
- same countries as above, making comparison easier

In [None]:
# types and look for shape, types and instances of missing or possibly errant data 
df_crbn.info()
df_crbn

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 192 entries, 0 to 191
Columns: 216 entries, country to 2014
dtypes: float64(215), object(1)
memory usage: 324.1+ KB


Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,1809,1810,1811,1812,1813,1814,1815,1816,1817,1818,1819,1820,1821,1822,1823,1824,1825,1826,1827,1828,1829,1830,1831,1832,1833,1834,1835,1836,1837,1838,...,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014
0,Afghanistan,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,0.17,0.15,0.18,0.16,0.17,0.13,0.15,0.16,0.20,0.23,0.29,0.27,0.27,0.25,0.23,0.21,0.18,0.10,0.09,0.08,0.07,0.06,0.06,0.05,0.04,0.04,0.04,0.05,0.05,0.04,0.05,0.06,0.08,0.15,0.24,0.29,0.41,0.34,0.31,0.29
1,Albania,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,1.91,2.01,2.27,2.52,2.89,1.93,2.68,2.62,2.68,2.69,2.65,2.65,2.38,2.29,2.76,1.68,1.31,0.78,0.73,0.61,0.67,0.65,0.50,0.56,0.96,0.97,1.03,1.20,1.38,1.34,1.38,1.27,1.29,1.46,1.47,1.56,1.79,1.68,1.74,1.97
2,Algeria,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,1.93,2.29,2.38,3.45,2.45,3.46,2.34,1.92,2.49,3.27,3.24,3.30,3.54,3.43,3.19,2.99,2.99,2.96,2.97,3.06,3.31,3.32,2.94,3.54,3.01,2.83,2.68,2.82,2.84,2.71,3.24,3.00,3.20,3.17,3.44,3.31,3.31,3.48,3.53,3.74
3,Andorra,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,,,,,,,,,,,,,,,,7.47,7.18,6.91,6.74,6.49,6.66,7.06,7.24,7.66,7.98,8.02,7.79,7.59,7.32,7.36,7.30,6.75,6.52,6.43,6.12,6.12,5.87,5.92,5.90,5.83
4,Angola,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,0.63,0.45,0.47,0.69,0.68,0.64,0.61,0.52,0.55,0.52,0.47,0.45,0.54,0.46,0.44,0.43,0.42,0.41,0.44,0.29,0.79,0.73,0.50,0.48,0.58,0.58,0.57,0.72,0.50,1.00,0.99,1.11,1.20,1.19,1.23,1.24,1.25,1.33,1.25,1.29
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
187,Venezuela,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,4.84,4.23,4.56,4.78,5.18,5.98,5.90,5.83,5.66,5.52,5.85,6.16,6.09,6.21,5.68,6.22,5.74,5.15,5.92,6.05,6.08,5.49,5.87,7.19,7.29,6.30,7.00,7.70,7.52,5.84,6.25,6.31,5.90,6.48,6.41,6.65,6.12,6.77,6.18,6.17
188,Vietnam,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,0.45,0.28,0.30,0.30,0.31,0.31,0.32,0.32,0.33,0.29,0.35,0.37,0.41,0.36,0.26,0.32,0.31,0.30,0.32,0.36,0.39,0.46,0.58,0.61,0.60,0.67,0.76,0.87,0.96,1.09,1.17,1.21,1.23,1.37,1.48,1.62,1.71,1.58,1.62,1.82
189,Yemen,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,0.32,0.37,0.41,0.42,0.42,0.42,0.51,0.70,0.71,0.78,0.90,0.83,0.84,0.90,0.89,0.82,0.75,0.77,0.64,0.63,0.70,0.69,0.84,0.74,0.82,0.84,0.91,0.85,0.91,0.97,1.00,1.03,0.98,1.02,1.09,1.01,0.83,0.76,1.01,0.88
190,Zambia,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,0.83,0.79,0.71,0.64,0.64,0.60,0.56,0.56,0.51,0.42,0.40,0.40,0.37,0.41,0.33,0.30,0.29,0.29,0.29,0.27,0.24,0.20,0.25,0.23,0.18,0.17,0.18,0.18,0.19,0.19,0.19,0.19,0.15,0.17,0.19,0.20,0.21,0.25,0.27,0.29


comments about df_crbn:
- most recent year is 2014, so limit question to 2014 for all

In [None]:
# types and look for shape, types and instances of missing or possibly errant data 
df_phon.info()
df_phon

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 194 entries, 0 to 193
Data columns (total 60 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   country  194 non-null    object 
 1   1960     183 non-null    float64
 2   1961     0 non-null      float64
 3   1962     0 non-null      float64
 4   1963     0 non-null      float64
 5   1964     0 non-null      float64
 6   1965     183 non-null    float64
 7   1966     0 non-null      float64
 8   1967     0 non-null      float64
 9   1968     0 non-null      float64
 10  1969     0 non-null      float64
 11  1970     183 non-null    float64
 12  1971     0 non-null      float64
 13  1972     0 non-null      float64
 14  1973     0 non-null      float64
 15  1974     0 non-null      float64
 16  1975     183 non-null    float64
 17  1976     183 non-null    float64
 18  1977     183 non-null    float64
 19  1978     183 non-null    float64
 20  1979     183 non-null    float64
 21  1980     183 non

Unnamed: 0,country,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018
0,Afghanistan,0.00,,,,,0.00,,,,,0.00,,,,,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.11,0.84,2.43,4.68,9.53,17.20,28.50,37.00,35.00,45.80,49.20,52.10,55.20,57.30,61.10,65.90,59.10
1,Albania,0.00,,,,,0.00,,,,,0.00,,,,,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.07,0.11,0.18,0.35,0.95,12.50,27.20,35.30,40.60,49.60,62.40,76.50,61.90,82.90,91.30,106.00,120.00,127.00,116.00,118.00,117.00,126.00,94.20
2,Algeria,0.00,,,,,0.00,,,,,0.00,,,,,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.02,0.02,0.02,0.00,0.02,0.04,0.06,0.06,0.23,0.28,0.32,1.41,4.48,14.90,41.20,62.40,80.70,77.80,92.60,91.10,97.10,100.00,104.00,111.00,109.00,116.00,111.00,112.00
3,Andorra,0.00,,,,,0.00,,,,,0.00,,,,,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,1.31,1.28,1.25,4.42,8.53,13.40,22.00,32.00,36.00,43.70,46.80,70.90,76.60,81.90,85.20,76.80,76.60,76.40,77.60,77.70,77.50,79.10,83.60,91.40,98.50,104.00,107.00
4,Angola,0.00,,,,,0.00,,,,,0.00,,,,,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.01,0.01,0.01,0.02,0.05,0.06,0.15,0.16,0.44,0.80,1.93,3.94,8.29,15.20,23.70,31.20,36.00,40.30,49.80,50.90,51.10,52.20,49.80,45.10,44.70,43.10
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
189,Venezuela,0.00,,,,,0.00,,,,,0.00,,,,,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.01,0.02,0.04,0.08,0.38,0.87,1.49,1.84,2.60,4.69,8.63,15.90,22.50,26.30,26.10,27.50,32.40,47.30,70.00,87.40,99.20,100.00,98.00,99.60,104.00,104.00,102.00,96.70,92.50,83.30,71.80
190,Vietnam,0.00,,,,,0.00,,,,,0.00,,,,,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.01,0.02,0.03,0.09,0.21,0.28,0.42,0.99,1.55,2.33,3.33,5.97,11.40,22.30,52.70,86.80,113.00,127.00,143.00,147.00,136.00,148.00,130.00,129.00,127.00,147.00
191,Yemen,0.00,,,,,0.00,,,,,0.00,,,,,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.01,0.04,0.06,0.06,0.06,0.08,0.10,0.16,0.18,0.82,2.64,3.56,7.55,11.30,14.40,20.40,29.40,36.90,47.90,49.00,56.80,67.00,66.20,56.70,60.50,55.20,53.70
192,Zambia,0.00,,,,,0.00,,,,,0.00,,,,,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.02,0.03,0.05,0.08,0.28,0.95,1.13,1.27,2.14,4.02,8.01,13.70,21.10,27.50,33.30,40.00,58.20,72.80,69.60,65.70,72.80,73.40,79.70,89.20


comments about df_phon
- up to 2018

 
### Data Cleaning

Merging variables into one large dataframe by country, truncating to years of interest (1996-2014), Getting 3-letter country codes 

In [11]:
# Select a year of interest within all dataframes
data_frames = [df_gini, df_incm, df_crbn, df_phon]
cols_to_keep = ["country","2014"]
df_ = pd.DataFrame(columns=cols_to_keep)

for df in data_frames:
    df = df[cols_to_keep]
    df_= pd.merge(df, df, on=[cols_to_keep], how = 'outer')
   # df_.append(df, on=['country'], how='outer')
 



KeyError: ignored

In [9]:
df_

Unnamed: 0,country,2014


In [None]:
df_gini_copy = df_gini[cols_to_keep]
df_gini_copy.head()

Unnamed: 0,country,1996,2014
0,Afghanistan,36.8,36.8
1,Albania,27.5,29.0
2,Algeria,34.9,27.6
3,Andorra,40.0,40.0
4,Angola,52.4,42.6


In [None]:
# Truncate and Merge dataframes by country
df_merged = reduce(lambda  left,right: pd.merge(left,right,on=[cols_to_keep],
                                            how='outer'), data_frames)


In [None]:
df_merged

Unnamed: 0,country_x,1800_x,1801_x,1802_x,1803_x,1804_x,1805_x,1806_x,1807_x,1808_x,1809_x,1810_x,1811_x,1812_x,1813_x,1814_x,1815_x,1816_x,1817_x,1818_x,1819_x,1820_x,1821_x,1822_x,1823_x,1824_x,1825_x,1826_x,1827_x,1828_x,1829_x,1830_x,1831_x,1832_x,1833_x,1834_x,1835_x,1836_x,1837_x,1838_x,...,1978_y,1979_y,1980_y,1981_y,1982_y,1983_y,1984_y,1985_y,1986_y,1987_y,1988_y,1989_y,1990_y,1991_y,1992_y,1993_y,1994_y,1995_y,1996_y,1997_y,1998_y,1999_y,2001_y,2002_y,2003_y,2004_y,2005_y,2006_y,2007_y,2008_y,2009_y,2010_y,2011_y,2012_y,2013_y,2014_y,2015,2016,2017,2018
0,Afghanistan,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,30.5,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,Albania,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,38.9,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,Bosnia and Herzegovina,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,30.0,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,Montenegro,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,30.2,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,Algeria,56.2,56.2,56.2,56.2,56.2,56.2,56.2,56.2,56.2,56.2,56.2,56.2,56.2,56.2,56.2,56.2,56.2,56.2,56.3,56.4,56.5,56.6,56.7,56.8,56.9,57.0,57.2,57.4,57.5,57.7,57.9,58.1,58.2,58.4,58.6,58.8,58.9,59.1,59.3,...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
765,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0000,0.0000,0.0000,0.00000,0.00229,0.00404,0.0164,0.0410,0.0729,0.1120,0.165,0.510,0.735,1.25,2.09,2.72,9.44,20.9,44.8,58.5,73.5,87.8,68.8,71.8,71.1,70.4,74.0,75.9,71.5
766,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0000,0.0000,0.0000,0.00000,0.00000,0.03900,0.0720,0.0897,0.1180,0.1240,0.166,0.185,2.530,3.92,5.15,6.06,7.00,11.8,16.0,57.2,71.9,56.4,58.6,49.6,59.1,64.5,78.5,79.9,85.9
767,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00963,0.0192,0.0378,0.0826,0.38200,0.86900,1.49000,1.8400,2.6000,4.6900,8.6300,15.900,26.300,26.100,27.50,32.40,47.30,70.00,87.4,99.2,100.0,98.0,99.6,104.0,104.0,102.0,96.7,92.5,83.3,71.8
768,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0000,0.0000,0.0000,0.00113,0.00562,0.01700,0.0314,0.0906,0.2080,0.2850,0.416,1.550,2.330,3.33,5.97,11.40,22.30,52.7,86.8,113.0,127.0,143.0,147.0,136.0,148.0,130.0,129.0,127.0,147.0


In [None]:
# Get 3-letter country codes 

list_countries = df_mw['country_name'].unique().tolist()
# print(list_countries) # Uncomment to see list of countries
d_country_code = {}  # To hold the country names and their ISO
for country in list_countries:
    try:
        country_data = pycountry.countries.search_fuzzy(country)
        # country_data is a list of objects of class pycountry.db.Country
        # The first item  ie at index 0 of list is best fit
        # object of class Country have an alpha_3 attribute
        country_code = country_data[0].alpha_3
        d_country_code.update({country: country_code})
    except:
        print('could not add ISO 3 code for ->', country)
        # If could not find country, make ISO code ' '
        d_country_code.update({country: ' '})
        
 # create a new column iso_alpha in the df
# and fill it with appropriate iso 3 code
for k, v in d_country_code.items():
    df_mw.loc[(df_mw.country_name == k), 'iso_alpha'] = v



<a id='eda'></a>
## Exploratory Data Analysis

statistics and create visualizations with the goal of addressing the research questions that you posed in the Introduction section. It is recommended that you be systematic with your approach. Look at one variable at a time, and then follow it up by looking at relationships between variables.

### Does the Gini coefficient correlate with X? (in 2014 or in 2000...was it much different?)

In [None]:
# Use this, and more code cells, to explore your data. Don't forget to add
#   Markdown cells to document your observations and findings.


### Research Question 2  (Replace this header name!)

In [None]:
# Continue to explore the data to address your additional research
#   questions. Add more headers as needed if you have more questions to
#   investigate.


<a id='conclusions'></a>
## Conclusions

> **Tip**: Finally, summarize your findings and the results that have been performed. Make sure that you are clear with regards to the limitations of your exploration. If you haven't done any statistical tests, do not imply any statistical conclusions. And make sure you avoid implying causation from correlation!

> **Tip**: Once you are satisfied with your work here, check over your report to make sure that it is satisfies all the areas of the rubric (found on the project submission page at the end of the lesson). You should also probably remove all of the "Tips" like this one so that the presentation is as polished as possible.

## Submitting your Project 

> Before you submit your project, you need to create a .html or .pdf version of this notebook in the workspace here. To do that, run the code cell below. If it worked correctly, you should get a return code of 0, and you should see the generated .html file in the workspace directory (click on the orange Jupyter icon in the upper left).

> Alternatively, you can download this report as .html via the **File** > **Download as** submenu, and then manually upload it into the workspace directory by clicking on the orange Jupyter icon in the upper left, then using the Upload button.

> Once you've done this, you can submit your project by clicking on the "Submit Project" button in the lower right here. This will create and submit a zip file with this .ipynb doc and the .html or .pdf version you created. Congratulations!

In [None]:
from subprocess import call
call(['python', '-m', 'nbconvert', 'Investigate_a_Dataset.ipynb'])