# Initial Questions
1. What are the downsides of development? 

# Profile

* Where did the data set come from (provenance)? What's in it?
    * The data is sourced from multiple locations and aggregated by The World Bank. Seems to mostly come from large, inter-governmental institutions, like the United Nations. I did not investigate tertiary sources in the hierarchy
        * Environmental Center 
        * Food and Agriculture Organization
        * Internal Displacement Monitoring Centre. 
        * World Health Organization
    
* How big is data set (how many rows? how many variables? file size?).
* What types of data variables present? What are the dimensions/type?
* What is the overall perceived quality of the data? What's missing? What do you wish it included? Any noticeable outliers? Any other anomalous or curious things that jump out at you?


# Variables to consider
## Positive indicators
* GDP per capita (current US$)
* GNI per capita, Atlas method (current US$)
* Literacy rate, adult total (% of people ages 15 and above)
* Mortality rate, infant (per 1,000 live births)
* Current health expenditure (% of GDP)
* Access to electricity (% population)
* industry (including construction), value added (% of GDP)

## Potentially negative indicators
* Rural population (% of total population)
* Urban population (% of total population)
* Total greenhouse gas emitions (kt)
* Forest area (% of land)
* Agriculture, forestry, fishing, value added (% of GDP)
* level of water stress
* Livestock production index (2014-2016 = 100)
* Cause of death, by communicable diseases and maternal, prenatal and nutrition conditions (% of total)
* Cause of death, by non-communicable diseases (% of total)
* Droughts, floods, extreme temperatures (% of population, average 1990-2009)
* Death rate, crude (per 1,000 people)
* Suicide mortality rate (per 100,000 population)
* Mortality from CVD, cancer, diabetes or CRD between exact ages 30 and 70 (%)
* PM2.5 air pollution, population exposed to levels exceeding WHO guideline value (% of total)

## Dropped indicators
* Bird, fish, mammal, plant species (threatened)
    * Data is too sparse--single entry for each per country. Also the number of threatened species is not normalized by the number of species in that country, so comparisons wouldn't make much sense. 

I did some filtering on the world bank webite. I looked through the variables and decided what was relevant to my question. Domain knowledge would have been helpful here. I pulled data for all years (1960 to 2019) and each of the above variables. Where possible, I chose variables that had already been normalized per capita. I do not yet know the fullness of the data. 

In [5]:
import pandas as pd
import numpy as np
import requests

pd.set_option('display.max.columns', None)
pd.set_option('display.precision', 2)

df = pd.read_csv('data/world_indicators.csv', na_values='..')

# Standardize column names: replace spaces with underscores and upper-case with lower-case
df.columns = [c.lower().replace(' ', '_') for c in df.columns]

In [6]:
df.head()

# Transform data to be of this form:
# country_name, year, series_1, series_1, series_n


Unnamed: 0,country_name,country_code,series_name,series_code,1960_[yr1960],1961_[yr1961],1962_[yr1962],1963_[yr1963],1964_[yr1964],1965_[yr1965],1966_[yr1966],1967_[yr1967],1968_[yr1968],1969_[yr1969],1970_[yr1970],1971_[yr1971],1972_[yr1972],1973_[yr1973],1974_[yr1974],1975_[yr1975],1976_[yr1976],1977_[yr1977],1978_[yr1978],1979_[yr1979],1980_[yr1980],1981_[yr1981],1982_[yr1982],1983_[yr1983],1984_[yr1984],1985_[yr1985],1986_[yr1986],1987_[yr1987],1988_[yr1988],1989_[yr1989],1990_[yr1990],1991_[yr1991],1992_[yr1992],1993_[yr1993],1994_[yr1994],1995_[yr1995],1996_[yr1996],1997_[yr1997],1998_[yr1998],1999_[yr1999],2000_[yr2000],2001_[yr2001],2002_[yr2002],2003_[yr2003],2004_[yr2004],2005_[yr2005],2006_[yr2006],2007_[yr2007],2008_[yr2008],2009_[yr2009],2010_[yr2010],2011_[yr2011],2012_[yr2012],2013_[yr2013],2014_[yr2014],2015_[yr2015],2016_[yr2016],2017_[yr2017],2018_[yr2018],2019_[yr2019],2020_[yr2020]
0,Afghanistan,AFG,Access to electricity (% of population),EG.ELC.ACCS.ZS,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,22.3,28.1,33.9,42.4,45.52,42.7,43.22,69.1,68.98,89.5,71.5,97.7,97.7,98.72,97.7,
1,Afghanistan,AFG,Agricultural land (% of land area),AG.LND.AGRI.ZS,,57.75,57.84,57.91,58.01,58.01,58.07,58.17,58.17,58.2,58.21,58.26,58.28,58.28,58.28,58.28,58.28,58.28,58.28,58.28,58.28,58.29,58.29,58.29,58.29,58.29,58.29,58.27,58.27,58.27,58.27,58.25,58.25,58.1,57.92,57.83,57.83,57.88,58.0,57.83,57.83,57.83,57.83,58.07,58.07,58.07,58.07,58.07,58.07,58.07,58.07,58.07,58.07,58.07,58.07,58.07,58.07,58.07,58.08,,
2,Afghanistan,AFG,Agricultural methane emissions (% of total),EN.ATM.METH.AG.ZS,,,,,,,,,,,77.07,76.65,72.76,74.02,73.87,74.8,75.5,74.42,74.37,74.11,71.8,71.75,71.42,69.56,65.94,63.82,59.27,63.6,63.81,66.48,59.75,61.85,63.07,64.34,66.18,68.38,70.9,72.74,74.05,75.74,72.8,68.59,73.19,73.43,72.77,72.84,58.09,48.62,44.2,,,,,,,,,,,,
3,Afghanistan,AFG,Agricultural methane emissions (thousand metri...,EN.ATM.METH.AG.KT.CE,,,,,,,,,,0.0,7863.01,7819.39,6672.29,6960.2,7378.03,7836.13,7954.05,7781.35,7719.37,7541.71,7450.5,7393.6,7327.15,6857.48,5970.79,5302.55,4394.66,4760.69,5054.15,4962.4,5360.0,5610.0,5670.0,5720.0,5930.0,6250.0,7040.0,7710.0,8220.0,8990.0,7710.0,6420.0,8190.0,8430.0,8340.0,8500.0,8650.0,8800.0,9800.0,9990.0,11510.0,11530.0,11380.0,11280.0,11480.0,10850.0,10630.0,10330.0,10450.0,,
4,Afghanistan,AFG,"Bird species, threatened",EN.BIR.THRD.NO,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,16.0,,
