The purpose of this file is to load World bank indicator data, obtained in CSV format, downloading from the [World Bank Indicator portal](http://datatopics.worldbank.org/world-development-indicators/)  

The data is reformated to a pandas dataframe with following format specified: indicators are on columns and the country and year form a multi level row index

In [2]:
import pandas as pd
import json
from pandas_datareader import wb

In [13]:
data_dir = '.\\..\\..\\data\\'
#name of output pickle file
world_bank_file_out = "world_bank_bulk_data.pkl"
#Name of input bulk csv file from the World bank website
world_bank_file_input = "WDIData.csv"
wb_data = pd.read_csv(data_dir + world_bank_file_input)

In [14]:
wb_data.shape

(422136, 64)

In [28]:
pd.set_option('display.max_columns', 5)

#### Format as received from the World Bank:

In [16]:
wb_data.head(3)

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,...,2015,2016,2017,2018,Unnamed: 63
0,Arab World,ARB,"2005 PPP conversion factor, GDP (LCU per inter...",PA.NUS.PPP.05,,...,,,,,
1,Arab World,ARB,"2005 PPP conversion factor, private consumptio...",PA.NUS.PRVT.PP.05,,...,,,,,
2,Arab World,ARB,Access to clean fuels and technologies for coo...,EG.CFT.ACCS.ZS,,...,84.23063,84.570425,,,


In [17]:
#filter out innecessary columns (we will focus on years 1972 to 2018)
drop = ['Country Code','Indicator Name','1960','1961','1962','1963','1964',\
        '1965','1966','1967','1968','1969', '1970', '1971', 'Unnamed: 63']
wb_data = wb_data.drop(drop, axis='columns')

In [67]:
#Switch around the data so that indicators are on columns and the country and year form a multi level row index
wb_data = wb_data.set_index(['Country Name', 'Indicator Code'])
wb_data = wb_data.stack()
wb_data = wb_data.unstack(['Indicator Code'])
wb_data = wb_data.sort_index()
wb_data.index.levels[1].name = 'Year'
wb_data.index.levels[0].name = 'Country'

#### Output format:

In [30]:
wb_data.head(10)

Unnamed: 0_level_0,Indicator Code,EG.CFT.ACCS.ZS,EG.ELC.ACCS.ZS,...,NY.GSR.NFCY.KN,SH.STA.FGMS.ZS
Country,Year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Afghanistan,1972,,,...,,
Afghanistan,1973,,,...,,
Afghanistan,1974,,,...,,
Afghanistan,1975,,,...,,
Afghanistan,1976,,,...,,
Afghanistan,1977,,,...,,
Afghanistan,1978,,,...,,
Afghanistan,1979,,,...,,
Afghanistan,1980,,,...,,
Afghanistan,1981,,,...,,


#### Save to a pickle file:

In [69]:
#Write data to a pickle file
wb_data.to_pickle(data_dir + world_bank_file_out)