## Model one policy variables

This notebook extracts the selected policy variables in the `indicator_list` from IMF and World Bank (wb) data sources, and writes them to a csv file.

In [11]:
import warnings
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline

In [12]:
warnings.filterwarnings('ignore')
pd.options.display.float_format = '{:20,.2f}'.format

|  variable                 | origin            | source      |granularity|countries|   description                                               | composition                                                       |
| --------------------------|-------------------|-------------|-----------|---------|-------------------------------------------------------------|-------------------------------------------------------------------|
| total debt service        | -                 | wb econ     | yearly    | 217     | Total debt service (% of GNI)                               | -                                                     |
| interest payments         | -                 | wb econ     | yearly    | 217     | Interest payments on external debt (% of GNI)             | -                                                     |
| lending interest rate     | -                 | wb econ     | yearly    | 217     | Lending interest rate (%)                                   | -                                                     |
| firms using banks         | -                 | wb econ     | yearly    | 217     | Firms using banks to finance investment (% of firms)     | -                                                     |
| bank capital ratio        | -                 | wb econ     | yearly    | 217     | Bank capital to assets ratio (%)                             | -                                                     |
| tax revenue gdp share     | -                 | wb econ     | yearly    | 217     | Tax revenue (% of GDP)                                       | -                                                     |
| short term debt           | -                 | wb econ     | yearly    | 217     | Short-term debt (% of total external debt)              | -                                                     |
| inflation                 | -                 | wb econ     | yearly    | 217     | Inflation, GDP deflator (annual %)                          | -                                                     |
| GDP growth                | -                 | wb econ     | yearly    | 217     | GDP growth (annual %)                                       | -                                                     |
| real interest rate        | -                 | wb econ     | yearly    | 217     | Real interest rate (%)                                       | -                                                     |
| firm market cap           | -                 | wb econ     | yearly    | 217     | Market capitalization of listed domestic companies (% of GDP) | -                                                   |
| GDP per capita growth     | -                 | wb econ     | yearly    | 217     | GDP per capita growth (annual %)                             | -                                                     |
| GDP                       | -                 | wb econ     | yearly    | 217     | GDP (constant 2010 USD)                                     | -                                                     |
| GNI growth                | -                 | wb econ     | yearly    | 217     | GNI growth (annual %)                                       | -                                                     |
| interest payments         | -                 | wb econ     | yearly    | 217     | Interest payments (% of expense)                             | -                                                     |
| nonperforming bank loans  | -                 | wb econ     | yearly    | 217     | Bank nonperforming loans to total gross loans (%)       | -                                                     |
| savings                   | -                 | wb econ     | yearly    | 217     | Gross domestic savings (% of GDP)                        | -                                                     |
| gross savings             | -                 | wb econ     | yearly    | 217     | Gross savings (% of GNI)                                     | -                                                     |
| GNI per capita growth     | -                 | wb econ     | yearly    | 217     | GNI per capita growth (annual %)                             | -                                                     |
| employee compensation     | -                 | wb econ     | yearly    | 217     | Compensation of employees (% of expense)                    | -                                                     |
| reserves                  | -                 | wb econ     | yearly    | 217     | Total reserves (% of total external debt)              | -                                                     |
| broad money               | -                 | wb econ     | yearly    | 217     | Broad money (% of GDP)                                       | -                                                     |
| GNI                       | -                 | wb econ     | yearly    | 217     | GNI (constant 2010 USD)                                     | -                                                     |
| government debt           | -                 | wb econ     | yearly    | 217     | Central government debt, total (% of GDP)                  | -                                                     |

In [13]:
indicator_list = ['Total debt service (% of GNI)', 'Interest payments on external debt (% of GNI)',
                  'Lending interest rate (%)', 'Firms using banks to finance investment (% of firms)',
                  'Bank capital to assets ratio (%)', 'Tax revenue (% of GDP)', 'Short-term debt (% of total external debt)',
                  'Inflation, GDP deflator (annual %)', 'GDP growth (annual %)', 'Real interest rate (%)',
                  'Market capitalization of listed domestic companies (% of GDP)', 'GDP per capita growth (annual %)',
                  'GDP (constant 2010 US$)', 'GNI growth (annual %)', 'Interest payments (% of expense)',
                  'Bank nonperforming loans to total gross loans (%)', 'Gross domestic savings (% of GDP)',
                  'Gross savings (% of GNI)', 'GNI per capita growth (annual %)', 'Compensation of employees (% of expense)',
                  'Total reserves (% of total external debt)', 'Broad money (% of GDP)', 'GNI (constant 2010 US$)',
                  'Central government debt, total (% of GDP)']

In [14]:
len(indicator_list)

24

## Load imf monthly data

In [15]:
%%bash
wc -l imf/*.csv

  365536 data/imf/BOP_11-25-2018 19-15-19-60_timeSeries.csv
      64 data/imf/COMMP_11-25-2018 19-13-52-15_timeSeries.csv
   14430 data/imf/CPI_11-25-2018 19-14-47-26_timeSeries.csv
    1693 data/imf/FDI_11-20-2018 21-39-31-89_timeSeries.csv
 1247714 data/imf/GFSR_11-25-2018 19-23-39-70_timeSeries.csv
   16732 data/imf/IRFCL_11-25-2018 19-13-18-05_timeSeries.csv
    7846 data/imf/ITS_11-14-2018 15-14-06-02_timeSeries.csv
    7425 data/imf/PPLT_11-25-2018 19-25-01-32_timeSeries.csv
 1661440 total


In [16]:
time_values = [str('%sM%s' % (y, m)) for m in list(range(1, 13)) for y in list(range(1960, 2018))]
imf_columns = ['Country Name', 'Indicator Name'] + time_values

In [17]:
imf_country_aggregates = ['Euro Area']

In [18]:
def load_imf_monthly(file_name, indicators, imf_columns, country_aggregates):
    csv_df = pd.read_csv('data/imf/%s' % file_name).fillna(0)
    base_df = csv_df.loc[csv_df['Attribute'] == 'Value'].drop(columns=['Attribute'])
    monthly_df = base_df.loc[(base_df['Indicator Name'].isin(indicators))]
    imf_df = monthly_df[imf_columns].fillna(0)
    df = pd.melt(imf_df, id_vars=['Country Name', 'Indicator Name'], var_name='date', value_name='value')
    df['date'] = pd.to_datetime(df['date'], format='%YM%m')
    df.columns = ['country', 'indicator', 'date', 'value']
    return df.loc[~df['country'].isin(country_aggregates)]

In [19]:
imf_pplt_df = load_imf_monthly('PPLT_11-25-2018 19-25-01-32_timeSeries.csv', indicator_list, imf_columns, imf_country_aggregates)

In [20]:
imf_cpi_df = load_imf_monthly('CPI_11-25-2018 19-14-47-26_timeSeries.csv', indicator_list, imf_columns, imf_country_aggregates)

In [21]:
imf_df = pd.concat([imf_cpi_df, imf_pplt_df], join='outer')

In [22]:
imf_df.size

0

In [23]:
imf_df.head(15)

Unnamed: 0,country,indicator,date,value


In [24]:
len(imf_df['country'].unique())

0

In [25]:
imf_countries = sorted(list(imf_df['country'].unique()))

### Load world bank yearly data

In [26]:
%%bash
wc -l world_bank/*.csv

   33534 data/world_bank/ECON.csv
    9589 data/world_bank/HNP.csv
      38 data/world_bank/HNP_indicator_definitions.csv
   36174 data/world_bank/POP.csv
   79335 total


In [27]:
wb_country_aggregates = ['nan', 'Lower middle income', 'Post-demographic dividend', 'High income',
                         'Pre-demographic dividend', 'East Asia & Pacific (IDA & IBRD countries)',
                         'Europe & Central Asia (excluding high income)', 'Heavily indebted poor countries (HIPC)',
                         'Caribbean small states', 'Pacific island small states', 'Middle income',
                         'Late-demographic dividend', 'OECD members', 'IDA & IBRD total', 'Not classified', 
                         'East Asia & Pacific (excluding high income)',
                         'Latin America & the Caribbean (IDA & IBRD countries)', 'Low income', 'Low & middle income',
                         'IDA blend', 'IBRD only', 'Sub-Saharan Africa (excluding high income)', 
                         'Fragile and conflict affected situations', 'Europe & Central Asia (IDA & IBRD countries)',
                         'Euro area', 'Other small states', 'Europe & Central Asia', 'Arab World',
                         'Latin America & Caribbean (excluding high income)', 
                         'Sub-Saharan Africa (IDA & IBRD countries)', 'Early-demographic dividend', 'IDA only',
                         'Small states', 'Middle East & North Africa (excluding high income)', 'East Asia & Pacific',
                         'South Asia', 'European Union', 'Least developed countries: UN classification',
                         'Middle East & North Africa (IDA & IBRD countries)', 'Upper middle income',
                         'South Asia (IDA & IBRD)', 'Central Europe and the Baltics', 'Sub-Saharan Africa', 
                         'Latin America & Caribbean', 'Middle East & North Africa', 'IDA total', 'North America',
                         'Last Updated: 11/14/2018', 'Data from database: World Development Indicators', 'World']

In [28]:
wb_cols = ['Country Name', 'Series Name'] + [str('%s [YR%s]' % (y, y)) for y in list(range(1960, 2018))]

In [29]:
def load_wb_yearly(file_name, indicators, wb_columns, country_aggregates):
    csv_df = pd.read_csv('world_bank/%s' % file_name).fillna(0)
    base_df = csv_df.loc[(csv_df['Series Name'].isin(indicators))]
    wb_df = base_df[wb_columns].fillna(0)
    df = pd.melt(wb_df, id_vars=['Country Name', 'Series Name'], var_name='date', value_name='value')
    df['date'] = pd.to_datetime(df['date'].map(lambda x: int(x.split(' ')[0])), format='%Y')
    df.columns = ['country', 'indicator', 'date', 'value']
    return df.loc[~df['country'].isin(country_aggregates)]

In [30]:
wb_econ_df = load_wb_yearly('ECON.csv', indicator_list, wb_cols, wb_country_aggregates)

In [31]:
wb_hnp_df = load_wb_yearly('HNP.csv', indicator_list, wb_cols, wb_country_aggregates)

In [32]:
wb_pop_df = load_wb_yearly('POP.csv', indicator_list, wb_cols, wb_country_aggregates)

In [33]:
wb_df = pd.concat([wb_econ_df, wb_hnp_df, wb_pop_df], join='outer')

In [34]:
wb_df.size

1611008

In [35]:
wb_df.head(15)

Unnamed: 0,country,indicator,date,value
0,Afghanistan,Bank capital to assets ratio (%),1960-01-01,0.0
1,Afghanistan,Bank nonperforming loans to total gross loans (%),1960-01-01,0.0
2,Afghanistan,Broad money (% of GDP),1960-01-01,13.45
3,Afghanistan,"Central government debt, total (% of GDP)",1960-01-01,0.0
4,Afghanistan,Compensation of employees (% of expense),1960-01-01,0.0
5,Afghanistan,Firms using banks to finance investment (% of ...,1960-01-01,0.0
6,Afghanistan,GDP (constant 2010 US$),1960-01-01,0.0
7,Afghanistan,GDP growth (annual %),1960-01-01,0.0
8,Afghanistan,GDP per capita growth (annual %),1960-01-01,0.0
9,Afghanistan,GNI growth (annual %),1960-01-01,0.0


In [36]:
len(wb_df['country'].unique())

217

In [37]:
wb_countries = sorted(list(wb_df['country'].unique()))

### Combine the two datasets

In [38]:
imf_specific = [country for country in imf_countries if country not in wb_countries]

In [39]:
len(imf_specific)

0

In [40]:
imf_to_wb_country_map = {
    'Afghanistan, Islamic Republic of': 'Afghanistan',
    'Armenia, Republic of': 'Armenia',
    'Azerbaijan, Republic of': 'Azerbaijan',
    'Bahrain, Kingdom of': 'Bahrain',
    'China, P.R.: Hong Kong': 'Hong Kong SAR, China',
    'China, P.R.: Macao': 'Macao SAR, China',
    'China, P.R.: Mainland': 'China',
    'Congo, Democratic Republic of': 'Congo, Dem. Rep.',
    'Congo, Republic of': 'Congo, Rep.',
    'Egypt': 'Egypt, Arab Rep.',
    'French Territories: New Caledonia': 'New Caledonia',
    'Iran, Islamic Republic of': 'Iran',
    'Korea, Republic of': 'Korea, Rep.',
    'Kosovo, Republic of': 'Kosovo',
    "Lao People's Democratic Republic": 'Lao PDR',
    'Serbia, Republic of': 'Serbia',
    'Sint Maarten': 'Sint Maarten (Dutch part)',
    'Timor-Leste, Dem. Rep. of': 'Timor-Leste',
    'Venezuela, Republica Bolivariana de': 'Venezuela, RB',
    'Venezuela, República Bolivariana de': 'Venezuela, RB',
    'Yemen, Republic of': 'Yemen'
}

In [41]:
imf_df = imf_df.replace({'country': imf_to_wb_country_map})

In [42]:
policy_df = pd.concat([wb_df, imf_df], join='outer')

In [43]:
policy_df.size

1611008

In [44]:
policy_df.head(15)

Unnamed: 0,country,indicator,date,value
0,Afghanistan,Bank capital to assets ratio (%),1960-01-01,0.0
1,Afghanistan,Bank nonperforming loans to total gross loans (%),1960-01-01,0.0
2,Afghanistan,Broad money (% of GDP),1960-01-01,13.45
3,Afghanistan,"Central government debt, total (% of GDP)",1960-01-01,0.0
4,Afghanistan,Compensation of employees (% of expense),1960-01-01,0.0
5,Afghanistan,Firms using banks to finance investment (% of ...,1960-01-01,0.0
6,Afghanistan,GDP (constant 2010 US$),1960-01-01,0.0
7,Afghanistan,GDP growth (annual %),1960-01-01,0.0
8,Afghanistan,GDP per capita growth (annual %),1960-01-01,0.0
9,Afghanistan,GNI growth (annual %),1960-01-01,0.0


In [45]:
indicators = sorted(list(policy_df['indicator'].unique()))

In [46]:
assert len(indicators) == len(indicator_list), 'The number of retrieved variables (%s) does not match the number of specified variables (%s).\nThe following variables are missing:\n\n %s' % (len(indicators), len(indicator_list), [i for i in indicator_list if i not in indicators])

In [47]:
policy_df.to_csv('model_one/policy.csv', sep=';', index=False)