# Trade network analysis
**Brian Dew (brianwdew@gmail.com)**

`04_imfdata.ipynb`

Builds a dataframe of relevant economic indicators for later use in hypothesis testing.

Required file:

* region_codes.csv - Mapping of countries with missing price data to areas with price data.

---

METODO: 

1. Change from ISO2 to ISO3 country codes.
2. Deflate price of exports using index.
       

#### Import packages

The requests package and pandas are used to collect data from the IMF API. 

In [1]:
import requests                                             # For requesting json data from the url
import pandas as pd                                         # pandas dataframes used for convenience
import os                                                   # change current directory in next line
os.chdir('C:/Working/trade_network/data/')

#### settings for API request

First, several variables are defined in later build the URL that requests the data of interest. These variables include the data request [method](http://datahelp.imf.org/knowledgebase/articles/667681-json-restful-web-service), the series (International Financial Statistics), the frequency (Annual), the indicators (Total exports, real effective exchange rate, and import price index), and the date range (2008 to 2014). 

In [10]:
webserv = 'http://dataservices.imf.org/REST/SDMX_JSON.svc/' # the main URL for the JSON rest API
methodCD = 'CompactData/'                                     # CompactData contains only the data 
methodDS = 'DataStructure/' # This method gives info on the country names, units, and indicator names
series = 'IFS'                                              # International Financial Statistics series
freq = 'A'                                                  # Annual
# Set of IFS indicators of interest 
inds = {'x': 'TXG_FOB_USD', 'q': 'EREER_IX', 'mp': 'TMG_D_USD_CIF_IX', 'xp': 'TXG_D_USD_FOB_IX'}  
date = '?startPeriod=2008&endPeriod=2015'                   # Date range of interest

#### Dictionaries of codes

Create dictionaries with country, indicator, and unit multiplier codes from the IMF's code list. Region codes are identified manually in a csv file. The source of the mapping is [here](https://www.imf.org/external/pubs/ft/weo/2016/02/weodata/groups.htm)

In [11]:
urlDS = webserv+methodDS+series   # url to access IMF datastructure method API
# Request data from IMF JSON RESTful API URL above. Navigate to the code list:
dataDS = requests.get(urlDS).json()['Structure']['CodeLists']['CodeList']#
df = pd.DataFrame(dataDS[2]['Code']).set_index('@value')               # area names here
area_names = {c : df['Description'].loc[c]['#text'] for c in df.index.values}
df = pd.DataFrame(dataDS[3]['Code']).set_index('@value')               # indicator codes
ifs_inds = {i : df['Description'].loc[i]['#text'] for i in df.index.values}
df = pd.DataFrame(dataDS[0]['Code']).set_index('@value')               # unit codes
unit_codes = {m : df['Annotations'].loc[m]['Annotation'][2]['AnnotationText']['#text'] 
              for m in df.index.values}
# A csv file maps countries with missing price data to their regions, and this is read below:
region_codes = pd.read_csv('region_codes.csv', header=None, index_col=0).to_dict()[1] # id'd manually

#### Print out the full indicator names

The full name of the indicator tells the unit of measurement.

In [4]:
for k in inds.keys():                         # keys are the x, p, q values in inds
    print inds[k]+': '+ifs_inds[inds[k]]      # print the indicator id and name

EREER_IX: Real Effective Exchange Rate, based on Consumer Price Index, Index
TXG_FOB_USD: Goods, Value of Exports, Free on board (FOB), US Dollars
TXG_D_USD_FOB_IX: Goods, Deflator/Unit Value of Exports, Index, US Dollars, Index
TMG_D_USD_CIF_IX: Goods, Deflator/Unit Value of Imports, US Dollars, Index


#### Loop with API request for each indicator

Using the IMF API, data on exports, prices, and exchange rates are collected for all available countries during 2008-2014. 

First raw data is requested for each indicator. 

Next, the units are adjusted to its multiplier (for example, if the value is 24 and the unit multiplier is 6, you effectively add six zeros or multiply by 1,000,000).

In [12]:
fd = {} # dictionary for saving each series from inds above
unit_mult = {} # dictionary for saving unit multipliers by country and indicator
for k, v in inds.iteritems(): # k is the key and v is the value
    url = webserv+methodCD+series+'/'+freq+'..'+v+'.'+date # print url to see
    # Build a dataframe for each indicator with the raw data from the IMF API:
    df = pd.DataFrame(requests.get(url).json()
                      ['CompactData']['DataSet']['Series']).set_index('@REF_AREA')
    df['@UNIT_MULT'] = df['@UNIT_MULT'].map(unit_codes) # match unit codes with unit multipliers
    df = df[df['Obs'].apply(lambda x: isinstance(x, list))] # drops empties
    d = {} # temporary dict to save country by country dataframes
    for c in df.index.values: # index values are countries (@REF_AREA) as set above
        d[c] = pd.DataFrame(df.loc[c]['Obs']).rename(columns={'@TIME_PERIOD':'date'})
        # Multiply units by unity multiplier value. 
        d[c]['@OBS_VALUE'] = pd.to_numeric(d[c]['@OBS_VALUE']) * int(df['@UNIT_MULT'][c]) 
    # Concatenate all country rows into one dataframe for each indicator:
    fd[k]= pd.concat(d, axis=0).reset_index().set_index(['level_0','date']).drop('level_1', 1) 

#### Find missing price data

In [14]:
#merged = pd.concat(fd, axis=1).reset_index() # combine all series to one merged dataframe
#merged['full_name'] = merged['level_0'].map(area_names)    # add column with full name of area
#merged = merged.set_index(['level_0','full_name','date'])  # set index to country and date
#merged = merged[merged['xp']['@BASE_YEAR'].isnull()]
#missing = merged[merged['q']['@OBS_VALUE'] > 0].reset_index()['level_0'].unique()
#for prod in missing:
#    print prod

AG
AM
AN
AT
BG
BH
BI
BS
BZ
CD
CF
CH
CI
CL
CM
CN
CO
CR
CY
CZ
DM
DO
DZ
FJ
FR
GA
GD
GE
GH
GM
GQ
GY
HR
IR
IS
KN
LC
LS
LU
LV
MA
MD
MK
MT
MW
MX
MY
NG
NI
PH
PL
PT
PY
RO
RU
SB
SK
SL
TG
TN
TT
UA
UG
UY
VC
VE
WS
ZA
ZM


#### Replace missing price data with regional data
Price data is missing for many countries, however, an acceptable substitute is the regional values. I've manually mapped out which region (with full data) best matches with each country in the `region_codes` dictionary. To replace missing values, I loop through the dictionary and set the country value equal to the region value. 

The IMF area groupings are [here](https://www.imf.org/external/pubs/ft/weo/2016/02/weodata/groups.htm).

In [7]:
date_range = fd['xp'].loc['DE'].reset_index().date.values  # Germany selected (automate!)
p_subinds = ['@BASE_YEAR' ,'@OBS_VALUE']  # To save space below these are the sub indicators for prices
for k, v in region_codes.iteritems():   # region_codes is a dictionary of {country code: region code}
    for date in date_range:  # repeate the replacement for each year in the data.
        fd['mp'].loc[(k, date), p_subinds] = fd['mp'].loc[(v, date), p_subinds]
        fd['xp'].loc[(k, date), p_subinds] = fd['xp'].loc[(v, date), p_subinds] 

KeyError: ('F97', u'2007')

#### Merge and save
The `merged` dataframe combines all indicators as well as the full name of the country or area into one dataframe

In [None]:
# The last step is to merge the various indicator values (prices, exports, etc) into one dataframe        
merged = pd.concat(fd, axis=1).reset_index() # combine all series to one merged dataframe
merged['full_name'] = merged['level_0'].map(area_names)    # add column with full name of area
merged = merged.set_index(['level_0','full_name','date'])  # set index to country and date
merged.dropna().to_csv('imf_data.csv')                      # drop missing and save as csv file