# Machine Reading IMF Data: Data Retrieval with Python
## Brian Dew, brianwdew@gmail.com

## Introduction
The International Monetary Fund (IMF) Statistics Department (STA) allows API access to their economic time series. Well-known datasets such as International Financial Statistics (IFS) can be machine read through the API. This example will use python to retrieve data from STA's JSON RESTful Web Service so that we can examine how Bulgaria's foreign direct investment assets have evolved over time. 


The [IMF knowledge base](http://datahelp.imf.org/knowledgebase/articles/630877-data-services) provides more information on the three avaiable API formats and IMF data services. For more information on the work of STA, see their PDF [annual report (PDF)](https://www.imf.org/external/np/sta/pdf/aglance.pdf}{https://www.imf.org/external/np/sta/pdf/aglance.pdf), STA at a glance 2015.

## Gathering series and data dimension information
First, we will need to import the json, urllib2, and requests libraries, as well as pandas and numpy. The first three allow us to read json data, open urls, and request information from the web. The latter two help us to work with the resultant data.

In [1]:
# Import libraries
import json
import urllib2
import requests
import pandas as pd
import numpy as np

Since we are using the JSON RESTful API, we start by using the 'Dataflow' endpoint URL to look at what series are available and find the series id of interest. The full output is long, so I've removed the data unrelated to this example. The IMF has many more series than what is shown below. 

In [2]:
# Find the series id and text name.
url = "http://dataservices.imf.org/REST/SDMX_JSON.svc/Dataflow/"
seriesids = json.load(urllib2.urlopen(url))
df = pd.DataFrame(seriesids['Structure']['KeyFamilies']['KeyFamily'])
for x in range(0, 4):    
    items = (str(df['@id'][x]), str(df['Name'][x]['#text']))
    print ': '.join(items)

FSIRE: Financial Soundness Indicators (FSI), Reporting Entities
FAS: Financial Access Survey (FAS)
IFS: International Financial Statistics (IFS)
MCDREO: Middle East and Central Asia Regional Economic Outlook (MCDREO)


We found above that the id for International Financial Statistics is, unsurprisingly, IFS. We can use this id to read notes about the series. We will next need to identify the *dimensions* of the data. 

In [3]:
# Annotations for the series
url = "http://dataservices.imf.org/REST/SDMX_JSON.svc/DataStructure/IFS"
dotstruct = json.load(urllib2.urlopen(url))
df = pd.DataFrame(dotstruct['Structure']['KeyFamilies']\
                  ['KeyFamily']['Annotations'])
for x in range(0, 3):    
    items = (str(df['Annotation'][x]['AnnotationTitle']), \
             str(df['Annotation'][x]['AnnotationText']['#text']))
    print ': '.join(items)

Latest Update Date: 03/25/2016
Name: International Financial Statistics (IFS)
Temporal Coverage: Data available starting in the 1948 for many IMF member countries. Varies by country.


In [4]:
# Look at structure of DOTS data to find the dimensions for our data request
url = "http://dataservices.imf.org/REST/SDMX_JSON.svc/DataStructure/IFS"
dotstruct = json.load(urllib2.urlopen(url))
df = pd.DataFrame(dotstruct['Structure']['KeyFamilies']['KeyFamily']\
                  ['Components']['Dimension'])
for x in range(0, 4):    
    items = ("Dimension", str(x+1), str(df['@codelist'][x]))
    print ': '.join(items)

Dimension: 1: CL_AREA|IFS
Dimension: 2: CL_INDICATOR|IFS
Dimension: 3: CL_FREQ|IFS
Dimension: 4: CL_UNIT_MULT|IFS


We can now copy the code for each dimension into the CodeList Method to view the list of possible values. For example, we will need to identify the value of the first dimension, CL_COUNTRY, for Bulgaria. Below, we show that the code is 918. I've manually shortened the output range, to save space, but you can replace [30, 35] with [0, 247] to get the full list of country/area codes.

In [5]:
# Obtain country codes
url = "http://dataservices.imf.org/REST/SDMX_JSON.svc/CodeList/CL_COUNTRY|IFS"
country = json.load(urllib2.urlopen(url))
df = pd.DataFrame(country['Structure']['CodeLists']['CodeList']['Code'])
for x in range(30, 35):    
    items = (str(df['@value'][x]), str(df['Description'][x]['#text']))
    print ': '.join(items)

TypeError: 'NoneType' object has no attribute '__getitem__'

The series ID is IFS and the country code of interest is 918. We see below that IFS contains thousands of individual indicators (data series), but our code of interest is IAD_BP6_USD, direct investment assets, in US Dollars, compiled based on the Balance of Payments and International Investment Position Manual, 6th edition (BPM6) methodology.

In [6]:
# Obtain series info and ids
url = "http://dataservices.imf.org/REST/SDMX_JSON.svc/CodeList/CL_INDICATOR|IFS"
series = json.load(urllib2.urlopen(url))
df = pd.DataFrame(series['Structure']['CodeLists']['CodeList']['Code'])
for x in range(3, 8):    
    items = (str(df['@value'][x]), str(df['Description'][x]['#text']))
    print ': '.join(items)

IADD_BP6_USD: Assets, Direct investment, Debt instruments, US Dollars
IADE_BP6_USD: Assets, Direct investment, Equity and investment fund shares , US Dollars
IAD_BP6_USD: Assets, Direct investment, US Dollars
IAD_USD: Assets, FDI Abroad (BPM5), US Dollars
IADF_BP6_USD: Assets, Financial derivatives (other than reserves) and employee stock options , US Dollars


## Retrieving Data
The guide to STA's API shows how we can combine information from the previous steps to call and retrieve data. For our example, we see that the dimensions are as follows:

* Dimension 1: CL_COUNTRY (the primary country) - 918
* Dimension 2: CL_INDICATOR (the measure/series of interest) - IAD_BP6_USD
* Dimension 4: CL_FREQ (the frequency of the data--we want to use quarterly data) - Q 
* Dimension 5: CL_UNIT_MULT (the units of measure--we can leave this blank)

The JSON RESTful API method for requesting the data is the CompactData Method. The format for putting together dimension and time period information is shown on the Web Service knowledge base as:

    http://dataservices.imf.org/REST/SDMX_JSON.svc/CompactData/{database ID}/ {item1 from dimension1}+{item2 from dimension1}{item N from dimension1}.{item1 from dimension2} +{item2 from dimension2}+{item M from dimension2}? startPeriod={start date}&endPeriod={end date}

Putting all of this information together, the URL to retrieve a JSON dictionary for Bulgaria's direct investment assets is:

http://dataservices.imf.org/REST/SDMX_JSON.svc/CompactData/IFS/918.IAD_BP6_USD.Q.?startPeriod=1998&endPeriod=2016

The python code which gets the data and saves it as a dictionary is as follows:

In [7]:
url = 'http://dataservices.imf.org/REST/SDMX_JSON.svc/CompactData/IFS/918.IAD_BP6_USD.Q.?startPeriod=1998&endPeriod=2016'
data = json.loads(requests.get(url).text)
bgfdi = pd.DataFrame(data['CompactData']['DataSet']['Series']['Obs'])
del bgfdi['@OBS_STATUS']
bgfdi.columns = ['date','bgfdi']
bgfdi.bgfdi = bgfdi.bgfdi.astype(float)
rng = pd.date_range('1/1/2007', periods=34, freq='QS')
bgfdi = bgfdi.set_index(pd.DatetimeIndex(rng))
bgfdi.tail()

KeyError: 'Series'

## Graphing the data
Let's use matplotlib to view the result of our work.

In [None]:
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
txt = '''Source: International Monetary Fund'''

# Plot Bulgaria Assets - Direct Investment
bgfdi.bgfdi.plot(grid=True, figsize=(9, 5), color="blue", linewidth=2,)
plt.ylabel('U.S. Dollars')
plt.xlabel('Year')
plt.text(147,400000000,txt)
plt.title('Bulgaria: Direct Investment Assets');

##  Export dataset to .csv
Let's save the dataset in a portable format that can be read by any statistical software. My preference is to create a .csv file, which can be achieved using the following code:

In [None]:
bgfdi.to_csv('bgfdi.csv')