# Existing packages for data extraction from the IMF

Currently, there are these packages: 


1.   [**imfpy**](https://pypi.org/project/imfpy/) extracts DoTS data. 
2.   [**PyIMF**](https://pypi.org/project/PyIMF/) searches through series, dimensions and requests data given series and index code.
3.   [**datapungi_imf**](https://pypi.org/project/datapungi_imf/#description) is not well documented; it is unclear how to use it. Seems like the user must input the index codes.
4.   [**imf**](https://pypi.org/project/imf/) - empty, created in May 2020

We will go through each to understand their functionality and how they are created.

## imf

In [1]:
!pip install imf

Collecting imf
  Downloading imf-0.0.1a0-py3-none-any.whl (2.0 kB)
Installing collected packages: imf
Successfully installed imf-0.0.1a0


In [2]:
import imf
dir(imf)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__']

Only special methods are present in the imf package. The [Github](https://github.com/nathanbegbie/imf) repository currently contains only set up files for the package.

## datapungi_imf

In [3]:
!pip install datapungi_imf

Collecting datapungi_imf
  Downloading datapungi_imf-0.2.0-py2.py3-none-any.whl (55 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.7/55.7 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting xlrd
  Downloading xlrd-2.0.1-py2.py3-none-any.whl (96 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m96.5/96.5 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
Collecting pyperclip
  Downloading pyperclip-1.8.2.tar.gz (20 kB)
  Preparing metadata (setup.py) ... [?25ldone
Collecting html5lib
  Downloading html5lib-1.1-py2.py3-none-any.whl (112 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m112.2/112.2 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
Collecting pytest-html
  Downloading pytest_html-3.2.0-py3-none-any.whl (16 kB)
Collecting pytest-metadata
  Downloading pytest_metadata-2.0.4-py3-none-any.whl (9.9 kB)
Building wheels for collected packages: pyperclip
  Building wheel for pyperclip (setup.py) ... [?25ldone
[?25h  C

In [4]:
import datapungi_imf as dpi
dir(dpi)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 'api',
 'data',
 'driverCore',
 'drivers',
 'generalSettings',
 'pandas',
 'pd',
 'requests',
 'sys',
 'tests',
 'utils']

How the package is described on the PyPi page: 

```datapungi_imf``` *is a python package that provides a simplified way to extract data from the API of IMF (IMF). Overall it can read a saved API key (in json/yaml files or environment variables (default)) to avoid having a copy of it on a script.*
*Can automatically test:*
* *the connectivity to all BEA datasets,*
* *the quality of the cleaned up data, and*
* *if the provided requests code snippet returns the correct result.*

There is no documentation and the github page link is broken, so it is unclear how the methods are supposed to work. We will run a few to see what they output.



In [5]:
data = dpi.data() #or data = dpi.data("API Key"), see setting up section   

data.list()

Unnamed: 0,id,description,language
0,BOP_2017M06,"Balance of Payments (BOP), 2017 M06",en
1,BOP_2020M3,"Balance of Payments (BOP), 2020 M03",en
2,BOP_2017M11,"Balance of Payments (BOP), 2017 M11",en
3,DOT_2020Q1,"Direction of Trade Statistics (DOTS), 2020 Q1",en
4,GFSMAB2016,Government Finance Statistics Yearbook (GFSY 2...,en
...,...,...,...
254,MFS,Monetary and Financial Statistics (MFS),en
255,PCPS,Primary Commodity Price System (PCPS),en
256,FSI,Financial Soundness Indicators (FSIs),en
257,FSIRE,Financial Soundness Indicators: Reporting enti...,en


In [6]:
data.params('IFS') # return all the dimentions? 

{'concepts':             id                                        text
 0    OBS_VALUE                                       Value
 1    UNIT_MULT                                       Scale
 2  TIME_FORMAT                                 Time format
 3         FREQ                                   Frequency
 4     REF_AREA                              Reference Area
 5    INDICATOR                                   Indicator
 6    BASE_YEAR                                   Base Year
 7  TIME_PERIOD                                        Date
 8   OBS_STATUS  Observation Status (incl. Confidentiality),
 'annotations':                  title                                               text
 0   Latest Update Date                                         02/06/2023
 1                 Name           International Financial Statistics (IFS)
 2    Temporal Coverage  Data available starting in the 1948 for many I...
 3  Geographic Coverage  IFS covers 194 countries and areas.\n\nUnder t.

In [7]:
for key in data.params('IFS'):
  print(key)

concepts
annotations
codes


In [8]:
data.params('IFS')['concepts']

Unnamed: 0,id,text
0,OBS_VALUE,Value
1,UNIT_MULT,Scale
2,TIME_FORMAT,Time format
3,FREQ,Frequency
4,REF_AREA,Reference Area
5,INDICATOR,Indicator
6,BASE_YEAR,Base Year
7,TIME_PERIOD,Date
8,OBS_STATUS,Observation Status (incl. Confidentiality)


In [9]:
data.params('IFS')['annotations']

Unnamed: 0,title,text
0,Latest Update Date,02/06/2023
1,Name,International Financial Statistics (IFS)
2,Temporal Coverage,Data available starting in the 1948 for many I...
3,Geographic Coverage,IFS covers 194 countries and areas.\n\nUnder t...
4,Methodology,The International Financial Statistics is base...
5,Sectoral Coverage,"National Accounts, Indicators of Economic Acti..."
6,Definition,The International Financial Statistics databas...
7,Code,IFS


In [None]:
data.params('IFS')['codes']

In [None]:
for key in data.params('IFS')['codes']:
  print(key) 

In [None]:
df = data.data('IFS/A.GB.LU_PE_NUM')
df.head(n=5)

In [None]:
data.meta('IFS/A.GB.LU_PE_NUM')

This package returns metadata in the form of json files and returns observation values as a dataframe. 

Same package but different dataset.

In [None]:
data.params('FSI')['codes']['CL_INDICATOR_FSI']

In [None]:
data.meta('FSI/A.GB.FS_ODX_AFCDGN_XDC')

In [None]:
data.data('FSI/A.GB.FS_ODX_AFCDGN_XDC') # error

The methods seem to work for IFS data only. Having tried extracting data for 'FSI', 'GFS', 'MFS', the request resulted in error. 

## PyIMF

In [None]:
!pip install PyIMF

In [None]:
import PyIMF

The package does not import.
Based on the codes on its [Github](https://github.com/ceggersp/IMF_API) page, the package it supported to 
1. output a list of series given search term:

```
find_series(search_term) 
```

2. output dimention names as a pandas dataframe given selected series:

```
find_dims(series)
```

3. return list of codes and their names for a given dimentiosn (eg 'CL_INDICATOR_IFS'):

```
find_codes(dim)
```

4. return data given parameters looked up before:


```
request_data(dataset, parameters, countries = 'ALL', F='A', var_name=0, save_file=0, file_type='csv', sleep = False)
```

Having run the codes from the github page, the functions perform as expected, except for one case. The codes are available in the next subsection if you are intrerested (on colab: unhide to view).








### Codes from the github page (on colab: unhide to view)

In [None]:
import pandas as pd
import numpy as np
import requests
import time
import os
import platform
url = 'http://dataservices.imf.org/REST/SDMX_JSON.svc/'


if platform.system() == 'Linux':
    clear_command = 'clear'
else:
    clear_command = 'cls'

def find_series(search_term):
    key = 'Dataflow'  # Method with series information
    search_series_list = pd.DataFrame(columns = ['series_name','series_ID'])
    full_series_list = requests.get(f'{url}{key}').json()\
                ['Structure']['Dataflows']['Dataflow']
    # Use dict keys to navigate through results:
    for series in full_series_list:
        if search_term in series['Name']['#text']:
            series_name = pd.DataFrame([series['Name']['#text']], columns = ['series_name'])
            series_ID = pd.DataFrame([series['KeyFamilyRef']['KeyFamilyID']], columns = ['series_ID'])
            search_series_list = pd.concat([search_series_list, pd.concat([series_name, series_ID], axis=1)], axis=0, ignore_index = True)
    return search_series_list

In [None]:
a = find_series('IFS')

In [None]:
def find_dims(series):
    key = 'DataStructure/'+series  # Method / series
    dimension_list = requests.get(f'{url}{key}').json()\
                ['Structure']['KeyFamilies']['KeyFamily']\
                ['Components']['Dimension']
    dims = pd.DataFrame(columns = ['Dimensions'])
    for n in range(0, len(dimension_list)):
        dim = pd.DataFrame([dimension_list[n]['@codelist']], columns = ['Dimensions'])
        dims = pd.concat([dims, dim], axis=0, ignore_index = True)
    
    return dims

In [None]:
series = 'IFS'
b  = find_dims(series)

In [None]:
b

In [None]:
def find_codes(dim):
    
    key = f"CodeList/{dim}"
    code_list = requests.get(f'{url}{key}').json()\
            ['Structure']['CodeLists']['CodeList']['Code']

    codes = pd.DataFrame(columns = ['Description','Code'])
    # Use dict keys to navigate through results:
    for c in code_list:
        code_desc = pd.DataFrame([c['Description']['#text']], columns = ['Description'])
        code = pd.DataFrame([c['@value']], columns = ['Code'])
        codes = pd.concat([codes, pd.concat([code_desc, code], axis=1)], axis=0, ignore_index = True)

    return codes

In [None]:
c = find_codes('CL_INDICATOR_IFS')

In [None]:
c

In [None]:
def request_data(dataset, parameters, countries = 'ALL', F='A', var_name=0, save_file=0, file_type='csv', sleep = False):

    countries_code_list = find_codes('CL_AREA_'+dataset)

    if var_name == 0:
        var_name = parameters

    if countries == 'ALL':
        countries_parameter = ''
        one_country = False
    else:
        if len(countries) == 1:
            one_country = True
        else:
            one_country = False
        for i in range(0, len(countries)):
            if i == 0:
                countries_parameter = countries[i]
            else:
                countries_parameter = countries_parameter+'+'+countries[i]

    key = 'CompactData/'+dataset+'/'+F+'.'+countries_parameter+'.'+parameters
    print(key)
    # Navigate to series in API-returned JSON data
    data = (requests.get(f'{url}{key}').json()
            ['CompactData']['DataSet']['Series'])

    if sleep == False:
        pass
    else:
        if sleep == True:
            time.sleep(1.005)
        else:
            print('Input "sleep" must be either "True" or "False"')

    # Create pandas dataframe from the observations
    PANEL = pd.DataFrame(columns=['period', var_name, 'country'])
    if one_country == True:
        data_list = [[obs.get('@TIME_PERIOD'), obs.get('@OBS_VALUE')]
                    for obs in data['Obs']]
        country_data = pd.DataFrame(data_list, columns=['period', var_name])
        country_data['country'] = data['@REF_AREA']
        PANEL = pd.concat([PANEL,country_data], axis = 0, ignore_index=True)
    else:
        for i in range(0,len(data)):
            data_list = [[obs.get('@TIME_PERIOD'), obs.get('@OBS_VALUE')]
                        for obs in data[i]['Obs']]
            country_data = pd.DataFrame(data_list, columns=['period', var_name])
            country_data['country'] = data[i]['@REF_AREA']
            PANEL = pd.concat([PANEL,country_data], axis = 0, ignore_index=True)

    PANEL['country_name'] = ['.' for i in range(0, len(PANEL))]
    for i in range(0, len(PANEL)):
        row = countries_code_list['Description'][countries_code_list['Code'] == PANEL['country'][i]].values.astype(str)
        PANEL['country_name'][i] = row[0]    

    PANEL['obs_code'] = PANEL['period'].astype(str)+PANEL['country']

    if save_file == 0:
        pass
    else:
        if file_type == 'csv':
            PANEL.to_csv(save_file+'.'+file_type, header=True, index=False)
        else:
            PANEL.to_excel(save_file+'.'+file_type, header=True, index=False)

    os.system(clear_command)
    print('Data retrieved succesfully')
    return PANEL


In [None]:
print(request_data('IFS', 'PMP_IX'))

In [None]:
print(request_data('IFS', 'PMP_IX', countries = ['GB','US'], F='Q', var_name=0, save_file=0, file_type='csv', sleep = False))

In [None]:
print(request_data('IFS', 'PMP_IX', countries = ['US'], F='Q', var_name=0, save_file=0, file_type='csv', sleep = False))

Results in error if only one country is specified. 

## imfpy

In [None]:
!pip install imfpy

In [None]:
import imfpy

In [None]:
dir(imfpy)

The package has a good [user guide](https://imfpy.readthedocs.io/en/latest/example.html#module-imfpy-searches) and can be used to 
1. extract/visualize IMF's Direction of Trade (DoTS) data and 
2. search through the IMF datasets, the indicators available and retrives some metadata for the datasets.

Some examples are shown below. 

In [None]:
from imfpy.retrievals import dots

In [None]:
# dots("DE", ["US", "AU"], 2017, 2022, freq='M')
dots("DE", ["US", "AU"], 2017, 2022, freq='A')

In [None]:
#Example: plot Australia trade data
from imfpy.tools import dotsplot
d = dots('AU',['US','CN'], 2000, 2020, freq='A', form="long")
#d = dots('AU',['US','CN'], 2000, 2020, freq='Q', form="long")

In [None]:
d.head(n=3)

In [None]:
from imfpy import searches

In [None]:
searches.country_search("^B.*a$", regex=True)

In [None]:
searches.database_search("^Financial.*", regex=True)

In [None]:
searches.database_info('FSI')