![This is an image](Quant-Trading.jpg)

<font size="3">
Please visit our website <a href="https://www.quant-trading.co" target="_blank">quant-trading.co</a> for more tools on quantitative finance and data science.
</font>

In [1]:
# !pip install pandas-datareader

## **¿How to download data from the OECD database?**


<font size="3"> The OECD provides access to datasets in the catalogue of OECD databases through a RESTful application programming interface (API) based on the SDMX-ML standard.  This allows a developer to easily call the API using simple RESTful URL programmatically. Fortunately we can use the pandas_datareader library for that purpose as we have shown previously. <br><br>

<font size="3">
The pandas_datareader library allows you to fetch data from different sources, including Yahoo Finance for financial market data, World Bank for global development data, and St. Louis Fed for economic data. In this notebook, we’ll show how you can load data from OECD. Behind the scene, pandas_datareader pulls the data you want from the web in real time and assembles it into a pandas DataFrame. Because of the vastly different structure of web pages, each data source needs a different reader. Hence, pandas_datareader only supports reading from a limited number of sources, mostly related to financial and economic time series. Below you can find an example on how this works.<br><br>
    
<font size="3">
We will also show how to download data directly without the use of any particular python library and we will also briefly explore the pandasdmx library, which can be also used in this API.     

In [2]:
import warnings
warnings.filterwarnings('ignore')

import pandas_datareader.data as web  #Pandas Datareader
import datetime

## **Historical price data - specific dates**


<font size="3">
In this example we show how to download data for an specific timeframe. We need to know the code of the specific OECD's database. For example, in this case we are using the database "MEI_FIN" which is related to financial variables. To get those codes you can visit the OECD webpage https://stats.oecd.org/ to see what info is available. 

In [3]:
#DATES
start_date = datetime.datetime(2013, 12, 31)
end_date = datetime.datetime(2023, 12, 31)

#SPECIFIC DATABASE WITHIN OECD API
database = 'MEI_FIN'

df1 = web.DataReader(database, 'oecd', start_date, end_date)
df1

Subject,"Share Prices, Index","Share Prices, Index","Share Prices, Index","Share Prices, Index","Share Prices, Index","Share Prices, Index","Share Prices, Index","Share Prices, Index","Share Prices, Index","Share Prices, Index",...,"Immediate interest rates, Call Money, Interbank Rate, Per cent per annum","Immediate interest rates, Call Money, Interbank Rate, Per cent per annum","Immediate interest rates, Call Money, Interbank Rate, Per cent per annum","Immediate interest rates, Call Money, Interbank Rate, Per cent per annum","Immediate interest rates, Call Money, Interbank Rate, Per cent per annum","Immediate interest rates, Call Money, Interbank Rate, Per cent per annum","Immediate interest rates, Call Money, Interbank Rate, Per cent per annum","Immediate interest rates, Call Money, Interbank Rate, Per cent per annum","Immediate interest rates, Call Money, Interbank Rate, Per cent per annum","Immediate interest rates, Call Money, Interbank Rate, Per cent per annum"
Country,Australia,Australia,Australia,Austria,Austria,Austria,Belgium,Belgium,Belgium,Canada,...,Saudi Arabia,Bulgaria,Bulgaria,Bulgaria,Romania,Romania,Romania,Croatia,Croatia,Croatia
Frequency,Annual,Quarterly,Monthly,Annual,Quarterly,Monthly,Annual,Quarterly,Monthly,Annual,...,Monthly,Annual,Quarterly,Monthly,Annual,Quarterly,Monthly,Annual,Quarterly,Monthly
Time,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3
2014-01-01,,,35.72023,,,48.10636,,,30.07202,,...,,,,87.4707,,,,,,9.5
2014-02-01,,,36.77441,,,48.84916,,,30.95100,,...,,,,89.2576,,,,,,8.5
2014-03-01,,,36.31924,,,48.28220,,,29.96547,,...,,,,104.2588,,,,,,8.5
2014-04-01,,,35.42709,,,45.98772,,,28.79349,,...,,,,107.6133,,,,,,8.5
2014-05-01,,,34.23272,,,44.73231,,,29.09980,,...,,,,107.9652,,,,,,8.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-08-01,,,60.33166,,,74.67569,,,55.50713,,...,,,,1.6016,,,21.59100,,,4.5
2023-09-01,,,62.07179,,,76.49167,,,55.04503,,...,,,,2.0694,,,21.00348,,,4.5
2023-10-01,,,62.62614,,,80.71296,,,56.52643,,...,,,,1.8251,,,20.54190,,,4.5
2023-11-01,,,61.77567,,,77.79077,,,55.29437,,...,,,,3.0313,,,21.04428,,,4.5


## **Get specific information from this DataFrame with Multilevel Index**


<font size="3">
The DataFrame we just downloaded is a Multilevel Index Dataframe. We need to understands its structure to be able to extract the information in an esay way. In the following scripts we will show one way to do that. The first thing we are going to do is to see the columns' names. That can be easily done with the pandas attribute "columns". As you can see below we get a list of all the columns in the DataFrame

In [4]:
#Get the column names
df1.columns

MultiIndex([(                                                     'Share Prices, Index', ...),
            (                                                     'Share Prices, Index', ...),
            (                                                     'Share Prices, Index', ...),
            (                                                     'Share Prices, Index', ...),
            (                                                     'Share Prices, Index', ...),
            (                                                     'Share Prices, Index', ...),
            (                                                     'Share Prices, Index', ...),
            (                                                     'Share Prices, Index', ...),
            (                                                     'Share Prices, Index', ...),
            (                                                     'Share Prices, Index', ...),
            ...
            ('Immediate interest r

## **Transpose the DataFrame and further explore the names**


<font size="3">
It might be easier to work with rows instead of columns. That is because we can use the rows as an index, and then we can apply some methods that help us better understand the data structure. The first thing we do here is to transpose the DataFrame with the property "T".

In [5]:
df1.T

Unnamed: 0_level_0,Unnamed: 1_level_0,Time,2014-01-01,2014-02-01,2014-03-01,2014-04-01,2014-05-01,2014-06-01,2014-07-01,2014-08-01,2014-09-01,2014-10-01,...,2023-03-01,2023-04-01,2023-05-01,2023-06-01,2023-07-01,2023-08-01,2023-09-01,2023-10-01,2023-11-01,2023-12-01
Subject,Country,Frequency,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
"Share Prices, Index",Australia,Annual,,,,,,,,,,,...,,,,,,,,,,
"Share Prices, Index",Australia,Quarterly,,,,,,,,,,,...,,,,,,,,,,
"Share Prices, Index",Australia,Monthly,35.72023,36.77441,36.31924,35.42709,34.23272,33.49716,33.10753,33.12756,33.8176,35.68745,...,58.29767,59.28762,58.36786,58.9150,60.12669,60.33166,62.07179,62.62614,61.77567,63.73055
"Share Prices, Index",Austria,Annual,,,,,,,,,,,...,,,,,,,,,,
"Share Prices, Index",Austria,Quarterly,,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
"Immediate interest rates, Call Money, Interbank Rate, Per cent per annum",Romania,Quarterly,,,,,,,,,,,...,,,,,,,,,,
"Immediate interest rates, Call Money, Interbank Rate, Per cent per annum",Romania,Monthly,,,,,,,,,,,...,19.31454,19.96565,21.28400,20.4995,21.05300,21.59100,21.00348,20.54190,21.04428,20.67182
"Immediate interest rates, Call Money, Interbank Rate, Per cent per annum",Croatia,Annual,,,,,,,,,,,...,,,,,,,,,,
"Immediate interest rates, Call Money, Interbank Rate, Per cent per annum",Croatia,Quarterly,,,,,,,,,,,...,,,,,,,,,,


<font size="3">
As you can see we didn't get much more information than the one we got before. But we can try the "index" property and see what we get

In [6]:
df1.T.index

MultiIndex([(                                                     'Share Prices, Index', ...),
            (                                                     'Share Prices, Index', ...),
            (                                                     'Share Prices, Index', ...),
            (                                                     'Share Prices, Index', ...),
            (                                                     'Share Prices, Index', ...),
            (                                                     'Share Prices, Index', ...),
            (                                                     'Share Prices, Index', ...),
            (                                                     'Share Prices, Index', ...),
            (                                                     'Share Prices, Index', ...),
            (                                                     'Share Prices, Index', ...),
            ...
            ('Immediate interest r

<font size="3">
Still not much information.....but let's try the function "get_values()"

In [7]:
df1.T.index.get_level_values(0)

Index(['Share Prices, Index', 'Share Prices, Index', 'Share Prices, Index',
       'Share Prices, Index', 'Share Prices, Index', 'Share Prices, Index',
       'Share Prices, Index', 'Share Prices, Index', 'Share Prices, Index',
       'Share Prices, Index',
       ...
       'Immediate interest rates, Call Money, Interbank Rate, Per cent per annum',
       'Immediate interest rates, Call Money, Interbank Rate, Per cent per annum',
       'Immediate interest rates, Call Money, Interbank Rate, Per cent per annum',
       'Immediate interest rates, Call Money, Interbank Rate, Per cent per annum',
       'Immediate interest rates, Call Money, Interbank Rate, Per cent per annum',
       'Immediate interest rates, Call Money, Interbank Rate, Per cent per annum',
       'Immediate interest rates, Call Money, Interbank Rate, Per cent per annum',
       'Immediate interest rates, Call Money, Interbank Rate, Per cent per annum',
       'Immediate interest rates, Call Money, Interbank Rate, Per c

<font size="3">
Not much information either. Let's try with the function "unique()"

In [8]:
df1.T.index.get_level_values(0).unique()

Index(['Share Prices, Index', 'Narrow Money (M1) Index, SA',
       'Broad Money (M3) Index, SA',
       'Long-term interest rates, Per cent per annum',
       'Currency exchange rates, monthly average',
       'Relative consumer price indices', 'Relative unit labour costs',
       'Short-term interest rates, Per cent per annum',
       'Immediate interest rates, Call Money, Interbank Rate, Per cent per annum'],
      dtype='object', name='Subject')

<font size="3">
This is finally what we were looking for. This is a short list of the fields in the first level of the DataFrame. We can for instance use the indicator 'Short-term interest rates, Per cent per annum' and further explore the DataFrame selecting this field or column.

In [9]:
indicator = 'Short-term interest rates, Per cent per annum'

df1_filtered1 = df1[indicator]
df1_filtered1

Country,Australia,Australia,Australia,Austria,Austria,Austria,Belgium,Belgium,Belgium,Canada,...,Saudi Arabia,Bulgaria,Bulgaria,Bulgaria,Romania,Romania,Romania,Croatia,Croatia,Croatia
Frequency,Annual,Quarterly,Monthly,Annual,Quarterly,Monthly,Annual,Quarterly,Monthly,Annual,...,Monthly,Annual,Quarterly,Monthly,Annual,Quarterly,Monthly,Annual,Quarterly,Monthly
Time,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
2014-01-01,,,5.41,,,4.8400,,,5.4600,,...,,,,,,,,,,
2014-02-01,,,5.60,,,4.8000,,,5.5000,,...,,,,,,,,,,
2014-03-01,,,5.93,,,4.8700,,,5.4000,,...,,,,,,,,,,
2014-04-01,,,6.45,,,5.0300,,,5.2200,,...,,,,,,,,,,
2014-05-01,,,7.16,,,5.0500,,,5.1400,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2023-08-01,,,5.57,,,2.0705,,,2.0705,,...,,,,2.9749,,,22.44150,,,7.7671
2023-09-01,,,5.51,,,2.0288,,,2.0288,,...,,,,3.0491,,,22.50000,,,8.4243
2023-10-01,,,5.54,,,2.0488,,,2.0488,,...,,,,3.2321,,,22.47190,,,7.4180
2023-11-01,,,5.51,,,2.0859,,,2.0859,,...,,,,3.5424,,,22.43143,,,5.9005


<font size="3">
Here we got a DataFrame with all the countries' short interest rates on different frequencies. Let's dig a little bit deeper. Now we will select one particular country, for instance "Australia"

In [10]:
country = 'Australia'
df1_filtered1[country]

Frequency,Annual,Quarterly,Monthly
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2014-01-01,,,5.41
2014-02-01,,,5.60
2014-03-01,,,5.93
2014-04-01,,,6.45
2014-05-01,,,7.16
...,...,...,...
2023-08-01,,,5.57
2023-09-01,,,5.51
2023-10-01,,,5.54
2023-11-01,,,5.51


<font size="3">
We can add an aditional selection, using the "Monthly" frequency.

In [11]:
frequency = 'Monthly'
df1_filtered1[country][frequency]

Time
2014-01-01    5.41
2014-02-01    5.60
2014-03-01    5.93
2014-04-01    6.45
2014-05-01    7.16
              ... 
2023-08-01    5.57
2023-09-01    5.51
2023-10-01    5.54
2023-11-01    5.51
2023-12-01    5.49
Name: Monthly, Length: 120, dtype: float64

## **Download the data using a query**


<font size="3">
The OECD API has some instructions on how to download data directly in the following link https://data.oecd.org/api/sdmx-ml-documentation/ . With this information we can use the pandas "read_csv()" function to read the data. In this particular example we will write some specific text within {sdmx_query}. The text will be "MEI_FIN/IRLT.AUS.M/OECD" , which mean that we are using the MEI_FIN database to download the Long Term Interest Rate for Australia using a monthly frequency. This is shown below:

In [12]:
import pandas as pd

def get_from_oecd(sdmx_query):
    return pd.read_csv(f"https://stats.oecd.org/SDMX-JSON/data/{sdmx_query}?contentType=csv")


df2 = get_from_oecd("MEI_FIN/IRLT.AUS.M/OECD")
df2

Unnamed: 0,SUBJECT,Subject,LOCATION,Country,FREQUENCY,Frequency,TIME,Time,Unit Code,Unit,PowerCode Code,PowerCode,Reference Period Code,Reference Period,Value,Flag Codes,Flags
0,IRLT,"Long-term interest rates, Per cent per annum",AUS,Australia,M,Monthly,1969-07,Jul-1969,PC,Percentage,0,Units,,,5.800,,
1,IRLT,"Long-term interest rates, Per cent per annum",AUS,Australia,M,Monthly,1969-08,Aug-1969,PC,Percentage,0,Units,,,5.790,,
2,IRLT,"Long-term interest rates, Per cent per annum",AUS,Australia,M,Monthly,1969-09,Sep-1969,PC,Percentage,0,Units,,,5.810,,
3,IRLT,"Long-term interest rates, Per cent per annum",AUS,Australia,M,Monthly,1969-10,Oct-1969,PC,Percentage,0,Units,,,5.830,,
4,IRLT,"Long-term interest rates, Per cent per annum",AUS,Australia,M,Monthly,1969-11,Nov-1969,PC,Percentage,0,Units,,,5.850,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
649,IRLT,"Long-term interest rates, Per cent per annum",AUS,Australia,M,Monthly,2023-08,Aug-2023,PC,Percentage,0,Units,,,4.128,,
650,IRLT,"Long-term interest rates, Per cent per annum",AUS,Australia,M,Monthly,2023-09,Sep-2023,PC,Percentage,0,Units,,,4.211,,
651,IRLT,"Long-term interest rates, Per cent per annum",AUS,Australia,M,Monthly,2023-10,Oct-2023,PC,Percentage,0,Units,,,4.626,,
652,IRLT,"Long-term interest rates, Per cent per annum",AUS,Australia,M,Monthly,2023-11,Nov-2023,PC,Percentage,0,Units,,,4.578,,


<font size="3">
This text is split in its different parameters here

In [13]:
database = 'MEI_FIN'
indicator = 'IRLT'
country_code = 'AUS'
frequency = 'M'

query_text = database+"/"+indicator+"."+country_code+"."+frequency+"/OECD"

df2 = get_from_oecd(query_text)
df2

Unnamed: 0,SUBJECT,Subject,LOCATION,Country,FREQUENCY,Frequency,TIME,Time,Unit Code,Unit,PowerCode Code,PowerCode,Reference Period Code,Reference Period,Value,Flag Codes,Flags
0,IRLT,"Long-term interest rates, Per cent per annum",AUS,Australia,M,Monthly,1969-07,Jul-1969,PC,Percentage,0,Units,,,5.800,,
1,IRLT,"Long-term interest rates, Per cent per annum",AUS,Australia,M,Monthly,1969-08,Aug-1969,PC,Percentage,0,Units,,,5.790,,
2,IRLT,"Long-term interest rates, Per cent per annum",AUS,Australia,M,Monthly,1969-09,Sep-1969,PC,Percentage,0,Units,,,5.810,,
3,IRLT,"Long-term interest rates, Per cent per annum",AUS,Australia,M,Monthly,1969-10,Oct-1969,PC,Percentage,0,Units,,,5.830,,
4,IRLT,"Long-term interest rates, Per cent per annum",AUS,Australia,M,Monthly,1969-11,Nov-1969,PC,Percentage,0,Units,,,5.850,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
649,IRLT,"Long-term interest rates, Per cent per annum",AUS,Australia,M,Monthly,2023-08,Aug-2023,PC,Percentage,0,Units,,,4.128,,
650,IRLT,"Long-term interest rates, Per cent per annum",AUS,Australia,M,Monthly,2023-09,Sep-2023,PC,Percentage,0,Units,,,4.211,,
651,IRLT,"Long-term interest rates, Per cent per annum",AUS,Australia,M,Monthly,2023-10,Oct-2023,PC,Percentage,0,Units,,,4.626,,
652,IRLT,"Long-term interest rates, Per cent per annum",AUS,Australia,M,Monthly,2023-11,Nov-2023,PC,Percentage,0,Units,,,4.578,,


## **Download the data using the pandasdmx library**

<font size="3">
As usual, the first thing that we need to do is to install and import the required libraries

In [14]:
# !pip install pandasdmx

In [15]:
import pandasdmx as pdmx

#Initialize with OECD data
oecd = pdmx.Request("OECD")
oecd

<class 'pandasdmx.api.Request'> instance, source:                OECD (Organisation for Economic Co-operation and Development)

<font size="3">
After we initiate an instance of the previous class we can download the information. We can do that by typing the resource_id and the key. <br><br>

In this case resource_id is PDB_LV which stands for the productivity resource. <br><br>
    
And we want to download the information of the indicator T_GDPEMP which is GDP per employed worker.<br><br>
    
We can concatenate the different countries for which we need the information using the sign "+"<br><br>
    
At the end of the text we need to write the starting year of the time series. Below is an example of the code

In [16]:
data = oecd.data(
            resource_id="PDB_LV",
            key="GBR+FRA+CAN+ITA+DEU+JPN+USA.T_GDPEMP./all?startTime=2010",
                    ).to_pandas()

df3 = pd.DataFrame(data)
df3

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,value
LOCATION,SUBJECT,MEASURE,TIME_PERIOD,Unnamed: 4_level_1
CAN,T_GDPEMP,C,2010,96349.661702
CAN,T_GDPEMP,C,2011,100980.627090
CAN,T_GDPEMP,C,2012,102779.407460
CAN,T_GDPEMP,C,2013,105632.502160
CAN,T_GDPEMP,C,2014,110482.623300
...,...,...,...,...
USA,T_GDPEMP,PCTUS,2018,100.000000
USA,T_GDPEMP,PCTUS,2019,100.000000
USA,T_GDPEMP,PCTUS,2020,100.000000
USA,T_GDPEMP,PCTUS,2021,100.000000


If this content is helpful and you want to make a donation please click on the button

[![paypal](https://www.paypalobjects.com/en_US/i/btn/btn_donateCC_LG.gif)](https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=29CVY97MEQ9BY)