# Tutorial

In this section we are going to develop a brief example to show you how use this package to obtain an ECI ranking, using data from [the Observatory of Economic Complexity](https://oec.world). 

Our goal will be to reproduce the ECI ranking using 2018 Exports data, classified according the Harmonized System (HS92) with a depth of 4 Digits, for Countries with a population of at least 1 million, and Exports of at least \$1 billion, and Products with world trade over US \$500 million.
For more details, [see link](https://oec.world/en/resources/methods).

Let us start by calling some packages, including the `complexity` package (which we will alias as `cplx`)

In [1]:
from urllib.parse import urlencode

import economic_complexity as ecplx
import numpy as np
import pandas as pd

We will use the following function to simplify the data request from the OEC's REST API.

In [3]:
def request_data(params):
    url = "https://dev.oec.world/olap-proxy/data.csv?{}".format(urlencode(params))
    return pd.read_csv(url)

Now, we fetch the trade data and population.

In [4]:
# Sum of Exports by Country and HS92 Level 4 Code between 2016 and 2018
df_trade = request_data({
    "cube": "trade_i_baci_a_92",
    "measures": "Trade Value",
    "drilldowns": "Exporter Country,HS4",
    "Year": "2016,2017,2018",
})

# World Population by Country for Year 2018
df_wdi = request_data({
    "cube": "indicators_i_wdi_a",
    "measures": "Measure",
    "drilldowns": "Country",
    "Indicator": "SP.POP.TOTL",
    "Year": "2018",
})

Next, filter the data. This is necessary to make the calculations properly. 

In [9]:
df = df_trade.copy()

# Countries with more than 1M habitants
df_population = df_wdi[df_wdi['Measure'] > 1000000]
# Products with more than $1.5B in global exports between 2016-2018
df_products = df.groupby('HS4 ID')['Trade Value'].sum().reset_index()
df_products = df_products[df_products['Trade Value'] > 3*500000000]
# Countries with more than $3B in global exports between 2016-2018
df_countries = df.groupby('Country ID')['Trade Value'].sum().reset_index()
df_countries = df_countries[df_countries['Trade Value'] > 3*1000000000]

df_filter  = df[
  (df['Country ID'].isin(df_population['Country ID'])) &
  (df['Country ID'].isin(df_countries['Country ID'])) & 
  (df['HS4 ID'].isin(df_products['HS4 ID']))
]

Now let's format the data to compute the RCA matrix and the ECI. 

In [10]:
df_pivot = pd.pivot_table(df_filter, index=['Country ID'],
                                     columns=['HS4 ID'],
                                     values='Trade Value')\
             .reset_index()\
             .set_index('Country ID')\
             .dropna(axis=1, how="all")\
             .fillna(0)\
             .astype(float)

Compute the RCA matrix, ECI and PCI.

In [11]:
rca = ecplx.rca(df_pivot)
ECI, PCI = ecplx.complexity(rca)

Finally, sort the ECI list to create your ranking, and compare with the info in [oec.world](https://oec.world/en/rankings/eci/hs4/hs92)

In [12]:
ECI.sort_values(ascending=False)

Country ID
asjpn    2.383443
asxxb    2.219043
euche    2.084139
askor    1.998266
eudeu    1.977178
           ...   
afgin   -1.679709
ocpng   -1.769314
asirq   -1.947585
afssd   -2.208514
aftcd   -2.489747
Length: 147, dtype: float64