<a id="ID_top"></a>
## UNCOMTRADE API extractor

Outline purpose of workflow.

#### load other scripts with 
`%load script_filepaths.py`

#### Notebook sections:
    
|| [0|Top](#ID_top) || [1|Part1](#ID_part1) || [2|Part2](#ID_part2) || [3|Part3](#ID_part3) || [4|Part4](#ID_part4) || [5|Part5](#ID_part5) ||

In [1]:
# %load script_un_comtrade_extract.py
# Packages just in case
#=== Packages
import os
import pandas as pd
import requests
import csv
import time

In [2]:
import script_un_comtrade_extract as un_ex
import script_filepaths as af_save

create_ref_doc = False

In [3]:
file_path_0_raw       = "./0_raw/"
file_path_1_backup    = "./1_raw_processed_backup/"
file_path_2_input     = "./2_raw_processed_input/"
file_path_3_generated = "./3_generated_inputs/"

<a id="ID_part1"></a>
### Set up fetching URL
|| [0|Top](#ID_top) || [1|Part1](#ID_part1) || [2|Part2](#ID_part2) || [3|Part3](#ID_part3) || [4|Part4](#ID_part4) || [5|Part5](#ID_part5) ||

Key for URL get headings: https://comtrade.un.org/api/swagger/ui/index#!/Data/Data_GetData
<br>Reporter explanation: https://comtrade.un.org/data/doc/api/#reporters

Reporter ids: https://comtrade.un.org/Data/cache/reporterAreas.json

In [4]:
if create_ref_doc:
    #"854","300", #|| 842 is the Unite States which should give us enough coverage of most countries to fetch their UN numerical IDs 
    un_extract = un_ex.f_un_comtrade_data(p_r_country = ["842"],p_p_country = ["all"])

    af_save.f_export(un_extract[0][0],"un_com_usa_ref")
else:
    print("Skipped")

Skipped


<a id="ID_part2"></a>
### Part 2 | Create reference document for UN IDs
|| [0|Top](#ID_top) || [1|Part1](#ID_part1) || [2|Part2](#ID_part2) || [3|Part3](#ID_part3) || [4|Part4](#ID_part4) || [5|Part5](#ID_part5) ||

In [5]:
if create_ref_doc:
    # list of all files
    filenames = os.listdir(file_path_2_input)
    print(filenames)

    # list of file names that can be read with same rule
    file_name = "input_un_com_usa_ref.csv.gzip"
    
    df_un_usa_ref = pd.read_csv(f"{file_path_2_input}{file_name}",compression= "gzip")
    
    useful_columns = ['pt3ISO','ptCode','ptTitle','pt3ISO2',
       'ptCode2',  'ptTitle2']
    df_un_usa_ref.columns
else:
    print("Skipped")

Skipped


In [6]:
if create_ref_doc:
    # reference dataframe
    un_ref_data = df_un_usa_ref.loc[:,useful_columns].drop_duplicates()
    un_ref_data = un_ref_data.iloc[:,0:3].copy()

    af_save.f_export(un_ref_data,"un_codes_ref")
else:
    print("Skipped")

Skipped


<a id="ID_part3"></a>
### Match to BRI reference countries
|| [0|Top](#ID_top) || [1|Part1](#ID_part1) || [2|Part2](#ID_part2) || [3|Part3](#ID_part3) || [4|Part4](#ID_part4) || [5|Part5](#ID_part5) ||

In [7]:
# import BRI reference list
un_ref_data = pd.read_csv(f"{file_path_2_input}input_un_codes_ref.csv.gzip", compression="gzip")

try:
    un_ref_data.drop("Unnamed: 0",axis = 1,inplace = True)
except:
    pass

un_ref_data_clean = un_ref_data.reset_index(drop = True)
un_ref_data_clean.head()

Unnamed: 0,pt3ISO,ptCode,ptTitle
0,WLD,0,World
1,AFG,4,Afghanistan
2,DZA,12,Algeria
3,ATG,28,Antigua and Barbuda
4,AZE,31,Azerbaijan


In [8]:
# import BRI reference list
df_bri_list = pd.read_csv(f"{file_path_2_input}input_bri_countries_Dumor_Yao.csv.gzip", compression="gzip")
len(df_bri_list)

93

In [9]:
df_bri_matched = df_bri_list.merge(un_ref_data_clean,left_on = "iso_3",right_on = "pt3ISO")
df_bri_matched.head()

Unnamed: 0.1,Unnamed: 0,BRI_Country,iso_3,pt3ISO,ptCode,ptTitle
0,0,Albania,ALB,ALB,8,Albania
1,1,Armenia,ARM,ARM,51,Armenia
2,2,Austria,AUT,AUT,40,Austria
3,3,Azerbaijan,AZE,AZE,31,Azerbaijan
4,4,Bangladesh,BGD,BGD,50,Bangladesh


<a id="ID_part4"></a>
### Loop through download
|| [0|Top](#ID_top) || [1|Part1](#ID_part1) || [2|Part2](#ID_part2) || [3|Part3](#ID_part3) || [4|Part4](#ID_part4) || [5|Part5](#ID_part5) ||

In [13]:
# URL settings
url_comma = "%2C"
url_add = "&"

extra_cc = f"cc=TOTAL"

5.376344086021505

In [78]:
# for every BRI country download data

df_collection = []
length = len(df_bri_matched.ptCode)

for index,entry in enumerate(list(df_bri_matched.ptCode)):
    temp_entry_name = list(df_bri_matched.BRI_Country)[index]
    print(f"Working on | {temp_entry_name} | {index+1}/{length} ({round(((index+1)/length)*100)}%)")
    
    # run functions to extract
    dl_year = "2011"
    un_extract = un_ex.f_un_comtrade_data(p_r_country = [str(entry)],p_p_country = ["all"],p_ps_years=[dl_year],p_extra = extra_cc)
    
    try:
        af_save.f_export(un_extract[0][0],f"un_com_{temp_entry_name}_{dl_year}_ref")
        df_collection.append(un_extract[0][0])
    except:
        df_collection.append(("Missing",entry))

Working on | Albania | 1/93 (1)
WORKING ON | Country 8| URL https://comtrade.un.org/api/get?r=8&p=all&freq=A&ps=2011&cc=TOTAL
OBLIGATORY PAUSE
Working on | Armenia | 2/93 (2)
WORKING ON | Country 51| URL https://comtrade.un.org/api/get?r=51&p=all&freq=A&ps=2011&cc=TOTAL
OBLIGATORY PAUSE
Working on | Austria | 3/93 (3)
WORKING ON | Country 40| URL https://comtrade.un.org/api/get?r=40&p=all&freq=A&ps=2011&cc=TOTAL
DL ATTEMPT | Country 40| URL https://comtrade.un.org/api/get?r=40&p=all&freq=A&ps=2011&cc=TOTAL | not ok, data not processed further
OBLIGATORY PAUSE
Working on | Azerbaijan | 4/93 (4)
WORKING ON | Country 31| URL https://comtrade.un.org/api/get?r=31&p=all&freq=A&ps=2011&cc=TOTAL
OBLIGATORY PAUSE
Working on | Bangladesh | 5/93 (5)
WORKING ON | Country 50| URL https://comtrade.un.org/api/get?r=50&p=all&freq=A&ps=2011&cc=TOTAL
OBLIGATORY PAUSE


KeyboardInterrupt: 

In [None]:
af_save.f_export(df_un_com_master,f"un_com_master_ref")

In [21]:
df_un_com_master = pd.concat(df_collection)

In [72]:
            # Partner / reporter info (6)
columns =   ["rtCode","rt3ISO","rtTitle","ptCode","pt3ISO","ptTitle",
             # period and trade category and value information (3)
             "period","rgDesc","yr",
             
             # duplicate info? (6)
             "rgCode","cmdCode","TradeValue","periodDesc","pfCode","cmdDescE"]

df_un_com_focused = df_un_com_master.loc[:,columns]
df_un_com_focused.head()

Unnamed: 0,rtCode,rt3ISO,rtTitle,ptCode,pt3ISO,ptTitle,period,rgDesc,yr,rgCode,cmdCode,TradeValue,periodDesc,pfCode,cmdDescE
0,8,ALB,Albania,0,WLD,World,2010,Import,2010,1,TOTAL,4602774967,2010,H3,All Commodities
1,8,ALB,Albania,0,WLD,World,2010,Export,2010,2,TOTAL,1549955724,2010,H3,All Commodities
2,8,ALB,Albania,0,WLD,World,2010,Re-Import,2010,4,TOTAL,26393,2010,H3,All Commodities
3,8,ALB,Albania,4,AFG,Afghanistan,2010,Import,2010,1,TOTAL,1862,2010,H3,All Commodities
4,8,ALB,Albania,4,AFG,Afghanistan,2010,Export,2010,2,TOTAL,1830,2010,H3,All Commodities


In [76]:
# save year specific data frame
af_save.f_export(df_un_com_focused,f"un_com_{df_un_com_focused.yr.unique()[0]}_merged_ref")

<a id="ID_part5"></a>
### Part 5
|| [0|Top](#ID_top) || [1|Part1](#ID_part1) || [2|Part2](#ID_part2) || [3|Part3](#ID_part3) || [4|Part4](#ID_part4) || [5|Part5](#ID_part5) ||