<a id="ID_top"></a>
## Working country selection

This workflow processes and outputs list of countries to be used in other analysis.

Use the three letter iso code as a standard to link them together, i.e. "GBR"

#### load other scripts with 
`%load script_filepaths.py`

#### Notebook sections:
    
|| [0|Top](#ID_top) || [1|Part1](#ID_part1) || [2|Part2](#ID_part2) || [3|Part3](#ID_part3) || [4|Part4](#ID_part4) || [5|Part5](#ID_part5) ||

In [7]:
#=== Packages
import pandas as pd
import numpy  as np
import os

<a id="ID_part1"></a>
### Part 1 | Load data
|| [0|Top](#ID_top) || [1|Part1](#ID_part1) || [2|Part2](#ID_part2) || [3|Part3](#ID_part3) || [4|Part4](#ID_part4) || [5|Part5](#ID_part5) ||

In [9]:
# %load script_filepaths.py
# This script allows one to load and correct raw files before saving them again.
file_path_0_raw       = "./0_raw/"
file_path_1_backup    = "./1_raw_processed_backup/"
file_path_2_input     = "./2_raw_processed_input/"
file_path_3_generated = "./3_generated_inputs/"

In [10]:
# list of all files
filenames = os.listdir(file_path_2_input)
print(filenames)

# list of file names that can be read with same rule
file_to_batch_read = [filenames[1]]

['2_raw_explainer_doc.md', 'input_dynamic_gravity.csv.gzip']


In [11]:
# load data
df_05_16 = pd.read_csv(f"{file_path_2_input}{file_to_batch_read[0]}",compression= "gzip")

In [13]:
# preview
print(df_05_16.columns)
df_05_16.head()

Index(['Unnamed: 0', 'year', 'country_d', 'iso3_d', 'dynamic_code_d',
       'landlocked_d', 'island_d', 'region_d', 'gdp_pwt_const_d', 'pop_d',
       'gdp_pwt_cur_d', 'capital_cur_d', 'capital_const_d', 'gdp_wdi_cur_d',
       'gdp_wdi_const_d', 'gdp_wdi_cap_cur_d', 'gdp_wdi_cap_const_d', 'lat_d',
       'lng_d', 'polity_d', 'polity_abs_d', 'country_o', 'iso3_o',
       'dynamic_code_o', 'landlocked_o', 'island_o', 'region_o',
       'gdp_pwt_const_o', 'pop_o', 'gdp_pwt_cur_o', 'capital_cur_o',
       'capital_const_o', 'gdp_wdi_cur_o', 'gdp_wdi_const_o',
       'gdp_wdi_cap_cur_o', 'gdp_wdi_cap_const_o', 'lat_o', 'lng_o',
       'polity_o', 'polity_abs_o', 'contiguity', 'agree_pta_goods',
       'agree_pta_services', 'agree_cu', 'agree_eia', 'agree_fta', 'agree_psa',
       'agree_pta', 'sanction_threat', 'sanction_threat_trade',
       'sanction_imposition', 'sanction_imposition_trade', 'member_eu_o',
       'member_wto_o', 'member_gatt_o', 'member_eu_d', 'member_wto_d',
       'me

Unnamed: 0.1,Unnamed: 0,year,country_d,iso3_d,dynamic_code_d,landlocked_d,island_d,region_d,gdp_pwt_const_d,pop_d,...,hostility_level_o,hostility_level_d,distance,common_language,colony_of_destination_after45,colony_of_destination_current,colony_of_destination_ever,colony_of_origin_after45,colony_of_origin_current,colony_of_origin_ever
0,0,2005,Aruba,ABW,ABW,0,1,caribbean,3906.5203,0.100031,...,0,0,120.05867,1,0,0,0,0,0,0
1,1,2006,Aruba,ABW,ABW,0,1,caribbean,4118.1396,0.10083,...,0,0,978.77728,1,0,0,0,0,0,0
2,2,2007,Aruba,ABW,ABW,0,1,caribbean,4196.4634,0.101218,...,0,0,8563.6963,0,0,0,0,0,0,0
3,3,2008,Aruba,ABW,ABW,0,1,caribbean,4433.6772,0.101342,...,0,0,7562.6733,0,0,0,0,0,0,0
4,4,2009,Aruba,ABW,ABW,0,1,caribbean,4183.0449,0.101416,...,0,0,16904.596,1,0,0,0,0,0,0


<a id="ID_part2"></a>
### Part 2 | Generate ref country table
|| [0|Top](#ID_top) || [1|Part1](#ID_part1) || [2|Part2](#ID_part2) || [3|Part3](#ID_part3) || [4|Part4](#ID_part4) || [5|Part5](#ID_part5) ||

In [25]:
# loop through columns and isolate all '_d' columns
col_index_keep = [1] # 1 is the year column

for index,column in enumerate(list(df_05_16.columns)):
    # isolate last two letters of a column name
    temp_string = column[-2:]
    
    if temp_string == "_d":
        col_index_keep.append(index)
    else:
        pass

In [28]:
# create unique reference table
df_master_country = df_05_16.iloc[:,col_index_keep].drop_duplicates()

<a id="ID_part3"></a>
### Part 3
|| [0|Top](#ID_top) || [1|Part1](#ID_part1) || [2|Part2](#ID_part2) || [3|Part3](#ID_part3) || [4|Part4](#ID_part4) || [5|Part5](#ID_part5) ||

<a id="ID_part4"></a>
### Part 4 | Export master ref table
|| [0|Top](#ID_top) || [1|Part1](#ID_part1) || [2|Part2](#ID_part2) || [3|Part3](#ID_part3) || [4|Part4](#ID_part4) || [5|Part5](#ID_part5) ||

In [None]:
compression_type = "gzip"

In [29]:
# Master list export
file_name = "generated_master_country_ref_list.csv"
df_master_country.to_csv(file_path_3_generated+file_name+"."+compression_type,compression = compression_type)

In [None]:
# Sample / working sample list export
file_name = "generated_sample_countries.csv"
df_master_country.to_csv(file_path_3_generated+file_name+"."+compression_type,compression = compression_type)

<a id="ID_part5"></a>
### Part 5
|| [0|Top](#ID_top) || [1|Part1](#ID_part1) || [2|Part2](#ID_part2) || [3|Part3](#ID_part3) || [4|Part4](#ID_part4) || [5|Part5](#ID_part5) ||