<a id="ID_top"></a>
## Spatial Interaction / Gravity Model

Create a very simple sample SIM / Gravity module with two packages

[Pysal package](https://github.com/pysal/spint) this includes some notebook examples `pip install spint==1.0.6`<br>
[GME package](https://www.usitc.gov/data/gravity/gme_docs/) with some guides / one tutorial [here](https://www.usitc.gov/data/gravity/gme_docs/estimation_tutorial/) `pip install gme`

#### Notebook sections:
    
|| [0|Top](#ID_top) || [1|Part1](#ID_part1) || [2|Part2](#ID_part2) || [3|Part3](#ID_part3) || [4|Part4](#ID_part4) || [5|Part5](#ID_part5) ||

#### Import all packages that could be required

In [1]:
# %load s_package_import.py
# package library, use to ensure consistency across notebooks, refresh periodically
# general packages
import os # use with os.listdir(_path_)
import requests
import csv
import time
from datetime import datetime
from shutil import copyfile

# data analysis packages
import pandas as pd
pd.options.display.max_columns = None # don't truncate columns
#pd.options.display.max_rows = 50

import numpy as np
import matplotlib.pyplot as plt

# custom scripts
import s_file_export
import s_filepaths
import s_un_comtrade_extract as s_un

#=== network analysis
import networkx as nx
#=== gavity modelling
import gme as gme


#### Import module and declare path variables
`import s_filepaths.py`

In [2]:
# import ref file
import s_filepaths

# declare local variables to work with
path_raw = s_filepaths.path_raw
path_raw_dl = s_filepaths.path_raw_dl
path_store = s_filepaths.path_store
path_live = s_filepaths.path_live

<a id="ID_part1"></a>
### Part 1 | GME tutorial
|| [0|Top](#ID_top) || [1|Part1](#ID_part1) || [2|Part2](#ID_part2) || [3|Part3](#ID_part3) || [4|Part4](#ID_part4) || [5|Part5](#ID_part5) ||

#### Import data | TRADE FLOW

In [20]:
os.listdir(path_live)

['input_test.csv.gzip',
 'input_un_com_2013.csv.gzip',
 'input_un_com_2012.csv.gzip',
 'input_un_com_2006-2009.csv.gzip',
 '.DS_Store',
 'input_un_codes_ref.csv.gzip',
 'input_un_com_2016-2019.csv.gzip',
 'input_un_com_2014.csv.gzip',
 'input_un_com_2015.csv.gzip',
 'input_bri_countries_Dumor_Yao.csv.gzip',
 '2_raw_explainer_doc.md',
 'input_dynamic_gravity.csv.gzip',
 'input_un_com_2010_merged_ref.csv.gzip',
 'input_gme_data_joined.csv.gzip',
 'input_un_sample.csv.gzip']

In [21]:
# set files (years) to load and concatenate them into one dataframe
file_name_list = ["input_un_com_2010_merged_ref.csv.gzip","input_un_com_2012.csv.gzip","input_un_com_2013.csv.gzip","input_un_com_2014.csv.gzip","input_un_com_2015.csv.gzip","input_un_com_2016-2019.csv.gzip","input_un_com_2006-2009.csv.gzip"]
flow_df = []

# loop through files
for entry in file_name_list:
    df_flow = pd.read_csv(f"{path_live}{entry}",compression="gzip")
    flow_df.append(df_flow)
    
# merge into single dataframe
flow_df = pd.concat(flow_df)
flow_df.reset_index(drop = True,inplace= True)

# clean any unwanted columns
try:
    flow_df.drop("Unnamed: 0",axis =1 , inplace= True)
except:
    pass

print(len(flow_df))
flow_df.head()

380108


Unnamed: 0,rtCode,rt3ISO,rtTitle,ptCode,pt3ISO,ptTitle,period,rgDesc,yr,rgCode,cmdCode,TradeValue,periodDesc,pfCode,cmdDescE
0,8,ALB,Albania,0,WLD,World,2010,Import,2010,1,TOTAL,4602774967,2010,H3,All Commodities
1,8,ALB,Albania,0,WLD,World,2010,Export,2010,2,TOTAL,1549955724,2010,H3,All Commodities
2,8,ALB,Albania,0,WLD,World,2010,Re-Import,2010,4,TOTAL,26393,2010,H3,All Commodities
3,8,ALB,Albania,4,AFG,Afghanistan,2010,Import,2010,1,TOTAL,1862,2010,H3,All Commodities
4,8,ALB,Albania,4,AFG,Afghanistan,2010,Export,2010,2,TOTAL,1830,2010,H3,All Commodities


In [22]:
# check if there are entires where a country is both reporter and partner || Exemplified by:
#flow_df[(flow_df.pt3ISO == "CHN") & (flow_df.rt3ISO == "CHN")]

# gather row indeces of culprit rows
row_index_to_drop = []

for row in np.arange(0,len(flow_df),1):
    
    if flow_df.loc[row,"pt3ISO"] == flow_df.loc[row,"rt3ISO"]:
        row_index_to_drop.append(row)
    else:
        pass

try:
    flow_df_ready = flow_df.drop(row_index_to_drop,axis = 0).copy()
    flow_df_ready.reset_index(drop = True,inplace= True)
except:
    flow_df_ready = flow_df.copy()

print(len(flow_df_ready))
flow_df_ready.head()

379719


Unnamed: 0,rtCode,rt3ISO,rtTitle,ptCode,pt3ISO,ptTitle,period,rgDesc,yr,rgCode,cmdCode,TradeValue,periodDesc,pfCode,cmdDescE
0,8,ALB,Albania,0,WLD,World,2010,Import,2010,1,TOTAL,4602774967,2010,H3,All Commodities
1,8,ALB,Albania,0,WLD,World,2010,Export,2010,2,TOTAL,1549955724,2010,H3,All Commodities
2,8,ALB,Albania,0,WLD,World,2010,Re-Import,2010,4,TOTAL,26393,2010,H3,All Commodities
3,8,ALB,Albania,4,AFG,Afghanistan,2010,Import,2010,1,TOTAL,1862,2010,H3,All Commodities
4,8,ALB,Albania,4,AFG,Afghanistan,2010,Export,2010,2,TOTAL,1830,2010,H3,All Commodities


Now the flow dataframe is ready to be filtered for import/export and used in the gravity model.

#### Import data | GRAVITY EXPLANATORY DATASET

In [23]:
#os.listdir(path_live)

In [24]:
# load gravity dataset
file_name = "input_dynamic_gravity.csv.gzip"
grav_df = pd.read_csv(f"{path_live}{file_name}",compression="gzip")
grav_df.head()

Unnamed: 0.1,Unnamed: 0,year,country_d,iso3_d,dynamic_code_d,landlocked_d,island_d,region_d,gdp_pwt_const_d,pop_d,gdp_pwt_cur_d,capital_cur_d,capital_const_d,gdp_wdi_cur_d,gdp_wdi_const_d,gdp_wdi_cap_cur_d,gdp_wdi_cap_const_d,lat_d,lng_d,polity_d,polity_abs_d,country_o,iso3_o,dynamic_code_o,landlocked_o,island_o,region_o,gdp_pwt_const_o,pop_o,gdp_pwt_cur_o,capital_cur_o,capital_const_o,gdp_wdi_cur_o,gdp_wdi_const_o,gdp_wdi_cap_cur_o,gdp_wdi_cap_const_o,lat_o,lng_o,polity_o,polity_abs_o,contiguity,agree_pta_goods,agree_pta_services,agree_cu,agree_eia,agree_fta,agree_psa,agree_pta,sanction_threat,sanction_threat_trade,sanction_imposition,sanction_imposition_trade,member_eu_o,member_wto_o,member_gatt_o,member_eu_d,member_wto_d,member_gatt_d,member_eu_joint,member_wto_joint,member_gatt_joint,hostility_level_o,hostility_level_d,distance,common_language,colony_of_destination_after45,colony_of_destination_current,colony_of_destination_ever,colony_of_origin_after45,colony_of_origin_current,colony_of_origin_ever
0,0,2005,Aruba,ABW,ABW,0,1,caribbean,3906.5203,0.100031,4093.2434,23531.377,24173.982,2331006000.0,,23302.831988,,12.530384,-70.028992,,,Netherlands Antilles,ANT,ANT.X,0,0,caribbean,,,,,,,,,,12.250778,-69.301224,,,0,1,0,0,0,1,0,1,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,120.05867,1,0,0,0,0,0,0
1,1,2006,Aruba,ABW,ABW,0,1,caribbean,4118.1396,0.10083,4217.0669,25757.818,25396.307,2421475000.0,,24015.420612,,12.530384,-70.028992,,,Anguilla,AIA,AIA,0,1,caribbean,348.7688,0.012903,365.93643,2471.682,2342.796,,,,,18.217348,-63.057232,,,0,1,0,0,0,1,0,1,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,978.77728,1,0,0,0,0,0,0
2,2,2007,Aruba,ABW,ABW,0,1,caribbean,4196.4634,0.101218,4248.4707,27375.447,26631.465,2623726000.0,,25921.538234,,12.530384,-70.028992,,,Sao Tome and Principe,STP,STP,0,1,africa,391.01483,0.160064,392.44177,1101.736,3205.526,145827400.0,167044600.0,911.057012,1043.611485,0.989202,7.072665,,,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,8563.6963,0,0,0,0,0,0,0
3,3,2008,Aruba,ABW,ABW,0,1,caribbean,4433.6772,0.101342,4441.8828,28639.586,27871.596,2791961000.0,,27549.889422,,12.530384,-70.028992,,,Andorra,AND,AND,1,0,europe,,,,,,4001201000.0,3675947000.0,46734.268282,42935.277871,42.5,1.516486,,,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,7562.6733,0,0,0,0,0,0,0
4,4,2009,Aruba,ABW,ABW,0,1,caribbean,4183.0449,0.101416,4304.9224,29400.539,29122.635,2498933000.0,,24640.421244,,12.530384,-70.028992,,,Philippines,PHL,PHL,0,1,south_east_asia,458079.81,91.641881,460142.72,1420047.0,1624159.0,168334600000.0,185437700000.0,1836.87412,2023.503659,11.817977,122.77502,8.0,8.0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0,1,1,0,0,0,0,0,0,0,0,16904.596,1,0,0,0,0,0,0


In [8]:
#grav_df.describe()

In [34]:
# only keep key columns
columns_to_keep = [
    'year','country_o', 'iso3_o','country_d', 'iso3_d','distance', 
    "gdp_wdi_const_o", "gdp_wdi_const_d",'common_language',"contiguity",
    "agree_pta_goods","agree_cu","sanction_imposition"
                  ]
grav_df_mini = grav_df.loc[:,columns_to_keep].copy()
grav_df_mini.head()

Unnamed: 0,year,country_o,iso3_o,country_d,iso3_d,distance,gdp_wdi_const_o,gdp_wdi_const_d,common_language,contiguity,agree_pta_goods,agree_cu,sanction_imposition
0,2005,Netherlands Antilles,ANT,Aruba,ABW,120.05867,,,1,0,1,0,0.0
1,2006,Anguilla,AIA,Aruba,ABW,978.77728,,,1,0,1,0,0.0
2,2007,Sao Tome and Principe,STP,Aruba,ABW,8563.6963,167044600.0,,0,0,0,0,0.0
3,2008,Andorra,AND,Aruba,ABW,7562.6733,3675947000.0,,0,0,0,0,0.0
4,2009,Philippines,PHL,Aruba,ABW,16904.596,185437700000.0,,1,0,0,0,0.0


In [35]:
# Join on year,iso_d, iso_o (inner join)
gme_data = grav_df_mini.merge(flow_df_ready,
                                       # year / origin / destination
                                      right_on  = ["yr","rt3ISO","pt3ISO"],
                                      left_on = ["year","iso3_o","iso3_d"])
flow_df_unmatched = grav_df_mini.merge(flow_df_ready,
                                       # year / origin / destination
                                      right_on  = ["yr","rt3ISO","pt3ISO"],
                                      left_on = ["year","iso3_o","iso3_d"],how = "right",indicator=True)

print(len(gme_data))
gme_data.head()

293135


Unnamed: 0,year,country_o,iso3_o,country_d,iso3_d,distance,gdp_wdi_const_o,gdp_wdi_const_d,common_language,contiguity,agree_pta_goods,agree_cu,sanction_imposition,rtCode,rt3ISO,rtTitle,ptCode,pt3ISO,ptTitle,period,rgDesc,yr,rgCode,cmdCode,TradeValue,periodDesc,pfCode,cmdDescE
0,2009,Philippines,PHL,Aruba,ABW,16904.596,185437700000.0,,1,0,0,0,0.0,608,PHL,Philippines,533,ABW,Aruba,2009,Import,2009,1,TOTAL,72162,2009,H2,ALL COMMODITIES
1,2009,Philippines,PHL,Aruba,ABW,16904.596,185437700000.0,,1,0,0,0,0.0,608,PHL,Philippines,533,ABW,Aruba,2009,Export,2009,2,TOTAL,149587,2009,H2,ALL COMMODITIES
2,2009,Romania,ROU,Afghanistan,AFG,1883.9504,169350300000.0,14697330000.0,0,0,0,0,0.0,642,ROU,Romania,4,AFG,Afghanistan,2009,Import,2009,1,TOTAL,1688,2009,H3,All Commodities
3,2009,Romania,ROU,Afghanistan,AFG,1883.9504,169350300000.0,14697330000.0,0,0,0,0,0.0,642,ROU,Romania,4,AFG,Afghanistan,2009,Export,2009,2,TOTAL,15843818,2009,H3,All Commodities
4,2010,Denmark,DNK,Afghanistan,AFG,4835.0132,321993900000.0,15936800000.0,0,0,0,0,0.0,208,DNK,Denmark,4,AFG,Afghanistan,2010,Import,2010,1,TOTAL,5267969,2010,H3,All Commodities


In [36]:
s_file_export.f_df_export(gme_data,"gme_data_joined")

Export | ../Data/1_raw_processed_backup/store_gme_data_joined_20200624_1812.csv | COMPLETE
COPY   | ../Data/2_raw_processed_input/input_gme_data_joined.csv.gzip | COMPLETE


The rows that do not join from the flow data are typically to partners such as 'world' 'SCG' or 'nan'
Code below let's you check.

In [12]:
#print(len(flow_df_unmatched[flow_df_unmatched._merge== "right_only"]))
#flow_df_unmatched[flow_df_unmatched._merge== "right_only"].pt3ISO.unique()

#### Create a GME dataset

In [37]:
# choose which flow to analyse
gme_data_analyse = gme_data[gme_data.rgDesc == "Export"].copy()

gme_data_analyse.head()

Unnamed: 0,year,country_o,iso3_o,country_d,iso3_d,distance,gdp_wdi_const_o,gdp_wdi_const_d,common_language,contiguity,agree_pta_goods,agree_cu,sanction_imposition,rtCode,rt3ISO,rtTitle,ptCode,pt3ISO,ptTitle,period,rgDesc,yr,rgCode,cmdCode,TradeValue,periodDesc,pfCode,cmdDescE
1,2009,Philippines,PHL,Aruba,ABW,16904.596,185437700000.0,,1,0,0,0,0.0,608,PHL,Philippines,533,ABW,Aruba,2009,Export,2009,2,TOTAL,149587,2009,H2,ALL COMMODITIES
3,2009,Romania,ROU,Afghanistan,AFG,1883.9504,169350300000.0,14697330000.0,0,0,0,0,0.0,642,ROU,Romania,4,AFG,Afghanistan,2009,Export,2009,2,TOTAL,15843818,2009,H3,All Commodities
5,2010,Denmark,DNK,Afghanistan,AFG,4835.0132,321993900000.0,15936800000.0,0,0,0,0,0.0,208,DNK,Denmark,4,AFG,Afghanistan,2010,Export,2010,2,TOTAL,14255143,2010,H3,All Commodities
8,2014,Belgium,BEL,Afghanistan,AFG,5309.2632,500752500000.0,19990320000.0,0,0,0,0,,56,BEL,Belgium,4,AFG,Afghanistan,2014,Export,2014,2,TOTAL,139360508,2014,H4,All Commodities
10,2015,Jordan,JOR,Afghanistan,AFG,2971.0706,30196250000.0,20158380000.0,0,0,0,0,,400,JOR,Jordan,4,AFG,Afghanistan,2015,Export,2015,2,TOTAL,1470790,2015,H4,All Commodities


In [38]:
gme_load = gme.EstimationData(
    data_frame = gme_data_analyse,
    # column with importer/exporter ID
    imp_var_name = "pt3ISO",
    exp_var_name= "rt3ISO",
    # column with trade volumes
    trade_var_name = "TradeValue",
    # year column
    year_var_name= "yr"
    # can also have sector and notes objects
    )

gme_load

number of countries: 235 
number of exporters: 88 
number of importers: 235 
number of years: 10 
number of sectors: not_applicable 
dimensions: (133296, 28)

#### Working with a GME dataset

**Note:**
not all of these will be written out, but there are functions that can be used for descriptive or exploratory use built in. Can display countries by year `.countries_each_year()`, columns `.columns` and `.dtypes()` in a similar way to native pandas

In [39]:
# Info for each column
gme_load.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 133296 entries, 1 to 293134
Data columns (total 28 columns):
year                   133296 non-null int64
country_o              133296 non-null object
iso3_o                 133296 non-null object
country_d              133296 non-null object
iso3_d                 133296 non-null object
distance               133296 non-null float64
gdp_wdi_const_o        119458 non-null float64
gdp_wdi_const_d        107736 non-null float64
common_language        133296 non-null int64
contiguity             133296 non-null int64
agree_pta_goods        133296 non-null int64
agree_cu               133296 non-null int64
sanction_imposition    92229 non-null float64
rtCode                 133296 non-null int64
rt3ISO                 133296 non-null object
rtTitle                133296 non-null object
ptCode                 133296 non-null int64
pt3ISO                 133296 non-null object
ptTitle                133296 non-null object
period             

In [16]:
# Use to call values of specific column
#gme_data.data_frame["Reporter Code"]

#### Creating and estimating a model
Two main steps (1) defining a model and (2) estimating the model

In [45]:
# simple model baseline, where trade value is dependedent on certain variables

model_baseline = gme.EstimationModel(
    # the data object created above
    estimation_data= gme_load,
    # dependent or left hand side variable in a regression equation
    lhs_var = "TradeValue",
    rhs_var = ["distance","common_language","gdp_wdi_const_o","gdp_wdi_const_d",
               "contiguity","agree_pta_goods","agree_cu","sanction_imposition"
               
              
              ] # these variables need to come from the gravity dataset not UNCOMTRADE             
                                    )

In [46]:
estimate = model_baseline.estimate()

select specification variables: ['distance', 'common_language', 'gdp_wdi_const_o', 'gdp_wdi_const_d', 'contiguity', 'agree_pta_goods', 'agree_cu', 'sanction_imposition', 'TradeValue', 'pt3ISO', 'rt3ISO', 'yr'], Observations excluded by user: {'rows': 0, 'columns': 16}
drop_intratrade: no, Observations excluded by user: {'rows': 0, 'columns': 0}
drop_imp: none, Observations excluded by user: {'rows': 0, 'columns': 0}
drop_exp: none, Observations excluded by user: {'rows': 0, 'columns': 0}
keep_imp: all available, Observations excluded by user: {'rows': 0, 'columns': 0}
keep_exp: all available, Observations excluded by user: {'rows': 0, 'columns': 0}
drop_years: none, Observations excluded by user: {'rows': 0, 'columns': 0}
keep_years: all available, Observations excluded by user: {'rows': 0, 'columns': 0}
drop_missing: yes, Observations excluded by user: {'rows': 49575, 'columns': 0}
Estimation began at 06:15 PM  on Jun 24, 2020
Omitted Columns: []
Estimation completed at 06:15 PM  on J

In [47]:
results = estimate["all"]
results.summary()

0,1,2,3
Dep. Variable:,TradeValue,No. Iterations:,1000.0
Model:,GLM,Df Residuals:,83718.0
Model Family:,Poisson,Df Model:,2.0
Link Function:,log,Scale:,1.0
Method:,IRLS,Log-Likelihood:,-412680000000000.0
Covariance Type:,HC1,Deviance:,825370000000000.0
No. Observations:,83721,Pearson chi2:,1.89e+21

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
distance,0.0003,3.46e-05,9.284,0.000,0.000,0.000
common_language,1.2879,0.143,9.037,0.000,1.009,1.567
gdp_wdi_const_o,1.64e-12,4.41e-14,37.170,0.000,1.55e-12,1.73e-12
gdp_wdi_const_d,7.362e-13,2.64e-14,27.857,0.000,6.84e-13,7.88e-13
contiguity,1.5526,0.137,11.348,0.000,1.284,1.821
agree_pta_goods,9.2694,0.406,22.825,0.000,8.473,10.065
agree_cu,7.8626,0.349,22.558,0.000,7.179,8.546
sanction_imposition,1.7893,0.518,3.453,0.001,0.774,2.805


<a id="ID_part2"></a>
### Part 2
|| [0|Top](#ID_top) || [1|Part1](#ID_part1) || [2|Part2](#ID_part2) || [3|Part3](#ID_part3) || [4|Part4](#ID_part4) || [5|Part5](#ID_part5) ||

<a id="ID_part3"></a>
### Part 3
|| [0|Top](#ID_top) || [1|Part1](#ID_part1) || [2|Part2](#ID_part2) || [3|Part3](#ID_part3) || [4|Part4](#ID_part4) || [5|Part5](#ID_part5) ||

<a id="ID_part4"></a>
### Part 4
|| [0|Top](#ID_top) || [1|Part1](#ID_part1) || [2|Part2](#ID_part2) || [3|Part3](#ID_part3) || [4|Part4](#ID_part4) || [5|Part5](#ID_part5) ||

<a id="ID_part5"></a>
### Part 5
|| [0|Top](#ID_top) || [1|Part1](#ID_part1) || [2|Part2](#ID_part2) || [3|Part3](#ID_part3) || [4|Part4](#ID_part4) || [5|Part5](#ID_part5) ||