<a id="ID_top"></a>
## Spatial Interaction / Gravity Model

Create a very simple sample SIM / Gravity module with two packages

[Pysal package](https://github.com/pysal/spint) this includes some notebook examples `pip install spint==1.0.6`<br>
[GME package](https://www.usitc.gov/data/gravity/gme_docs/) with some guides / one tutorial [here](https://www.usitc.gov/data/gravity/gme_docs/estimation_tutorial/) `pip install gme`

#### load other scripts with 
`%load script_filepaths.py`

#### Notebook sections:
    
|| [0|Top](#ID_top) || [1|Part1](#ID_part1) || [2|Part2](#ID_part2) || [3|Part3](#ID_part3) || [4|Part4](#ID_part4) || [5|Part5](#ID_part5) ||

In [18]:
#=== Packages
import gme as gme
import pandas as pd
import numpy as np

# some package settings
pd.options.display.max_columns = None # don't truncate columns

In [5]:
# %load script_filepaths.py
# This script allows one to load and correct raw files before saving them again.
file_path_0_raw       = "./0_raw/"
file_path_1_backup    = "./1_raw_processed_backup/"
file_path_2_input     = "./2_raw_processed_input/"
file_path_3_generated = "./3_generated_inputs/"

<a id="ID_part1"></a>
### Part 1 | GME tutorial
|| [0|Top](#ID_top) || [1|Part1](#ID_part1) || [2|Part2](#ID_part2) || [3|Part3](#ID_part3) || [4|Part4](#ID_part4) || [5|Part5](#ID_part5) ||

#### Import data | TRADE FLOW

In [115]:
# input file name
file_name = "input_un_sample.csv.gzip"

# load in the data
gravity_data = pd.read_csv(f"{file_path_2_input}{file_name}",compression="gzip")

# view data
print(gravity_data.columns)
gravity_data.head()

Index(['Unnamed: 0', 'Classification', 'Year', 'Period', 'Period Desc.',
       'Aggregate Level', 'Is Leaf Code', 'Trade Flow Code', 'Trade Flow',
       'Reporter Code', 'Reporter', 'Reporter ISO', 'Partner Code', 'Partner',
       'Partner ISO', '2nd Partner Code', '2nd Partner', '2nd Partner ISO',
       'Customs Proc. Code', 'Customs', 'Mode of Transport Code',
       'Mode of Transport', 'Commodity Code', 'Commodity', 'Qty Unit Code',
       'Qty Unit', 'Qty', 'Alt Qty Unit Code', 'Alt Qty Unit', 'Alt Qty',
       'Netweight (kg)', 'Gross weight (kg)', 'Trade Value (US$)',
       'CIF Trade Value (US$)', 'FOB Trade Value (US$)', 'Flag'],
      dtype='object')


Unnamed: 0.1,Unnamed: 0,Classification,Year,Period,Period Desc.,Aggregate Level,Is Leaf Code,Trade Flow Code,Trade Flow,Reporter Code,Reporter,Reporter ISO,Partner Code,Partner,Partner ISO,2nd Partner Code,2nd Partner,2nd Partner ISO,Customs Proc. Code,Customs,Mode of Transport Code,Mode of Transport,Commodity Code,Commodity,Qty Unit Code,Qty Unit,Qty,Alt Qty Unit Code,Alt Qty Unit,Alt Qty,Netweight (kg),Gross weight (kg),Trade Value (US$),CIF Trade Value (US$),FOB Trade Value (US$),Flag
0,0,H5,2018,2018,2018,0,0,1,Import,156,China,CHN,156,China,CHN,,,,,,,,TOTAL,All Commodities,1,No Quantity,0.0,,,,0.0,,146381811975,,,4
1,1,H5,2018,2018,2018,0,0,4,Re-Import,156,China,CHN,156,China,CHN,,,,,,,,TOTAL,All Commodities,1,No Quantity,0.0,,,,0.0,,146381811975,,,4
2,2,H5,2018,2018,2018,0,0,1,Import,156,China,CHN,276,Germany,DEU,,,,,,,,TOTAL,All Commodities,1,No Quantity,0.0,,,,0.0,,106257241330,,,4
3,3,H5,2018,2018,2018,0,0,2,Export,156,China,CHN,276,Germany,DEU,,,,,,,,TOTAL,All Commodities,1,No Quantity,0.0,,,,0.0,,77908711119,,,4
4,4,H5,2018,2018,2018,0,0,1,Import,156,China,CHN,392,Japan,JPN,,,,,,,,TOTAL,All Commodities,1,No Quantity,0.0,,,,0.0,,180401786146,,,4


In [116]:
# Filter dataframe to eliminate duplicate entries of country importing from itself
row_index_to_drop = []

for row in np.arange(0,len(gravity_data),1):
    
    if gravity_data.loc[row,"Reporter ISO"] == gravity_data.loc[row,"Partner ISO"]:
        row_index_to_drop.append(row)
    else:
        pass

gravity_data.drop(row_index_to_drop,axis = 0,inplace = True)

In [117]:
# drop any columns/rows with all nan values
gravity_data.dropna(axis = "columns",how = "all",inplace = True)

# isolate imports only
gravity_data = gravity_data[gravity_data["Trade Flow"] == "Import"].copy()

#### Import data | GRAVITY EXPLANATORY DATASET

In [72]:
# input file name
file_name = "input_dynamic_gravity.csv.gzip"

# load in the data
meta_data = pd.read_csv(f"{file_path_2_input}{file_name}",compression="gzip")

# view data
print(meta_data.columns)
meta_data.head(5)

Index(['Unnamed: 0', 'year', 'country_d', 'iso3_d', 'dynamic_code_d',
       'landlocked_d', 'island_d', 'region_d', 'gdp_pwt_const_d', 'pop_d',
       'gdp_pwt_cur_d', 'capital_cur_d', 'capital_const_d', 'gdp_wdi_cur_d',
       'gdp_wdi_const_d', 'gdp_wdi_cap_cur_d', 'gdp_wdi_cap_const_d', 'lat_d',
       'lng_d', 'polity_d', 'polity_abs_d', 'country_o', 'iso3_o',
       'dynamic_code_o', 'landlocked_o', 'island_o', 'region_o',
       'gdp_pwt_const_o', 'pop_o', 'gdp_pwt_cur_o', 'capital_cur_o',
       'capital_const_o', 'gdp_wdi_cur_o', 'gdp_wdi_const_o',
       'gdp_wdi_cap_cur_o', 'gdp_wdi_cap_const_o', 'lat_o', 'lng_o',
       'polity_o', 'polity_abs_o', 'contiguity', 'agree_pta_goods',
       'agree_pta_services', 'agree_cu', 'agree_eia', 'agree_fta', 'agree_psa',
       'agree_pta', 'sanction_threat', 'sanction_threat_trade',
       'sanction_imposition', 'sanction_imposition_trade', 'member_eu_o',
       'member_wto_o', 'member_gatt_o', 'member_eu_d', 'member_wto_d',
       'me

Unnamed: 0.1,Unnamed: 0,year,country_d,iso3_d,dynamic_code_d,landlocked_d,island_d,region_d,gdp_pwt_const_d,pop_d,gdp_pwt_cur_d,capital_cur_d,capital_const_d,gdp_wdi_cur_d,gdp_wdi_const_d,gdp_wdi_cap_cur_d,gdp_wdi_cap_const_d,lat_d,lng_d,polity_d,polity_abs_d,country_o,iso3_o,dynamic_code_o,landlocked_o,island_o,region_o,gdp_pwt_const_o,pop_o,gdp_pwt_cur_o,capital_cur_o,capital_const_o,gdp_wdi_cur_o,gdp_wdi_const_o,gdp_wdi_cap_cur_o,gdp_wdi_cap_const_o,lat_o,lng_o,polity_o,polity_abs_o,contiguity,agree_pta_goods,agree_pta_services,agree_cu,agree_eia,agree_fta,agree_psa,agree_pta,sanction_threat,sanction_threat_trade,sanction_imposition,sanction_imposition_trade,member_eu_o,member_wto_o,member_gatt_o,member_eu_d,member_wto_d,member_gatt_d,member_eu_joint,member_wto_joint,member_gatt_joint,hostility_level_o,hostility_level_d,distance,common_language,colony_of_destination_after45,colony_of_destination_current,colony_of_destination_ever,colony_of_origin_after45,colony_of_origin_current,colony_of_origin_ever
0,0,2005,Aruba,ABW,ABW,0,1,caribbean,3906.5203,0.100031,4093.2434,23531.377,24173.982,2331006000.0,,23302.831988,,12.530384,-70.028992,,,Netherlands Antilles,ANT,ANT.X,0,0,caribbean,,,,,,,,,,12.250778,-69.301224,,,0,1,0,0,0,1,0,1,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,120.05867,1,0,0,0,0,0,0
1,1,2006,Aruba,ABW,ABW,0,1,caribbean,4118.1396,0.10083,4217.0669,25757.818,25396.307,2421475000.0,,24015.420612,,12.530384,-70.028992,,,Anguilla,AIA,AIA,0,1,caribbean,348.7688,0.012903,365.93643,2471.682,2342.796,,,,,18.217348,-63.057232,,,0,1,0,0,0,1,0,1,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,978.77728,1,0,0,0,0,0,0
2,2,2007,Aruba,ABW,ABW,0,1,caribbean,4196.4634,0.101218,4248.4707,27375.447,26631.465,2623726000.0,,25921.538234,,12.530384,-70.028992,,,Sao Tome and Principe,STP,STP,0,1,africa,391.01483,0.160064,392.44177,1101.736,3205.526,145827400.0,167044600.0,911.057012,1043.611485,0.989202,7.072665,,,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,8563.6963,0,0,0,0,0,0,0
3,3,2008,Aruba,ABW,ABW,0,1,caribbean,4433.6772,0.101342,4441.8828,28639.586,27871.596,2791961000.0,,27549.889422,,12.530384,-70.028992,,,Andorra,AND,AND,1,0,europe,,,,,,4001201000.0,3675947000.0,46734.268282,42935.277871,42.5,1.516486,,,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0,0,0,0,0,0,0,0,0,0,0,7562.6733,0,0,0,0,0,0,0
4,4,2009,Aruba,ABW,ABW,0,1,caribbean,4183.0449,0.101416,4304.9224,29400.539,29122.635,2498933000.0,,24640.421244,,12.530384,-70.028992,,,Philippines,PHL,PHL,0,1,south_east_asia,458079.81,91.641881,460142.72,1420047.0,1624159.0,168334600000.0,185437700000.0,1836.87412,2023.503659,11.817977,122.77502,8.0,8.0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,0,1,1,0,0,0,0,0,0,0,0,16904.596,1,0,0,0,0,0,0


In [87]:
# filter by year (random two years)
meta_data_filtered = meta_data[meta_data.year.isin([2015,2010])].copy()
# filter destination by countries 
meta_data_filtered = meta_data_filtered[meta_data_filtered.iso3_d.isin(gravity_data["Reporter ISO"].unique())].copy()
# filter origin by countries
meta_data_filtered = meta_data_filtered[meta_data_filtered.iso3_o.isin(gravity_data["Reporter ISO"].unique())].copy()

# eliminate self country links
meta_data_filtered = meta_data_filtered[~(meta_data_filtered.iso3_d == meta_data_filtered.iso3_o)]

40

In [148]:
# slim column selection
columns_to_keep = [
    'year', 'country_d', 'iso3_d','distance', 'common_language',
    'country_o', 'iso3_o',"gdp_wdi_const_d","gdp_wdi_const_o"
                  ]

meta_data_filtered_slim = meta_data_filtered.loc[:,columns_to_keep].copy()
print(len(meta_data_filtered_slim))
meta_data_filtered_slim

40


Unnamed: 0,year,country_d,iso3_d,distance,common_language,country_o,iso3_o,gdp_wdi_const_d,gdp_wdi_const_o
132652,2010,China,CHN,11454.236,0,United States,USA,6100620000000.0,14964370000000.0
132674,2010,China,CHN,8634.542,0,United Kingdom,GBR,6100620000000.0,2429603000000.0
132713,2010,China,CHN,2236.3628,0,Japan,JPN,6100620000000.0,5700096000000.0
132750,2010,China,CHN,8159.2344,0,Germany,DEU,6100620000000.0,3417298000000.0
133890,2015,China,CHN,8159.2344,0,Germany,DEU,8909812000000.0,3696833000000.0
133983,2015,China,CHN,11454.236,0,United States,USA,8909812000000.0,16597450000000.0
134052,2015,China,CHN,8634.542,0,United Kingdom,GBR,8909812000000.0,2682177000000.0
134084,2015,China,CHN,2236.3628,0,Japan,JPN,8909812000000.0,5986138000000.0
179054,2010,Germany,DEU,7593.9678,0,United States,USA,3417298000000.0,14964370000000.0
179133,2010,Germany,DEU,782.87354,0,United Kingdom,GBR,3417298000000.0,2429603000000.0


#### Join data

In [153]:
# [TEMP] hack years
temp_year_index_max = list(gravity_data[gravity_data.Period == 2018].index)
temp_year_index_min = list(gravity_data[gravity_data.Period == 2015].index)

gravity_data.loc[temp_year_index_max,"Year"] = 2015
gravity_data.loc[temp_year_index_min,"Year"] = 2010

columns_to_keep = ["Year","Period",'Reporter', 'Reporter ISO','Partner','Partner ISO','Trade Value (US$)']
gravity_data_slim = gravity_data.loc[:,columns_to_keep].copy()

print(len(gravity_data_slim))
gravity_data_slim.head()

40


Unnamed: 0,Year,Period,Reporter,Reporter ISO,Partner,Partner ISO,Trade Value (US$)
2,2015,2018,China,CHN,Germany,DEU,106257241330
4,2015,2018,China,CHN,Japan,JPN,180401786146
6,2015,2018,China,CHN,United Kingdom,GBR,23893335363
8,2015,2018,China,CHN,USA,USA,156004352076
10,2015,2018,Germany,DEU,China,CHN,126750945327


In [154]:
# Join on year,iso_d, iso_o
df_grav_join = gravity_data_slim.merge(meta_data_filtered_slim,
                                       # year / origin / destination
                                      left_on  = ["Year","Reporter ISO","Partner ISO"],
                                      right_on = ["year","iso3_o","iso3_d"])

df_grav_join.head()

Unnamed: 0,Year,Period,Reporter,Reporter ISO,Partner,Partner ISO,Trade Value (US$),year,country_d,iso3_d,distance,common_language,country_o,iso3_o,gdp_wdi_const_d,gdp_wdi_const_o
0,2015,2018,China,CHN,Germany,DEU,106257241330,2015,Germany,DEU,8159.2344,0,China,CHN,3696833000000.0,8909812000000.0
1,2015,2018,China,CHN,Japan,JPN,180401786146,2015,Japan,JPN,2236.3628,0,China,CHN,5986138000000.0,8909812000000.0
2,2015,2018,China,CHN,United Kingdom,GBR,23893335363,2015,United Kingdom,GBR,8634.542,0,China,CHN,2682177000000.0,8909812000000.0
3,2015,2018,China,CHN,USA,USA,156004352076,2015,United States,USA,11454.236,0,China,CHN,16597450000000.0,8909812000000.0
4,2015,2018,Germany,DEU,China,CHN,126750945327,2015,China,CHN,8159.2344,0,Germany,DEU,8909812000000.0,3696833000000.0


#### Create a GME dataset

In [155]:
gme_data = gme.EstimationData(
    data_frame = df_grav_join,
    # column with importer/exporter ID
    imp_var_name = "Reporter ISO",
    exp_var_name= "Partner ISO",
    # column with trade volumes
    trade_var_name = "Trade Value (US$)",
    # year column
    year_var_name= "Year"
    # can also have sector and notes objects
    )

#### Working with a GME dataset

In [156]:
# calling the object provides little summary
gme_data

number of countries: 5 
number of exporters: 5 
number of importers: 5 
number of years: 2 
number of sectors: not_applicable 
dimensions: (40, 16)

**Note:**
not all of these will be written out, but there are functions that can be used for descriptive or exploratory use built in. Can display countries by year `.countries_each_year()`, columns `.columns` and `.dtypes()` in a similar way to native pandas

In [157]:
# Info for each column
gme_data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 40 entries, 0 to 39
Data columns (total 16 columns):
Year                 40 non-null int64
Period               40 non-null int64
Reporter             40 non-null object
Reporter ISO         40 non-null object
Partner              40 non-null object
Partner ISO          40 non-null object
Trade Value (US$)    40 non-null int64
year                 40 non-null int64
country_d            40 non-null object
iso3_d               40 non-null object
distance             40 non-null float64
common_language      40 non-null int64
country_o            40 non-null object
iso3_o               40 non-null object
gdp_wdi_const_d      40 non-null float64
gdp_wdi_const_o      40 non-null float64
dtypes: float64(3), int64(5), object(8)
memory usage: 5.3+ KB


In [37]:
# Use to call values of specific column
#gme_data.data_frame["Reporter Code"]

#### Creating and estimating a model
Two main steps (1) defining a model and (2) estimating the model

In [158]:
# simple model baseline, where trade value is dependedent on certain variables

model_baseline = gme.EstimationModel(
    # the data object created above
    estimation_data= gme_data,
    # dependent or left hand side variable in a regression equation
    lhs_var = "Trade Value (US$)",
    rhs_var = ["distance","common_language","gdp_wdi_const_o","gdp_wdi_const_d"] # these variables need to come from the gravity dataset not UNCOMTRADE             
                                    )

In [161]:
estimate = model_baseline.estimate()

select specification variables: ['distance', 'common_language', 'gdp_wdi_const_o', 'gdp_wdi_const_d', 'Trade Value (US$)', 'Reporter ISO', 'Partner ISO', 'Year'], Observations excluded by user: {'rows': 0, 'columns': 8}
drop_intratrade: no, Observations excluded by user: {'rows': 0, 'columns': 0}
drop_imp: none, Observations excluded by user: {'rows': 0, 'columns': 0}
drop_exp: none, Observations excluded by user: {'rows': 0, 'columns': 0}
keep_imp: all available, Observations excluded by user: {'rows': 0, 'columns': 0}
keep_exp: all available, Observations excluded by user: {'rows': 0, 'columns': 0}
drop_years: none, Observations excluded by user: {'rows': 0, 'columns': 0}
keep_years: all available, Observations excluded by user: {'rows': 0, 'columns': 0}
drop_missing: yes, Observations excluded by user: {'rows': 0, 'columns': 0}
Estimation began at 10:30 PM  on Jun 17, 2020
Omitted Columns: []
Estimation completed at 10:30 PM  on Jun 17, 2020


In [164]:
results = estimate["all"]
results.summary()

0,1,2,3
Dep. Variable:,Trade Value (US$),No. Iterations:,1000.0
Model:,GLM,Df Residuals:,36.0
Model Family:,Poisson,Df Model:,3.0
Link Function:,log,Scale:,1.0
Method:,IRLS,Log-Likelihood:,-24046000000000.0
Covariance Type:,HC1,Deviance:,48092000000000.0
No. Observations:,40,Pearson chi2:,5.63e+20

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
distance,0.0015,0.001,1.216,0.224,-0.001,0.004
common_language,6.7396,2.215,3.043,0.002,2.248,11.232
gdp_wdi_const_o,4.487e-13,5.74e-13,0.781,0.435,-7.16e-13,1.61e-12
gdp_wdi_const_d,3.719e-13,5.7e-13,0.652,0.514,-7.84e-13,1.53e-12


<a id="ID_part2"></a>
### Part 2
|| [0|Top](#ID_top) || [1|Part1](#ID_part1) || [2|Part2](#ID_part2) || [3|Part3](#ID_part3) || [4|Part4](#ID_part4) || [5|Part5](#ID_part5) ||

<a id="ID_part3"></a>
### Part 3
|| [0|Top](#ID_top) || [1|Part1](#ID_part1) || [2|Part2](#ID_part2) || [3|Part3](#ID_part3) || [4|Part4](#ID_part4) || [5|Part5](#ID_part5) ||

<a id="ID_part4"></a>
### Part 4
|| [0|Top](#ID_top) || [1|Part1](#ID_part1) || [2|Part2](#ID_part2) || [3|Part3](#ID_part3) || [4|Part4](#ID_part4) || [5|Part5](#ID_part5) ||

<a id="ID_part5"></a>
### Part 5
|| [0|Top](#ID_top) || [1|Part1](#ID_part1) || [2|Part2](#ID_part2) || [3|Part3](#ID_part3) || [4|Part4](#ID_part4) || [5|Part5](#ID_part5) ||