# Trade network analysis
**Brian Dew (brianwdew@gmail.com)**

`03_deg_dist.ipynb`

This notebook estimates an alpha value for the degree distribution for each product in each year.

The distribution of weighted indegree and outdegree is estimated as $p(x) = Cx^{-\alpha}$.

Powerlaw follows the methods described in [Clauset et al (2009)](https://arxiv.org/abs/0706.1062). 

---

METODO:

1. Add description of weighted outdegree centrality and what it represents!

#### Import packages

In [1]:
# uses panadas, networkx, powerlaw, and os packages
import pandas as pd
import networkx as nx
import powerlaw
import os
os.chdir('C:/Working/trade_network/data/')        # Change to working directory
if not os.path.exists( 'summary/.'):              # Creates folder in directory if missing.
    os.makedirs('summary/.')

#### Define alpha estimation function

This function uses a product ID, `prod` as an input. It builds a `networkx` network for the product, calculates the `out_degree` weighted to the USD value of trade `v` (in thousands), and finally estimates how well the weighted `out_degree` scores fit a power law distribution. The output is an alpha score for the product. 

Weighted outdegree:

Each country's USD export volume for each product as a share of total exports of all countries.

In [2]:
# Define function called deg_dist which calculates alpha value of degree distribution.
def deg_dist(prod):
    "Calculates the degree distribution for a product"
    try:
        G = nx.from_pandas_dataframe(df.loc[prod], 'i', 'j', 'v', nx.DiGraph())  #build network
        deg = G.out_degree(weight='v').values()         # calc weighted outdeg for each country
        fit = powerlaw.Fit(deg)                          # est. distrib.\
        pl_alpha[prod] = fit.power_law.alpha                   # Estimated alpha
        pl_sigma[prod] = fit.power_law.sigma                   # Estimated standard deviation
    except Exception:                                   # some products don't converge
        pass 
    return;

#### Apply function to all data

The fucntion defined above is applied to data for each year `y` in the range. The power law distribution is recorded for each product in a dictionary called `pl_alpha` which is saved as a separate csv file for each year.

In [3]:
%%capture 
for y in map(str, range(2008,2015)):         # start year & end year + 1 
    # read csv file for year
    df = pd.read_csv('clean/baci07_' + y + '_clean.csv', index_col='hs6', header=0).sort_index()
    df = df[['i','j','v']]         # take only relevant columns
    pl_alpha = {}                        # blank dictionary
    pl_sigma = {}
    map(deg_dist,df.index.unique()) # This runs the program above
    pl_alpha = pd.Series(pl_alpha)            
    pl_sigma = pd.Series(pl_sigma)
    combined = pd.concat([pl_alpha, pl_sigma], axis=1) # One series from all dictionaries
    # Save as csv
    combined.to_csv('summary/deg_dist_alpha_' + y + '.csv', index=True, float_format='%g')

#### Dummy csv

Creates a csv file that lists each product that has a fat-tailed centrality score (alpha greater than 2). 

In [4]:
fat_tailed_prod = {}
for y in range(2008,2015):
    prod = pd.read_csv('summary/deg_dist_alpha_'+str(y)+'.csv').set_index('Unnamed: 0')
    fat_tailed_prod[y] = prod[prod['0'] > 1.999]                              # fat tail is alpha above 2
fat_tailed_prod = pd.concat(fat_tailed_prod, axis=0).reset_index()
fat_tailed_prod['c'] = 1                                                # to merge and create dummy
fat_tailed_prod.columns = ['year', 'hs6', 'alpha','sigma', 'c']
fat_tailed_prod[['year','hs6','c']].to_csv('fat_tailed_prod.csv', index=None)    # save as csv file