# Trade network analysis
**Brian Dew (brianwdew@gmail.com)**

`03_deg_dist.ipynb`

This notebook estimates an alpha value for the degree distribution for each product in each year.

The distribution of weighted indegree and outdegree is estimated as $p(x) = Cx^{-\alpha}$.

Powerlaw follows the methods described in Clauset (2011). 

---

METODO:

1. Speed up run time. Update: big improvement from sorting the dataframe first and then using df.loc[prod]. Down to about 46 seconds, most of which is coming from the powerlaw line in deg_dist.

#### Import packages

In [1]:
# uses panadas, networkx, powerlaw, and os packages
import pandas as pd
import networkx as nx
import powerlaw
import os
os.chdir('C:/Working/trade_network/data/')        # Change to working directory
if not os.path.exists( 'summary/.'):              # Creates folder in directory if missing.
    os.makedirs('summary/.')

#### Define alpha estimation function

This function uses a product ID, `prod` as an input. It builds a `networkx` network for the product, calculates the `out_degree` weighted to the USD value of trade `v` (in thousands), and finally estimates how well the weighted out_degree scores fit a power law distribution. The output is an alpha score for the product. 

In [2]:
# Define function called deg_dist which calculates alpha value of degree distribution.
def deg_dist(prod):
    "Calculates the degree distribution for a product"
    try:
        G = nx.from_pandas_dataframe(df.loc[prod], 'i', 'j', 'v', nx.DiGraph())  #build network
        deg = G.out_degree(weight='v').values()         # calc weighted outdeg for each country
        pl_alpha[prod] = powerlaw.Fit(deg).power_law.alpha    # est. distrib. and save alpha value
    except Exception:                                   # some products don't converge
        pass 
    return;

#### Apply function to all data

The fucntion defined above is applied to data for each each in the range. The power law distribution is recorded for each product in a dictionary called `pl_alpha` which is saved as a separate csv file for each year.

In [3]:
%%capture 
for y in map(str, range(2008,2015)):         # start year & end year + 1 
    # read csv file for year
    df = pd.read_csv('clean/baci07_' + y + '_clean.csv', index_col='hs6', header=0).sort_index()
    df = df[['i','j','v']]         # take only relevant columns
    pl_alpha = {}                        # blank dictionary
    map(deg_dist,df.index.unique()) # This runs the program above
    pl_alpha = pd.Series(pl_alpha)            # One series from all dictionaries
    # Save as csv
    pl_alpha.to_csv('summary/deg_dist_alpha_' + y + '.csv', index=True, float_format='%g')