# Trade network analysis
**Brian Dew (brianwdew@gmail.com)**

`02_clean.ipynb`

Files used: 
* Country codes file from BACI 'country_code_baci07.csv' http://www.cepii.fr/DATA_DOWNLOAD/baci/country_code_baci07.csv
* Broad economic classification (BEC) codes from the UN 'BECnoncons.csv'
http://unstats.un.org/unsd/tradekb/Knowledgebase/Intermediate-Goods-in-Trade-Statistics

This script: 
1. opens the raw csv file for each year (unpacked in the previous step);
2. drops BEC classified final consumption products;
3. changes the country codes to ISO2; and
4. saves the end result as a new file with the ending '_clean'

Note: Script creates ~2.4GB of files and takes several minutes to run.

---

METODO: 

1. Speed up run time. Can I use `.loc` or `.ix`? 
2. Start and end year should not be manual

In [1]:
import pandas as pd
import os
os.chdir('C:/Working/trade_network/data/')
if not os.path.exists( 'clean/.'):            # This just creates a folder if one does not exist
    os.makedirs('clean/.')

In [2]:
# Create a dictionary of iso2 country codes 
iso2 = pd.read_csv('country_code_baci07.csv', index_col='i', keep_default_na=False, na_values=[''])['iso2'].to_dict()
# Create a dictionary of non final consumption goods
BECcodes = pd.read_csv('BECnoncons.csv', index_col='hs6', squeeze=True).to_dict().keys()

In [3]:
for y in map(str, range(2008,2015)):                      # start year & end year + 1 
    rawfile = 'raw/baci07_'+y+'_raw.csv'
    df = pd.read_csv(rawfile, index_col='t')             # build dataframe
    df = df[df['hs6'].isin(BECcodes)]                    # drop final consumption goods
    for v in ['i', 'j']:
        df[v] = df[v].apply(lambda x: iso2.get(x,x))     # replace country names
    df = df.dropna(axis=0)                               # drop any empty rows
    cleanfile = 'clean/baci07_'+y+'_clean.csv'          #file name
    df.to_csv(cleanfile, index=False, float_format='%g') # save as csv