In [3]:
import warnings
warnings.simplefilter('ignore', FutureWarning)

import matplotlib
matplotlib.rcParams['axes.grid'] = True # show gridlines by default
%matplotlib inline

import pandas as pd

## Getting Comtrade data into your notebook

In this exercise, you will practice loading data from Comtrade into a pandas dataframe and getting it into a form where you can start to work with it. 

The following steps and code are an example. Your task for this exercise is stated at the end, after the example.

The data is obtained from the [United Nations Comtrade](http://comtrade.un.org/data/) website, by selecting the following configuration:

- Type of Product: goods
- Frequency: monthly 
- Periods: all of 2020
- Reporter: Kenya
- Partners: all
- Flows: imports and exports
- HS (as reported) commodity codes: 0401 (Milk and cream, neither concentrated nor sweetened) and 0402 (Milk and cream, concentrated or sweetened)

Clicking on 'Preview' results in a message that the data exceeds 500 rows. Data was downloaded using the *Download CSV* button and the download file renamed appropriately.

In [12]:
LOCATION ='comtrade_milk_kenya_monthly_2020.csv'

Load the data in from the specified location, ensuring that the various codes are read as strings. Preview the first few rows of the dataset.

In [13]:
milk = pd.read_csv(LOCATION, dtype={'Commodity Code':str, 'Reporter Code':str})
milk.head(5)
milk.tail(5)

Unnamed: 0,Classification,Year,Period,Period Desc.,Aggregate Level,Is Leaf Code,Trade Flow Code,Trade Flow,Reporter Code,Reporter,...,Qty,Alt Qty Unit Code,Alt Qty Unit,Alt Qty,Netweight (kg),Gross weight (kg),Trade Value (US$),CIF Trade Value (US$),FOB Trade Value (US$),Flag
253,HS,2020,202001,January 2020,4,0,1,Imports,404,Kenya,...,,,,,36000,,99577,,,0
254,HS,2020,202001,January 2020,4,0,1,Imports,404,Kenya,...,,,,,7,,14,,,0
255,HS,2020,202001,January 2020,4,0,1,Imports,404,Kenya,...,,,,,43,,422,,,0
256,HS,2020,202001,January 2020,4,0,1,Imports,404,Kenya,...,,,,,2227,,972,,,0
257,HS,2020,202001,January 2020,4,0,1,Imports,404,Kenya,...,,,,,144600,,350598,,,0


In [18]:
#limit the columns
COLUMNS = ['Year', 'Period','Trade Flow','Reporter', 'Partner', 'Commodity','Commodity Code','Trade Value (US$)']
milk = milk[COLUMNS]
milk

Unnamed: 0,Year,Period,Trade Flow,Reporter,Partner,Commodity,Commodity Code,Trade Value (US$)
0,2020,202001,Imports,Kenya,World,Milk and cream; not concentrated nor containin...,0401,4863634
1,2020,202001,Exports,Kenya,World,Milk and cream; not concentrated nor containin...,0401,15914
2,2020,202001,Exports,Kenya,South Sudan,Milk and cream; not concentrated nor containin...,0401,480
3,2020,202001,Imports,Kenya,Uganda,Milk and cream; not concentrated nor containin...,0401,4863634
4,2020,202001,Exports,Kenya,Uganda,Milk and cream; not concentrated nor containin...,0401,14166
5,2020,202001,Exports,Kenya,Bunkers,Milk and cream; not concentrated nor containin...,0401,1268
6,2020,202002,Imports,Kenya,World,Milk and cream; not concentrated nor containin...,0401,3456994
7,2020,202002,Exports,Kenya,World,Milk and cream; not concentrated nor containin...,0401,3511
8,2020,202002,Imports,Kenya,Denmark,Milk and cream; not concentrated nor containin...,0401,24552
9,2020,202002,Imports,Kenya,Rwanda,Milk and cream; not concentrated nor containin...,0401,14616


Derive two new dataframes that separate out the 'World' partner data and the data for individual partner countries.

In [19]:
milk_world = milk[milk['Partner'] == 'World']
milk_countries = milk[milk['Partner'] != 'World']

In [20]:
#store as csv
milk_countries.to_csv('kenyamilk.csv',index=False)

To load the data back in:

In [21]:
load_test= pd.read_csv('kenyamilk.csv', dtype={'Commodity Code':str,'Reporter Code':str})
load_test.head(3)

Unnamed: 0,Year,Period,Trade Flow,Reporter,Partner,Commodity,Commodity Code,Trade Value (US$)
0,2020,202001,Exports,Kenya,South Sudan,Milk and cream; not concentrated nor containin...,401,480
1,2020,202001,Imports,Kenya,Uganda,Milk and cream; not concentrated nor containin...,401,4863634
2,2020,202001,Exports,Kenya,Uganda,Milk and cream; not concentrated nor containin...,401,14166


### Subsetting Your Data
For large or heterogenous datasets, it is often convenient to create subsets of the data. To further separate out the imports:

In [24]:
milk_imports = milk[milk['Trade Flow'] == 'Imports']
milk_countries_imports = milk_countries[milk_countries['Trade Flow'] == 'Imports']
milk_world_imports=milk_world[milk_world['Trade Flow'] == 'Imports']

### Sorting the data

Having loaded in the data, find the most valuable partners in terms of import trade flow during a particular month by sorting the data by *decreasing* trade value and then selecting the top few rows.

In [26]:
milkImportsInJanuary2020 = milk_countries_imports[milk_countries_imports['Period'] == 202001]
milkImportsInJanuary2020.sort_values('Trade Value (US$)',ascending=False).head(10)

Unnamed: 0,Year,Period,Trade Flow,Reporter,Partner,Commodity,Commodity Code,Trade Value (US$)
3,2020,202001,Imports,Kenya,Uganda,Milk and cream; not concentrated nor containin...,401,4863634
250,2020,202001,Imports,Kenya,Belgium,Milk and cream; concentrated or containing add...,402,926089
252,2020,202001,Imports,Kenya,Netherlands,Milk and cream; concentrated or containing add...,402,412013
257,2020,202001,Imports,Kenya,Uganda,Milk and cream; concentrated or containing add...,402,350598
253,2020,202001,Imports,Kenya,New Zealand,Milk and cream; concentrated or containing add...,402,99577
256,2020,202001,Imports,Kenya,United Arab Emirates,Milk and cream; concentrated or containing add...,402,972
255,2020,202001,Imports,Kenya,United Kingdom,Milk and cream; concentrated or containing add...,402,422
251,2020,202001,Imports,Kenya,Lebanon,Milk and cream; concentrated or containing add...,402,41
254,2020,202001,Imports,Kenya,South Africa,Milk and cream; concentrated or containing add...,402,14


### Grouping the data
Split the data into two different subsets of data (imports and exports), by grouping on trade flow.

In [27]:
groups = milk_countries.groupby('Trade Flow')

In [28]:
groups.get_group('Imports').head()

Unnamed: 0,Year,Period,Trade Flow,Reporter,Partner,Commodity,Commodity Code,Trade Value (US$)
3,2020,202001,Imports,Kenya,Uganda,Milk and cream; not concentrated nor containin...,401,4863634
8,2020,202002,Imports,Kenya,Denmark,Milk and cream; not concentrated nor containin...,401,24552
9,2020,202002,Imports,Kenya,Rwanda,Milk and cream; not concentrated nor containin...,401,14616
11,2020,202002,Imports,Kenya,Uganda,Milk and cream; not concentrated nor containin...,401,3417826
15,2020,202003,Imports,Kenya,Belgium,Milk and cream; not concentrated nor containin...,401,2838


As well as grouping on a single term, you can create groups based on multiple columns by passing in several column names as a list. For example, generate groups based on commodity code and trade flow, and then preview the keys used to define the groups

In [29]:
GROUPING_COMMFLOW = ['Commodity Code','Trade Flow']

groups = milk_countries.groupby(GROUPING_COMMFLOW)
groups.groups.keys()

dict_keys([('0401', 'Exports'), ('0401', 'Imports'), ('0402', 'Exports'), ('0402', 'Imports')])

Retrieve a group based on multiple group levels by passing in a tuple that specifies a value for each index column. For example, if a grouping is based on the 'Partner' and 'Trade Flow' columns, the argument of get_group has to be a partner/flow pair, like ('Uganda', 'Import') to get all rows associated with imports from Uganda.

In [30]:
GROUPING_PARTNERFLOW = ['Partner','Trade Flow']
groups = milk_countries.groupby(GROUPING_PARTNERFLOW)

GROUP_PARTNERFLOW= ('Uganda','Imports')
groups.get_group( GROUP_PARTNERFLOW )

Unnamed: 0,Year,Period,Trade Flow,Reporter,Partner,Commodity,Commodity Code,Trade Value (US$)
3,2020,202001,Imports,Kenya,Uganda,Milk and cream; not concentrated nor containin...,401,4863634
11,2020,202002,Imports,Kenya,Uganda,Milk and cream; not concentrated nor containin...,401,3417826
20,2020,202003,Imports,Kenya,Uganda,Milk and cream; not concentrated nor containin...,401,3879387
27,2020,202004,Imports,Kenya,Uganda,Milk and cream; not concentrated nor containin...,401,4958907
34,2020,202005,Imports,Kenya,Uganda,Milk and cream; not concentrated nor containin...,401,3209854
42,2020,202006,Imports,Kenya,Uganda,Milk and cream; not concentrated nor containin...,401,2990831
55,2020,202007,Imports,Kenya,Uganda,Milk and cream; not concentrated nor containin...,401,3652206
66,2020,202008,Imports,Kenya,Uganda,Milk and cream; not concentrated nor containin...,401,3222280
78,2020,202010,Imports,Kenya,Uganda,Milk and cream; not concentrated nor containin...,401,3917276
90,2020,202011,Imports,Kenya,Uganda,Milk and cream; not concentrated nor containin...,401,3503799


To find the leading partner for a particular commodity, group by commodity, get the desired group, and then sort the result.

In [31]:
groups = milk_countries.groupby(['Commodity Code'])
groups.get_group('0402').sort_values("Trade Value (US$)", ascending=False).head()

Unnamed: 0,Year,Period,Trade Flow,Reporter,Partner,Commodity,Commodity Code,Trade Value (US$)
144,2020,202012,Imports,Kenya,Uganda,Milk and cream; concentrated or containing add...,402,3396617
131,2020,202010,Imports,Kenya,Uganda,Milk and cream; concentrated or containing add...,402,3351424
114,2020,202006,Imports,Kenya,Uganda,Milk and cream; concentrated or containing add...,402,2619525
186,2020,202011,Imports,Kenya,Uganda,Milk and cream; concentrated or containing add...,402,2270889
127,2020,202010,Imports,Kenya,Netherlands,Milk and cream; concentrated or containing add...,402,1995667
