<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Load-the-data" data-toc-modified-id="Load-the-data-1">Load the data</a></span></li><li><span><a href="#Tidying-data" data-toc-modified-id="Tidying-data-2">Tidying data</a></span></li><li><span><a href="#Data-Visualization-with" data-toc-modified-id="Data-Visualization-with-3">Data Visualization with</a></span><ul class="toc-item"><li><span><a href="#Plot-for-annual-produce-over-years" data-toc-modified-id="Plot-for-annual-produce-over-years-3.1">Plot for annual produce over years</a></span></li><li><span><a href="#Food-v.-Feed" data-toc-modified-id="Food-v.-Feed-3.2">Food v. Feed</a></span></li></ul></li></ul></div>

In [1]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
%matplotlib inline 
import seaborn as sns
plt.style.use('seaborn-darkgrid')
import holoviews as hv
from holoext.bokeh import Mod
hv.extension('bokeh', 'matplotlib')

# Load the data

In [76]:
%opts HeatMap {+framewise}
#%output max_frames=2000 #widgets='live' #size = 400
hv.Dimension.type_formatters[np.datetime64] = '%Y'
# When working on a live server append widgets='live' to 
# the line above for greatly improved performance and memory usage

In [3]:
df = pd.read_csv(
    '/home/abanihirwe/.kaggle/datasets/dorbicycle/world-foodfeed-production/FAO.csv',
    encoding="ISO-8859-1")
df.head()

Unnamed: 0,Area Abbreviation,Area Code,Area,Item Code,Item,Element Code,Element,Unit,latitude,longitude,...,Y2004,Y2005,Y2006,Y2007,Y2008,Y2009,Y2010,Y2011,Y2012,Y2013
0,AFG,2,Afghanistan,2511,Wheat and products,5142,Food,1000 tonnes,33.94,67.71,...,3249.0,3486.0,3704.0,4164.0,4252.0,4538.0,4605.0,4711.0,4810,4895
1,AFG,2,Afghanistan,2805,Rice (Milled Equivalent),5142,Food,1000 tonnes,33.94,67.71,...,419.0,445.0,546.0,455.0,490.0,415.0,442.0,476.0,425,422
2,AFG,2,Afghanistan,2513,Barley and products,5521,Feed,1000 tonnes,33.94,67.71,...,58.0,236.0,262.0,263.0,230.0,379.0,315.0,203.0,367,360
3,AFG,2,Afghanistan,2513,Barley and products,5142,Food,1000 tonnes,33.94,67.71,...,185.0,43.0,44.0,48.0,62.0,55.0,60.0,72.0,78,89
4,AFG,2,Afghanistan,2514,Maize and products,5521,Feed,1000 tonnes,33.94,67.71,...,120.0,208.0,233.0,249.0,247.0,195.0,178.0,191.0,200,200


# Tidying data



We need to make sure our data is tidy. Before we do any plots, filtering, transformations, summary statistics, regressions... Without a tidy dataset, we'll be fighting our tools to get the result we need. With a tidy dataset, it's relatively easy to do all of those.

In his 2014 paper, [Tidy Data](http://vita.had.co.nz/papers/tidy-data.pdf), Hadley Wickham kindly summarized tidiness as a dataset where:

- Each variable forms a column
- Each observation forms a row
- Each type of observational unit forms a table

For a well detailed tutorial on tidying data, check this [blog article](http://www.jeannicholashould.com/tidy-data-in-python.html).

For this dataset, we are interested in exploring questions that are based on the change in produce volume over years. Therefore, let's create a tidy dataset which has `year` as one of the columns. We will use `pandas.melt`.

`pandas.melt` works by taking observations that are spread across columns `Y1961, Y1962, ...`, and melting them down into one column with multiple rows. However, we don't want to lose the metadata (`Area`, `latitude`, etc) that is shared between the observations. By including those columns as `id_vars`, the values will be repeated as many times as needed to stay with their observations.

In [4]:
# Get a list of years 
year_list = list(df.iloc[:,10:].columns)

In [5]:
tidy = pd.melt(
    df.reset_index(),
    id_vars=[
        'Area', 'Area Abbreviation', 'Element', 'Item', 'latitude', 'longitude'
    ],
    value_vars=year_list,
    var_name= 'year',
    value_name='produce_quantity')
tidy['year'] = tidy['year'].apply(lambda x: pd.to_datetime(x[1:]+'-12-15'))
tidy['produce_quantity'].fillna(0, inplace=True)

tidy.head()

Unnamed: 0,Area,Area Abbreviation,Element,Item,latitude,longitude,year,produce_quantity
0,Afghanistan,AFG,Food,Wheat and products,33.94,67.71,1961-12-15,1928.0
1,Afghanistan,AFG,Food,Rice (Milled Equivalent),33.94,67.71,1961-12-15,183.0
2,Afghanistan,AFG,Feed,Barley and products,33.94,67.71,1961-12-15,76.0
3,Afghanistan,AFG,Food,Barley and products,33.94,67.71,1961-12-15,237.0
4,Afghanistan,AFG,Feed,Maize and products,33.94,67.71,1961-12-15,210.0


# Data Visualization with 
![](https://i.imgur.com/gvrbAjo.png)


We will first make a HoloViews object called a Dataset that declares the independent variables (called key dimensions or `kdims` in HoloViews) and dependent variables (called value dimensions or `vdims` ) that we want to work with: 

In [12]:
dataset = hv.Dataset(
    tidy,
    kdims=[('Area', 'Country'), 'Area Abbreviation', 'Element', 'Item',
           'latitude', 'longitude', ('year', 'Year')],
    vdims=[('produce_quantity', 'Quantity')])

Here we've used an optional tuple-based syntax `(name,label)` to specify a more meaningful description for some of the vdims and kdims, while using the original short descriptions for the rest kdims .

## Plot for annual produce over years

In [13]:
volume_range = 0, tidy.produce_quantity.max()
countries = [
    'Brazil', 'United States of America', 'India', 'China, mainland', 'Japan',
    'Russian Federation', 'France', 'Germany', 'Turkey', 'Mexico', 'Thailand',
    'Indonesia', 'Viet Nam'
]

In [14]:
curve = dataset.select(Area=countries).to(
    hv.Curve,
    kdims=['year'],
    vdims=['produce_quantity'],
    groupby=['Element', 'Item', 'Area'],
    dynamic=True).redim.range(produce_quantity=volume_range)

In [15]:
Mod(xrotation=60, width=700, height=600,
    title='Annual Produce over Years').apply(
        curve.redim.unit(produce_quantity='Kilotonne'))

## Food v. Feed

In [34]:
element_agg = dataset.aggregate(dimensions=['Element'], function=np.sum)

In [37]:
Mod(title='Global Produce Quantity per Element').apply(
    hv.Bars(element_agg).redim.unit(produce_quantity='Kilotonne'))

In [88]:
%%opts HeatMap [xrotation=90 width=730 height=1300 colorbar=True tools=['hover'] toolbar='above'] (cmap='viridis')
ds = hv.Dataset(tidy, kdims=['year', 'Item', 'Area'])
hmap = ds.select(Area=countries).to(
    hv.HeatMap,
    vdims=['produce_quantity'],
    dynamic=True).redim.unit(produce_quantity='Kilotonne')

In [89]:
hmap

In [90]:
ds = hv.Dataset(tidy, kdims=['year', 'Item', 'Area'])
hmap = ds.select(Area=countries).to(
    hv.HeatMap,
    vdims=['produce_quantity'],
    dynamic=False)

In [80]:
Mod(width=700, xrotation=90, height=1500).apply(hmap)