# Introduction to Plotly

Let's start by importing the necessary libraries. 

Plotly can be used in offline and online modes. When in online mode, all the graphs you create will be saved in your Plotly account online. But there is a limit of graphs you can create for free, so you either have to constantly delete your graphs from your account or pay. 

The good news is that you can use offline mode where your graphs are only displayed in your Jupyter Notebook and are not saved online. There are no limitations as to the number of graphs you can create in offline mode. 

**Note:** apparently starting from plotly version 4.0 released in July 2019, the offline mode is used by default, and the online functionality is moved to a separate package called `chart-studio` (more on Plotly version 4.0 [here](https://medium.com/plotly/plotly-py-4-0-is-here-offline-only-express-first-displayable-anywhere-fc444e5659ee)). So depending on the Plotly version you have, you might not need to specify that you want to use offline mode explicitly.

In [1]:
# pandas and numpy to work with datasets
import pandas as pd
import numpy as np

# plotly imports for Plotly visualizations
import plotly.plotly as py
import plotly.graph_objs as go
# inititate Plotly offline mode
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode(connected=True)

# cufflinks to create Plotly viz with simplified syntax
import cufflinks as cf
# inititate cufflinks offline mode
cf.go_offline(connected=True)

## Dataset exploration

We will use data about OECD countries meat consumption from [Data World](https://data.world/oecd/meat-consumption). 


**Summary** 

Meat consumption is related to living standards, diet, livestock production and consumer prices, as well as macroeconomic uncertainty and shocks to GDP. Compared to other commodities, meat is characterised by high production costs and high output prices. Meat demand is associated with higher incomes and a shift - due to urbanisation - to food consumption changes that favour increased proteins from animal sources in diets. While the global meat industry provides food and a livelihood for billions of people, it also has significant environmental and health consequences for the planet. This indicator is presented for beef and veal, pig, poultry, and sheep. Meat consumption is measured in thousand tonnes of carcass weight (except for poultry expressed as ready to cook weight) and in kilograms of retail weight per capita. Carcass weight to retail weight conversion factors are: 0.7 for beef and veal, 0.78 for pigmeat, and 0.88 for both sheep meat and poultry meat.



In [2]:
# loading dataset into pandas dataframe
df = pd.read_csv('meat_consumption_clean.csv')

In [3]:
# print out first 5 rows of the dataset
df.head()

Unnamed: 0,LOCATION,INDICATOR,SUBJECT,MEASURE,FREQUENCY,TIME,Value,Flag Codes
0,AUS,MEATCONSUMP,BEEF,KG_CAP,A,1990,4e-06,
1,AUS,MEATCONSUMP,BEEF,KG_CAP,A,1991,27.808401,
2,AUS,MEATCONSUMP,BEEF,KG_CAP,A,1992,26.278166,
3,AUS,MEATCONSUMP,BEEF,KG_CAP,A,1993,26.244478,
4,AUS,MEATCONSUMP,BEEF,KG_CAP,A,1994,25.541244,


In [4]:
# get the number of rows and columns in the dataset
df.shape

(9020, 8)

In [5]:
# get a sense of values in each column of the dataset
df.describe(include='all')

Unnamed: 0,LOCATION,INDICATOR,SUBJECT,MEASURE,FREQUENCY,TIME,Value,Flag Codes
count,9020,9020,9020,9020,9020,9020.0,9020.0,0.0
unique,39,1,4,2,1,,,
top,NGA,MEATCONSUMP,POULTRY,THND_TONNE,A,,,
freq,232,9020,2255,4524,9020,,,
mean,,,,,,2004.041242,2014.624,
std,,,,,,8.347088,9320.713,
min,,,,,,1990.0,7.505681e-08,
25%,,,,,,1997.0,4.567496,
50%,,,,,,2004.0,23.25614,
75%,,,,,,2011.0,391.6184,


In [6]:
# get the list of unique countries
df.LOCATION.unique()

array(['AUS', 'CAN', 'JPN', 'KOR', 'MEX', 'NZL', 'TUR', 'USA', 'ARG',
       'BRA', 'CHL', 'CHN', 'COL', 'EGY', 'ETH', 'IND', 'IDN', 'IRN',
       'ISR', 'KAZ', 'MYS', 'NGA', 'PAK', 'PRY', 'PER', 'PHL', 'RUS',
       'SAU', 'ZAF', 'THA', 'UKR', 'VNM', 'WLD', 'EU27', 'OECD', 'BRICS',
       'NOR', 'CHE', 'GBR'], dtype=object)

In [7]:
# get the list of unique meat types
df.SUBJECT.unique()

array(['BEEF', 'PIG', 'POULTRY', 'SHEEP'], dtype=object)

In [8]:
# get the list of unique measures
df.MEASURE.unique()

array(['KG_CAP', 'THND_TONNE'], dtype=object)

Since there are 2 different measures of meat consumption in this dataset, let's separate it into 2 datasets: 

- `df_total` with total meat consumption in thousand tonnes of carcass weight
- `df_per_cap` with meat consumption measured in kilogramms per capita of retail weight

In [9]:
# split the dataset into 2 based on the measure
df_total =  df.query('MEASURE == "THND_TONNE"')
df_per_cap = df.query('MEASURE == "KG_CAP"')

## Building graphs with Plotly

The native syntax of Plotly includes 3 main components: 

- **Data**

The Data object defines the data that will be displayed in the graph. 

- **Layout**

The Layout object defines all additional features of the graph, like title, axis titles, etc. 

- **Figure** 

The Figure object brings together the Data and the Layout and creates the graph to be plotted. 


You can find full Plotly documentation for Python [here](https://plot.ly/python/).

Let's start by building a simple line chart summarizing the evolution of total meat consumption by year.

In [10]:
# remove OECD, BRICS
df_total_clean =  df_total.query('LOCATION != "OECD" & LOCATION != "BRICS"')

# group by year
df_total_year = df_total_clean.groupby('TIME').Value.sum().round(0).reset_index()
df_total_year

Unnamed: 0,TIME,Value
0,1990,218203.0
1,1991,238417.0
2,1992,260198.0
3,1993,277449.0
4,1994,292060.0
5,1995,302593.0
6,1996,303447.0
7,1997,320615.0
8,1998,332461.0
9,1999,341410.0


In [11]:
# save the split datasets to csv for further use
df_total_clean.to_csv('df_total.csv', index=False)
df_per_cap.to_csv('df_per_cap.csv', index=False)

This would be the full syntax for defining the figure as one object at once:

In [12]:
fig = go.Figure(
        data=[
            go.Scatter(x=df_total_year.TIME, 
                       y=df_total_year.Value,
                       mode='lines')
        ],
        layout=go.Layout(
            title=dict(text='Total Meat Consumption Evolution'),
            xaxis=dict(title='Year'),
            yaxis=dict(title='Meat consumption, thousand tonnes')
        )
    )
fig.show()

This is a more step by step way to define the same graph, using methods `add_trace` and `layout.update` on the figure object.

In [13]:
fig = go.Figure()
# create a data trace 
fig.add_trace(go.Scatter(x=df_total_year.TIME, 
                         y=df_total_year.Value,
                         mode='lines'))
# edit the layout
fig.layout.update(title='Total Meat Consumption Evolution',
                   xaxis_title='Year',
                   yaxis_title='Meat consumption, thousand tonnes')
fig.show()

Now let's build a multiple lines chart, adding a split by type of meat to the previous graph.

In [29]:
# create a dataframe with meat consumption by type by year
df_year_type = df_total_clean.groupby(['TIME', 'SUBJECT']).Value.sum().round(0).reset_index()
df_year_type.head()

Unnamed: 0,TIME,SUBJECT,Value
0,1990,BEEF,71089.0
1,1990,PIG,84457.0
2,1990,POULTRY,50556.0
3,1990,SHEEP,12101.0
4,1991,BEEF,74012.0


To plot the results easily, let's pivot this dataframe, so that each column represents a type of meat and each row represents a year and on the intersection we have the actual value of thousands of tonnes of meat consumed.

In [30]:
# pivot the dataframe using pivot_table function
df_year_type_pivot = df_year_type.pivot_table(index='TIME', columns='SUBJECT', values='Value').reset_index()
df_year_type_pivot.head()

SUBJECT,TIME,BEEF,PIG,POULTRY,SHEEP
0,1990,71089.0,84457.0,50556.0,12101.0
1,1991,74012.0,89966.0,61594.0,12845.0
2,1992,82849.0,95298.0,68323.0,13728.0
3,1993,83594.0,105873.0,73748.0,14234.0
4,1994,86789.0,111632.0,78938.0,14700.0


In [16]:
fig = go.Figure()
# create a data trace for each type of meat and add them to figure one by one
fig.add_trace(go.Scatter(x=df_year_type_pivot.TIME, 
                         y=df_year_type_pivot.BEEF,
                         mode='lines+markers',
                         name='beef'))
fig.add_trace(go.Scatter(x=df_year_type_pivot.TIME, 
                         y=df_year_type_pivot.PIG,
                         mode='lines+markers',
                         name='pig'))
fig.add_trace(go.Scatter(x=df_year_type_pivot.TIME, 
                         y=df_year_type_pivot.POULTRY,
                         mode='lines+markers',
                         name='poultry'))
fig.add_trace(go.Scatter(x=df_year_type_pivot.TIME, 
                         y=df_year_type_pivot.SHEEP,
                         mode='lines+markers',
                         name='sheep'))
# edit the layout
fig.layout.update(title='Meat Consumption Evolution by Meat Type',
                   xaxis_title='Year',
                   yaxis_title='Meat consumption, thousand tonnes')
fig.show()

## Building Plotly graphs with Cufflinks

The main advantage of using Cufflinks over native Plotly syntax is that it is much less verbose and can be used directly on pandas dataframe objects. You can find full Cufflinks documentation [here](https://plot.ly/ipython-notebooks/cufflinks/).

Let's recreate the same line chart using cufflinks.

In [17]:
# Option 1: we can use iplot directly on the pandas series resulting from the groupby
df_total_clean.groupby('TIME').Value.sum().round(0).iplot(kind='line', 
                                                    title='Total Meat Consumption Evolution',
                                                    xTitle='Year',
                                                    yTitle='Meat consumption, thousand tonnes')

In [18]:
# Option 2: we can use iplot on the df_total_year pandas Dataframe
# In this case we need to specify the x and y axis
df_total_year.iplot(kind='line', 
                    x='TIME',
                    y='Value',
                    title='Total Meat Consumption Evolution',
                    xTitle='Year',
                    yTitle='Meat consumption, thousand tonnes')

In case of multiple lines graph, Cufflinks syntax will also be much less verbose than native Plotly syntax: 

In [19]:
# Option 1: build a graph directly on the result of pivoting the dataframe

df_year_type.pivot_table(index='TIME', columns='SUBJECT', values='Value').\
iplot(kind='line', 
      mode='lines+markers',
      size=6,
      title='Meat Consumption Evolution by Meat Type',
      xTitle='Year',
      yTitle='Meat consumption, thousand tonnes')

In [20]:
# Option 2: build a graph based on the df_year_type_pivot dataframe

df_year_type_pivot.iplot(kind='line',
                         mode='lines+markers',
                         size=6,
                         x='TIME',
                         title='Meat Consumption Evolution by Meat Type',
                         xTitle='Year',
                         yTitle='Meat consumption, thousand tonnes')

### PRACTICE TIME

Create a bar chart showing total meat consumption in kg per capita per country (where each country is a bar) in year 2018 using either Plotly native syntax or Cufflinks. 

**BONUS POINTS!** Turn the bar chart into stacked bar chart by adding a split by type of meat.

[Here](https://plot.ly/python/bar-charts/) is Plotly documentation for creating bar charts.
[Here](https://plot.ly/ipython-notebooks/cufflinks/#bar-charts) is Cufflinks documentation for creating bar charts.

Remember to filter the dataframe by year 2018 first and then summarize the meat consumption by country (hint: you can use pandas `groupby` funtion for that).

In [21]:
# start by filtering the df_per_cap dataset by year 2018


In [22]:
# continue by summarizing meat consumption by country


**Cufflinks option**

In [23]:
# create a bar chart


In [24]:
# in order for bar chart to appear in descending order of meat consumption
# sort pandas series / dataframe by meat consumption 
# you can use sort_values method for that


In [25]:
# create stacked bar chart

# start by pivoting the 2018 dataframe to have types of meat as columns and countries as rows


# if you want the bars to appear ordered by total meat consumption, add total column and sort the dataframe by it
# you can use the .sum() method providing the necessary axis as an argument


# drop the total column after the sorting so that it doesn't appear on the graph


# create a stacked bar chart (use barmode='stack')


**Plotly Option**

In [26]:
# create a bar chart


In [27]:
# create a sorted bar chart


In [28]:
# create stacked bar chart

# start by pivoting the 2018 dataframe to have types of meat as columns and countries as rows


# if you want the bars to appear ordered by total meat consumption, add total column and sort the dataframe by it
# you can use the .sum() method providing the necessary axis as an argument


# drop the total column after the sorting so that it doesn't appear on the graph


# create a stacked bar chart (use barmode='stack')
