## MGT-499 Statistics and Data Science - Koala

Coffee is a beverage that is enjoyed by millions of people around the world. It is not only a tasty and invigorating drink, but also a cultural phenomenon that has its own rituals and lifestyles associated with it. From the rich, full-bodied aroma of a freshly brewed cup of coffee to the satisfying feeling of sipping a warm beverage on a cold morning, there are many reasons why coffee is such a pleasure. A large number of people we know drink coffee, whether it's in the morning at IMD to start their day or late at night in the library to finish a Statistics and Data Science assignment. Personally, we enjoy it, and we're sure we're not alone. Unfortunately, obtaining a cup of coffee will become more difficult and costly in the near future. Coffee is one of the crops threatened by climate change.

Global warming, deforestation, illness, and pests are all contributing to the loss, and scientists warn that if conservation, monitoring, and seed preservation measures are not implemented, one of the world's most beloved drinks may become extinct. Beyond the environmental consequences, coffee is a $70-billion-a-year industry that is primarily supplied by small-scale farms in Africa, Asia and Latin America. Not only is the supply chain jeopardized, but so are the livelihoods of the estimated 25 million farmers who rely on coffee for a living. Furthermore, countries that rely on coffee as a key economic sector may witness a large reduction in their GDP numbers year after year.

But, **what is the impact of climate change on coffee production ?** 

In this project, we will answer this question using Python and datasets of the production, consumption, and price of coffee from 1990 to 2018. The dataset is from the International Coffee Organization (ICO). We will also use The Quality of Government Institute Standard Dataset. The QoG is a dataset made by experts on public administration around the world. It is available in an individual dataset and an aggregated dataset covering more than 100 countries. It contains historical temperature and rainfall data as well many other environmental indicators.

In [None]:
%pip install chart_studio

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting chart_studio
  Downloading chart_studio-1.1.0-py3-none-any.whl (64 kB)
[?25l[K     |█████                           | 10 kB 30.0 MB/s eta 0:00:01[K     |██████████▏                     | 20 kB 37.2 MB/s eta 0:00:01[K     |███████████████▎                | 30 kB 37.4 MB/s eta 0:00:01[K     |████████████████████▍           | 40 kB 34.2 MB/s eta 0:00:01[K     |█████████████████████████▍      | 51 kB 31.4 MB/s eta 0:00:01[K     |██████████████████████████████▌ | 61 kB 34.6 MB/s eta 0:00:01[K     |████████████████████████████████| 64 kB 1.5 MB/s 
Collecting retrying>=1.3.3
  Downloading retrying-1.3.4-py3-none-any.whl (11 kB)
Installing collected packages: retrying, chart-studio
Successfully installed chart-studio-1.1.0 retrying-1.3.4


In [None]:
# Import here what you need
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
!pip install chart_studio
import chart_studio.plotly as py
import json
import plotly
from plotly.offline import iplot

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting chart_studio
  Downloading chart_studio-1.1.0-py3-none-any.whl (64 kB)
[K     |████████████████████████████████| 64 kB 1.1 MB/s 
[?25hCollecting retrying>=1.3.3
  Downloading retrying-1.3.4-py3-none-any.whl (11 kB)
Installing collected packages: retrying, chart-studio
Successfully installed chart-studio-1.1.0 retrying-1.3.4


## Coffee Production

### Data Import

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
coffee_production = pd.read_excel('/content/drive/MyDrive/Impacts of climate change on coffee/Database/Coffee Data/1a - Total production.xlsx', header = 3)
coffee_production.head(10)

FileNotFoundError: ignored

In [None]:
# display the data types
print(coffee_production.dtypes.unique())

NameError: ignored

### Data Preparation

The data needs to be cleaned prior to the analysis and visualization.

For the production data, we are going to reformat the dataset, remove some rows that are not in our interest and also modify the unit of the data. The unit from the source is a thousand 60kg bags, we are going to change it to a more common unit like tons.

In [None]:
# drop of a blank column in the dataframe
coffee_production.drop('Unnamed: 1', axis=1,inplace=True)
# drop the nan rows
coffee_production.dropna(inplace=True)
# rename the column for correct format
coffee_production.rename(columns={'Crop year':'Country'}, inplace=True)
# setting of the index by the country column
coffee_production.set_index('Country',inplace=True)
# removal of the / of each column and preserving just the first year.
coffee_production.rename(columns=lambda x: x[:4], inplace=True)
# remove some rows that are either all 0 or not valueable for us. 
coffee_production.drop(['April group', 'October group','July group', 'Equatorial Guinea'], inplace=True)
# change of units into tons.
coffee_production = coffee_production * 60
coffee_production.head(5)

NameError: ignored

After the reformat, we are going to create two dataframes from the original, one just for 2018 and the other for the total production value for each year, this to handle it better in the analysis.After the reformat, we are going to create two dataframes from the original, one just for 2018 and the other for the total production value for each year, this to handle it better in the analysis.

In [None]:
# generation of a series of just values from 2019.
coffee_production_2019 = coffee_production['2019'].drop('Total')
coffee_production_2019.head(5)

In [None]:
# transpose to change the year into the index and then select just the total
coffee_production_total = coffee_production.T['Total']
coffee_production_total.head(5)

### Data Analysis

To do the analysis of the coffee production, consumption and price, we are going to generate a series of maps that show the distribution of these factors worldwide and also line charts to show how these have change through 1990 to 2019.

Prior to the map generation, we are going to import the geojson file needed for the choropleth map and also a proportion table of the production for a better understanding of this factor.


In [None]:
# import of the geojson
json_file = open('/content/drive/MyDrive/Impacts of climate change on coffee/Database/countries.geojson')
countries = json.load(json_file) 

In [None]:
# share of each country in the total production
(coffee_production_2019/coffee_production_2019.sum()*100).sort_values(ascending=False).head(10)

NameError: ignored

In [None]:
  palette = plotly.colors.make_colorscale(["#ebe1d7","#f66749","#9b3727","#400604"])

fig = px.choropleth_mapbox(coffee_production_2019, geojson=countries, color=coffee_production_2019,
                           locations=coffee_production_2019.index, featureidkey="properties.ADMIN",
                           center={"lat": 23, "lon": 3}, labels={'color':'Production<br>(Tons)'},
                           mapbox_style="carto-positron", zoom=.2, 
                           color_continuous_scale=palette, custom_data=[coffee_production_2019.index, coffee_production_2019],
                           opacity=.9,  title='World Coffee Production in 2019')

fig.update_layout(margin={"b":30,'l':30,'t':50}, width=660, height=450)

fig.update_traces(
    hovertemplate='<b>%{customdata[0]}</b>' +  "<br>Production: %{customdata[1]:,.2f} Tons<extra></extra>")

fig.update_layout(title={
                        'y':0.95,
                        'x':0.5,
                        'xanchor': 'center',
                        'yanchor': 'top'})

fig.show(renderer="colab")

The first thing that leaps out on this map is that Brazil is by far the largest coffee grower in the world, producing almost 3.5 million tons of coffee in 2019, accounting for 35% of total global production.

When we originally started working on this project, we suspected that Brazil was the world's largest coffee producer due to the popularity and delectability of its mixes; nevertheless, we were surprised to learn that Viet Nam was the second largest. In 2019, Vietnam produced over 1.8 million tons of coffee, accounting for 18.5% of total coffee production. Vietnam also produces one of the world's most costly coffees, Weasel Coffee, which is a coffee bean that has been chewed and digested by a civet before being roasted.

In [None]:
top10_coffee_producers = coffee_production_2019.sort_values(ascending=False).iloc[0:10]

fig = px.bar(x=top10_coffee_producers.index, y=top10_coffee_producers, title='Top Coffee Producers in 2019')


fig.update_traces(
    hovertemplate='<b>%{y:,.2f}<extra></extra>',
    marker={'color':"#400604"})

fig.update_layout(margin={'r':50,'t':50})

fig.update_layout(xaxis={'title':''},
                  yaxis={'title':'Coffee Production, Tons'},
                  title={
                        'y':0.95,
                        'x':0.5,
                        'xanchor': 'center',
                        'yanchor': 'top'},
                  plot_bgcolor="#F9F9F9",
                  width=670,
                  height=450)

fig.show(renderer = "colab")

In [None]:
top5_list = ['Brazil', 'Vietnam', 'Colombia', 'Indonesia', 'Ethiopia']
top5_coffee_producers = coffee_production.loc[top5_list]
top5_coffee_producers.head(5)

In [None]:
# Create figure object
fig = go.Figure()

# Add line for each row of dataframe
for row in top5_coffee_producers.iterrows():
    fig.add_trace(go.Scatter(x=top5_coffee_producers.columns, y=row[1].values, name=f"{row[0]}"))
    
fig.update_layout(
    title="Production of Coffee from 1990 to 2019",
    xaxis_title="Year",
    yaxis_title="Coffee Production, Tons",
    legend_title="Country")

# Show figure
fig.show(renderer='colab')

This rate of production is thought to be unsustainable in the next years, as a so-called "Coffee Crisis" is expected as a result of climate change. As temperatures continue to rise, the meteorological conditions for growing coffee, particularly Arabica coffee, which requires moderate temperatures to thrive, are jeopardized.