# Coffee Industry

## 1. Business Understanding

### Topic Introduction

Coffee is not just a popular beverage; it’s a crucial global commodity that supports the livelihoods of millions of small producers in developing countries. Every day, over 2.25 billion cups of coffee are consumed worldwide—enough to circle the globe 12 times if lined up in cups. Over 90% of coffee production occurs in developing nations, particularly in South America, yet most consumption happens in industrialized economies.

With 25 million small producers depending on coffee cultivation for their income, this industry involves more people than the entire population of Australia. In Brazil, which produces almost a third of the world’s coffee, over five million people are employed in the coffee sector—a number nearly equivalent to the entire population of Norway. Unlike sugar cane or cattle farming, coffee cultivation is labour-intensive and cannot be fully automated, requiring frequent human attention.

Unroasted, or green, coffee beans are one of the most traded agricultural commodities globally, with a market size comparable to the global steel or oil industries. Coffee trading, particularly in commodity exchanges such as those in New York, underscores coffee's significance in international trade. Major European trading and processing hubs, such as Hamburg and Trieste, further highlight its global importance, much like the financial markets in London or the shipping ports of Rotterdam.

### Project Objective

This project aims to `analyze the global coffee market`, focusing on `Colombia and key industry players`. It will provide small exporters with insights into global trade dynamics and `opportunities in non-traditional markets`, supporting the growth and competitiveness of Colombian coffee on the international stage.

## 2. Data Mining

### Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

### Sources

To achieve the project’s objective, I will first conduct a broad analysis of the `global coffee market`, examining key metrics like exports, imports, and consumption trends at a macro level. Then, the focus will shift to the `export performance` of key players, with particular emphasis on comparing Colombia’s role to its major competitors. This approach will help uncover opportunities and provide strategic insights into market dynamics.

#### Global Coffee Market

The Coffee Market refers specifically to the buying and selling of coffee as a commodity. It encompasses the economic aspects of coffee trade, including supply and demand, pricing, market trends, and the roles of exporters, importers, and consumers. The coffee market also includes the financial aspects, such as commodity exchanges and futures trading.

To contextualise this, let's explore exchange of coffee products (e.g., green coffee beans, roasted coffee) and the economic forces that drive these transactions on a global scale.

##### Kaggle: Coffee Dataset

For a generall view of the sector, I will work with a **Kaggle Dataset:** [Coffee Dataset](https://www.kaggle.com/datasets/michals22/coffee-dataset/data?select=Coffee_domestic_consumption.csv). This resource contains information about coffee production and consumption.

All data has been extracted from the official [International Coffee Organization](https://icocoffee.org/) (ICO) website.

**Datasets:**
- Coffee_domestic_consumption.csv
- Coffee_importers_consuption.csv
- Coffee_exports.csv
- Coffee_imports.csv
- Coffee_re-export.csv
- Coffee_green_coffee_inventory.csv
- Coffee_production.csv

**Units:**

All statistics are given in kilograms (1kg or 1,000 grams) unit.

**Load the Data:**

In [42]:
domestic_consumption_raw = pd.read_csv('Raw_Datasets\Coffee_Production_and_Consumption\Coffee_domestic_consumption.csv')
importers_consumtion_raw = pd.read_csv('Raw_Datasets\Coffee_Production_and_Consumption\Coffee_importers_consumption.csv')
exports_raw = pd.read_csv('Raw_Datasets\Coffee_Production_and_Consumption\Coffee_export.csv')
imports_raw = pd.read_csv('Raw_Datasets\Coffee_Production_and_Consumption\Coffee_import.csv')
inventory_raw = pd.read_csv('Raw_Datasets\Coffee_Production_and_Consumption\Coffee_green_coffee_inventorie.csv')
production_raw = pd.read_csv('Raw_Datasets\Coffee_Production_and_Consumption\Coffee_production.csv')


#### Coffee Export Sector

##### Domestic_Consumption

In [38]:
domestic_consumption_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55 entries, 0 to 54
Data columns (total 33 columns):
 #   Column                      Non-Null Count  Dtype 
---  ------                      --------------  ----- 
 0   Country                     55 non-null     object
 1   Coffee type                 55 non-null     object
 2   1990/91                     55 non-null     int64 
 3   1991/92                     55 non-null     int64 
 4   1992/93                     55 non-null     int64 
 5   1993/94                     55 non-null     int64 
 6   1994/95                     55 non-null     int64 
 7   1995/96                     55 non-null     int64 
 8   1996/97                     55 non-null     int64 
 9   1997/98                     55 non-null     int64 
 10  1998/99                     55 non-null     int64 
 11  1999/00                     55 non-null     int64 
 12  2000/01                     55 non-null     int64 
 13  2001/02                     55 non-null     int64 
 

##### Exports

In [39]:
exports_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55 entries, 0 to 54
Data columns (total 32 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Country       55 non-null     object
 1   1990          55 non-null     int64 
 2   1991          55 non-null     int64 
 3   1992          55 non-null     int64 
 4   1993          55 non-null     int64 
 5   1994          55 non-null     int64 
 6   1995          55 non-null     int64 
 7   1996          55 non-null     int64 
 8   1997          55 non-null     int64 
 9   1998          55 non-null     int64 
 10  1999          55 non-null     int64 
 11  2000          55 non-null     int64 
 12  2001          55 non-null     int64 
 13  2002          55 non-null     int64 
 14  2003          55 non-null     int64 
 15  2004          55 non-null     int64 
 16  2005          55 non-null     int64 
 17  2006          55 non-null     int64 
 18  2007          55 non-null     int64 
 19  2008      

##### Green Coffee Inventory

In [43]:
inventory_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18 entries, 0 to 17
Data columns (total 32 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Country           18 non-null     object
 1   1990              18 non-null     int64 
 2   1991              18 non-null     int64 
 3   1992              18 non-null     int64 
 4   1993              18 non-null     int64 
 5   1994              18 non-null     int64 
 6   1995              18 non-null     int64 
 7   1996              18 non-null     int64 
 8   1997              18 non-null     int64 
 9   1998              18 non-null     int64 
 10  1999              18 non-null     int64 
 11  2000              18 non-null     int64 
 12  2001              18 non-null     int64 
 13  2002              18 non-null     int64 
 14  2003              18 non-null     int64 
 15  2004              18 non-null     int64 
 16  2005              18 non-null     int64 
 17  2006              

In [33]:
# Convert the wide format of the data into a long format with the `melt` function in pandas.
domestic_consumption_long_format = pd.melt(domestic_consumption_raw, id_vars=["Country", "Coffee type"], var_name="Period", value_name="Consumption")
domestic_consumption_long_format

Unnamed: 0,Country,Coffee type,Period,Consumption
0,Angola,Robusta/Arabica,1990/91,1200000
1,Bolivia (Plurinational State of),Arabica,1990/91,1500000
2,Brazil,Arabica/Robusta,1990/91,492000000
3,Burundi,Arabica/Robusta,1990/91,120000
4,Ecuador,Arabica/Robusta,1990/91,21000000
...,...,...,...,...
1700,Trinidad & Tobago,Robusta,Total_domestic_consumption,21090000
1701,Uganda,Robusta/Arabica,Total_domestic_consumption,284816400
1702,Venezuela,Arabica,Total_domestic_consumption,2386067999
1703,Viet Nam,Robusta/Arabica,Total_domestic_consumption,1920928320


In [31]:
# Filter out rows where the 'Period' column contains 'Total_domestic_consumption'
domestic_consumption = domestic_consumption_long_format[~domestic_consumption_long_format['Period'].str.contains('Total_domestic_consumption', case=False, na=False)]
domestic_consumption

Unnamed: 0,Country,Coffee type,Period,Consumption
0,Angola,Robusta/Arabica,1990/91,1200000
1,Bolivia (Plurinational State of),Arabica,1990/91,1500000
2,Brazil,Arabica/Robusta,1990/91,492000000
3,Burundi,Arabica/Robusta,1990/91,120000
4,Ecuador,Arabica/Robusta,1990/91,21000000
...,...,...,...,...
1645,Trinidad & Tobago,Robusta,2019/20,600000
1646,Uganda,Robusta/Arabica,2019/20,15240000
1647,Venezuela,Arabica,2019/20,76500000
1648,Viet Nam,Robusta/Arabica,2019/20,159000000


##### Exports

In [32]:
exports_long_format = pd.melt(exports_raw, id_vars=["Country"], var_name="Year", value_name="Exports")


Unnamed: 0,Country,Year,Exports
0,Angola,1990,5040000
1,Bolivia (Plurinational State of),1990,9360000
2,Brazil,1990,1016160000
3,Burundi,1990,35100000
4,Cameroon,1990,156660000
...,...,...,...
1700,Venezuela,Total_export,241260000
1701,Viet Nam,Total_export,24924480000
1702,Yemen,Total_export,101220000
1703,Zambia,Total_export,76260000


In [None]:
domestic_consumption.to_csv('domestic_consumption.csv', index=False)

exports.to_csv('exports.csv', index=False)

##### Green Coffee Inventory

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18 entries, 0 to 17
Data columns (total 32 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Country           18 non-null     object
 1   1990              18 non-null     int64 
 2   1991              18 non-null     int64 
 3   1992              18 non-null     int64 
 4   1993              18 non-null     int64 
 5   1994              18 non-null     int64 
 6   1995              18 non-null     int64 
 7   1996              18 non-null     int64 
 8   1997              18 non-null     int64 
 9   1998              18 non-null     int64 
 10  1999              18 non-null     int64 
 11  2000              18 non-null     int64 
 12  2001              18 non-null     int64 
 13  2002              18 non-null     int64 
 14  2003              18 non-null     int64 
 15  2004              18 non-null     int64 
 16  2005              18 non-null     int64 
 17  2006              

In [22]:
inventory = pd.melt(inventory_raw, id_vars=["Country"], var_name="Year", value_name="Inventory")
inventory.to_csv('inventory.csv', index=False)
inventory

Unnamed: 0,Country,Year,Inventory
0,Austria,1990,19980000
1,Cyprus,1990,600000
2,Denmark,1990,5340000
3,Finland,1990,10560000
4,France,1990,34380000
...,...,...,...
553,Norway,Total_inventorie,223200000
554,Switzerland,Total_inventorie,398100000
555,United Kingdom,Total_inventorie,186120000
556,United States of America,Total_inventorie,8984400000


##### Imports

In [8]:
imports_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35 entries, 0 to 34
Data columns (total 32 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Country       35 non-null     object
 1   1990          35 non-null     int64 
 2   1991          35 non-null     int64 
 3   1992          35 non-null     int64 
 4   1993          35 non-null     int64 
 5   1994          35 non-null     int64 
 6   1995          35 non-null     int64 
 7   1996          35 non-null     int64 
 8   1997          35 non-null     int64 
 9   1998          35 non-null     int64 
 10  1999          35 non-null     int64 
 11  2000          35 non-null     int64 
 12  2001          35 non-null     int64 
 13  2002          35 non-null     int64 
 14  2003          35 non-null     int64 
 15  2004          35 non-null     int64 
 16  2005          35 non-null     int64 
 17  2006          35 non-null     int64 
 18  2007          35 non-null     int64 
 19  2008      

In [26]:
imports = pd.melt(imports_raw, id_vars=["Country"], var_name="Year", value_name="Imports")
imports.to_csv('imports.csv', index=False)
imports

Unnamed: 0,Country,Year,Imports
0,Austria,1990,112800000
1,Belgium,1990,0
2,Belgium/Luxembourg,1990,120900000
3,Bulgaria,1990,16080000
4,Croatia,1990,0
...,...,...,...
1080,Russian Federation,Total_import,5731080000
1081,Switzerland,Total_import,3212700000
1082,Tunisia,Total_import,492540000
1083,United Kingdom,Total_import,6731460000


##### Importers Consumtion

In [10]:
importers_consumtion_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35 entries, 0 to 34
Data columns (total 32 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Country                   35 non-null     object
 1   1990                      35 non-null     int64 
 2   1991                      35 non-null     int64 
 3   1992                      35 non-null     int64 
 4   1993                      35 non-null     int64 
 5   1994                      35 non-null     int64 
 6   1995                      35 non-null     int64 
 7   1996                      35 non-null     int64 
 8   1997                      35 non-null     int64 
 9   1998                      35 non-null     int64 
 10  1999                      35 non-null     int64 
 11  2000                      35 non-null     int64 
 12  2001                      35 non-null     int64 
 13  2002                      35 non-null     int64 
 14  2003                      35

In [28]:
importers_consumtion = pd.melt(importers_consumtion_raw, id_vars=["Country"], var_name="Year", value_name="Importers_Consumtion")
importers_consumtion.to_csv('importers_consumtion.csv', index=False)
importers_consumtion

Unnamed: 0,Country,Year,Importers_Consumtion
0,Austria,1990,80400000
1,Belgium,1990,0
2,Belgium/Luxembourg,1990,67440000
3,Bulgaria,1990,6120000
4,Croatia,1990,0
...,...,...,...
1080,Russian Federation,Total_import_consumption,5121240000
1081,Switzerland,Total_import_consumption,1717380000
1082,Tunisia,Total_import_consumption,488640000
1083,United Kingdom,Total_import_consumption,5002620000


Coffee Production

In [12]:
production_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55 entries, 0 to 54
Data columns (total 33 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Country           55 non-null     object 
 1   Coffee type       55 non-null     object 
 2   1990/91           55 non-null     float64
 3   1991/92           55 non-null     float64
 4   1992/93           55 non-null     float64
 5   1993/94           55 non-null     float64
 6   1994/95           55 non-null     float64
 7   1995/96           55 non-null     float64
 8   1996/97           55 non-null     float64
 9   1997/98           55 non-null     float64
 10  1998/99           55 non-null     float64
 11  1999/00           55 non-null     float64
 12  2000/01           55 non-null     float64
 13  2001/02           55 non-null     float64
 14  2002/03           55 non-null     float64
 15  2003/04           55 non-null     float64
 16  2004/05           55 non-null     float64
 17 

In [23]:
production = pd.melt(production_raw, id_vars=["Country", "Coffee type"], var_name="Period", value_name="Production")
production.to_csv('production.csv', index=False)
production

Unnamed: 0,Country,Coffee type,Period,Production
0,Angola,Robusta/Arabica,1990/91,3.000000e+06
1,Bolivia (Plurinational State of),Arabica,1990/91,7.380000e+06
2,Brazil,Arabica/Robusta,1990/91,1.637160e+09
3,Burundi,Arabica/Robusta,1990/91,2.922000e+07
4,Ecuador,Arabica/Robusta,1990/91,9.024000e+07
...,...,...,...,...
1700,Trinidad & Tobago,Robusta,Total_production,2.574000e+07
1701,Uganda,Robusta/Arabica,Total_production,5.919480e+09
1702,Venezuela,Arabica,Total_production,1.992780e+09
1703,Viet Nam,Robusta/Arabica,Total_production,2.880318e+10


Coffee Re-Exports

In [14]:
re_exports_raw = pd.read_csv('Raw_Datasets\Coffee_Production_and_Consumption\Coffee_re_export.csv')
re_exports_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 35 entries, 0 to 34
Data columns (total 32 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Country          35 non-null     object
 1   1990             35 non-null     int64 
 2   1991             35 non-null     int64 
 3   1992             35 non-null     int64 
 4   1993             35 non-null     int64 
 5   1994             35 non-null     int64 
 6   1995             35 non-null     int64 
 7   1996             35 non-null     int64 
 8   1997             35 non-null     int64 
 9   1998             35 non-null     int64 
 10  1999             35 non-null     int64 
 11  2000             35 non-null     int64 
 12  2001             35 non-null     int64 
 13  2002             35 non-null     int64 
 14  2003             35 non-null     int64 
 15  2004             35 non-null     int64 
 16  2005             35 non-null     int64 
 17  2006             35 non-null     int6

In [24]:
re_exports = pd.melt(re_exports_raw, id_vars=["Country"], var_name="Year", value_name="Re-Exports")
re_exports.to_csv('production.csv', index=False)
re_exports

Unnamed: 0,Country,Year,Re-Exports
0,Austria,1990,24900000
1,Belgium,1990,0
2,Belgium/Luxembourg,1990,53460000
3,Bulgaria,1990,9960000
4,Croatia,1990,0
...,...,...,...
1080,Russian Federation,Total_re_export,609840000
1081,Switzerland,Total_re_export,1485780000
1082,Tunisia,Total_re_export,3660000
1083,United Kingdom,Total_re_export,1734120000
