# Coffee Industry

## 1. Business Understanding

### Brazil as Top Leader of The Industry

Brazilian coffee is unique due to several factors that contribute to its distinctive characteristics:

- Diverse Growing Regions:\
Brazil is the largest coffee producer in the world, and its vast land offers diverse microclimates, altitudes, and soil types. Coffee is grown in various regions, each imparting unique flavor profiles to the beans. Key regions include Minas Gerais, São Paulo, Espírito Santo, Bahia, and Paraná, among others. The diversity in geography allows for a wide range of coffee flavors, from bright and fruity to nutty and chocolatey.

- Varieties of Coffee Beans:\
Brazil predominantly grows Arabica beans, known for their smooth, mild flavor and lower acidity, although Robusta beans are also cultivated. Within Arabica, there are different varieties like Bourbon, Catuai, and Mundo Novo, each offering distinct flavor notes and characteristics.

- Processing Methods:\
Brazil is renowned for its diverse coffee processing methods, particularly the natural (dry) process. In this method, coffee cherries are dried with the beans still inside, which allows the fruit's sugars to infuse into the beans, producing a sweet, full-bodied coffee with complex fruity notes. Brazil also uses pulped natural and washed processing methods, each contributing to different flavor profiles.

- Consistency and Volume:\
Brazilian coffee is known for its consistency in flavor and quality, thanks to the country's advanced agricultural practices and large-scale production. This consistency makes Brazilian coffee a popular choice for blends, providing a reliable base that enhances the flavors of other origins.

- Flavor Profile:\
Brazilian coffees are generally known for their balanced flavor with low acidity, medium body, and notes of chocolate, nuts, caramel, and sweet fruits. The flavor profile is often smooth and sweet, making it accessible and enjoyable for a wide range of coffee drinkers.

- Sustainability and Innovation:\
Brazil has a strong focus on sustainable coffee production, with many farms practicing environmentally friendly techniques and certifications such as Rainforest Alliance and Fair Trade. Additionally, Brazil is a leader in coffee research and innovation, constantly improving cultivation and processing methods to enhance quality and sustainability.

These factors combined make Brazilian coffee not only a staple in the global coffee industry but also a favorite among coffee enthusiasts for its versatility, consistency, and unique flavor profiles.

## 2. Data Mining

### Installs

In [1]:
# To extract tables from PDF files, using Pandas
# pip install pandas tabula-py
# tabula-py is a Python wrapper for Tabula

### Libraries

In [2]:
import numpy as np
import pandas as pd
import os

# Data Profile Reporting Tool
from ydata_profiling import ProfileReport
# To avoid unneeded warning display
import warnings
warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', None)

import time
import datetime
import pycountry

import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
%matplotlib inline
import seaborn as sns

import pymysql
from sqlalchemy import create_engine, text

### Importing my Functions

In [11]:
from coffee_functions import process_files, clean_and_prepare_dataframe, create_sqlalchemy_engine, insert_dataframe_to_mysql
import config  # Access to MySQL

### Load the Data

##### All Exports

##### Coffee Exports

In [4]:
brazil_trade_raw = process_files(r"source\datasets\UN_Comtrade_Exports_Coffee\1_Brazil", "ImportsExports_Coffee", (2017, 2023))
brazil_trade_raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11953 entries, 0 to 11952
Data columns (total 48 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   TypeCode                  11953 non-null  object 
 1   FreqCode                  11953 non-null  object 
 2   RefPeriodId               11953 non-null  int64  
 3   RefYear                   11953 non-null  int64  
 4   RefMonth                  11953 non-null  int64  
 5   Period                    11953 non-null  int64  
 6   ReporterCode              11953 non-null  int64  
 7   ReporterISO               11953 non-null  object 
 8   ReporterDesc              11953 non-null  object 
 9   FlowCode                  11953 non-null  object 
 10  FlowDesc                  11953 non-null  object 
 11  PartnerCode               11953 non-null  int64  
 12  PartnerISO                11953 non-null  object 
 13  PartnerDesc               11953 non-null  object 
 14  Partne

In [5]:
brazil_trade_raw

Unnamed: 0,TypeCode,FreqCode,RefPeriodId,RefYear,RefMonth,Period,ReporterCode,ReporterISO,ReporterDesc,FlowCode,FlowDesc,PartnerCode,PartnerISO,PartnerDesc,Partner2Code,Partner2ISO,Partner2Desc,ClassificationCode,ClassificationSearchCode,IsOriginalClassification,CmdCode,CmdDesc,AggrLevel,IsLeaf,CustomsCode,CustomsDesc,MosCode,MotCode,MotDesc,QtyUnitCode,QtyUnitAbbr,Qty,IsQtyEstimated,AltQtyUnitCode,AltQtyUnitAbbr,AltQty,IsAltQtyEstimated,NetWgt,IsNetWgtEstimated,GrossWgt,IsGrossWgtEstimated,Cifvalue,Fobvalue,PrimaryValue,LegacyEstimationFlag,IsReported,IsAggregate,Unnamed: 47
0,C,M,20170101,2017,1,201701,76,BRA,Brazil,X,Export,8,ALB,Albania,0,W00,World,H5,HS,True,90111,Coffee; not roasted or decaffeinated,6,True,C00,TOTAL CPC,0,0,TOTAL MOT,8,kg,5.800000e+04,False,21,1000 KG,58,False,5.760000e+04,False,0,False,,164736,164736,0,False,True,
1,C,M,20170101,2017,1,201701,76,BRA,Brazil,X,Export,12,DZA,Algeria,0,W00,World,H5,HS,True,90111,Coffee; not roasted or decaffeinated,6,True,C00,TOTAL CPC,0,0,TOTAL MOT,8,kg,1.160000e+05,False,21,1000 KG,116,False,1.152000e+05,False,0,False,,276446,276446,0,False,True,
2,C,M,20170101,2017,1,201701,76,BRA,Brazil,X,Export,24,AGO,Angola,0,W00,World,H5,HS,True,90121,"Coffee; roasted, not decaffeinated",6,True,C00,TOTAL CPC,0,0,TOTAL MOT,8,kg,8.100000e+02,False,8,kg,810,False,8.100000e+02,False,0,False,,7148,7148,0,False,True,
3,C,M,20170101,2017,1,201701,76,BRA,Brazil,X,Export,32,ARG,Argentina,0,W00,World,H5,HS,True,90111,Coffee; not roasted or decaffeinated,6,True,C00,TOTAL CPC,0,0,TOTAL MOT,8,kg,1.240000e+06,False,21,1000 KG,1240,False,1.243018e+06,False,0,False,,3625482,3625482,0,False,True,
4,C,M,20170101,2017,1,201701,76,BRA,Brazil,X,Export,32,ARG,Argentina,0,W00,World,H5,HS,True,90121,"Coffee; roasted, not decaffeinated",6,True,C00,TOTAL CPC,0,0,TOTAL MOT,8,kg,2.732000e+04,False,8,kg,27320,False,2.732000e+04,False,0,False,,222701,222701,0,False,True,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
11948,C,M,20231201,2023,12,202312,76,BRA,Brazil,M,Import,0,W00,World,0,W00,World,H6,HS,True,90122,"Coffee; roasted, decaffeinated",6,True,C00,TOTAL CPC,0,0,TOTAL MOT,8,kg,7.955000e+03,False,8,kg,7955,False,7.955000e+03,False,0,False,185509.0,170377,185509,0,False,True,
11949,C,M,20231201,2023,12,202312,76,BRA,Brazil,X,Export,0,W00,World,0,W00,World,H6,HS,True,90111,Coffee; not roasted or decaffeinated,6,True,C00,TOTAL CPC,0,0,TOTAL MOT,8,kg,4.383613e+08,False,-1,,0,False,2.435601e+08,False,0,False,,775531681,775531681,0,False,True,
11950,C,M,20231201,2023,12,202312,76,BRA,Brazil,X,Export,0,W00,World,0,W00,World,H6,HS,True,90112,"Coffee; decaffeinated, not roasted",6,True,C00,TOTAL CPC,0,0,TOTAL MOT,8,kg,2.000000e+00,False,8,kg,2,False,2.000000e+00,False,0,False,,96,96,0,False,True,
11951,C,M,20231201,2023,12,202312,76,BRA,Brazil,X,Export,0,W00,World,0,W00,World,H6,HS,True,90121,"Coffee; roasted, not decaffeinated",6,True,C00,TOTAL CPC,0,0,TOTAL MOT,8,kg,3.887250e+05,False,8,kg,388725,False,3.887440e+05,False,0,False,,3219766,3219766,0,False,True,


In [6]:
brazil_trade_raw.shape

(11953, 48)

## 3. Data Cleaning

In [7]:
brazil_trade = clean_and_prepare_dataframe(brazil_trade_raw)
brazil_trade.shape

(11292, 13)

In [8]:
brazil_trade.head()

Unnamed: 0,Year,Month,Period,ReporterISO,ReporterDesc,FlowCode,FlowDesc,PartnerISO,PartnerDesc,CmdCode,CmdDesc,Qty_in_kg,PrimaryValue
0,2017,1,201701,BRA,Brazil,X,Export,ALB,Albania,90111,Coffee; not roasted or decaffeinated,58000.0,164736
1,2017,1,201701,BRA,Brazil,X,Export,DZA,Algeria,90111,Coffee; not roasted or decaffeinated,116000.0,276446
2,2017,1,201701,BRA,Brazil,X,Export,AGO,Angola,90121,"Coffee; roasted, not decaffeinated",810.0,7148
3,2017,1,201701,BRA,Brazil,X,Export,ARG,Argentina,90111,Coffee; not roasted or decaffeinated,1240000.0,3625482
4,2017,1,201701,BRA,Brazil,X,Export,ARG,Argentina,90121,"Coffee; roasted, not decaffeinated",27320.0,222701


### Missing data (Null values)

In [9]:
# Checking for missing data
brazil_trade.isnull().sum().sort_values(ascending=False)

Year            0
Month           0
Period          0
ReporterISO     0
ReporterDesc    0
FlowCode        0
FlowDesc        0
PartnerISO      0
PartnerDesc     0
CmdCode         0
CmdDesc         0
Qty_in_kg       0
PrimaryValue    0
dtype: int64

### Finding Duplicates

In [10]:
# Find duplicates
brazil_trade.duplicated().sum()

0

## Exploratory Data Analysis