# Coffee Industry

## 1. Business Understanding

### Vietnam the King in Robusta Production

Vietnamese coffee is renowned for its distinctive characteristics, which set it apart from other coffee varieties worldwide. Here are the key features that make Vietnamese coffee unique:

- Robusta Beans Dominance:\
Vietnam is the world's largest producer of Robusta coffee beans, which have a stronger, more bitter flavor compared to the more common Arabica beans. Robusta beans are higher in caffeine and have a distinct earthy and nutty flavor profile, which contributes to the bold taste of Vietnamese coffee.

- Traditional Brewing Method:\
Vietnamese coffee is traditionally brewed using a phin filter, a small metal drip filter that allows the coffee to slowly drip into the cup. This method produces a thick, strong coffee, often enjoyed in small servings.

- Sweetened Condensed Milk:\
A signature of Vietnamese coffee is the use of sweetened condensed milk, especially in cà phê sữa đá (iced milk coffee). The rich, creamy sweetness of the condensed milk perfectly balances the bitterness of the strong Robusta coffee, creating a unique flavor combination.

- Variety of Preparations:\
Vietnam has a rich coffee culture with a variety of unique preparations. Besides the popular cà phê sữa đá, other favorites include cà phê đen (black coffee), cà phê trứng (egg coffee, made with whipped egg yolk and sugar), and cà phê dừa (coconut coffee, which combines coffee with coconut milk or cream).

- Rich, Dark Roast:\
Vietnamese coffee beans are often roasted with a small amount of butter and sometimes with sugar or other additives, giving the coffee a distinctive dark roast that enhances its bold flavor.

- Street Coffee Culture:\
Coffee is deeply embedded in Vietnamese daily life, with a vibrant street coffee culture where small coffee stands and cafes offer freshly brewed coffee to locals and visitors alike. The social aspect of enjoying coffee on the streets or in small cafes is an integral part of the experience.

These elements combined create a coffee experience that is rich, strong, and uniquely Vietnamese, attracting coffee lovers worldwide who seek something different from the typical coffee offerings.

## 2. Data Mining

### Installs

In [1]:
# To extract tables from PDF files, using Pandas
# pip install pandas tabula-py
# tabula-py is a Python wrapper for Tabula

### Libraries

In [2]:
import numpy as np
import pandas as pd
import os

# Data Profile Reporting Tool
from ydata_profiling import ProfileReport
# To avoid unneeded warning display
import warnings
warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', None)

import time
import datetime
import pycountry

import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
%matplotlib inline
import seaborn as sns

import pymysql
from sqlalchemy import create_engine, text

### Importing my Functions

In [3]:
from coffee_functions import process_files, clean_and_prepare_dataframe, create_sqlalchemy_engine, insert_dataframe_to_mysql
import config  # Access to MySQL

### Load the Data

#### All Exports

#### Coffee Exports

In [4]:
vietnam_trade_raw = process_files(r"source\datasets\UN_Comtrade_Exports_Coffee\2_Vietnam", "ImportsExports_Coffee", (2017, 2023))
vietnam_trade_raw.info()

No files found for the pattern: source\datasets\UN_Comtrade_Exports_Coffee\2_Vietnam\ImportsExports_Coffee_*_WORLD_2023.csv
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8710 entries, 0 to 8709
Data columns (total 48 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   TypeCode                  8710 non-null   object 
 1   FreqCode                  8710 non-null   object 
 2   RefPeriodId               8710 non-null   int64  
 3   RefYear                   8710 non-null   int64  
 4   RefMonth                  8710 non-null   int64  
 5   Period                    8710 non-null   int64  
 6   ReporterCode              8710 non-null   int64  
 7   ReporterISO               8710 non-null   object 
 8   ReporterDesc              8710 non-null   object 
 9   FlowCode                  8710 non-null   object 
 10  FlowDesc                  8710 non-null   object 
 11  PartnerCode               8710 non-null   int64  

In [5]:
vietnam_trade_raw

Unnamed: 0,TypeCode,FreqCode,RefPeriodId,RefYear,RefMonth,Period,ReporterCode,ReporterISO,ReporterDesc,FlowCode,FlowDesc,PartnerCode,PartnerISO,PartnerDesc,Partner2Code,Partner2ISO,Partner2Desc,ClassificationCode,ClassificationSearchCode,IsOriginalClassification,CmdCode,CmdDesc,AggrLevel,IsLeaf,CustomsCode,CustomsDesc,MosCode,MotCode,MotDesc,QtyUnitCode,QtyUnitAbbr,Qty,IsQtyEstimated,AltQtyUnitCode,AltQtyUnitAbbr,AltQty,IsAltQtyEstimated,NetWgt,IsNetWgtEstimated,GrossWgt,IsGrossWgtEstimated,Cifvalue,Fobvalue,PrimaryValue,LegacyEstimationFlag,IsReported,IsAggregate,Unnamed: 47
0,C,M,20170101,2017,1,201701,704,VNM,Viet Nam,X,Export,12,DZA,Algeria,0,W00,World,H5,HS,True,90111,Coffee; not roasted or decaffeinated,6,True,C00,TOTAL CPC,0,0,TOTAL MOT,8,kg,2.874438e+06,True,-1,,0.000,False,2.874438e+06,True,0,False,0.000,8.559437e+06,8.559437e+06,6,True,False,
1,C,M,20170101,2017,1,201701,704,VNM,Viet Nam,X,Export,24,AGO,Angola,0,W00,World,H5,HS,True,90112,"Coffee; decaffeinated, not roasted",6,True,C00,TOTAL CPC,0,0,TOTAL MOT,8,kg,4.719790e+02,True,-1,,0.000,False,4.719790e+02,True,0,False,0.000,2.116500e+03,2.116500e+03,6,True,False,
2,C,M,20170101,2017,1,201701,704,VNM,Viet Nam,X,Export,32,ARG,Argentina,0,W00,World,H5,HS,True,90111,Coffee; not roasted or decaffeinated,6,True,C00,TOTAL CPC,0,0,TOTAL MOT,8,kg,2.098209e+05,True,-1,,0.000,False,2.098209e+05,True,0,False,0.000,6.248000e+05,6.248000e+05,6,True,False,
3,C,M,20170101,2017,1,201701,704,VNM,Viet Nam,X,Export,36,AUS,Australia,0,W00,World,H5,HS,True,90111,Coffee; not roasted or decaffeinated,6,True,C00,TOTAL CPC,0,0,TOTAL MOT,8,kg,6.551048e+05,True,-1,,0.000,False,6.551048e+05,True,0,False,0.000,1.950756e+06,1.950756e+06,6,True,False,
4,C,M,20170101,2017,1,201701,704,VNM,Viet Nam,X,Export,36,AUS,Australia,0,W00,World,H5,HS,True,90112,"Coffee; decaffeinated, not roasted",6,True,C00,TOTAL CPC,0,0,TOTAL MOT,8,kg,4.498286e+04,True,-1,,0.000,False,4.498286e+04,True,0,False,0.000,2.017171e+05,2.017171e+05,6,True,False,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8705,C,M,20221201,2022,12,202212,704,VNM,Viet Nam,M,Import,0,W00,World,0,W00,World,H6,HS,True,90122,"Coffee; roasted, decaffeinated",6,True,C00,TOTAL CPC,0,0,TOTAL MOT,8,kg,6.899570e+02,True,-1,,0.000,False,6.899570e+02,True,0,False,10076.183,,1.007618e+04,6,False,True,
8706,C,M,20221201,2022,12,202212,704,VNM,Viet Nam,X,Export,0,W00,World,0,W00,World,H6,HS,True,90111,Coffee; not roasted or decaffeinated,6,True,C00,TOTAL CPC,0,0,TOTAL MOT,8,kg,1.422020e+08,False,21,1000 KG,142202.038,False,1.422020e+08,False,0,False,,2.854629e+08,2.854629e+08,0,False,True,
8707,C,M,20221201,2022,12,202212,704,VNM,Viet Nam,X,Export,0,W00,World,0,W00,World,H6,HS,True,90112,"Coffee; decaffeinated, not roasted",6,True,C00,TOTAL CPC,0,0,TOTAL MOT,8,kg,2.447635e+06,False,21,1000 KG,2447.635,False,2.447635e+06,False,0,False,,7.712878e+06,7.712878e+06,0,False,True,
8708,C,M,20221201,2022,12,202212,704,VNM,Viet Nam,X,Export,0,W00,World,0,W00,World,H6,HS,True,90121,"Coffee; roasted, not decaffeinated",6,True,C00,TOTAL CPC,0,0,TOTAL MOT,8,kg,6.674590e+05,False,21,1000 KG,667.459,False,6.674590e+05,False,0,False,,3.142636e+06,3.142636e+06,0,False,True,


In [6]:
vietnam_trade_raw.shape

(8710, 48)

## 3. Data Cleaning

In [7]:
vietnam_trade = clean_and_prepare_dataframe(vietnam_trade_raw)
vietnam_trade.shape

(7898, 13)

In [10]:
vietnam_trade.head()

Unnamed: 0,Year,Month,Period,ReporterISO,ReporterDesc,FlowCode,FlowDesc,PartnerISO,PartnerDesc,CmdCode,CmdDesc,Qty_in_kg,PrimaryValue
0,2017,1,201701,VNM,Viet Nam,X,Export,DZA,Algeria,90111,Coffee; not roasted or decaffeinated,2874438.484,8559437.401
1,2017,1,201701,VNM,Viet Nam,X,Export,AGO,Angola,90112,"Coffee; decaffeinated, not roasted",471.979,2116.5
2,2017,1,201701,VNM,Viet Nam,X,Export,ARG,Argentina,90111,Coffee; not roasted or decaffeinated,209820.936,624800.0
3,2017,1,201701,VNM,Viet Nam,X,Export,AUS,Australia,90111,Coffee; not roasted or decaffeinated,655104.846,1950756.28
4,2017,1,201701,VNM,Viet Nam,X,Export,AUS,Australia,90112,"Coffee; decaffeinated, not roasted",44982.859,201717.1


### Missing data (Null values)

In [8]:
# Checking for missing data
vietnam_trade.isnull().sum().sort_values(ascending=False)

Year            0
Month           0
Period          0
ReporterISO     0
ReporterDesc    0
FlowCode        0
FlowDesc        0
PartnerISO      0
PartnerDesc     0
CmdCode         0
CmdDesc         0
Qty_in_kg       0
PrimaryValue    0
dtype: int64

### Finding Duplicates

In [9]:
# Find duplicates
vietnam_trade.duplicated().sum()

0

## Exploratory Data Analysis