# Assignment: Analysis of Coffee Production Types (1990 - 2020)

### Overview
In this assignment, you will explore the coffee production data from 1990 to 2020, focusing on different types of coffee, such as Robusta, Arabica, and blends like Arabica/Robusta and Robusta/Arabica. Your task will involve data aggregation, transformation, and correlation analysis to uncover patterns and relationships in coffee production over three decades.

### Objectives
Aggregate coffee production data by type over a 30-year period.
Transpose the aggregated data to facilitate correlation analysis.
Calculate and analyze the correlation matrix to identify the strength of relationships between different coffee types.
Dataset
Your dataset will consist of yearly coffee production volumes for each coffee type from 1990 to 2020. The data should be structured with years as rows and coffee types as columns in the initial format.

### Tasks
#### 1. Data Aggregation

* Group the coffee production data by type for each year from 1990 to 2020.
* Summarize the data to reflect total production volumes for each coffee type per year.

In [1]:
import pandas as pd

file_path = 'AD_450/coffee_production.csv'
coffee_df = pd.read_csv(file_path)

In [2]:
coffee_df.shape

(55, 33)

In [3]:
coffee_df.head(20)

Unnamed: 0,country,coffee_type,1990_1991,1991_1992,1992_1993,1993_1994,1994_1995,1995_1996,1996_1997,1997_1998,...,2011_2012,2012_2013,2013_2014,2014_2015,2015_2016,2016_2017,2017_2018,2018_2019,2019_2020,total_production
0,Angola,Robusta/Arabica,3000000,4740000,4680000,1980000,4620000,3720000,4260000,3840000,...,1740000,1980000,2100000,2340000,2460000,2700000,2100000,2520000,3120000,82080000
1,Bolivia (Plurinational State of),Arabica,7380000,6240000,7200000,3060000,7020000,8520000,7500000,8460000,...,7920000,6300000,7200000,6000000,5040000,4680000,5040000,4980000,4860000,207000000
2,Brazil,Arabica/Robusta,1637160000,1637580000,2076180000,1690020000,1691520000,1083600000,1751820000,1568880000,...,2915520000,3325080000,3281340000,3198300000,3172260000,3407280000,3164400000,3907860000,3492660000,75082980000
3,Burundi,Arabica/Robusta,29220000,40020000,37200000,23580000,39840000,26040000,24060000,15000000,...,12240000,24360000,9780000,14880000,16140000,11760000,12120000,12240000,16320000,623640000
4,Ecuador,Arabica/Robusta,90240000,127440000,71100000,124140000,142560000,113280000,119580000,71460000,...,49500000,49680000,39960000,38640000,38640000,38700000,37440000,29760000,33540000,1900380000
5,Indonesia,Robusta/Arabica,446460000,509580000,334140000,404580000,322080000,274380000,493260000,457260000,...,413340000,784200000,774060000,656760000,755100000,692460000,651120000,577080000,685980000,15404880000
6,Madagascar,Robusta,58920000,55980000,67320000,26520000,38460000,47100000,50940000,37440000,...,35100000,30000000,35040000,30060000,24840000,27180000,24240000,22740000,22980000,1045560000
7,Malawi,Arabica,6300000,7440000,8220000,3720000,5040000,5460000,2940000,3660000,...,1560000,1380000,1680000,1500000,1260000,1200000,840000,780000,960000,82260000
8,Papua New Guinea,Arabica/Robusta,57780000,44820000,54000000,61140000,68340000,60120000,65340000,64440000,...,84840000,42960000,50100000,47880000,42720000,70260000,44040000,55800000,45120000,1803120000
9,Paraguay,Arabica,7860000,4800000,3240000,4020000,1500000,1560000,1860000,2940000,...,1200000,1200000,1200000,1200000,1200000,1200000,1200000,1200000,1200000,62220000


In [4]:
coffee_by_type = coffee_df.groupby(['coffee_type']).sum()

In [5]:
coffee_by_type.drop(['total_production'], axis = 1, inplace = True)

In [6]:
coffee_by_type.head()

Unnamed: 0_level_0,1990_1991,1991_1992,1992_1993,1993_1994,1994_1995,1995_1996,1996_1997,1997_1998,1998_1999,1999_2000,...,2010_2011,2011_2012,2012_2013,2013_2014,2014_2015,2015_2016,2016_2017,2017_2018,2018_2019,2019_2020
coffee_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Arabica,1807140000,2073960000,1895580000,1593720000,1715940000,1873440000,1665720000,1739400000,1686780000,1815000000,...,1846080000,1960020000,1992300000,2047680000,2075280000,2177340000,2429880000,2431260000,2445300000,2341020000
Arabica/Robusta,2399820000,2409960000,2787060000,2492700000,2503560000,1948140000,2615880000,2354940000,3043860000,3798120000,...,4000620000,3634680000,4025820000,3881640000,3785460000,3720480000,4042620000,3822840000,4603500000,4122780000
Robusta,266400000,354840000,228840000,198840000,269700000,228360000,390780000,326940000,257700000,485160000,...,162780000,248520000,212100000,253260000,198480000,171900000,153240000,185940000,214800000,200100000
Robusta/Arabica,1120440000,1237380000,999780000,1220280000,1109640000,1189320000,1526280000,1571940000,1543620000,1784760000,...,2395020000,2636400000,2840820000,3051960000,2958720000,3297840000,3113340000,3381540000,3084180000,3239280000


#### 2. Data Transformation

* Transpose the aggregated data so that coffee types become the rows and years become the columns.
* This reorientation will prepare your data for the next step of correlation analysis.

In [7]:
coffee_by_type = coffee_by_type.transpose()

In [8]:
coffee_by_type.head()

coffee_type,Arabica,Arabica/Robusta,Robusta,Robusta/Arabica
1990_1991,1807140000,2399820000,266400000,1120440000
1991_1992,2073960000,2409960000,354840000,1237380000
1992_1993,1895580000,2787060000,228840000,999780000
1993_1994,1593720000,2492700000,198840000,1220280000
1994_1995,1715940000,2503560000,269700000,1109640000


#### 3. Correlation Analysis

* Using the Pandas library in Python, apply the corr() method to your transposed dataset to compute the correlation matrix.
* The correlation matrix should show the relationship coefficients between each pair of coffee types.

In [9]:
coffee_by_type.corr()

coffee_type,Arabica,Arabica/Robusta,Robusta,Robusta/Arabica
coffee_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Arabica,1.0,0.623834,-0.453044,0.743225
Arabica/Robusta,0.623834,1.0,-0.312663,0.826251
Robusta,-0.453044,-0.312663,1.0,-0.451428
Robusta/Arabica,0.743225,0.826251,-0.451428,1.0


#### Questions
1. Examine the correlation matrix. Which two coffee types have the strongest correlation in production volumes over the years? What might this imply about their production dynamics?

Arabica/Robusta and Robusta/Arabica have the strongest correlation in production volumes over the years. This might imply that the growing conditions for these types may be similar and so when there are good growing conditions for one, those conditions are good for the other. 

For example, weather, let's say wet weather is good for producing Arabica/Robusta beans and Robusta/Arabica so if the year has wet weather then we would expect to see increased production for both of those types compared to years with dry weather. 

Alternatively, or in conjuction with this theory, the countries producing Arabica/Robusta and Robusta/Arabica may be close geographically and so conditions that affect Arabica/Robusta producers also affect Robusta/Arabica producers. It would be interesting to see the two plotted on the same graph over time as well as a map of producers, color-coded by coffee type produced.

2. Identify the two coffee types with the weakest correlation. Discuss possible reasons for this weak relationship and any external factors that might influence these production types differently.

The weakest correlation is between Robusta and Arabica/Robusta, followed by Robusta and Robusta/Arabica. I mention the second place weakest correlation because it adds more information to the description of Robusta/Arabica and Arabica/Robusta having the strongest correlation. 

Weakest correlations may be the result of the inverse of theories proposed above for high correlation. Maybe the growing conditions for these beans are different or the places where they are grown are not geographically similar.