# Importation of the libraries and datasets

In [34]:
# Libraries
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline 
sns.set(color_codes=True)

In [4]:
# FAO Dataframes

# FAO animal stocks data
stocks = pd.read_csv("FAOSTAT_animal_stocks.csv", low_memory=False)

# FAO meat production data
meat = pd.read_csv("FAOSTAT_meat_production.csv", low_memory=False)

# FAO live stock import / export data
imp_exp = pd.read_csv("FAOSTAT_Livestock_import_export.csv", low_memory=False)

# Exploratory data analysis

## Animal stocks data

Here we view the head and shape of the data

In [5]:
stocks.head()

Unnamed: 0,Domain Code,Domain,Area Code (FAO),Area,Element Code,Element,Item Code (FAO),Item,Year Code,Year,Unit,Value,Flag,Flag Description
0,QCL,Crops and livestock products,255,Belgium,5320,Producing Animals/Slaughtered,867,"Meat, cattle",2000,2000,Head,832926.0,,Official data
1,QCL,Crops and livestock products,255,Belgium,5320,Producing Animals/Slaughtered,867,"Meat, cattle",2001,2001,Head,873268.0,,Official data
2,QCL,Crops and livestock products,255,Belgium,5320,Producing Animals/Slaughtered,867,"Meat, cattle",2002,2002,Head,932473.0,,Official data
3,QCL,Crops and livestock products,255,Belgium,5320,Producing Animals/Slaughtered,867,"Meat, cattle",2003,2003,Head,853641.0,,Official data
4,QCL,Crops and livestock products,255,Belgium,5320,Producing Animals/Slaughtered,867,"Meat, cattle",2004,2004,Head,842585.0,,Official data


In [12]:
stocks.shape

(3479, 14)

In [16]:
stocks.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3479 entries, 0 to 3478
Data columns (total 14 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Domain Code       3479 non-null   object 
 1   Domain            3479 non-null   object 
 2   Area Code (FAO)   3479 non-null   int64  
 3   Area              3479 non-null   object 
 4   Element Code      3479 non-null   int64  
 5   Element           3479 non-null   object 
 6   Item Code (FAO)   3479 non-null   int64  
 7   Item              3479 non-null   object 
 8   Year Code         3479 non-null   int64  
 9   Year              3479 non-null   int64  
 10  Unit              3479 non-null   object 
 11  Value             3401 non-null   float64
 12  Flag              1251 non-null   object 
 13  Flag Description  3479 non-null   object 
dtypes: float64(1), int64(5), object(8)
memory usage: 380.6+ KB


In [17]:
stocks.describe()

Unnamed: 0,Area Code (FAO),Element Code,Item Code (FAO),Year Code,Year,Value
count,3479.0,3479.0,3479.0,3479.0,3479.0,3401.0
mean,155.463639,5320.410463,1033.35355,1997.955734,1997.955734,13234700.0
std,97.847523,0.491988,75.725988,13.801676,13.801676,58845470.0
min,54.0,5320.0,867.0,1973.0,1973.0,0.0
25%,79.0,5320.0,1017.0,1986.0,1986.0,21100.0
50%,106.0,5320.0,1058.0,1999.0,1999.0,304000.0
75%,231.0,5321.0,1080.0,2010.0,2010.0,3792119.0
max,351.0,5321.0,1163.0,2020.0,2020.0,744917900.0


The data is in its current layout is not suitable for statistical analysis as each country is a row.
I need to melt the data so that the values for each country are in a column item type.

In [18]:
stocks.Item.unique()

array(['Meat, cattle', 'Meat, chicken', 'Meat, duck', 'Meat, goat',
       'Meat, horse', 'Meat, pig', 'Meat, sheep', 'Meat, turkey',
       'Meat, rabbit', 'Meat, game'], dtype=object)

There are 10 animal categories in the stocking data. I can remove the repetition from these item categories when I clean the data

In [19]:
stocks.Area.unique()

array(['Belgium', 'China', 'Denmark', 'France', 'Germany', 'Ireland',
       'Italy', 'Luxembourg', 'Netherlands', 'United States of America'],
      dtype=object)

The dataset contains 8 EU countries Belgium, Denmark, France, Germany, Luxembourg, Ireland, Italy and the Netherlands. All of these countries have been member states since at least Jan 1st 1973.

Belgium, France, Germany, Luxembourg, Italy and the Netherlands founded the EU in 1957 with Ireland and Denmark Joining on the 1st January 1973.

Therefore, to ensure data comparability between these countries and the US and China, I limited the datasets time frame from 1st January 1973 to 31st December 2020 (most recent data available). This ensures that all European countries were member states of the EU at the time of analysis. 

We can confirm this as follows

In [20]:
stocks.Year.unique()

array([2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
       2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 1973,
       1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984,
       1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995,
       1996, 1997, 1998, 1999], dtype=int64)

In [31]:
print(f"The earliest year in the dataset is {stocks.Year.min()} and the maximum year is {stocks.Year.max()}")

The earliest year in the data is 1973 and the maximum year is 2020


Next we can observe what units the stocking data has been recorded in

In [32]:
stocks.Unit.unique()

array(['Head', '1000 Head'], dtype=object)

Stocking units have been reported as per "head" and per "thousand heads" of animal.

In [33]:
stocks.Element.unique()

array(['Producing Animals/Slaughtered'], dtype=object)

In [6]:
meat.head()

Unnamed: 0,Domain Code,Domain,Area Code (FAO),Area,Element Code,Element,Item Code (FAO),Item,Year Code,Year,Unit,Value,Flag,Flag Description
0,QCL,Crops and livestock products,255,Belgium,5510,Production,867,"Meat, cattle",2000,2000,tonnes,275360.0,,Official data
1,QCL,Crops and livestock products,255,Belgium,5510,Production,867,"Meat, cattle",2001,2001,tonnes,285250.0,,Official data
2,QCL,Crops and livestock products,255,Belgium,5510,Production,867,"Meat, cattle",2002,2002,tonnes,305388.0,,Official data
3,QCL,Crops and livestock products,255,Belgium,5510,Production,867,"Meat, cattle",2003,2003,tonnes,275170.0,,Official data
4,QCL,Crops and livestock products,255,Belgium,5510,Production,867,"Meat, cattle",2004,2004,tonnes,280931.0,,Official data


In [7]:
imp_exp.head()

Unnamed: 0,Domain Code,Domain,Area Code (FAO),Area,Element Code,Element,Item Code (FAO),Item,Year Code,Year,Unit,Value,Flag,Flag Description
0,TCL,Crops and livestock products,255,Belgium,5608,Import Quantity,866,Cattle,2000,2000,Head,59395.0,,Official data
1,TCL,Crops and livestock products,255,Belgium,5608,Import Quantity,866,Cattle,2001,2001,Head,44232.0,,Official data
2,TCL,Crops and livestock products,255,Belgium,5608,Import Quantity,866,Cattle,2002,2002,Head,61054.0,,Official data
3,TCL,Crops and livestock products,255,Belgium,5608,Import Quantity,866,Cattle,2003,2003,Head,85727.0,,Official data
4,TCL,Crops and livestock products,255,Belgium,5608,Import Quantity,866,Cattle,2004,2004,Head,100891.0,,Official data


In [8]:
print(stocks.shape, meat.shape, imp_exp.shape)

(3479, 14) (3675, 14) (16204, 14)


In [9]:
stocks.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3479 entries, 0 to 3478
Data columns (total 14 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Domain Code       3479 non-null   object 
 1   Domain            3479 non-null   object 
 2   Area Code (FAO)   3479 non-null   int64  
 3   Area              3479 non-null   object 
 4   Element Code      3479 non-null   int64  
 5   Element           3479 non-null   object 
 6   Item Code (FAO)   3479 non-null   int64  
 7   Item              3479 non-null   object 
 8   Year Code         3479 non-null   int64  
 9   Year              3479 non-null   int64  
 10  Unit              3479 non-null   object 
 11  Value             3401 non-null   float64
 12  Flag              1251 non-null   object 
 13  Flag Description  3479 non-null   object 
dtypes: float64(1), int64(5), object(8)
memory usage: 380.6+ KB


In [10]:
meat.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3675 entries, 0 to 3674
Data columns (total 14 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Domain Code       3675 non-null   object 
 1   Domain            3675 non-null   object 
 2   Area Code (FAO)   3675 non-null   int64  
 3   Area              3675 non-null   object 
 4   Element Code      3675 non-null   int64  
 5   Element           3675 non-null   object 
 6   Item Code (FAO)   3675 non-null   int64  
 7   Item              3675 non-null   object 
 8   Year Code         3675 non-null   int64  
 9   Year              3675 non-null   int64  
 10  Unit              3675 non-null   object 
 11  Value             3611 non-null   float64
 12  Flag              1174 non-null   object 
 13  Flag Description  3675 non-null   object 
dtypes: float64(1), int64(5), object(8)
memory usage: 402.1+ KB


In [11]:
imp_exp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16204 entries, 0 to 16203
Data columns (total 14 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   Domain Code       16204 non-null  object 
 1   Domain            16204 non-null  object 
 2   Area Code (FAO)   16204 non-null  int64  
 3   Area              16204 non-null  object 
 4   Element Code      16204 non-null  int64  
 5   Element           16204 non-null  object 
 6   Item Code (FAO)   16204 non-null  int64  
 7   Item              16204 non-null  object 
 8   Year Code         16204 non-null  int64  
 9   Year              16204 non-null  int64  
 10  Unit              16204 non-null  object 
 11  Value             15917 non-null  float64
 12  Flag              3012 non-null   object 
 13  Flag Description  16204 non-null  object 
dtypes: float64(1), int64(5), object(8)
memory usage: 1.7+ MB


# Data preparation 

Importing the country codes file and using it to map country name to each code in the dataframes using python melt

In [8]:
country_codes = pd.read_csv("country_codes.tsv", sep='\t')
country_codes.head()

# This can be used if you get data only with country names, it allows you to
# map the country name to the code on new data sets and then you can make
# your visualisation

Unnamed: 0,Country,Alpha-2 code,Alpha-3 code,Numeric
0,Afghanistan,AF,AFG,4
1,Albania,AL,ALB,8
2,Algeria,DZ,DZA,12
3,American Samoa,AS,ASM,16
4,Andorra,AD,AND,20


In [15]:
df = pd.melt(slaughter, id_vars=['geo'], var_name="Year", 
             value_name="Number")

In [16]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 45792 entries, 0 to 45791
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   geo     45792 non-null  object
 1   Year    45792 non-null  object
 2   Number  40939 non-null  object
dtypes: object(3)
memory usage: 1.0+ MB


In [17]:
df.head()

Unnamed: 0,geo,Year,Number
0,AT,DATAFLOW,ESTAT:APRO_MT_PWGTM(1.0)
1,AT,DATAFLOW,ESTAT:APRO_MT_PWGTM(1.0)
2,AT,DATAFLOW,ESTAT:APRO_MT_PWGTM(1.0)
3,AT,DATAFLOW,ESTAT:APRO_MT_PWGTM(1.0)
4,AT,DATAFLOW,ESTAT:APRO_MT_PWGTM(1.0)
