# How food shapes the World - Ada project

With 7.7 billion people around the world in 2019, food demand outbreak has raised serious concerns regarding productive agricultural land availability. As for now cultivable surface expansion has been the only real solution, this project aims at providing insights on how this problematic is currently shaping the world’s surface as we know it. Investigation will first focus on the kind of environment that are being impinged on. Moving on to the “destructive” impact of certain foodstuffs and diet trends, correlating with international trade flows. This study relies on FAOSTAT data set from the United Nation, helping us in the process.
We endeavor at providing an exhaustive visualization of the world under growing food reshaping pressure. Key features will be surface evolution across time and space, selected crops impact as well as involved parties and areas. Tackling all of the above with both social awareness and self consciousness concerns.

## Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Opening the dataset

First of all, we need to open the dataset on which we are working. That is, loading it and displayint as much of it as possible to get a better view from it.

In [2]:
datapath = "Data/global-food-agriculture-statistics/"
current_fao = "current_FAO/raw_files/"

savanna = pd.read_csv(datapath + current_fao + "Emissions_Agriculture_Burning_Savanna_E_All_data_(Norm).csv", sep=",", encoding="ANSI")  # Less savanna
crops = pd.read_csv(datapath + "fao_data_crops_data.csv", sep=",", encoding="UTF-8")                                                     # Where every products are planted
forests = pd.read_csv(datapath + current_fao + "Emissions_Land_Use_Forest_Land_E_All_Data_(Norm).csv", sep=",", encoding="ANSI")         # Less forests
livestock = pd.read_csv(datapath + current_fao + "Trade_Crops_Livestock_E_All_Data_(Normalized).csv", sep=",", encoding="ANSI")          # Imports and exports 

Now let's display these dataset and their types quickly : 

### Savanna :

In [3]:
savanna.dtypes

Country Code      int64
Country          object
Item Code         int64
Item             object
Element Code      int64
Element          object
Year Code         int64
Year              int64
Unit             object
Value           float64
Flag             object
dtype: object

In [184]:
savannaSurface.head()

Unnamed: 0,Country Code,Country,Item Code,Item,Element Code,Element,Year Code,Year,Unit,Value,Flag
0,2,Afghanistan,6760,Savanna,7246,Burned Area,1990,1990,Ha,0.9251,Fc
1,2,Afghanistan,6760,Savanna,7246,Burned Area,1991,1991,Ha,0.9251,Fc
2,2,Afghanistan,6760,Savanna,7246,Burned Area,1992,1992,Ha,0.9251,Fc
3,2,Afghanistan,6760,Savanna,7246,Burned Area,1993,1993,Ha,0.9251,Fc
4,2,Afghanistan,6760,Savanna,7246,Burned Area,1994,1994,Ha,0.9251,Fc


#### Description:
In this dataframe we have the following columns:
- Country Code: One specific number per country
- Country: list of country
- Item Code: One specific number per Item
- Item: A list of different ecosystems
- Element Code: One specific munber per element
- Element: What was actually calculated, measured or estimated
- Year Code: One specific number per year (same as the year)
- Year: The actual year
- Unit: What is the unit of the value of "Value"
- Value: The value found for the element
- Flag: Info on where the data comes from (Fc, A, NaN, F)

There is a total of 50'000 rows.

All the years are registred since 1990 until 2014. There is then prediction for year 2030 and 2050.

There is a total of 275 countries

#### Selection of useful data
As we are only interested in the area lost to burnings. We can do a first selection:

In [241]:
savanna.head(50)
savannaSurface = savanna[savanna['Element'].str.match('Burned Area')].head(50000)
savannaSurface.head()

Unnamed: 0,Country Code,Country,Item Code,Item,Element Code,Element,Year Code,Year,Unit,Value,Flag
0,2,Afghanistan,6760,Savanna,7246,Burned Area,1990,1990,Ha,0.9251,Fc
1,2,Afghanistan,6760,Savanna,7246,Burned Area,1991,1991,Ha,0.9251,Fc
2,2,Afghanistan,6760,Savanna,7246,Burned Area,1992,1992,Ha,0.9251,Fc
3,2,Afghanistan,6760,Savanna,7246,Burned Area,1993,1993,Ha,0.9251,Fc
4,2,Afghanistan,6760,Savanna,7246,Burned Area,1994,1994,Ha,0.9251,Fc


In [242]:
savanna.Item.drop_duplicates()


0                         Savanna
207                 Woody savanna
444              Closed shrubland
685                Open shrubland
928                     Grassland
1169     Burning - all categories
1358    Savanna and woody savanna
1547    Closed and open shrubland
Name: Item, dtype: object

Here is a list of all the different type of ecosystem on which we have data. It seems to contain 2 types of savana, two type of shrubland and grassland. In addition, the tree last categories are aparently agglomeration of the other items.
Lets verify that:
- The Savanna and woody savanna contains the values under the item Savanna and woody savanna.
- Closed and open shrubland contains the values under the item closed shrubland and open shrubland.
- Burning - all categories is a sum of all the types of ecosystem.

To do that:

In [249]:
NA_90 = savannaSurface.Value[(savannaSurface['Year']==1990)  & (savannaSurface['Country']=='Northern Africa')].tolist()
print('Total Savana : ', S_NA_90[0]+S_NA_90[1], S_NA_90[6])
print('Total shrubland : ', S_NA_90[2]+S_NA_90[3],S_NA_90[7])
print('Total Surfaces : ', sum(S_NA_90[:5]),S_NA_90[5])

Total Savana :  28808703.812 28808703.812
Total shrubland :  1138970.0815 1138970.0815
Total Surfaces :  34143614.0124 34143614.0124


Our hypothesis were indeed correct. There are three main categories: Savana, Shrubland and grassland with subcategories. We also have the total surfaces burned in "Burning - all categories".

Now we want to get rid of all the useless column:
- Flag: It won't directly interest us. Just good to keep in mind that not all our data was collected in the same way
- Unit: We are only dealing with [ha] now
- Year Code: do not give more information than the column "Year"
- Element: We only consider Burned Area, so no need to keep it in the dataframe
- Element Code: Same reason as for Element

In [250]:
savanna = savannaSurface.drop(['Flag', 'Unit','Year Code', 'Element', 'Element Code'], axis=1).rename(columns={"Value": "Burned Area"})

In [251]:
savanna.head()

Unnamed: 0,Country Code,Country,Item Code,Item,Year,Burned Area
0,2,Afghanistan,6760,Savanna,1990,0.9251
1,2,Afghanistan,6760,Savanna,1991,0.9251
2,2,Afghanistan,6760,Savanna,1992,0.9251
3,2,Afghanistan,6760,Savanna,1993,0.9251
4,2,Afghanistan,6760,Savanna,1994,0.9251


#### Analysis of the desired value:
In this dataset, only the total burned area interests us. Here is the description of the values:

In [252]:
savanna['Burned Area'].describe()

count    5.000000e+04
mean     1.612733e+06
std      1.367873e+07
min      0.000000e+00
25%      0.000000e+00
50%      7.599320e+01
75%      1.165686e+04
max      3.472483e+08
Name: Burned Area, dtype: float64

We realize that we have a lot of value with 0 ha of area burned.

### Crops

In [5]:
crops.dtypes

country_or_area     object
element_code        object
element             object
year               float64
unit                object
value              float64
value_footnotes     object
category            object
dtype: object

In [6]:
crops.head(5)

Unnamed: 0,country_or_area,element_code,element,year,unit,value,value_footnotes,category
0,Americas +,31,Area Harvested,2007.0,Ha,49404.0,A,agave_fibres_nes
1,Americas +,31,Area Harvested,2006.0,Ha,49404.0,A,agave_fibres_nes
2,Americas +,31,Area Harvested,2005.0,Ha,49404.0,A,agave_fibres_nes
3,Americas +,31,Area Harvested,2004.0,Ha,49113.0,A,agave_fibres_nes
4,Americas +,31,Area Harvested,2003.0,Ha,48559.0,A,agave_fibres_nes


### Forests

In [7]:
forests.dtypes

Country Code      int64
Country          object
Item Code         int64
Item             object
Element Code      int64
Element          object
Year Code         int64
Year              int64
Unit             object
Value           float64
Flag             object
dtype: object

In [8]:
forests.head(5)

Unnamed: 0,Country Code,Country,Item Code,Item,Element Code,Element,Year Code,Year,Unit,Value,Flag
0,2,Afghanistan,6661,Forest,5110,Area,1990,1990,1000 Ha,1350.0,F
1,2,Afghanistan,6661,Forest,5110,Area,1991,1991,1000 Ha,1350.0,F
2,2,Afghanistan,6661,Forest,5110,Area,1992,1992,1000 Ha,1350.0,F
3,2,Afghanistan,6661,Forest,5110,Area,1993,1993,1000 Ha,1350.0,F
4,2,Afghanistan,6661,Forest,5110,Area,1994,1994,1000 Ha,1350.0,F


### Livestock

In [9]:
livestock.dtypes

Area Code         int64
Area             object
Item Code         int64
Item             object
Element Code      int64
Element          object
Year Code         int64
Year              int64
Unit             object
Value           float64
Flag             object
dtype: object

In [10]:
livestock.head(5)

Unnamed: 0,Area Code,Area,Item Code,Item,Element Code,Element,Year Code,Year,Unit,Value,Flag
0,2,Afghanistan,231,Almonds shelled,5910,Export Quantity,1961,1961,tonnes,0.0,
1,2,Afghanistan,231,Almonds shelled,5910,Export Quantity,1962,1962,tonnes,0.0,
2,2,Afghanistan,231,Almonds shelled,5910,Export Quantity,1963,1963,tonnes,0.0,
3,2,Afghanistan,231,Almonds shelled,5910,Export Quantity,1964,1964,tonnes,0.0,
4,2,Afghanistan,231,Almonds shelled,5910,Export Quantity,1965,1965,tonnes,0.0,
