# How food shapes the World - Ada project

With 7.7 billion people around the world in 2019, food demand outbreak has raised serious concerns regarding productive agricultural land availability. As for now cultivable surface expansion has been the only real solution, this project aims at providing insights on how this problematic is currently shaping the world’s surface as we know it. Investigation will first focus on the kind of environment that are being impinged on. Moving on to the “destructive” impact of certain foodstuffs and diet trends, correlating with international trade flows. This study relies on FAOSTAT data set from the United Nation, helping us in the process.
We endeavor at providing an exhaustive visualization of the world under growing food reshaping pressure. Key features will be surface evolution across time and space, selected crops impact as well as involved parties and areas. Tackling all of the above with both social awareness and self consciousness concerns.

## Libraries

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Opening the dataset

First of all, we need to open the dataset on which we are working. That is, loading it and displayint as much of it as possible to get a better view from it.

In [4]:
datapath = "Data/global-food-agriculture-statistics/"
current_fao = "current_FAO/raw_files/"

savanna = pd.read_csv(datapath + current_fao + "Emissions_Agriculture_Burning_Savanna_E_All_data_(Norm).csv", sep=",", encoding="ANSI")  # Less savanna
crops = pd.read_csv(datapath + "fao_data_crops_data.csv", sep=",", encoding="UTF-8")                                                     # Where every products are planted
forests = pd.read_csv(datapath + current_fao + "Emissions_Land_Use_Forest_Land_E_All_Data_(Norm).csv", sep=",", encoding="ANSI")         # Less forests
livestock = pd.read_csv(datapath + current_fao + "Trade_Crops_Livestock_E_All_Data_(Normalized).csv", sep=",", encoding="ANSI")          # Imports and exports 

We will now investigate each dataset and start to correlate them depending on the questions we want to answer for Milestone 3. 

### Table of contents

1. [Savanna](#savanna) 
2. [Forests](#forests)
3. [Merge of the Biomes Data Sets](#merge)
4. [Crops](#crops)
5. [Livestock](#livestock)
6. [Data to answer the questions](#questions)

<a id="savanna"></a>
### Savanna

In [5]:
# Keep only the burned area
savannaSurface = savanna[savanna['Element'].str.match('Burned Area')]

# keep only the three categories: Savanna / Shrubland / Grassland
savannaSurface = savannaSurface[(savannaSurface['Item']=='Savanna and woody savanna') | (savannaSurface['Item']=='Closed and open shrubland') | (savannaSurface['Item']=='Grassland')]
savannaSurface = savannaSurface.replace('Savanna and woody savanna','Savanna')
savannaSurface = savannaSurface.replace('Closed and open shrubland','Shrubland')

# Keep only the countries:
savannaCountry = savannaSurface.truncate(after=377033)

# Drop useless columns and set appropriate names
savannaFinal = savannaCountry.drop(['Flag', 'Unit','Year Code', 'Element', 'Element Code','Country Code', 'Item Code'], axis=1)\
                             .rename(columns={"Value": "Area_loss","Item": "Ecosystem"})

In [6]:
savannaFinal.head()

Unnamed: 0,Country,Ecosystem,Year,Area_loss
928,Afghanistan,Grassland,1990,2323.1605
929,Afghanistan,Grassland,1991,2323.1605
930,Afghanistan,Grassland,1992,2323.1605
931,Afghanistan,Grassland,1993,2323.1605
932,Afghanistan,Grassland,1994,2323.1605


<a id="forests"></a>
### Forests

In [7]:
forests.dtypes

Country Code      int64
Country          object
Item Code         int64
Item             object
Element Code      int64
Element          object
Year Code         int64
Year              int64
Unit             object
Value           float64
Flag             object
dtype: object

In [8]:
forests.head(5)

Unnamed: 0,Country Code,Country,Item Code,Item,Element Code,Element,Year Code,Year,Unit,Value,Flag
0,2,Afghanistan,6661,Forest,5110,Area,1990,1990,1000 Ha,1350.0,F
1,2,Afghanistan,6661,Forest,5110,Area,1991,1991,1000 Ha,1350.0,F
2,2,Afghanistan,6661,Forest,5110,Area,1992,1992,1000 Ha,1350.0,F
3,2,Afghanistan,6661,Forest,5110,Area,1993,1993,1000 Ha,1350.0,F
4,2,Afghanistan,6661,Forest,5110,Area,1994,1994,1000 Ha,1350.0,F


<a id="merge"></a>
### Merge of the Biomes Data Set

<a id="crops"></a>
### Crops

The dataset `Crops` give us informations about distribution of crops in different areas over the years. We investigate this data set more in details in the file `Crops.ipynb` and will only present here the conclusive data sets and analysis. 

In [12]:
crops.dtypes

country_or_area     object
element_code        object
element             object
year               float64
unit                object
value              float64
value_footnotes     object
category            object
dtype: object

In [13]:
crops.head(5)

Unnamed: 0,country_or_area,element_code,element,year,unit,value,value_footnotes,category
0,Americas +,31,Area Harvested,2007.0,Ha,49404.0,A,agave_fibres_nes
1,Americas +,31,Area Harvested,2006.0,Ha,49404.0,A,agave_fibres_nes
2,Americas +,31,Area Harvested,2005.0,Ha,49404.0,A,agave_fibres_nes
3,Americas +,31,Area Harvested,2004.0,Ha,49113.0,A,agave_fibres_nes
4,Americas +,31,Area Harvested,2003.0,Ha,48559.0,A,agave_fibres_nes


From investigation in `Crops.ipynb`, we describe each features of the dataFrame: \
   * `country_or_area`: area where the product is cultivated. From the investigation, we observe different regroupments for those areas. They can correspond to countries, regions such as continents or even the world or economical regroupments such as `Low Income Food Deficit Countries`. 
   * `element`: gives us a number of different informations about the adding of the crops. We have lot of information on PIN, which is a production index qualifying the land needed per unit of crop production in 1961.We also have informations on  Seeds and Yields. IN order to answer our specific question, we will only keep `Area Harvested` and `ProductionQuantity`, respectively in Hectars and Tonnes as it can be seen in the feature `unit`. From now on, the data frames will be generated for both elements. 
   * `year`: the years are from 1961 to 2007. As the datasets `Savanna` and `Forests` are restricted to 1990, we choose to start at 1990 too in order to be able to conduct correlations. From the investigation, we can see that th enumber of data is uneven through time with a lot less data in the older years. 
   * `value`: feature of biggest interest as it will give us the corresponding value. Units are given in the `unit` feature and correspond to the information said in `element`.
   * `category`: the sort of product that is concerned by the informations. FRom the investigation, we see that there are also regroupments. Hence, we can find products by themselves such as `Bananas` or groups of products such as `cereals_total`. When we regroup the data sets, we will have a first part using all the food products as one and in a second part to work on only 5 specific food products in order to have a specific idea of the phenomena. We chose `Bananas`, `palm_oil`, `sojabean`, `banana`, `wheat`, `rice`. 

<a id="livestock"></a>
### Livestock

In [14]:
livestock.dtypes

Area Code         int64
Area             object
Item Code         int64
Item             object
Element Code      int64
Element          object
Year Code         int64
Year              int64
Unit             object
Value           float64
Flag             object
dtype: object

In [15]:
livestock.head(5)

Unnamed: 0,Area Code,Area,Item Code,Item,Element Code,Element,Year Code,Year,Unit,Value,Flag
0,2,Afghanistan,231,Almonds shelled,5910,Export Quantity,1961,1961,tonnes,0.0,
1,2,Afghanistan,231,Almonds shelled,5910,Export Quantity,1962,1962,tonnes,0.0,
2,2,Afghanistan,231,Almonds shelled,5910,Export Quantity,1963,1963,tonnes,0.0,
3,2,Afghanistan,231,Almonds shelled,5910,Export Quantity,1964,1964,tonnes,0.0,
4,2,Afghanistan,231,Almonds shelled,5910,Export Quantity,1965,1965,tonnes,0.0,


<a id="questions"></a>
## Data to Answer the Questions 


##### Question 1. 
What is the global evolution of the cultivated surface on a world scale from 1990 to 2014?

##### Question 2. 
What is the area lost of savanna, shrubland, grassland and forest per country?

##### Question 3. 
Do we see a correlation between the area lost by ecosystems and the area gain by agriculture?

##### Question 4. 
For each selected crop, Sojabean, banana, wheat, rice, palm oil, what pourcentage of the total cultivated area do they represent? 

##### Question 5. 
If there is a correlation between an increase in area cultivated and area lost from all the ecosystems, what would be approximatively the area lost because of these crops in particular?

##### Question 6. 
Is sojabean, banana, wheat, rice, palm oil is meant for exportation and/or importation for each country over the years? Check more precisely for different economical segments and regions.