# ADA Project - Food self-sufficiency :   what about Switzerland?

<div class="alert alert-block alert-warning">

**INFO** - Our **interactives plots are not visible** when you open the pulled notebook without running it, you can find them as a **pdf file in the doc folder**.
    

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import folium

<div class="alert alert-block alert-success">
    
## We will first investigate the dataset we choose from the proposed list : "Global Food & Agriculture Statistics"

Our aim was initially to link food production to hunger in some areas. An other idea was to found the possible causes for food insufficiency (natural disasters, wars...). <br>

The FAO dataset is the one we dowloaded from the course's link. It contains all the FAO data for world crops production. We started our analysis with this file but realized that given the diversity of the data, we should rather focus our project on a region or country. Moreover, this data is somehow out of date. <br>

You will find right bellow our data investigation for "Global Food & Agriculture Statistics" dataset as we want to explicit our reasoning.

## Load data  into a Pandas dataframe

In [None]:
complete_dataset = pd.read_csv('../data/fao_data_crops_data.csv')

In [None]:
# We split the data and metadata and store them in 'crops' and 'flags' dataframe, respecitvely.
crops = complete_dataset.loc[:2255342].copy() 
flags = complete_dataset.loc[2255344:2255348].copy() 
# 'flags' contains correspondance list of acronyms that describe how a given sample was acquired --> only informative
flags.drop(['element','year','unit','value','value_footnotes','category'], axis=1, inplace = True) 
flags.rename(columns={'country_or_area':'acronym', 'element_code':'description'}, inplace=True) 
flags.set_index('acronym', inplace=True)
flags

## Exploratory data analysis

In [None]:
crops.head()

In [None]:
print("Size of the DataFrame: {s}\n".format(s=crops.shape))
print("Variable types present in DataFrame: \n{t}".format(t=crops.dtypes))

In [None]:
# List all the different footnotes values present in the dataset
footnotes = crops['value_footnotes'].unique() 
print(footnotes)
# Display dataframe that only contains one given value of 'value_footnotes'
display(crops.query('value_footnotes==@footnotes[4]')) 
# Return dataframe that only contains samples having NaN as value for 'value_footnotes'
crops[crops.value_footnotes.isnull()] 

In [None]:
print(crops['element'].unique())
print(crops['year'].unique())
print(crops['unit'].unique())
print(crops['category'].unique())
print(crops['element_code'].unique())
print(crops['country_or_area'].unique())

## Data preprocessing

We clear the data by dropping all the row containing only NAN values. 
We also clear the raw where value_footnotes is NR as it means not repported by country, so it won't be usefull for our analysis. 

In [None]:
# Returns a boolean of whether a column contains NaN (True) or not (False).
print(crops.isnull().values.any(axis=0)) 

# Drop rows which contain only missing values.
crops.dropna(how='all', inplace=True) 

In [None]:
# We drop the samples where 'value' is unknown (NaN) because they are of no utility    
crops.dropna(subset=['value', 'value_footnotes'], inplace=True) 

# Let's drop also all the samples that have 'NR' as a 'value_footnotes' value or 0 as 'value'
crops.drop(index=crops[crops['value_footnotes'].str.contains('NR')].index, inplace=True)
crops.drop(index=crops[crops['value']==0].index, inplace=True)


In our dataset, regions are indicated by a "+" at the end of their names. We want to separate regions from countries to facilitate our analysis so we can be more precise. 

In [None]:
regions_bool = crops['country_or_area'].str.contains('\+')
crops_regions = crops[regions_bool].copy()
crops_countries = crops[~regions_bool].copy()
crops_countries[crops_countries.country_or_area.str.contains('China')].tail()

We calculate the mean of all the elements for every country so we can compare the area harvested, seed or yield between each country. The mean is calculated over all years. 

In [None]:
#calculate the mean of all the elements for every country.  
crops_countries_by_country_year = crops_countries.groupby(['country_or_area', 'element']) \
                            .agg({'value':'mean'}) \
                            .rename(columns={'value':'mean_element'}) 
crops_countries_by_country_year

In [None]:
area_harvested = crops_countries_by_country_year.loc[(slice(None),'Area Harvested'), :]
area_harvested.loc['United States of America']

#### Create a map showing yield by country (average over all years) 

The following maps provide an insight of agricultural yield and area harvested in World countries.


In [None]:
yield_df= crops_countries_by_country_year.loc[(slice(None),'Yield'), :]
#we take the log value for the following plot, so our quantile a more equilibrate
log_yield_df=pd.DataFrame(yield_df.mean_element.map(lambda x:np.log(x)))
log_yield_df.head()

In [None]:
m = folium.Map(location=[48, -102], zoom_start=3)

world_geo = 'https://raw.githubusercontent.com/johan/world.geo.json/master/countries.geo.json'
Bins = list(yield_df.mean_element.quantile([0, 0.25, 0.5, 0.75, 1]))

m = folium.Map(zoom_start=3)

folium.Choropleth(
    geo_data=world_geo,
    name='choropleth',
    data=log_yield_df,
    columns=[log_yield_df.index.get_level_values(level='country_or_area').values,'mean_element'],
    key_on='feature.properties.name',
    fill_color='BuPu',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='yield',
    #bins = Bins,
    reset=True
).add_to(m)

folium.LayerControl().add_to(m)

m

The countries having the most important yield are Iceland and Danemark. We can also see that in many countries of Africa the yield is very low. Mongolia has also a very low yield. We can see the yield of Switzerland is higher than its neighbours (averaged over years).

#### Area Hervested (mean)/superficy(mean over years) by country
 
Such a way to calculate the ratio is hard to interpret. What we should do instead is to calculate an average for each year and make an interactive plot so we can select the year we want to analyze and show the map. 

In [None]:
surface_country = pd.read_csv('../data/API_AG.LND.TOTL.K2_DS2_en_csv_v2_422954.csv', skiprows=3)
surface_country.set_index('Country Name', inplace = True)

In [None]:
surface_country.drop(columns=['Country Code', 'Indicator Name', 'Indicator Code'], inplace=True)

In [None]:
surface_country

In [None]:
mean_ = surface_country.apply('mean', axis=1)
df_surface_country = pd.DataFrame(mean_,columns=['mean_superficy'], index=surface_country.index)#.rename(columns={'0':'mean'})

In [None]:
df_surface_country

In [None]:
crops_countries_area = area_harvested.join(df_surface_country['mean_superficy'], on='country_or_area', how='left')
crops_countries_area['ratio'] = area_harvested['mean_element']/(crops_countries_area['mean_superficy']*100) 
crops_countries_area.dropna(inplace=True)

In [None]:
crops_countries_area.head()

#### Create a map showing this ratio by country

In [None]:
crops_countries_area_df=pd.DataFrame(crops_countries_area.ratio)
crops_countries_area_df.head()
log_df=pd.DataFrame(crops_countries_area_df.ratio.map(lambda x:np.log(x)))
type(log_df)

In [None]:
m = folium.Map(location=[48, -102], zoom_start=3)

world_geo = 'https://raw.githubusercontent.com/johan/world.geo.json/master/countries.geo.json'
Bins = list(crops_countries_area.ratio.quantile([0, 0.25, 0.5, 0.75, 1]))

m = folium.Map(zoom_start=3)

folium.Choropleth(
    geo_data=world_geo,
    name='choropleth',
    data=log_df,
    columns=[crops_countries_area.index.get_level_values(level='country_or_area').values,'ratio'],
    key_on='feature.properties.name',
    fill_color='BuPu',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='country surface vs surface harvested ratio',
    reset=True
).add_to(m)

folium.LayerControl().add_to(m)

m

On the map above, we see Switzerland has not a high ratio of area harvested over its total superficy compared to its neighbours (averaged over years). Hence, in further analysis, we will investigate whether augmenting this ratio could be feasible in the future.

In [None]:
crops_countries_by_country_by_category = crops_countries.groupby(['country_or_area','element', 'category'])
crops_countries_by_country_by_category = pd.DataFrame(data=crops_countries_by_country_by_category.value.sum().reset_index(name='total').sort_values(by='total',ascending=False))

crops_countries_by_country_by_category.head(10)

we can see that China and the united states are the main producers of cereals_total and cereale_rice_milled_eq. 

  
 ## What are the principal foodstuffs produced in each country/region of the world? And which countries are the biggest producers for a given food?

In [None]:
main_product=crops_countries_by_country_by_category.drop_duplicates(subset='country_or_area', keep='first')
main_product.head(10)

We decide to keep only the Production quantity for further analysis.

In [None]:
ind_keep=pd.Series(main_product.element.str.contains('Production Quantity'))
ind_keep=ind_keep[ind_keep].index

main_product_quantity = main_product.drop(index=main_product.index.difference(ind_keep))
main_product_quantity.head(10)

In [None]:
main_product_quantity.category.unique()

In the previous dataframe (main_product_quantity) we show the category that is most produced by each country in term of production quantity. 

We can see that China and the United States are the main producers of cereals, Canada is the main producer of cereals_rice_milled. Nigeria and Poland are the mais producers of roots and tubbers. Philippines mainly produces sugar cane and Malesia mainly produces oil_palm_fruit. 

## Are all countries equal in terms of diversity of foodstuffs harvested?
    
To answer this question, we simply count the number of categories produced by countries, so we can get an idea of their food production diversity.

In [None]:
food_diversity = pd.DataFrame(crops_countries.groupby(['country_or_area','category'])['category'].count().reset_index(name='total'))
food_diversity.head(10)

In [None]:
food_diversity = pd.DataFrame(food_diversity.groupby(['country_or_area']).country_or_area.size().reset_index(name='category_diversity'))

In [None]:
food_diversity.sort_values('category_diversity', ascending=False).head(10)

In [None]:
sorted_diversity = food_diversity.sort_values('category_diversity', ascending=False).reset_index()
sorted_diversity.loc[sorted_diversity.country_or_area=='Switzerland']

In [None]:
sorted_diversity.category_diversity.describe()

In [None]:
plt.hist(sorted_diversity.category_diversity, bins=25)
plt.title('food production diversity distribution')

We see that Switzerland is above the median of the food production diversity of all countries with 66 agricultural products produced. It is a fair value but still far from the top countries like its neighbour Italy. So there could some improvements to do on the production diversity of Switzerland, which will be discussed later in the analysis.

### Interactive visualization of element of a given category by a given country throughout years

In [None]:
#TO RUN THIS: with conda --> conda install -c conda-forge ipywidgets
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed, interact_manual
#To enable interactive viz on lab --> conda install nodejs
#                                  + jupyter labextension install @jupyter-widgets/jupyterlab-manager

In [None]:
#Interactive visualization
def viz_evolution(country, element, category):
    df_to_plot = crops_countries.loc[crops_countries['country_or_area'] == country].loc[crops_countries['element'] == element].loc[crops_countries['category'] == category]
    df_to_plot.plot(x='year', y='value',figsize=(20,10))
    plt.title(f'{element} values of {category} in {country} throughout years', fontsize= 20)
    plt.xlabel("Year", fontsize= 20)
    plt.ylabel("Values", fontsize= 20)
    plt.show()

countries = crops_countries.country_or_area.unique()    
elements = crops_countries.element.unique()
categories = crops_countries.category.unique()
interact(viz_evolution, country=countries, element = elements , category=categories)    

Now that we have well understood our dataset we can redefine our project goals. 
As we have many data for almost all the countries in the world we had to redefine the direction we wanted to follow. So we decided to focus our attention on one country only: Switzerland!
We will try to answer the following question: Can Switzerland be self-sufficient in term of food production? (see updates on the README)

<div class="alert alert-block alert-success">

# Project Update - Insight into Switzerland agriculture production 

We will focus on Switzerland compared to its neighbours. We would like to know if Switzerland could be self-sufficient in term of food production. 

## Abstract

In the wake of the the years 2007-08, food self-sufficiency policies have gained increased attention in a number of coutries following the international food crisis that triggered great volatilities on the world food markets causing important economic and social damages. <br>
Since then, diverse countries have expressed interest in improving their levels of food self-sufficiency arising controversy into a massive economically connected world.

On the 23th september of 2018, in the small country of Switzerland, the debate is materialized into a popular referendum submitted to its population asking wherever a food self-sufficiency politic should be adopted or not. Such a politic could have unexpected consequences considering a country as Switzerland with many neighbours and such a small area capacity. <br>
This paper aims to analyse the questions surrounding the debate over food self-sufficiency in Switzerland. 

- What does Switzerland produce and in which quantity? 
- What about the amount of importations/exportations?
- Are all Switzerland areas optimally harvested?
- Links to population size
- How is the Swiss productiviy trend evolving, is it correlated with external factors such as temperature, fertilizer use, ...?

Then we will make comparaisons between Switzerland and its neighbours. Does Switzerland import more than its neighbours (due to its small size ?) ? Is food self-suffience of Switzerland realistic ? How many farms/farmers would it need ?


## Data loading - Crops 

This dataset represents our new start line, it contains almost the same informations as the "Global Food & Agriculture Statistics" we already used. However, the data are more recent.  

We found the majority of our following data on the __[Food And Agriculture Organization of the United Nations Datasets](http://www.fao.org/faostat/en/#data)__ website (we will precise later if one dataset does not come from this link).

File contains data about Switzerland and neighbours (Italy, Germany, France, Austria and Liechtenstein)

In [None]:
raw_CH_crops_dataset = pd.read_csv('../data/FAOSTAT_data_crops_CHandNeighbours.csv')

Let's explore the structure of our dataset :

In [None]:
raw_CH_crops_dataset.head()

Keep only relevant information.

In [None]:
raw_CH_crops_dataset =raw_CH_crops_dataset[['Domain', 'Area', 'Element', 'Item', 'Year', 'Unit', 'Value', 'Flag Description']]

In [None]:
raw_CH_crops_dataset.drop(index=raw_CH_crops_dataset[raw_CH_crops_dataset['Flag Description'].str.contains('Data not available')].index, inplace=True)

In [None]:
raw_CH_crops_dataset.head()

Let's also load the flags dataset, in case we need it later (very small size --> doesn't cost anything)

In [None]:
flags = pd.read_csv('../data/FAOSTAT_data_flags.csv')
flags

In [None]:
print("Size of the DataFrame: {s}\n".format(s=raw_CH_crops_dataset.shape))
print("Variable types present in DataFrame: \n{t}".format(t=raw_CH_crops_dataset.dtypes))

Null values investigation:

In [None]:
print(raw_CH_crops_dataset.isna().values.any(axis=0)) 

In [None]:
print(raw_CH_crops_dataset.isna().values.any(axis=0)) 

What about the categories listed in our columns?

In [None]:
print(raw_CH_crops_dataset['Domain'].unique())
print(raw_CH_crops_dataset['Area'].unique())
print(raw_CH_crops_dataset['Element'].unique())
print(raw_CH_crops_dataset['Item'].unique())
print(raw_CH_crops_dataset['Year'].unique())
print(raw_CH_crops_dataset['Unit'].unique())
print(raw_CH_crops_dataset['Flag Description'].unique())

**Quick view of the crops dataset ready to be used**

In [None]:
raw_CH_crops_dataset.head()

In [None]:
data_check_agave=raw_CH_crops_dataset[raw_CH_crops_dataset.Item.str.contains("Agave fibres")]

In [None]:
data_check_agave

## **Crops plots :** what we can already see/investigate with this first dataset

Even if we will probably not use those plots for final presentation/analysis, it helps us to see what's inside our data. It is quick and very visual.

### Plot production of all countries over time for a selected crop

This plot is interactive. It allows you to choose for an item (apples, berries..) and shows you its production over years for the 6 countries (CH + Neighbours as listed above).

In [None]:
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed, interact_manual
# All those  library importations would probably be at the begining for the final version of te notebook, 
# but for now, we just kee them where we use them since we don't know what we will keep or not.

In [None]:
#Interactive visualization

#Plot the production of selected item for all countries over years
def viz_evolution(item):
    df_viz_evolution = raw_CH_crops_dataset.loc[raw_CH_crops_dataset['Element']=='Production'].loc[raw_CH_crops_dataset['Item']==item]
    
    # multiple line plot
    plt.figure(figsize=(20,10))
    plt.plot( 'Year', 'Value', data=df_viz_evolution.loc[df_viz_evolution['Area']=='Austria'], marker='', color='green',  label = 'Austria')
    plt.plot( 'Year', 'Value', data=df_viz_evolution.loc[df_viz_evolution['Area']=='France'], marker='', color='skyblue', label = 'France')
    plt.plot( 'Year', 'Value', data=df_viz_evolution.loc[df_viz_evolution['Area']=='Switzerland'], marker='', color='red', label = 'Switzerland', linewidth=3)
    plt.plot( 'Year', 'Value', data=df_viz_evolution.loc[df_viz_evolution['Area']=='Germany'], marker='', color='orange', label = 'Germany')
    plt.plot( 'Year', 'Value', data=df_viz_evolution.loc[df_viz_evolution['Area']=='Italy'], marker='', color='grey', label = 'Italy')
    
    plt.legend() 
    plt.title(f'Production of {item} in Switzerland and its neighbours throughout years', fontsize= 20)
    plt.xlabel("Year", fontsize= 20)
    plt.ylabel("Values", fontsize= 20)
    plt.show()
   
items = raw_CH_crops_dataset.Item.unique()
interact(viz_evolution, item = items)    

<div class="alert alert-block alert-success">
    For most of the items, Switzerland has the lowest production values. This can be explained by the small size of this country but to better understand those values and to know if their are sufficient to feed the Swiss population we will analyse how the swiss lands are used and occupied and look at the swiss demography. 
    We will also analyse swiss importations and exportations to know what Switzerland need and try to estimate if the country could produce it by it self. 

<div class="alert alert-block alert-success">

Some interesting cases we could focus on: 
    
    production of cherries and pears are decreasing over the years, Why? link to temperature or fertilizer?
    
    production of raspberries is increasing and got higher than the one of Italy and Austria
    
    good production of spinach, better than Austria
    
    onions,the production was excellent and then there is a big decrease in the production in 2004 and then it stays low... we could try to explain why. 
    

<div class="alert alert-block alert-warning">
    should we get rid off of items that Switzerland does not produce?? 
    ex: dry Beans, dry or buckwheat, eggplants and many others....
    
    je pense il faudrait parce que y'a vraiment plein d'items qui sont produits que par un seul pays et ca sert a rien dns notre analyse pour la suisse et ca nous aidera a trouver des trucs plus interessant de trier. 
    
    For some datas we can see that we miss datas along some years. 
ex: item=Artichokes
What do we do? linear interpolation? drop this item? 

### Plot production/area_harvested for all items of all countries over time.

This plot is interactive. It allows you to choose for an element (production/area harvested/yield) and shows you the sum of all items for each country over years (CH & Co)

In [None]:
# plot area harvested of each country over years
crops_sum = raw_CH_crops_dataset.groupby(['Area', 'Element','Year']) \
                                .agg({'Value':'sum'}) \
                                .rename(columns={'Value':'Sum'}) \
                                .reset_index()
crops_sum.head() # Sum of area/yiel/production of items by country and year

In [None]:
#Interactive visualization

#Plot the area harvested (sum of all items) for all countries over years
def viz_sum_evolution(element):
    df_viz_sum_evolution = crops_sum.loc[crops_sum['Element']== element]
    
    # multiple line plot
    plt.figure(figsize=(20,10))
    plt.plot( 'Year', 'Sum', data=df_viz_sum_evolution.loc[df_viz_sum_evolution['Area']=='Austria'], marker='', color='green',  label = 'Austria')
    plt.plot( 'Year', 'Sum', data=df_viz_sum_evolution.loc[df_viz_sum_evolution['Area']=='France'], marker='', color='skyblue', label = 'France')
    plt.plot( 'Year', 'Sum', data=df_viz_sum_evolution.loc[df_viz_sum_evolution['Area']=='Switzerland'], marker='', color='red', label = 'Switzerland', linewidth=3)
    plt.plot( 'Year', 'Sum', data=df_viz_sum_evolution.loc[df_viz_sum_evolution['Area']=='Germany'], marker='', color='orange', label = 'Germany')
    plt.plot( 'Year', 'Sum', data=df_viz_sum_evolution.loc[df_viz_sum_evolution['Area']=='Italy'], marker='', color='grey', label = 'Italy')
    
    plt.legend() 
    plt.title(f'{element} of all items in Switzerland and its neighbours throughout years', fontsize= 20)
    plt.xlabel("Year", fontsize= 20)
    plt.ylabel("Values", fontsize= 20)
    plt.show()
   
elements = crops_sum.Element.unique()
interact(viz_sum_evolution, element = elements)  

<div class="alert alert-block alert-success">
    
   Switzerland has the lowest production and area harvested of all items throughout years but it allways have one of the higher yield and it is increasing. 
   

## Data loading - Land use indicators 

File contains data about Switzerland and neighbours (Italy, Germany, France, Austria and Liechtenstein).
This file will allows us to know the potential of Switzerland in term of agriculture. Does the country use all its land or not? 

Data exploration and pre-processing is very simmilar to first dataset. We will therefore not describe all steps as precisely as before.

In [None]:
raw_land_use_dataset = pd.read_csv('../data/FAOSTAT_data_LandUseIndicators.csv')

In [None]:
raw_land_use_dataset.head()

In [None]:
raw_land_use_dataset =raw_land_use_dataset[['Domain', 'Area', 'Element', 'Item', 'Year', 'Unit', 'Value', 'Flag Description']]

In [None]:
print("Size of the DataFrame: {s}\n".format(s=raw_land_use_dataset.shape))
print("Variable types present in DataFrame: \n{t}".format(t=raw_land_use_dataset.dtypes))

In [None]:
print(raw_land_use_dataset.isnull().values.any(axis=0))  # --> PERFECT!

In [None]:
print(raw_land_use_dataset.isna().values.any(axis=0))  # --> PERFECT!

In [None]:
print(raw_land_use_dataset['Domain'].unique())
print(raw_land_use_dataset['Area'].unique())
print(raw_land_use_dataset['Element'].unique())
print(raw_land_use_dataset['Item'].unique())
print(raw_land_use_dataset['Year'].unique())
print(raw_land_use_dataset['Unit'].unique())
print(raw_land_use_dataset['Flag Description'].unique())

## **Land use indicators plots :** what we can already see/investigate with this second dataset

Even if we will probably not use those plots for final presentation/analysis, it helps us to see what's inside our data. It is quick and very visual.

### Plot the lands distribution in Switzerland

We would like to precise those data (with more datasets) by including also urban areas to the distribution.

In [None]:
import matplotlib.pyplot as plt

# DataFrames to plot
df_land = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='Switzerland'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Land area']
df_agri = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='Switzerland'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Agricultural land']

# Pie plot #1
labels1 = df_land.Item
sizes1 = df_land.Value
explode = (0, 0, 0.1, 0)  # only "explode" the 3rd slice

fig1, ax1 = plt.subplots()
ax1.pie(sizes1, explode=explode, labels=labels1, autopct='%1.1f%%',
        shadow=True, startangle=90)
#ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax1.title.set_text('Distribution of lands in Switzerland, year 2016')
fig1.set_facecolor('white')

# Pie plot #2
labels2 = df_agri.Item
sizes2 = df_agri.Value
fig1, ax2 = plt.subplots()
ax2.pie(sizes2, labels=labels2, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax2.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax2.title.set_text('Distribution of agricultural lands in Switzerland, year 2016')

# General settings
fig1.set_facecolor('white')
plt.show()
df_land

In [None]:
import matplotlib.pyplot as plt

# DataFrames to plot
df_land = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='France'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Land area']
df_agri = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='France'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Agricultural land']

# Pie plot #1
labels1 = df_land.Item
sizes1 = df_land.Value
explode = (0, 0, 0.1, 0)  # only "explode" the 3rd slice

fig1, ax1 = plt.subplots()
ax1.pie(sizes1, explode=explode,labels=labels1, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax1.title.set_text('Distribution of lands in France, year 2016')
fig1.set_facecolor('white')

# Pie plot #2
labels2 = df_agri.Item
sizes2 = df_agri.Value
fig1, ax2 = plt.subplots()
ax2.pie(sizes2, labels=labels2, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax2.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax2.title.set_text('Distribution of agricultural lands in France, year 2016')

# General settings
fig1.set_facecolor('white')
plt.show()

In [None]:
import matplotlib.pyplot as plt

# DataFrames to plot
df_land = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='Germany'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Land area']
df_agri = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='Germany'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Agricultural land']

# Pie plot #1
labels1 = df_land.Item
sizes1 = df_land.Value
explode = (0, 0, 0.1, 0)  # only "explode" the 3rd slice

fig1, ax1 = plt.subplots()
ax1.pie(sizes1, explode=explode,labels=labels1, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax1.title.set_text('Distribution of lands in Germany, year 2016')
fig1.set_facecolor('white')

# Pie plot #2
labels2 = df_agri.Item
sizes2 = df_agri.Value
fig1, ax2 = plt.subplots()
ax2.pie(sizes2, labels=labels2, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax2.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax2.title.set_text('Distribution of agricultural lands in Germany, year 2016')

# General settings
fig1.set_facecolor('white')
plt.show()

In [None]:
import matplotlib.pyplot as plt

# DataFrames to plot
df_land = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='Italy'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Land area']
df_agri = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='Italy'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Agricultural land']

# Pie plot #1
labels1 = df_land.Item
sizes1 = df_land.Value
explode = (0, 0, 0.1, 0)  # only "explode" the 3rd slice

fig1, ax1 = plt.subplots()
ax1.pie(sizes1, explode=explode,labels=labels1, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax1.title.set_text('Distribution of lands in Italy, year 2016')
fig1.set_facecolor('white')

# Pie plot #2
labels2 = df_agri.Item
sizes2 = df_agri.Value
fig1, ax2 = plt.subplots()
ax2.pie(sizes2, labels=labels2, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax2.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax2.title.set_text('Distribution of agricultural lands in Italy, year 2016')

# General settings
fig1.set_facecolor('white')
plt.show()

In [None]:
import matplotlib.pyplot as plt

# DataFrames to plot
df_land = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='Liechtenstein'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Land area']
df_agri = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='Liechtenstein'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Agricultural land']

# Pie plot #1
labels1 = df_land.Item
sizes1 = df_land.Value
explode = (0, 0, 0.1, 0)  # only "explode" the 3rd slice

fig1, ax1 = plt.subplots()
ax1.pie(sizes1, explode=explode,labels=labels1, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax1.title.set_text('Distribution of lands in Liechtenstein, year 2016')
fig1.set_facecolor('white')

# Pie plot #2
labels2 = df_agri.Item
sizes2 = df_agri.Value
fig1, ax2 = plt.subplots()
ax2.pie(sizes2, labels=labels2, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax2.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax2.title.set_text('Distribution of agricultural lands in Liechtenstein, year 2016')

# General settings
fig1.set_facecolor('white')
plt.show()

<div class="alert alert-block alert-success">

From the first graphes (distribution of lands) we can see that only 45,2% of Switzerland lands are used in agriculture compared to Frane, Italy or Germany where around 64% of lands are exploited in agriculture. Lands expoited in agriculture are the sum of cropland and agriculture land. We can see that the percentage of forest is quite similar between those three countries and that the main difference reside in the percentage of lands attribuated to meadows and pastures. For example France use half land less than Switzerland for meadows and pastures, Germany more than half less and Italy use only one third of what Switzerland attribute. We can deduce from these plots that Switzerland is more dedicated to dairy products and breeding. 
    When comparing Sxitzerland with Liechtenstein, we find more similarities as the percentage of land used in agriculture is 42,5%.
    From the second graphes (distribution of agricultural lands) we can see that the majority of Switzerland agricultural lands are under permanant meadows and pastures. This is a huge amount compared to the other countries which promote crop and arable lands. This suit our previous conviction than Switzerland is more dedicated to dairy products and breedings. We could hypothetize that Switzerland may be obligated to reduce this part of dedicated land to meadows and pastures in order to become food self efficient. This would also induce work  and policy transitions and impact the Swiss economy. 
    However an important aspect which is not shown by this data are the part of urban lands. We should add it to our analysis. 
    
ps: Arable lands are lands that can or are cultivable

<div class="alert alert-block alert-warning">
    
    Si on décide de garder ces graphes faudra mieux les afficher et faire une fonction pour le code la j'essaiyais juste d'en tirer des infos utiles pour nous aider sur la direction du rapport ou data story. 
    Il faudra rajouter également le poucentage de terre urbaines dans les plot (see next step)

## Data loading - Land Cover
File contains data about Switzerland and neighbours (Italy, Germany, France, Austria and Liechtenstein). It is complementary to the previous one, since it also references urban areas.
Data exploration and pre-processing is very simmilar to first dataset. We will therefore not describe all steps as precisely as before.

In [None]:
raw_land_cover_dataset = pd.read_csv('../data/FAOSTAT_data_LandCover.csv')
raw_land_cover_dataset.head()

In [None]:
raw_land_cover_dataset =raw_land_cover_dataset[['Domain', 'Area', 'Element', 'Item', 'Year', 'Unit', 'Value', 'Flag Description']]
raw_land_cover_dataset.head()


In [None]:
print("Size of the DataFrame: {s}\n".format(s=raw_land_cover_dataset.shape))
print("Variable types present in DataFrame: \n{t}".format(t=raw_land_cover_dataset.dtypes))

In [None]:
raw_land_cover_dataset.drop(index=raw_land_cover_dataset[raw_land_cover_dataset['Flag Description'].str.contains('Data not available')].index, inplace=True)

In [None]:
print(raw_land_cover_dataset.isnull().values.any(axis=0))  # --> PERFECT!

In [None]:
print(raw_land_cover_dataset['Domain'].unique())
print(raw_land_cover_dataset['Area'].unique())
print(raw_land_cover_dataset['Element'].unique())
print(raw_land_cover_dataset['Item'].unique())
print(raw_land_cover_dataset['Year'].unique())
print(raw_land_cover_dataset['Unit'].unique())
print(raw_land_cover_dataset['Flag Description'].unique())

To get the surface, we need to multiply the Value by the unit

In [None]:
pd.options.mode.chained_assignment = None  # default='warn', Mutes warnings when copying a slice from a DataFrame.
raw_land_cover_dataset["Surface"] = raw_land_cover_dataset.Value.apply(lambda x: x*1000)
raw_land_cover_dataset.drop(columns='Value')

Let's compute the percentage of the superficy which are allocated to urbain zone so we can add it to our previous visualizations of the land distributions of switzerland.  

In [None]:
df_surface_country.head()

get ridd off the multiple indexes

In [None]:
df_surface_country.columns = df_surface_country.columns.map(lambda x: x[1]) 
df_surface_country = df_surface_country.reset_index()

In [None]:
df_surface_country

get access of the superficy of the country we are interested in


In [None]:
Switzerland_superficy=df_surface_country.loc[df_surface_country["Country Name"]=="Switzerland"].e.get_value
Italy_superficy=df_surface_country.loc[df_surface_country["Country Name"]=="Italy"].e.get_value
France_superficy=df_surface_country.loc[df_surface_country["Country Name"]=="France"].e.get_value
Germany_superficy=df_surface_country.loc[df_surface_country["Country Name"]=="Germany"].e.get_value
Austria_superficy=df_surface_country.loc[df_surface_country["Country Name"]=="Austria"].e.get_value
Liechtenstein_superficy=df_surface_country.loc[df_surface_country["Country Name"]=="Liechtenstein"].e.get_value
Austria_superficy

<div class="alert alert-block alert-warning">
    
    step to finish:
Now we can compute the fraction for each country and add it to our dataframe and redo the previous plot

In [None]:
list_to_keep=['Switzerland','France','Germany','Italy','Leichtenstein', 'Austria']
raw_land_cover_dataset = raw_land_cover_dataset[raw_land_cover_dataset.Area.isin(list_to_keep)]
raw_land_cover_dataset.head()


In [None]:
pd.options.mode.chained_assignment = None  # default='warn', Mutes warnings when copying a slice from a DataFrame.

for i in range(0,raw_land_cover_dataset.shape[0]):
    if raw_land_cover_dataset.Area.iloc[i] == 'Switzerland':
        raw_land_cover_dataset.Surface.iloc[i] = raw_land_cover_dataset.Surface.iloc[i]/39524.74159
    if raw_land_cover_dataset.Area.iloc[i] == 'France':
        raw_land_cover_dataset.Surface.iloc[i] = raw_land_cover_dataset.Surface.iloc[i]/547569.094154
    if raw_land_cover_dataset.Area.iloc[i] == 'Italy':
        raw_land_cover_dataset.Surface.iloc[i] = raw_land_cover_dataset.Surface.iloc[i]/294118.275862
    if raw_land_cover_dataset.Area.iloc[i] == 'Germany':
        raw_land_cover_dataset.Surface.iloc[i] = raw_land_cover_dataset.Surface.iloc[i]/349029.292834
    if raw_land_cover_dataset.Area.iloc[i] == 'Leichtenstein':
        raw_land_cover_dataset.Surface.iloc[i] = raw_land_cover_dataset.Surface.iloc[i]/160.0
    if raw_land_cover_dataset.Area.iloc[i] == 'Austria':
        raw_land_cover_dataset.Surface.iloc[i] = raw_land_cover_dataset.Surface.iloc[i]/82573.206492


In [None]:
raw_land_cover_dataset.head()

In [None]:
import matplotlib.pyplot as plt

# DataFrames to plot
df_artificial_surface = raw_land_cover_dataset.loc[raw_land_cover_dataset['Area']=='Switzerland'].loc[raw_land_cover_dataset['Year']==2016].loc[raw_land_cover_dataset['Element']=='Area from MODIS']
# Pie plot #1
labels1 = df_artificial_surface.Item
sizes1 = df_artificial_surface.Value

fig1, ax1 = plt.subplots(figsize=(30,15))
ax1.pie(sizes1, labels=labels1, autopct='%1.1f%%',
        shadow=True, startangle=45)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax1.title.set_text('Distribution of artificial lands in Switzerland, year 2016')
fig1.set_facecolor('white')

# General settings
fig1.set_facecolor('white')
plt.show()
df_artificial_surface

<div class="alert alert-block alert-success">
    
    Around 60 % of Switzerland land are unusable for agriculture (sum of forests, shrub covered areas,  inland water bodies, permanent snow and glaciers and artificial surfaces). 
    Inland water bodies are lakes and  artificial areas are urban areas. 
    This percentage can be increased via deforestation but it is not in the Interest of the country for environmental issues. 

Comparison with the neighbours:

In [None]:
import matplotlib.pyplot as plt

# DataFrames to plot
df_artificial_surface = raw_land_cover_dataset.loc[raw_land_cover_dataset['Area']=='France'].loc[raw_land_cover_dataset['Year']==2016].loc[raw_land_cover_dataset['Element']=='Area from MODIS']
# Pie plot #1
labels1 = df_artificial_surface.Item
sizes1 = df_artificial_surface.Value

fig1, ax1 = plt.subplots(figsize=(30,15))
ax1.pie(sizes1, labels=labels1, autopct='%1.1f%%',
        shadow=True, startangle=45)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax1.title.set_text('Distribution of lands in France, year 2016')
fig1.set_facecolor('white')

# General settings
fig1.set_facecolor('white')
plt.show()

In [None]:
import matplotlib.pyplot as plt

# DataFrames to plot
df_artificial_surface = raw_land_cover_dataset.loc[raw_land_cover_dataset['Area']=='Germany'].loc[raw_land_cover_dataset['Year']==2016].loc[raw_land_cover_dataset['Element']=='Area from MODIS']
# Pie plot #1
labels1 = df_artificial_surface.Item
sizes1 = df_artificial_surface.Value

fig1, ax1 = plt.subplots(figsize=(30,15))
ax1.pie(sizes1, labels=labels1, autopct='%1.1f%%',
        shadow=True, startangle=45)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax1.title.set_text('Distribution of artificial lands in Germany, year 2016')
fig1.set_facecolor('white')

# General settings
fig1.set_facecolor('white')
plt.show()

In [None]:
import matplotlib.pyplot as plt

# DataFrames to plot
df_artificial_surface = raw_land_cover_dataset.loc[raw_land_cover_dataset['Area']=='Italy'].loc[raw_land_cover_dataset['Year']==2016].loc[raw_land_cover_dataset['Element']=='Area from MODIS']
# Pie plot #1
labels1 = df_artificial_surface.Item
sizes1 = df_artificial_surface.Value

fig1, ax1 = plt.subplots(figsize=(30,15))
ax1.pie(sizes1, labels=labels1, autopct='%1.1f%%',
        shadow=True, startangle=45)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax1.title.set_text('Distribution of artificial lands in Italy, year 2016')
fig1.set_facecolor('white')

# General settings
fig1.set_facecolor('white')
plt.show()

In [None]:
import matplotlib.pyplot as plt

# DataFrames to plot
df_artificial_surface = raw_land_cover_dataset.loc[raw_land_cover_dataset['Area']=='Austria'].loc[raw_land_cover_dataset['Year']==2016].loc[raw_land_cover_dataset['Element']=='Area from MODIS']
# Pie plot #1
labels1 = df_artificial_surface.Item
sizes1 = df_artificial_surface.Value

fig1, ax1 = plt.subplots(figsize=(30,15))
ax1.pie(sizes1, labels=labels1, autopct='%1.1f%%',
        shadow=True, startangle=45)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax1.title.set_text('Distribution of artificial lands in Austria, year 2016')
fig1.set_facecolor('white')

# General settings
fig1.set_facecolor('white')
plt.show()

In [None]:
import matplotlib.pyplot as plt

# DataFrames to plot
df_artificial_surface = raw_land_cover_dataset.loc[raw_land_cover_dataset['Area']=='Leichtenstein'].loc[raw_land_cover_dataset['Year']==2016].loc[raw_land_cover_dataset['Element']=='Area from CCI_CI']
# Pie plot #1
labels1 = df_artificial_surface.Item
sizes1 = df_artificial_surface.Value

fig1, ax1 = plt.subplots(figsize=(30,15))
ax1.pie(sizes1, labels=labels1, autopct='%1.1f%%',
        shadow=True, startangle=45)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax1.title.set_text('Distribution of artificial lands in Leichtenstein, year 2016')
fig1.set_facecolor('white')

# General settings
fig1.set_facecolor('white')
plt.show()

<div class="alert alert-block alert-success">
    
    In France, 51 % of the lands are not usable for agriculture. 
    In Germany, 52,6 %
    In Italy only 41,7%
    In Austria 66,8 %
    In Leichtenstein: no datas
    So compared to its neighbours Switzerland have one of the smallest ratio of usable lands for Agriculture but still manage to have the best yields. 

## Data loading - Demographical data 

File contains data about Switzerland and neighbours (Italy, Germany, France, Austria but not Liechtenstein -data missing from dataset).
These data will allow us to know the number of consumers in Switzerland and to compare the possible food self-sufficiency between Switzerland and its neighbours. we would like to answer questions as: With the growing population, can we feed everybody with Swiss agriculture in the next few years?

Data exploration and pre-processing is very simmilar to first dataset. We will therefore not describe all steps as precisely as before.

In [None]:
demography = pd.read_csv('../data/FAOSTAT_data_demography.csv')

In [None]:
demography

In [None]:
for col in demography:
    print (demography[col].unique())

In [None]:
demography = demography[['Area', 'Year', 'Value']]
demography

The value unit beeing 1000 persons, we adjust the number to display the population values in terms of individuals

In [None]:
pd.options.mode.chained_assignment = None  # default='warn', Mutes warnings when copying a slice from a DataFrame.
demography["Population"] = demography.Value.apply(lambda x: x*1000)
demography.drop(columns='Value')

In [None]:
#plot of the evolution of the demography over the years
from scipy.stats import linregress

plt.figure(figsize=(20,10))
plt.plot( 'Year', 'Population', data=demography.loc[demography['Area']=='Austria'], marker='', color='green',  label = 'Austria')
plt.plot( 'Year', 'Population', data=demography.loc[demography['Area']=='France'], marker='', color='skyblue', label = 'France')
plt.plot( 'Year', 'Population', data=demography.loc[demography['Area']=='Switzerland'], marker='', color='red', label = 'Switzerland', linewidth=3)
plt.plot( 'Year', 'Population', data=demography.loc[demography['Area']=='Germany'], marker='', color='orange', label = 'Germany')
plt.plot( 'Year', 'Population', data=demography.loc[demography['Area']=='Italy'], marker='', color='grey', label = 'Italy')
    
plt.legend() 
plt.title('Evolution of the demography over the years' , fontsize= 20)
plt.xlabel("Year", fontsize= 20)
plt.ylabel("Population value", fontsize= 20)
plt.show()

In [None]:
min_swiss_demography = demography[demography.Area.str.contains('Switzerland')].Population.min()
min_swiss_demography

In [None]:
max_swiss_demography = demography[demography.Area.str.contains('Switzerland')].Population.max()
max_swiss_demography

In [None]:
delta_swiss_demography= max_swiss_demography - min_swiss_demography
delta_swiss_demography

<div class="alert alert-block alert-success">
    
We can see that as expected, the population is growing in every country. From 1950 to 2018 the swiss population has increased by 0,38*10^7 persons. It has allmost double, so if the wold's predictions about demographic growth reveal to be true for the next years, how could Switzerland become self-sufficcient? 
    

<div class="alert alert-block alert-warning">
    idea: we could make linear regression for each country so we get the slope of the demographic growth and we can compare and extrapolate it to further years for our predictive model? 

## Data loading - Swiss importations and exportations of agricultural goods 

Files contain data for Switzerland only. The data provide insight about the trade of agricultural goods, that is the importations and exportations for a given product.
Data exploration and pre-processing is very simmilar to first dataset. We will therefore not describe all steps as precisely as before.

In [None]:
CH_imports = pd.read_csv('../data/FAOSTAT_data_11-23-2019.csv')

In [None]:
CH_imports.head()

In [None]:
CH_exports = pd.read_csv('../data/FAOSTAT_data_exports.csv')

In [None]:
CH_exports.head()

To make the data processing and analysis more simple and concise, we concatenate exportations and importations data (as both datasets have exactly the same structure).

In [None]:
CH_trade = pd.concat([CH_imports, CH_exports])

In [None]:
CH_trade.dtypes

In [None]:
for col in CH_trade:
    print (CH_trade[col].unique())

For maximizing reliability of later results, we discard the numbers that were obtained from an unofficial source.

In [None]:
unofficial_stats_index = CH_trade.loc[CH_trade.Flag=='*'].index

In [None]:
# Drop the unofficial data
CH_trade = CH_trade.drop(index = unofficial_stats_index)

We keep only the importation and exportation values that are represented in tonnes, so that we can compare it with the agricultural production.

In [None]:
CH_trade = CH_trade.loc[CH_imports.Unit=='tonnes']

In [None]:
#for further task
CH_trade_network=CH_trade.copy()

In [None]:
CH_trade = CH_trade[['Element','Partner Countries', 'Item', 'Year', 'Unit', 'Value']]

To keep the model simple, we sum the importations and exportations for a given product over all partner countries.

In [None]:
CH_trade = CH_trade.groupby(['Item', 'Year', 'Element']).agg({'Value':'sum'})\
                                    .reset_index()

We improve the structure of our dataframe by pivoting its values of importations and exportations.

In [None]:
CH_trade_transformed = pd.pivot(CH_trade,columns = 'Element', values='Value')\
                .rename(columns={'Export Quantity':'Exported Quantity','Import Quantity':'Imported Quantity'})

In [None]:
CH_trade_transformed

In [None]:
CH_trade = pd.concat([CH_trade, CH_trade_transformed], axis=1, join='inner')

In [None]:
CH_trade.drop(columns=['Value', 'Element'], inplace=True)

In [None]:
CH_trade = CH_trade.groupby(['Item', 'Year'])\
                            .agg({'Exported Quantity':'mean','Imported Quantity':'mean'})\
                            .reset_index()
                                    

Combine production and trade data in one dataframe 'CH_data' so that we have all the information at the same place. Note that we don't have values of importations and exportations before 1986 so production of goods before 1986 will not be considered as from here.

In [None]:
CH_crops = raw_CH_crops_dataset[['Area', 'Item','Element', 'Year', 'Unit', 'Value']]

In [None]:
# Merge importations data with production data
CH_data = CH_crops.loc[CH_crops.Area=='Switzerland'].loc[CH_crops.Element=='Production'].loc[CH_crops.Year>= 1986]\
                                    .merge(CH_trade,on=['Item', 'Year'], how='inner')\
                                    .rename(columns={'Value':'Produced Quantity'})



In [None]:
CH_data

Now, combine with the land analysis of Switzerland, the consumers trends and the Swiss demography we could estimate if the country has an interest of producing more of an item, if it is able to produce more of an item and stop its importation of the item. --> # Milestone 3

### Plot production, exports and imports of items in Switzerland over years.

This plot is interactive. It allows you to choose for an item (apples, berries..) and shows you its production, exportation and importation in Switzerland over years.

In [None]:
#Interactive visualization

#Plot the production of selected item for all countries over years
def viz_evolution(item):
    df_viz_evolution = CH_data.loc[CH_data['Item']==item]
    
    # multiple line plot
    plt.figure(figsize=(20,10))
    plt.plot( 'Year', 'Produced Quantity', data=df_viz_evolution, marker='', color='red', label = 'crops', linewidth=3)
    plt.plot('Year', 'Imported Quantity', data=df_viz_evolution, marker='', color='blue', label = 'imports', linewidth=3)
    plt.plot('Year', 'Exported Quantity', data=df_viz_evolution, marker='', color='green', label = 'exports', linewidth=3) 
    plt.legend() 
    plt.title(f'Production and imports of {item} in Switzerland throughout years', fontsize= 20)
    plt.xlabel("Year", fontsize= 20)
    plt.ylabel("Values [tonnes]", fontsize= 20)
    plt.show()
   
items = CH_data.Item.unique()
interact(viz_evolution, item = items)    

**Most produced, imported and exported products :**

- Most produced crops products

In [None]:
CH_data.loc[CH_data.Year == 2016].sort_values(by='Produced Quantity', ascending = False).head(10)

- Most imported crops products

In [None]:
CH_data.loc[CH_data.Year == 2016].sort_values(by='Imported Quantity', ascending = False).head(10)

- Most exported crops products

In [None]:
CH_data.loc[CH_data.Year == 2016].sort_values(by='Exported Quantity', ascending = False).head(10)

In [None]:
total_export_quantity = CH_data["Exported Quantity"].sum()
total_export_quantity

In [None]:
total_import_quantity = CH_data["Imported Quantity"].sum()
total_import_quantity

In [None]:
dv=total_import_quantity/total_export_quantity
dv

<div class="alert alert-block alert-success">
    
We can see that some of the most produced items are also some of the more imported like potatoes, wheat, maize, grappes, lettuce and chicory and sugar beet. This can show a high consumption of the item by the population and can indicate us that one of the priority could be to increase their production. In the more exported items, it is not suprising to find several items that are highly produced by Switzerland such as wheat, potatoes, apples, maise and Barley.  But we also find some strange items like oilseeds nes which are more imported than produced and then exported in a higher quantity than the one produce which indicates an economic advantage in this transition. 
    When summing the total amount of exported and imported products we can see that switzerland import 70 times more products. But is this in a transit goal ( see oilseeds ex) ? or for consumption? 

**Less produced, imported and exported products :**

- Less produced crops products

In [None]:
CH_data.loc[CH_data.Year == 2016].sort_values(by='Produced Quantity', ascending = True).head(10)

<div class="alert alert-block alert-success">
    Does the less produced items corresponds to the most imported ones? 
    we can see that none of these items figured in the most imported ones.. is it related to the consumption trends of swiss people? Is it necessary to increase their production if they do not seems necessary? 
    The only ecxeption is Oilseed nes but as discussed previously they are also exported. 
    
    

- Less imported products

In [None]:
CH_data.loc[CH_data.Year == 2016].sort_values(by='Imported Quantity', ascending = True).head(10)

<div class="alert alert-block alert-success">
    Does the less imported items corresponds to the most produced and exported ones? which would suggest high sufficiency of the items. 
    we can see not math between the less imported and the most imported an produced items. 
    

- less exported crops products

In [None]:
CH_data.loc[CH_data.Year == 2016].sort_values(by='Exported Quantity', ascending = True).head(10)

<div class="alert alert-block alert-success">

Are the less exported items the less produced and the more imported ones? 
    No similarities between the less produced and the less exported items, same for importations. 
    

<div class="alert alert-block alert-warning">

Idea: on pourrait faire des matrices de corrélations pour répondre à ces questions?

### Plot most produced, exported and  imported items in Switzerland, year 2016.

This plot is interactive. Shows values upon cursor selection.

In [None]:
import plotly # conda install -c anaconda plotly #AND# jupyter labextension install @jupyterlab/plotly-extension
import plotly.graph_objects as go
y_wheat = CH_data.loc[CH_data.Year == 2016].loc[CH_data.Item=='Wheat'].values[0,-3:]
y_potatoes = CH_data.loc[CH_data.Year == 2016].loc[CH_data.Item=='Potatoes'].values[0,-3:]
y_beet = CH_data.loc[CH_data.Year == 2016].loc[CH_data.Item=='Sugar beet'].values[0,-3:]
y_maize = CH_data.loc[CH_data.Year == 2016].loc[CH_data.Item=='Maize'].values[0,-3:]

x=['Produced', 'Exported', 'Imported']
fig = go.Figure(go.Bar(x=x, y=y_wheat, name='Wheat'))
fig.add_trace(go.Bar(x=x, y=y_potatoes, name='Potatoes'))
fig.add_trace(go.Bar(x=x, y=y_beet, name='Sugar beet'))
fig.add_trace(go.Bar(x=x, y=y_maize, name='Maize'))

fig.update_layout(
    title='Most produced, exported and imported items in Switzerland in 2016',
    yaxis_title="Values [tonnes]",
    barmode='stack', 
    font=dict(
        family="Courier New, monospace",
        size=16,
        color="#7f7f7f")
    )
fig.show()


<div class="alert alert-block alert-success">

We see that Switzerland is a very small exportator for wheat, potatoes, sugar beet and Maize. They also import some quantities of each items which mean that the country is not sufficient, maybe the production of these items have to be increased in priority in order to make Sitzerland food self sufficient. 

### Plot production, importation and exportation of agricultural goods in Switzerland throughout years

This plot is interactive. Shows values upon cursor selection

In [None]:
import plotly.graph_objects as go

def viz_potatoe(item):
    y_wheat = CH_data.loc[CH_data.Item==item].values[:,-3:]
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=CH_data.Year.unique(), y=y_wheat[:,0], fill='tonexty', name='Produced')) # fill down to xaxis
    fig.add_trace(go.Scatter(x=CH_data.Year.unique(), y=y_wheat[:,1], fill='tozeroy', name='Exported')) # fill to trace0 y
    fig.add_trace(go.Scatter(x=CH_data.Year.unique(), y=y_wheat[:,2], fill='tonexty', name='Imported')) # fill to trace0 y
    fig.update_layout(
        title=f"{item} importations and productions throughout years in Switzerland",
        yaxis_title="Values [tonnes]",
        xaxis_title='Years'
        )
    fig.show()

items = CH_data.Item.unique()
interact(viz_potatoe, item = items)  


### Plot production,  importation and exportation in Switzerland throughout years

This plot is interactive. Shows values upon cursor selection. As reported before, values of exportations are much lower than those of production and importations. Hence, exportations values will now be plot separately, to better show their trend.

In [None]:
total_crops_imports = CH_data.groupby('Year').agg({'Produced Quantity':'sum', 'Exported Quantity':'sum', 'Imported Quantity':'sum'})

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=CH_data.Year.unique(), y=total_crops_imports['Produced Quantity'].values, fill='tonexty', name='Produced')) # fill down to xaxis
fig.add_trace(go.Scatter(x=CH_data.Year.unique(), y=total_crops_imports['Imported Quantity'].values, fill='tozeroy', name='Imported')) # fill to trace0 y
fig.add_trace(go.Scatter(x=CH_data.Year.unique(), y=total_crops_imports['Exported Quantity'].values, fill='tozeroy', name='Exported'))
fig.update_layout(
    title="Sum of all importations, exportations and productions throughout years in Switzerland",
    yaxis_title="Values [tonnes]",
    xaxis_title='Years'
    )
fig.show()

<div class="alert alert-block alert-success">
    
Here again we can see that Switzerland is a very small exporter and is quite constant with its importations. Nevertheless, its importations seems to be slightly increasing since 2005. Is it because of demand for food diversity or because of production issues? Since The production values through years seems constantly varying we could follow the first hypothesis (ask for food diversity as a consequence of the mondialisation). 
    
    For the analysis + writing of report, I would say that:
        - CH is  a good player after all, since produces way more that imports (ratio 3:1)
        - For in raise in importations: I would say that a higher demand on diversity would not affect the curve since diversity does not mean quantity (actually we like chocolate but still eat more rice+ "common food products" ) --> To me it would rather reflect the fact that demography increases linearly and CH is ~cst in production = need to import more!

As the exportations are hardly visible on the previous graph due to scale differences, we will plot them alone.

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=CH_data.Year.unique(), y=total_crops_imports['Exported Quantity'].values, fill='tozeroy', name='Exported')) # fill to trace0 y
fig.update_layout(
    title="Sum of exportations throughout years in Switzerland",
    yaxis_title="Values [tonnes]",
    xaxis_title='Years'
    )
fig.show()

Maybe we could add here an intercative plot where we can select the item to see (but for now we are sure if we can/how to combine plotly and ipwidgets libraries)

In [None]:
CH_data2 = CH_data.copy().rename(columns={'Produced Quantity':'Country production', 'Imported Quantity':'Importation', 'Exported Quantity':'Exportation'})
CH_data_transformed = pd.melt(CH_data2, value_vars=['Country production', 'Importation'], id_vars=['Area', 'Element','Item','Year','Unit'], var_name='Input', value_name='Value')

In [None]:
CH_data_transformed.loc[CH_data_transformed.Item=='Potatoes']

<div class="alert alert-block alert-warning">
Step not finish: on voulait faire quoi avec CH2 deja ???

### Plot evolution of production and importations for five most important items (Switzerland data only)

This plot is interactive. Shows values upon cursor selection

In [None]:
CH_restrained = CH_data_transformed.loc[CH_data_transformed.Item.isin(['Apples','Wheat','Potatoes', 'Maize', 'Sugar beet'])]



In [None]:
# Just trying a plot
import plotly.express as px
fig = px.area(CH_restrained, x="Year", y="Value", color='Item',
      line_group='Input')
fig.update_layout(
    title="Switzerland's production/importation evolution for five most important items",
    yaxis_title="Values [tonnes]",
    xaxis_title='Years'
    )
fig.show()

<div class="alert alert-block alert-success">
Since 2005, productions and importations of these five mains products are quite constant. 
    
    Cst as expected --> Would not be used for report, since it shows "normality"
    

In [None]:
CH_data_transformed_exportations = pd.melt(CH_data2, value_vars='Exportation', id_vars=['Area', 'Element','Item','Year','Unit'], var_name='Input', value_name='Value')


In [None]:
CH_restrained_exportations = CH_data_transformed_exportations.loc[CH_data_transformed_exportations.Item.isin(['Apples','Wheat','Potatoes', 'Maize', 'Sugar beet'])]

In [None]:
import plotly.express as px
fig = px.area(CH_restrained_exportations, x="Year", y="Value", color='Item',
      line_group='Input')
fig.update_layout(
    title="Switzerland's exportations evolution for five most important items over time",
    yaxis_title="Values [tonnes]",
    xaxis_title='Years'
    )
fig.show()

<div class="alert alert-block alert-success">

The are fluctuating a lot. For the more recent years, only the exportations of potatoes are increasing. Are the other ones decreasing beacuse Switzerland is producing less of them? 

## Data loading - Italian importations and exportations of agricultural goods 

In [None]:
Italy_trade = pd.read_csv('../data/FAOSTAT_data_italy.csv')
Italy_trade.head()

In [None]:
Italy_trade.dtypes

In [None]:
unofficial_stats_index_it = Italy_trade.loc[Italy_trade.Flag=='*'].index

In [None]:
# Drop the unofficial data
Italy_trade = Italy_trade.drop(index = unofficial_stats_index_it)

In [None]:
#we keep only tonnes units
Italy_trade = Italy_trade.loc[Italy_trade.Unit=='tonnes']

In [None]:
Italy_trade.drop(index=Italy_trade[Italy_trade['Flag Description'].str.contains('Data not available')].index, inplace=True)

In [None]:
Italy_trade = Italy_trade[['Element','Area', 'Item', 'Year', 'Unit', 'Value']]
Italy_trade.head()

To keep the model simple, we sum the importations and exportations for a given product over all partner countries.


In [None]:
Italy_trade = Italy_trade.groupby(['Item', 'Year', 'Element']).agg({'Value':'sum'})\
                                    .reset_index()
Italy_trade.head()

We improve the structure of our dataframe by pivoting its values of importations and exportations.

In [None]:
Italy_trade_transformed = pd.pivot(Italy_trade,columns = 'Element', values='Value')\
                .rename(columns={'Export Quantity':'Exported Quantity','Import Quantity':'Imported Quantity'})
Italy_trade_transformed.head()

In [None]:
Italy_trade = pd.concat([Italy_trade, Italy_trade_transformed], axis=1, join='inner')
Italy_trade.drop(columns=['Value', 'Element'], inplace=True)
Italy_trade = Italy_trade.groupby(['Item', 'Year'])\
                            .agg({'Exported Quantity':'mean','Imported Quantity':'mean'})\
                            .reset_index()

In [None]:
Italy_trade.head()

Combine production and trade data in one dataframe 'Italy_data' so that we have all the information at the same place. Note that we don't have values of importations and exportations before 1986 so production of goods before 1986 will not be considered as from here.

In [None]:
Italy_crops = raw_CH_crops_dataset[['Area', 'Item','Element', 'Year', 'Unit', 'Value']]

In [None]:
# Merge importations data with production data
Italy_data = Italy_crops.loc[Italy_crops.Area=='Italy'].loc[Italy_crops.Element=='Production'].loc[Italy_crops.Year>= 1986]\
                                    .merge(Italy_trade,on=['Item', 'Year'], how='inner')\
                                    .rename(columns={'Value':'Produced Quantity'})



In [None]:
Italy_data.head()

- Most produced Items

In [None]:
Italy_data.loc[Italy_data.Year == 2016].sort_values(by='Produced Quantity', ascending = False).head(10)

- Most exported Items

In [None]:
Italy_data.loc[Italy_data.Year == 2016].sort_values(by='Exported Quantity', ascending = False).head(10)

- Most imported Items

In [None]:
Italy_data.loc[Italy_data.Year == 2016].sort_values(by='Imported Quantity', ascending = False).head(10)

## Data loading - French importations and exportations of agricultural goods 

In [None]:
France_trade = pd.read_csv('../data/FAOSTAT_data_france.csv')
France_trade.head()

In [None]:
France_trade.dtypes

In [None]:
unofficial_stats_index_fr = France_trade.loc[France_trade.Flag=='*'].index

In [None]:
# Drop the unofficial data
France_trade = France_trade.drop(index = unofficial_stats_index_fr)

In [None]:
#we keep only tonnes units
France_trade = France_trade.loc[France_trade.Unit=='tonnes']

In [None]:
France_trade.drop(index=France_trade[France_trade['Flag Description'].str.contains('Data not available')].index, inplace=True)

In [None]:
France_trade = France_trade[['Element','Area', 'Item', 'Year', 'Unit', 'Value']]
France_trade.head()

To keep the model simple, we sum the importations and exportations for a given product over all partner countries.

In [None]:
France_trade = France_trade.groupby(['Item', 'Year', 'Element']).agg({'Value':'sum'})\
                                    .reset_index()
France_trade.head()

We improve the structure of our dataframe by pivoting its values of importations and exportations.

In [None]:
France_trade_transformed = pd.pivot(France_trade,columns = 'Element', values='Value')\
                .rename(columns={'Export Quantity':'Exported Quantity','Import Quantity':'Imported Quantity'})
France_trade_transformed.head()

In [None]:
France_trade = pd.concat([France_trade, France_trade_transformed], axis=1, join='inner')
France_trade.drop(columns=['Value', 'Element'], inplace=True)
France_trade = France_trade.groupby(['Item', 'Year'])\
                            .agg({'Exported Quantity':'mean','Imported Quantity':'mean'})\
                            .reset_index()
France_trade.head()

Combine production and trade data in one dataframe 'France_data' so that we have all the information at the same place. Note that we don't have values of importations and exportations before 1986 so production of goods before 1986 will not be considered as from here.

In [None]:
France_crops = raw_CH_crops_dataset[['Area', 'Item','Element', 'Year', 'Unit', 'Value']]

In [None]:
# Merge importations data with production data
France_data = France_crops.loc[France_crops.Area=='France'].loc[France_crops.Element=='Production'].loc[France_crops.Year>= 1986]\
                                    .merge(France_trade,on=['Item', 'Year'], how='inner')\
                                    .rename(columns={'Value':'Produced Quantity'})



In [None]:
France_data.head()

- Most produced Items

In [None]:
France_data.loc[France_data.Year == 2016].sort_values(by='Produced Quantity', ascending = False).head(10)

- Most exported Items

In [None]:
France_data.loc[France_data.Year == 2016].sort_values(by='Exported Quantity', ascending = False).head(10)

- Most imported Items

In [None]:
France_data.loc[France_data.Year == 2016].sort_values(by='Imported Quantity', ascending = False).head(10)

## Data loading - Austrian importations and exportations of agricultural goods 

In [None]:
Austria_trade = pd.read_csv('../data/FAOSTAT_data_austria.csv')
Austria_trade.head()

In [None]:
Austria_trade.dtypes

In [None]:
unofficial_stats_index_au = Austria_trade.loc[Austria_trade.Flag=='*'].index

In [None]:
# Drop the unofficial data
Austria_trade = Austria_trade.drop(index = unofficial_stats_index_au)

In [None]:
#we keep only tonnes units
Austria_trade = Austria_trade.loc[Austria_trade.Unit=='tonnes']

In [None]:
Austria_trade.drop(index=Austria_trade[Austria_trade['Flag Description'].str.contains('Data not available')].index, inplace=True)

In [None]:
Austria_trade = Austria_trade[['Element','Area', 'Item', 'Year', 'Unit', 'Value']]
Austria_trade.head()

To keep the model simple, we sum the importations and exportations for a given product over all partner countries.

In [None]:
Austria_trade = Austria_trade.groupby(['Item', 'Year', 'Element']).agg({'Value':'sum'})\
                                    .reset_index()
Austria_trade.head()

We improve the structure of our dataframe by pivoting its values of importations and exportations.

In [None]:
Austria_trade_transformed = pd.pivot(Austria_trade,columns = 'Element', values='Value')\
                .rename(columns={'Export Quantity':'Exported Quantity','Import Quantity':'Imported Quantity'})
Austria_trade_transformed.head()

In [None]:
Austria_trade = pd.concat([Austria_trade, Austria_trade_transformed], axis=1, join='inner')
Austria_trade.drop(columns=['Value', 'Element'], inplace=True)
Austria_trade = Austria_trade.groupby(['Item', 'Year'])\
                            .agg({'Exported Quantity':'mean','Imported Quantity':'mean'})\
                            .reset_index()
Austria_trade.head()

Combine production and trade data in one dataframe 'Austria_data' so that we have all the information at the same place. Note that we don't have values of importations and exportations before 1986 so production of goods before 1986 will not be considered as from here.

In [None]:
Austria_crops = raw_CH_crops_dataset[['Area', 'Item','Element', 'Year', 'Unit', 'Value']]

In [None]:
# Merge importations data with production data
Austria_data = Austria_crops.loc[Austria_crops.Area=='Austria'].loc[Austria_crops.Element=='Production'].loc[Austria_crops.Year>= 1986]\
                                    .merge(Austria_trade,on=['Item', 'Year'], how='inner')\
                                    .rename(columns={'Value':'Produced Quantity'})

In [None]:
Austria_data.head()

- Most produced items

In [None]:
Austria_data.loc[Austria_data.Year == 2016].sort_values(by='Produced Quantity', ascending = False).head(10)

- Most exported items

In [None]:
Austria_data.loc[Austria_data.Year == 2016].sort_values(by='Exported Quantity', ascending = False).head(10)

- Most imported items

In [None]:
Austria_data.loc[Austria_data.Year == 2016].sort_values(by='Imported Quantity', ascending = False).head(10)

## Data loading - German importations and exportations of agricultural goods 

In [None]:
Germany_trade = pd.read_csv('../data/FAOSTAT_data_germany.csv')
Germany_trade.head()

In [None]:
Germany_trade.dtypes

In [None]:
unofficial_stats_index_ge = Germany_trade.loc[Germany_trade.Flag=='*'].index

In [None]:
# Drop the unofficial data
Germany_trade = Germany_trade.drop(index = unofficial_stats_index_ge)

In [None]:
#we keep only tonnes units
Germany_trade = Germany_trade.loc[Germany_trade.Unit=='tonnes']

In [None]:
Germany_trade.drop(index=Germany_trade[Germany_trade['Flag Description'].str.contains('Data not available')].index, inplace=True)

In [None]:
Germany_trade = Germany_trade[['Element','Area', 'Item', 'Year', 'Unit', 'Value']]
Germany_trade.head()

To keep the model simple, we sum the importations and exportations for a given product over all partner countries.

In [None]:
Germany_trade = Germany_trade.groupby(['Item', 'Year', 'Element']).agg({'Value':'sum'})\
                                    .reset_index()
Germany_trade.head()

We improve the structure of our dataframe by pivoting its values of importations and exportations.

In [None]:
Germany_trade_transformed = pd.pivot(Germany_trade,columns = 'Element', values='Value')\
                .rename(columns={'Export Quantity':'Exported Quantity','Import Quantity':'Imported Quantity'})
Germany_trade_transformed.head()

In [None]:
Germany_trade = pd.concat([Germany_trade, Germany_trade_transformed], axis=1, join='inner')
Germany_trade.drop(columns=['Value', 'Element'], inplace=True)
Germany_trade = Germany_trade.groupby(['Item', 'Year'])\
                            .agg({'Exported Quantity':'mean','Imported Quantity':'mean'})\
                            .reset_index()
Germany_trade.head()

Combine production and trade data in one dataframe 'Germany_data' so that we have all the information at the same place. Note that we don't have values of importations and exportations before 1986 so production of goods before 1986 will not be considered as from here.

In [None]:
Germany_crops = raw_CH_crops_dataset[['Area', 'Item','Element', 'Year', 'Unit', 'Value']]

In [None]:
# Merge importations data with production data
Germany_data = Germany_crops.loc[Germany_crops.Area=='Germany'].loc[Germany_crops.Element=='Production'].loc[Germany_crops.Year>= 1986]\
                                    .merge(Germany_trade,on=['Item', 'Year'], how='inner')\
                                    .rename(columns={'Value':'Produced Quantity'})

In [None]:
Germany_data.head()

- Most produceditems

In [None]:
Germany_data.loc[Germany_data.Year == 2016].sort_values(by='Produced Quantity', ascending = False).head(10)

- Most exported items

In [None]:
Germany_data.loc[Germany_data.Year == 2016].sort_values(by='Exported Quantity', ascending = False).head(10)

- Most imported items

In [None]:
Germany_data.loc[Germany_data.Year == 2016].sort_values(by='Imported Quantity', ascending = False).head(10)

<div class="alert alert-block alert-success">

We can observe that apples, maize, potatoes wheats and Sugar beet are important items for all countries as they are often amoung the most produced, exported and imported items. 
So we can focus our study on those products to answer the question of swiss food suficiency. 

    on devrait remplacer oat par autre chose... What about Sugar beet??

## Data loading - leichtenstein importations and exportations of agricultural goods 

no data found on FAO

## Data Loading -  Switzerland temperatures

This dataset does not come from FAOSTATS but from : __[MeteoSwiss](https://www.meteoswiss.admin.ch/home/climate/swiss-climate-in-detail/Swiss-temperature-mean/Data-on-the-Swiss-temperature-mean.html)__

In [None]:
CH_temperatures = pd.read_csv('../data/10.18751-Climate-Timeseries-CHTM-1.1-swiss.txt', sep="\t", header=0, skiprows=15)

In [None]:
CH_temperatures = CH_temperatures.loc[CH_temperatures.time>=1986].loc[CH_temperatures.time<=2017]

In [None]:
CH_temperatures = CH_temperatures.iloc[:,-3:]

In [None]:
CH_temperatures

### Plot : Is there a correlation between production and temperature?


In [None]:
CH_data.head()

In [None]:
#we should make an interactive plot so we can select a food element and see how its production is affected
#by temperatures changes. 
years = np.sort(CH_data.Year.unique())
fig, ax1 = plt.subplots()
data1 = CH_data.loc[CH_data.Item=='Potatoes']['Produced Quantity']
data2 = CH_temperatures.year

color = 'tab:red'
ax1.set_xlabel('year')
ax1.set_ylabel('production', color=color)
ax1.plot(years, data1, color=color)
ax1.tick_params(axis='y', labelcolor=color)

ax2 = ax1.twinx()  # instantiate a second axes that shares the same x-axis

color = 'tab:blue'
ax2.set_ylabel('temperature', color=color)  # we already handled the x-label with ax1
ax2.plot(years, data2, color=color)
ax2.tick_params(axis='y', labelcolor=color)

fig.tight_layout()  # otherwise the right y-label is slightly clipped
plt.title('Potatoes production and temperatures every year')
plt.show()

<div class="alert alert-block alert-success">
As temperature increase, the production decrease. 


<div class="alert alert-block alert-warning">
For our predictive model we will need a way to estimate the future temperatures. If we base our reasoning on +1.5°C in 2050, we can make linear regression? 

## Switzerland importations and exportations network

We want to know which country are the main partners of Switzerland 

In [None]:
CH_trade_network = CH_trade_network[['Element','Reporter Countries','Partner Countries', 'Item', 'Year', 'Unit', 'Value']]
CH_trade_network.head()

In [None]:
CH_trade_network.Unit.unique()

We can make 3 differents networks graphes, weighted in different manners:

    - according to quantity exchanged
    -according to number of times they are linked
    -accrding by the variety of products exchanged
    
I choose the first one. So i can get rid off the columns year, units, and item. The network will show the best partners from 1985 to 2016 for all products. 

The netork graph is also directed (Importation towards Switzerland and Exportations towards the partner country)   

In [None]:
CH_trade_network = CH_trade_network[['Element','Reporter Countries','Partner Countries', 'Value']]
CH_trade_network.head()

In [None]:
#compute the weights
weights= CH_trade_network.groupby(["Element","Partner Countries","Reporter Countries"]).agg({'Value':'sum'})\
                                    .reset_index()

weights.sort_values(by='Value', ascending = False).head(20)

<div class="alert alert-block alert-success">

France, Italy, Germany, Spain an Netherlands are the biggest Importer countries for Switzerland when we look at the quantity imported over the years. 
France, Germany, Austria Italy and the United States are the biggest partner exporters of Switzerland. 
    

In [None]:
# Helper function for printing various graph properties
def describe_graph(G):
    print(nx.info(G))
    if nx.is_connected(G):
        print("Avg. Shortest Path Length: %.4f" %nx.average_shortest_path_length(G))
        print("Diameter: %.4f" %nx.diameter(G)) # Longest shortest path
    else:
        print("Graph is not connected")
        print("Diameter and Avg shortest path length are not defined!")
    print("Sparsity: %.4f" %nx.density(G))  # #edges/#edges-complete-graph
    # #closed-triplets(3*#triangles)/#all-triplets
    print("Global clustering coefficient aka Transitivity: %.4f" %nx.transitivity(G))

In [None]:
import networkx as nx

G=nx.from_pandas_edgelist(weights, 'Reporter Countries', 'Partner Countries', edge_attr=['Value'], create_using=nx.Graph())
 
# Plot it
nx.draw(G, with_labels=True, k=1, alpha=0.8)
#plt.size(18.5, 10.5)
plt.show()


In [None]:
print(nx.info(G))

In [None]:
describe_graph(G)  

To make it more readable, we decide to keep only the 20 biggest partners in exportations and importations.

In [None]:
weights = weights.sort_values(by='Value', ascending = False).head(20)

G2=nx.from_pandas_edgelist(weights[:20], 'Reporter Countries', 'Partner Countries', edge_attr=['Value'], create_using=nx.Graph())
 
# Plot it
nx.draw(G2, with_labels=True, k=0.05, alpha=0.8)
#plt.size(18.5, 10.5)
plt.show()



In [None]:
print(nx.info(G2))

In [None]:
describe_graph(G2) 

<div class="alert alert-block alert-warning">
    Pas fini j'ai pas reussi a faire directed parce que j'ai dans la meme colonne du tableau imported and exported il faudrait séparer en 2 differentes colones ds le genre :
create new_df avec colonne from et colonne to et colone weight
    
    iterate on the old dataframe raws:
    if imported :  to = Switzerland and from = Partner country and weight = value
    if exported : to= Partner country and from = Switzerland and weight = value 
    et apres faire 
    G=nx.from_pandas_edgelist(new_df, 'from', 'to', edge_attr=['weight'], create_using=nx.DiGraph())
    
    et la ca serait juste
    et on garderai seulement les 20 plus importants sinon c'est ilisible. 
    Vous pensez ca vaut la peine de faire ou on va pas utiliser????
    
    

In [None]:
To_list=[]
From_list=[]
Weight_list=[]

for i in range(0, weights.shape[0]):
    if weights.Element.iloc[i] == "Import Quantity":
        To_list.append("Switzerland")
        From_list.append(weights["Partner Countries"].iloc[i])
        Weight_list.append(weights.Value.iloc[i])
    if weights.Element.iloc[i] == "Export Quantity":
        To_list.append(weights["Partner Countries"].iloc[i])
        From_list.append("Switzerland")
        Weight_list.append(weights.Value.iloc[i])
        
trade_network_df= pd.DataFrame({'To': To_list, 'From': From_list, 'weight': Weight_list})
trade_network_df.head()
    


In [None]:
trade_network_df.sort_values(by='weight', ascending = False).head(20)

In [None]:
trade_network_df = trade_network_df.sort_values(by='weight', ascending = False)

In [None]:
trade_network_df.head()

In [None]:
trade_network_df["Logarithmic weight"] = trade_network_df.weight.apply(lambda x: np.log(x))
trade_network_df

<div class="alert alert-block alert-info">

Main importation partners are: France, Germany, Italy, Spain, Netherlands, Brasil and Austria.
    
Main exportation partners are : France, Germany, Austria, United States and Italy.


In [None]:

G3=nx.from_pandas_edgelist(trade_network_df[:10], 'To', 'From', edge_attr=['Logarithmic weight'], create_using=nx.DiGraph())
 

nx.draw_networkx(G3, node_size=500, with_labels=True, k=0.05, alpha=0.8)
plt.show()
limits=plt.axis('off')



In [None]:
G4=nx.from_pandas_edgelist(trade_network_df[:20], 'To', 'From', edge_attr=['Logarithmic weight'], create_using=nx.DiGraph())
 

nx.draw_shell(G4, node_size=500, with_labels=True, k=0.05, alpha=0.8)
plt.show()
limits=plt.axis('off')



## Plan - What's coming next?

<div class="alert alert-block alert-info">
    
1. Defining what is food self-sufficiency
    1. $ SSR = Production * 100 / (Production + Imports - Exports)$ to develop
    2. Addapt it to the Swiss case : take a look to what we import (basic needs ?), export (top exports ? by far ?) and production graphs
    3. __[Ref. Paper "Food self-sufficiency: Making sense of it, and when it makes sense" By Jennifer Clapp](https://www.sciencedirect.com/science/article/pii/S0306919216305851#b0240)__. <br> Résumé : __[Résumé par le site Resilience du paper de Clapp](https://www.resilience.org/stories/2018-03-13/food-self-sufficiency-does-it-make-sense/)__
    4. Compare our results with other sources just to know if we share the same results (e.g. selfsufficiency switzerland on wikipedia __[List of countries by food self-sufficieent rate](https://en.wikipedia.org/wiki/List_of_countries_by_food_self-sufficiency_rate)__)

    
2. Food situation of Switzerland from 1986 to 2017.
    1. Is/was it food self-sufficient ? SSR scores over the years.
    2. Comapre to neighbours

    
3. Will it be **physically** possible for Switzerland in a near future to be food self-sufficient (in the sense of the 2018 initiative bc we have seen that definition is relative) taking into account its population growth (hesimated increase in consumption computation)? What would it imply/take into account in terms of :
    1. Area harvested (actual ratio and estimation of its evolution)
    2. Farmers population 
    3. Temperature (climate impact food production correlation)
    4. Environment (use of fertilizers needed ? depends on productivity)

    
4. Attempt on **economy** consequences analysis ?
    1. Complicated ... What about looking at what happended in countries that adopted food self-sufficient policies such as Senegal, India, the Philippines, Qatar, Bolivia, and Russia ? (Jaccard and correlations?)
    2. Jaccard similarity of country based on SSR to see which country should adopt more food self-sufficient policies ?

# 1. food self-sufficiency : food situation in Switzerland from 1986 to 2017

## Compute SSR for Switzerland and its neighbour over the years then plot? Analysis

In [None]:
CH_data.head()

In [None]:
CH_clear = CH_data[["Year", "Produced Quantity", "Exported Quantity", "Imported Quantity"]]
CH_clear.head()

In [None]:
CH_ssr = CH_clear.groupby("Year")\
                .agg({'Produced Quantity':'sum','Exported Quantity':'sum','Imported Quantity':'sum'})\
                .reset_index()

CH_ssr.head(30)

In [None]:
SSR_list=[]
for i in range(0, CH_ssr.shape[0]):
    SSR_list.append((CH_ssr["Produced Quantity"].iloc[i]*100)/(CH_ssr["Produced Quantity"].iloc[i] + CH_ssr["Imported Quantity"].iloc[i]-CH_ssr["Exported Quantity"].iloc[i]))

CH_ssr["SSR"]=SSR_list
CH_ssr.head()

Now we do the same for Switzerland's neighbours

In [None]:
France_clear = France_data[["Year", "Produced Quantity", "Exported Quantity", "Imported Quantity"]]
France_ssr = France_clear.groupby("Year")\
                .agg({'Produced Quantity':'sum','Exported Quantity':'sum','Imported Quantity':'sum'})\
                .reset_index()
SSR_list_F=[]
for i in range(0, France_ssr.shape[0]):
    SSR_list_F.append((France_ssr["Produced Quantity"].iloc[i]*100)/(France_ssr["Produced Quantity"].iloc[i] + France_ssr["Imported Quantity"].iloc[i] - France_ssr["Exported Quantity"].iloc[i]))

France_ssr["SSR"]=SSR_list_F
France_ssr.head()


In [None]:
Germany_clear = Germany_data[["Year", "Produced Quantity", "Exported Quantity", "Imported Quantity"]]
Germany_ssr = Germany_clear.groupby("Year")\
                .agg({'Produced Quantity':'sum','Exported Quantity':'sum','Imported Quantity':'sum'})\
                .reset_index()
SSR_list_G=[]
for i in range(0, Germany_ssr.shape[0]):
    SSR_list_G.append((Germany_ssr["Produced Quantity"].iloc[i]*100)/(Germany_ssr["Produced Quantity"].iloc[i] + Germany_ssr["Imported Quantity"].iloc[i]-France_ssr["Exported Quantity"].iloc[i]))

Germany_ssr["SSR"]=SSR_list_G
Germany_ssr.head()



In [None]:
Italy_clear = Italy_data[["Year", "Produced Quantity", "Exported Quantity", "Imported Quantity"]]
Italy_ssr = Italy_clear.groupby("Year")\
                .agg({'Produced Quantity':'sum','Exported Quantity':'sum','Imported Quantity':'sum'})\
                .reset_index()
SSR_list_I=[]
for i in range(0, Italy_ssr.shape[0]):
    SSR_list_I.append((Italy_ssr["Produced Quantity"].iloc[i]*100)/(Italy_ssr["Produced Quantity"].iloc[i] + Italy_ssr["Imported Quantity"].iloc[i] - Italy_ssr["Exported Quantity"].iloc[i]))

Italy_ssr["SSR"]=SSR_list_I
Italy_ssr.head()



In [None]:
Austria_clear = Austria_data[["Year", "Produced Quantity", "Exported Quantity", "Imported Quantity"]]
Austria_ssr = Austria_clear.groupby("Year")\
                .agg({'Produced Quantity':'sum','Exported Quantity':'sum','Imported Quantity':'sum'})\
                .reset_index()
SSR_list_A=[]
for i in range(0, Austria_ssr.shape[0]):
    SSR_list_A.append((Austria_ssr["Produced Quantity"].iloc[i]*100)/(Austria_ssr["Produced Quantity"].iloc[i] + Austria_ssr["Imported Quantity"].iloc[i] - Austria_ssr["Exported Quantity"].iloc[i]))

Austria_ssr["SSR"]=SSR_list_A
Austria_ssr.head()




In [None]:
#plot of the evolution of the demography over the years
from scipy.stats import linregress

plt.figure(figsize=(20,10))
plt.plot( 'Year', 'SSR', data=Austria_ssr, marker='', color='green',  label = 'Austria')
plt.plot( 'Year', 'SSR', data=France_ssr, marker='', color='skyblue', label = 'France')
plt.plot( 'Year', 'SSR', data=CH_ssr, marker='', color='red', label = 'Switzerland', linewidth=3)
plt.plot( 'Year', 'SSR', data=Germany_ssr, marker='', color='orange', label = 'Germany')
plt.plot( 'Year', 'SSR', data=Italy_ssr, marker='', color='grey', label = 'Italy')
    
plt.legend() 
plt.title('Evolution of the SSR over the years' , fontsize= 20)
plt.xlabel("Year", fontsize= 20)
plt.ylabel("SSR value in %", fontsize= 20)
plt.show()

<div class="alert alert-block alert-info">
    
We can see that Switzerland has the smallers SSR ratio. It oscillates between 90 and 70% over the years.
This indicates that Switzerland has never been food self-sufficient.
We can also observe that Germany, France and sometimes Austria have SSR superior than 100% this can be explained by the fact that our dataset contains mostly items producted in our country so we are missing a lot of importations. Mathematically this means that those countries have high exports value and low import values. 
    
We will now try to recompute those SSR with only our 5 main items:
    potatoes, Wheat, Sugar beet, apples and maize. 

In [None]:
keep=["Potatoes","Apples","Maize","Sugar beet","Wheat"]
CH_clear_ = CH_data[CH_data.Item.isin(keep)]

In [None]:
CH_clear_.head()

In [None]:
CH_ssr_5 = CH_clear_.groupby("Year")\
                .agg({'Produced Quantity':'sum','Exported Quantity':'sum','Imported Quantity':'sum'})\
                .reset_index()
SSR_list_CH_5=[]
for i in range(0, CH_ssr_5.shape[0]):
    SSR_list_CH_5.append((CH_ssr_5["Produced Quantity"].iloc[i]*100)/(CH_ssr_5["Produced Quantity"].iloc[i] + CH_ssr_5["Imported Quantity"].iloc[i] - CH_ssr_5["Exported Quantity"].iloc[i]))

CH_ssr_5["SSR"]=SSR_list_CH_5
CH_ssr_5.head()

In [None]:
France_clear_ = France_data[France_data.Item.isin(keep)]

France_ssr_5 = France_clear_.groupby("Year")\
                .agg({'Produced Quantity':'sum','Exported Quantity':'sum','Imported Quantity':'sum'})\
                .reset_index()
SSR_list_FR_5=[]
for i in range(0, France_ssr_5.shape[0]):
    SSR_list_FR_5.append((France_ssr_5["Produced Quantity"].iloc[i]*100)/(France_ssr_5["Produced Quantity"].iloc[i] + France_ssr_5["Imported Quantity"].iloc[i] - France_ssr_5["Exported Quantity"].iloc[i]))

France_ssr_5["SSR"]=SSR_list_FR_5
France_ssr_5.head()

In [None]:
Germany_clear_ = Germany_data[Germany_data.Item.isin(keep)]

Germany_ssr_5 = Germany_clear_.groupby("Year")\
                .agg({'Produced Quantity':'sum','Exported Quantity':'sum','Imported Quantity':'sum'})\
                .reset_index()
SSR_list_G_5=[]
for i in range(0, Germany_ssr_5.shape[0]):
    SSR_list_G_5.append((Germany_ssr_5["Produced Quantity"].iloc[i]*100)/(Germany_ssr_5["Produced Quantity"].iloc[i] + Germany_ssr_5["Imported Quantity"].iloc[i] - Germany_ssr_5["Exported Quantity"].iloc[i]))

Germany_ssr_5["SSR"]=SSR_list_G_5
Germany_ssr_5.head()

In [None]:
Italy_clear_ = Italy_data[Italy_data.Item.isin(keep)]

Italy_ssr_5 = Italy_clear_.groupby("Year")\
                .agg({'Produced Quantity':'sum','Exported Quantity':'sum','Imported Quantity':'sum'})\
                .reset_index()
SSR_list_I_5=[]
for i in range(0, Italy_ssr_5.shape[0]):
    SSR_list_I_5.append((Italy_ssr_5["Produced Quantity"].iloc[i]*100)/(Italy_ssr_5["Produced Quantity"].iloc[i] + Italy_ssr_5["Imported Quantity"].iloc[i] - Italy_ssr_5["Exported Quantity"].iloc[i]))

Italy_ssr_5["SSR"]=SSR_list_I_5
Italy_ssr_5.head()

In [None]:
Austria_clear_ = Austria_data[Austria_data.Item.isin(keep)]

Austria_ssr_5 = Austria_clear_.groupby("Year")\
                .agg({'Produced Quantity':'sum','Exported Quantity':'sum','Imported Quantity':'sum'})\
                .reset_index()
SSR_list_A_5=[]
for i in range(0, Austria_ssr_5.shape[0]):
    SSR_list_A_5.append((Austria_ssr_5["Produced Quantity"].iloc[i]*100)/(Austria_ssr_5["Produced Quantity"].iloc[i] + Austria_ssr_5["Imported Quantity"].iloc[i] - Austria_ssr_5["Exported Quantity"].iloc[i]))

Austria_ssr_5["SSR"]=SSR_list_A_5
Austria_ssr_5.head()

In [None]:
#plot of the evolution of the demography over the years
from scipy.stats import linregress

plt.figure(figsize=(20,10))
plt.plot( 'Year', 'SSR', data=Austria_ssr_5, marker='', color='green',  label = 'Austria')
plt.plot( 'Year', 'SSR', data=France_ssr_5, marker='', color='skyblue', label = 'France')
plt.plot( 'Year', 'SSR', data=CH_ssr_5, marker='', color='red', label = 'Switzerland', linewidth=3)
plt.plot( 'Year', 'SSR', data=Germany_ssr_5, marker='', color='orange', label = 'Germany')
plt.plot( 'Year', 'SSR', data=Italy_ssr_5, marker='', color='grey', label = 'Italy')
    
plt.legend() 
plt.title('Evolution of the SSR over the years for the 5 main proucts' , fontsize= 20)
plt.xlabel("Year", fontsize= 20)
plt.ylabel("SSR value in %", fontsize= 20)
plt.show()

<div class="alert alert-block alert-info">
    
This time it is Italy which have the lower SSR over the years. The SSR of Switzerland is still around 90 % but seems to decrease during the last years. 
    
The SSR of France is still very high, same possible explanations as before. 
    
Germany and Austria have similar SSR.
    
Which version do you want to keep????

<div class="alert alert-block alert-warning">

TO DO: further analysis: 
        
3. __[Ref. Paper "Food self-sufficiency: Making sense of it, and when it makes sense" By Jennifer Clapp](https://www.sciencedirect.com/science/article/pii/S0306919216305851#b0240)__. <br> Résumé : __[Résumé par le site Resilience du paper de Clapp](https://www.resilience.org/stories/2018-03-13/food-self-sufficiency-does-it-make-sense/)__
4. Compare our results with other sources just to know if we share the same results (e.g. selfsufficiency switzerland on wikipedia __[List of countries by food self-sufficieent rate](https://en.wikipedia.org/wiki/List_of_countries_by_food_self-sufficiency_rate)__)
        

## What doees Switzerland import and export? Is it different from its neighbours in items or quantities ?

<div class="alert alert-block alert-success">
Allready done need to decide how to presents our results and do the analysis (see points C and D in the plan)

Answer the questions:
What does Switzerland produce and in which quantity?
Does CH import more than its neighbours (due to its small size ?) 

Let's plot the production of items in Switzerland over the years 

In [None]:
#Interactive visualization

#Plot the production of selected item for all countries over years
def viz_evolution(item):
    df_viz_evolution = CH_data.loc[CH_data['Item']==item]
    
    # multiple line plot
    plt.figure(figsize=(20,10))
    plt.plot( 'Year', 'Produced Quantity', data=df_viz_evolution, marker='', color='red', label = 'crops', linewidth=3)
    plt.plot('Year', 'Imported Quantity', data=df_viz_evolution, marker='', color='blue', label = 'imports', linewidth=3)
    plt.plot('Year', 'Exported Quantity', data=df_viz_evolution, marker='', color='green', label = 'exports', linewidth=3) 
    plt.legend() 
    plt.title(f'Production and imports of {item} in Switzerland throughout years', fontsize= 20)
    plt.xlabel("Year", fontsize= 20)
    plt.ylabel("Values [tonnes]", fontsize= 20)
    plt.show()
   
items = CH_data.Item.unique()
interact(viz_evolution, item = items)    

<div class="alert alert-block alert-info">
    
We have seen that some of the most produced items are also some of the more imported like potatoes, wheat, maize, apples, grappes, lettuce and chicory and sugar beet. This can show a high consumption of the item by the population and can indicate us that one of the priority could be to increase their production. In the more exported items, it is not suprising to find several items that are highly produced by Switzerland such as wheat, potatoes, apples, maise and Barley.  But we also find some strange items like oilseeds nes which are more imported than produced and then exported in a higher quantity than the one produce which indicates an economic advantage in this transition. 
    

We have observed that apples, maize, potatoes wheats and Sugar beet are important items for Switzerland and its  neighbour countries as they are often amoung the most produced, exported and imported items. 
So we can focus our study on those products to answer the question of swiss food suficiency. 


In [None]:
import plotly # conda install -c anaconda plotly #AND# jupyter labextension install @jupyterlab/plotly-extension
import plotly.graph_objects as go
y_wheat = CH_data.loc[CH_data.Year == 2016].loc[CH_data.Item=='Wheat'].values[0,-3:]
y_potatoes = CH_data.loc[CH_data.Year == 2016].loc[CH_data.Item=='Potatoes'].values[0,-3:]
y_beet = CH_data.loc[CH_data.Year == 2016].loc[CH_data.Item=='Sugar beet'].values[0,-3:]
y_maize = CH_data.loc[CH_data.Year == 2016].loc[CH_data.Item=='Maize'].values[0,-3:]
y_maize = CH_data.loc[CH_data.Year == 2016].loc[CH_data.Item=='Apples'].values[0,-3:]

x=['Produced', 'Exported', 'Imported']
fig = go.Figure(go.Bar(x=x, y=y_wheat, name='Wheat'))
fig.add_trace(go.Bar(x=x, y=y_potatoes, name='Potatoes'))
fig.add_trace(go.Bar(x=x, y=y_beet, name='Sugar beet'))
fig.add_trace(go.Bar(x=x, y=y_maize, name='Maize'))
fig.add_trace(go.Bar(x=x, y=y_maize, name='Apples'))

fig.update_layout(
    title='Most produced, exported and imported items in Switzerland in 2016',
    yaxis_title="Values [tonnes]",
    barmode='stack', 
    font=dict(
        family="Courier New, monospace",
        size=16,
        color="#7f7f7f")
    )
fig.show()

y_wheat_F = France_data.loc[France_data.Year == 2016].loc[France_data.Item=='Wheat'].values[0,-3:]
y_potatoes_F = France_data.loc[France_data.Year == 2016].loc[France_data.Item=='Potatoes'].values[0,-3:]
y_beet_F = France_data.loc[France_data.Year == 2016].loc[France_data.Item=='Sugar beet'].values[0,-3:]
y_maize_F = France_data.loc[France_data.Year == 2016].loc[France_data.Item=='Maize'].values[0,-3:]
y_maize_F = France_data.loc[France_data.Year == 2016].loc[France_data.Item=='Apples'].values[0,-3:]

x=['Produced', 'Exported', 'Imported']
fig1 = go.Figure(go.Bar(x=x, y=y_wheat_F, name='Wheat'))
fig1.add_trace(go.Bar(x=x, y=y_potatoes_F, name='Potatoes'))
fig1.add_trace(go.Bar(x=x, y=y_beet_F, name='Sugar beet'))
fig1.add_trace(go.Bar(x=x, y=y_maize_F, name='Maize'))
fig1.add_trace(go.Bar(x=x, y=y_maize_F, name='Apples'))

fig1.update_layout(
    title='Most produced, exported and imported items in France in 2016',
    yaxis_title="Values [tonnes]",
    barmode='stack', 
    font=dict(
        family="Courier New, monospace",
        size=16,
        color="#7f7f7f")
    )
fig1.show()

y_wheat_G = Germany_data.loc[Germany_data.Year == 2016].loc[Germany_data.Item=='Wheat'].values[0,-3:]
y_potatoes_G = Germany_data.loc[Germany_data.Year == 2016].loc[Germany_data.Item=='Potatoes'].values[0,-3:]
y_beet_G = Germany_data.loc[Germany_data.Year == 2016].loc[Germany_data.Item=='Sugar beet'].values[0,-3:]
y_maize_G = Germany_data.loc[Germany_data.Year == 2016].loc[Germany_data.Item=='Maize'].values[0,-3:]
y_maize_G = Germany_data.loc[Germany_data.Year == 2016].loc[Germany_data.Item=='Apples'].values[0,-3:]

x=['Produced', 'Exported', 'Imported']
fig3 = go.Figure(go.Bar(x=x, y=y_wheat_G, name='Wheat'))
fig3.add_trace(go.Bar(x=x, y=y_potatoes_G, name='Potatoes'))
fig3.add_trace(go.Bar(x=x, y=y_beet_G, name='Sugar beet'))
fig3.add_trace(go.Bar(x=x, y=y_maize_G, name='Maize'))
fig3.add_trace(go.Bar(x=x, y=y_maize_G, name='Apples'))

fig3.update_layout(
    title='Most produced, exported and imported items in Germany in 2016',
    yaxis_title="Values [tonnes]",
    barmode='stack', 
    font=dict(
        family="Courier New, monospace",
        size=16,
        color="#7f7f7f")
    )
fig3.show()

y_wheat_I = Italy_data.loc[Italy_data.Year == 2016].loc[Italy_data.Item=='Wheat'].values[0,-3:]
y_potatoes_I = Italy_data.loc[Italy_data.Year == 2016].loc[Italy_data.Item=='Potatoes'].values[0,-3:]
y_beet_I = Italy_data.loc[Italy_data.Year == 2016].loc[Italy_data.Item=='Sugar beet'].values[0,-3:]
y_maize_I = Italy_data.loc[Italy_data.Year == 2016].loc[Italy_data.Item=='Maize'].values[0,-3:]
y_maize_I = Italy_data.loc[Italy_data.Year == 2016].loc[Italy_data.Item=='Apples'].values[0,-3:]

x=['Produced', 'Exported', 'Imported']
fig2 = go.Figure(go.Bar(x=x, y=y_wheat_I, name='Wheat'))
fig2.add_trace(go.Bar(x=x, y=y_potatoes_I, name='Potatoes'))
fig2.add_trace(go.Bar(x=x, y=y_beet_I, name='Sugar beet'))
fig2.add_trace(go.Bar(x=x, y=y_maize_I, name='Maize'))
fig2.add_trace(go.Bar(x=x, y=y_maize_I, name='Apples'))

fig2.update_layout(
    title='Most produced, exported and imported items in Italy in 2016',
    yaxis_title="Values [tonnes]",
    barmode='stack', 
    font=dict(
        family="Courier New, monospace",
        size=16,
        color="#7f7f7f")
    )
fig2.show()

y_wheat_A = Austria_data.loc[Austria_data.Year == 2016].loc[Austria_data.Item=='Wheat'].values[0,-3:]
y_potatoes_A = Austria_data.loc[Austria_data.Year == 2016].loc[Austria_data.Item=='Potatoes'].values[0,-3:]
y_beet_A = Austria_data.loc[Austria_data.Year == 2016].loc[Austria_data.Item=='Sugar beet'].values[0,-3:]
y_maize_A = Austria_data.loc[Austria_data.Year == 2016].loc[Austria_data.Item=='Maize'].values[0,-3:]
y_maize_A = Austria_data.loc[Austria_data.Year == 2016].loc[Austria_data.Item=='Apples'].values[0,-3:]

x=['Produced', 'Exported', 'Imported']
fig4 = go.Figure(go.Bar(x=x, y=y_wheat_A, name='Wheat'))
fig4.add_trace(go.Bar(x=x, y=y_potatoes_A, name='Potatoes'))
fig4.add_trace(go.Bar(x=x, y=y_beet_A, name='Sugar beet'))
fig4.add_trace(go.Bar(x=x, y=y_maize_A, name='Maize'))
fig4.add_trace(go.Bar(x=x, y=y_maize_A, name='Apples'))

fig4.update_layout(
    title='Most produced, exported and imported items in Austria in 2016',
    yaxis_title="Values [tonnes]",
    barmode='stack', 
    font=dict(
        family="Courier New, monospace",
        size=16,
        color="#7f7f7f")
    )
fig4.show()


<div class="alert alert-block alert-info">
We can see that due to its small size, Switzerland have the lowest values of imports, exports and production. 
We can see that Switzerland is a very small exportator for wheat, potatoes, sugar beet and Maize compared to its neighbours. Switzerland is not the biggest importer neither as we could have supposed due to its small size. But the country import some quantities of each items which mean that the country is not sufficient, maybe the production of these items have to be increased in priority in order to make Sitzerland food self sufficient. 
The most imported products in Switzerland is wheat and the less imported ones are apples and maize. The most produced one is Sugar beet. The most exported are potatoes and wheat. 

In [None]:
import plotly.graph_objects as go

def viz_potatoe(item):
    y_wheat = CH_data.loc[CH_data.Item==item].values[:,-3:]
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=CH_data.Year.unique(), y=y_wheat[:,0], fill='tonexty', name='Produced')) # fill down to xaxis
    fig.add_trace(go.Scatter(x=CH_data.Year.unique(), y=y_wheat[:,1], fill='tozeroy', name='Exported')) # fill to trace0 y
    fig.add_trace(go.Scatter(x=CH_data.Year.unique(), y=y_wheat[:,2], fill='tonexty', name='Imported')) # fill to trace0 y
    fig.update_layout(
        title=f"{item} importations and productions throughout years in Switzerland",
        yaxis_title="Values [tonnes]",
        xaxis_title='Years'
        )
    fig.show()

items = CH_data.Item.unique()
interact(viz_potatoe, item = items)  



In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=CH_data.Year.unique(), y=total_crops_imports['Produced Quantity'].values, fill='tonexty', name='Produced')) # fill down to xaxis
fig.add_trace(go.Scatter(x=CH_data.Year.unique(), y=total_crops_imports['Imported Quantity'].values, fill='tozeroy', name='Imported')) # fill to trace0 y
fig.add_trace(go.Scatter(x=CH_data.Year.unique(), y=total_crops_imports['Exported Quantity'].values, fill='tozeroy', name='Exported'))
fig.update_layout(
    title="Sum of all importations, exportations and productions throughout years in Switzerland",
    yaxis_title="Values [tonnes]",
    xaxis_title='Years'
    )
fig.show()

<div class="alert alert-block alert-info">

Here again we can see that Switzerland is a very small exporter and is quite constant with its importations. Nevertheless, its importations seems to be slightly increasing since 2005. Is it because of demand for food diversity or because of production issues? Since The production values through years seems constantly varying we could follow the first hypothesis (ask for food diversity as a consequence of the mondialisation). 

As exportations are hardly visible in the previous plot due to scale differences lets plot the exported value alone. 

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=CH_data.Year.unique(), y=total_crops_imports['Exported Quantity'].values, fill='tozeroy', name='Exported')) # fill to trace0 y
fig.update_layout(
    title="Sum of exportations throughout years in Switzerland",
    yaxis_title="Values [tonnes]",
    xaxis_title='Years'
    )
fig.show()

In [None]:
CH_restrained = CH_data_transformed.loc[CH_data_transformed.Item.isin(['Apples','Wheat','Potatoes', 'Maize', 'Sugar beet'])]

import plotly.express as px
fig = px.area(CH_restrained, x="Year", y="Value", color='Item',
      line_group='Input')
fig.update_layout(
    title="Switzerland's production/importation evolution for five most important items",
    yaxis_title="Values [tonnes]",
    xaxis_title='Years'
    )
fig.show()



<div class="alert alert-block alert-info">

It looks overall constant with small fluctuations over the years. 

In [None]:
CH_restrained_exportations = CH_data_transformed_exportations.loc[CH_data_transformed_exportations.Item.isin(['Apples','Wheat','Potatoes', 'Maize', 'Sugar beet'])]

In [None]:
import plotly.express as px
fig = px.area(CH_restrained_exportations, x="Year", y="Value", color='Item',
      line_group='Input')
fig.update_layout(
    title="Switzerland's exportations evolution for five most important items over time",
    yaxis_title="Values [tonnes]",
    xaxis_title='Years'
    )
fig.show()

<div class="alert alert-block alert-info">

The are fluctuating a lot. There is a big gap in the overall exportations of these products between 1993 and 2010. As shown in the previous graph, the importation/production of those products were constants over the years, so a possible explanations for this decrease could be an augmentation of the consumption due to a demographic growth or consumer new trends. It could also be explained by a decrease in the production of ohter substitute items which lead to a smaller diversity of items conseumption and then a bigger consumption of those 5 main items diminishing the part dedicated for exportations. 

# Trade partners of Switzerland

In [None]:
trade_network_df

<div class="alert alert-block alert-info">

Main importation partners are: France, Germany, Italy, Spain, Netherlands, Brasil and Austria.
    
Main exportation partners are : France, Germany, Austria, United States and Italy.



In [None]:
G4=nx.from_pandas_edgelist(trade_network_df[:20], 'To', 'From', edge_attr=['Logarithmic weight'], create_using=nx.DiGraph())
 

nx.draw_shell(G4, node_size=500, with_labels=True, k=0.05, alpha=0.8)
plt.show()
limits=plt.axis('off')

<div class="alert alert-block alert-info">
The weight of the link between countries is given by the thickness of the edge. 

# 2. Will it be physically possible for Switzerland in a near future to be food self-sufficient? Predictive Model


## A. Area harvested (actual ratio and estimation of its evolution)

<div class="alert alert-block alert-success">
Started: see land attribution and interactive map of the words. + demographic map

Compute the switzerland potential in term of agriculture. Does the country use all his land or not? Demography of Switzerland: with the growing population, can we feed everybody with Swiss agriculture in the next few years for example?
    
Analysis: strated

## Global Land distribution

In [None]:
import matplotlib.pyplot as plt

# DataFrames to plot
df_artificial_surface_S = raw_land_cover_dataset.loc[raw_land_cover_dataset['Area']=='Switzerland'].loc[raw_land_cover_dataset['Year']==2016].loc[raw_land_cover_dataset['Element']=='Area from MODIS']
df_artificial_surface_F = raw_land_cover_dataset.loc[raw_land_cover_dataset['Area']=='France'].loc[raw_land_cover_dataset['Year']==2016].loc[raw_land_cover_dataset['Element']=='Area from MODIS']
df_artificial_surface_G = raw_land_cover_dataset.loc[raw_land_cover_dataset['Area']=='Germany'].loc[raw_land_cover_dataset['Year']==2016].loc[raw_land_cover_dataset['Element']=='Area from MODIS']
df_artificial_surface_I = raw_land_cover_dataset.loc[raw_land_cover_dataset['Area']=='Italy'].loc[raw_land_cover_dataset['Year']==2016].loc[raw_land_cover_dataset['Element']=='Area from MODIS']
df_artificial_surface_A = raw_land_cover_dataset.loc[raw_land_cover_dataset['Area']=='Austria'].loc[raw_land_cover_dataset['Year']==2016].loc[raw_land_cover_dataset['Element']=='Area from MODIS']

# Pie plot #1
labels1 = df_artificial_surface_S.Item
sizes1 = df_artificial_surface_S.Value

fig1, ax1 = plt.subplots(figsize=(30,15))
ax1.pie(sizes1, labels=labels1, autopct='%1.1f%%',
        shadow=True, startangle=45)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax1.title.set_text('Distribution of artificial lands in Switzerland, year 2016')
fig1.set_facecolor('white')

# Pie plot #2
labels2 = df_artificial_surface_F.Item
sizes2 = df_artificial_surface_F.Value

fig2, ax2 = plt.subplots(figsize=(30,15))
ax2.pie(sizes2, labels=labels2, autopct='%1.1f%%',
        shadow=True, startangle=45)
ax2.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax2.title.set_text('Distribution of artificial lands in France, year 2016')
fig2.set_facecolor('white')

# Pie plot #3
labels3 = df_artificial_surface_G.Item
sizes3 = df_artificial_surface_G.Value

fig3, ax3 = plt.subplots(figsize=(30,15))
ax3.pie(sizes3, labels=labels3, autopct='%1.1f%%',
        shadow=True, startangle=45)
ax3.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax3.title.set_text('Distribution of artificial lands in Germany, year 2016')
fig3.set_facecolor('white')

# Pie plot #4
labels4 = df_artificial_surface_I.Item
sizes4 = df_artificial_surface_I.Value

fig4, ax4 = plt.subplots(figsize=(30,15))
ax4.pie(sizes4, labels=labels4, autopct='%1.1f%%',
        shadow=True, startangle=45)
ax4.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax4.title.set_text('Distribution of artificial lands in Italy, year 2016')
fig4.set_facecolor('white')

# Pie plot #5
labels5 = df_artificial_surface_A.Item
sizes5 = df_artificial_surface_A.Value

fig5, ax5 = plt.subplots(figsize=(30,15))
ax5.pie(sizes5, labels=labels5, autopct='%1.1f%%',
        shadow=True, startangle=45)
ax5.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax5.title.set_text('Distribution of artificial lands in Austria, year 2016')
fig5.set_facecolor('white')

# General settings
plt.show()

<div class="alert alert-block alert-info">
    
In Switzerland, 60,5 % of the lands are not usable for agriculture (sum of artificial, inland water bodies,snow and tree-covered areas).
    
In France, 51 % of the lands are not usable for agriculture. 
    
In Germany, 52,6 %
    
In Italy only 41,7%
    
In Austria 66,8 %
    
In Leichtenstein: no datas
    
So compared to its neighbours Switzerland have one of the smallest ratio of usable lands for Agriculture but still manage to have the best yields. 
    
Now that we have the general distribution of lands for each country, lets focus our plots on the agriculture lands. 

## Distribution of lands if we get rid off of artificial lands, inland water bodys and snow covered areas:

<div class="alert alert-block alert-warning">
    
Comme on plot déja une distribution des terres globales, je pense qu'on devrait garder que les plots des df_agri. 
Vous en pensez quoi? 
    
    J'aurais dit l'inverse

In [None]:
import matplotlib.pyplot as plt

# DataFrames to plot
df_land_S = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='Switzerland'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Land area']
df_agri_S = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='Switzerland'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Agricultural land']

df_land_F = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='France'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Land area']
df_agri_F = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='France'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Agricultural land']

df_land_G = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='Germany'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Land area']
df_agri_G = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='Germany'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Agricultural land']

df_land_I = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='Italy'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Land area']
df_agri_I = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='Italy'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Agricultural land']

df_land_A = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='Austria'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Land area']
df_agri_A = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='Austria'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Agricultural land']

df_land_L = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='Liechtenstein'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Land area']
df_agri_L = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='Liechtenstein'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Agricultural land']


# Pie plot #1 Switzerland
labels1 = df_land_S.Item
sizes1 = df_land_S.Value
explode = (0, 0, 0.1, 0)  # only "explode" the 3rd slice

fig1, ax1 = plt.subplots()
ax1.pie(sizes1,labels=labels1, autopct='%1.1f%%',
        shadow=True, startangle=45)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax1.title.set_text('Distribution of lands in Switzerland, year 2016')
fig1.set_facecolor('white')

# Pie plot #2 Switzerland
labels2 = df_agri_S.Item
sizes2 = df_agri_S.Value
fig1, ax2 = plt.subplots()
ax2.pie(sizes2, labels=labels2, autopct='%1.1f%%',
        shadow=True, startangle=45)
ax2.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax2.title.set_text('Distribution of agricultural lands in Switzerland, year 2016')

# Pie plot #3 France
labels3 = df_land_F.Item
sizes3 = df_land_F.Value
explode = (0, 0, 0.1, 0)  # only "explode" the 3rd slice

fig3, ax3 = plt.subplots()
ax3.pie(sizes3, explode=explode,labels=labels3, autopct='%1.1f%%',
        shadow=True, startangle=45)
ax3.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax3.title.set_text('Distribution of lands in France, year 2016')
fig3.set_facecolor('white')

# Pie plot #4 France
labels4 = df_agri_F.Item
sizes4 = df_agri_F.Value
fig3, ax4 = plt.subplots()
ax4.pie(sizes4, labels=labels4, autopct='%1.1f%%',
        shadow=True, startangle=45)
ax4.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax4.title.set_text('Distribution of agricultural lands in France, year 2016')

# Pie plot #5 Germany
labels5 = df_land_G.Item
sizes5 = df_land_G.Value
explode = (0, 0, 0.1, 0)  # only "explode" the 3rd slice

fig5, ax5 = plt.subplots()
ax5.pie(sizes5, explode=explode,labels=labels5, autopct='%1.1f%%',
        shadow=True, startangle=45)
ax5.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax5.title.set_text('Distribution of lands in Germany, year 2016')
fig5.set_facecolor('white')

# Pie plot #6 Germany
labels6 = df_agri_G.Item
sizes6 = df_agri_G.Value
fig5, ax6 = plt.subplots()
ax6.pie(sizes6, labels=labels6, autopct='%1.1f%%',
        shadow=True, startangle=45)
ax6.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax6.title.set_text('Distribution of agricultural lands in Germany, year 2016')

# Pie plot #7 Italy
labels7 = df_land_I.Item
sizes7 = df_land_I.Value
explode = (0, 0, 0.1, 0)  # only "explode" the 3rd slice

fig7, ax7 = plt.subplots()
ax7.pie(sizes7,labels=labels7, autopct='%1.1f%%',
        shadow=True, startangle=45)
ax7.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax7.title.set_text('Distribution of lands in Italy, year 2016')
fig7.set_facecolor('white')

# Pie plot #8 Italy
labels8 = df_agri_I.Item
sizes8 = df_agri_I.Value
fig7, ax8 = plt.subplots()
ax8.pie(sizes8, labels=labels8, autopct='%1.1f%%',
        shadow=True, startangle=45)
ax8.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax8.title.set_text('Distribution of agricultural lands in Italy, year 2016')

# Pie plot #9 Austria 
labels9 = df_land_A.Item
sizes9 = df_land_A.Value
explode = (0, 0, 0.1, 0)  # only "explode" the 3rd slice

fig9, ax9 = plt.subplots()
ax9.pie(sizes9, explode=explode,labels=labels9, autopct='%1.1f%%',
        shadow=True, startangle=45)
ax9.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax9.title.set_text('Distribution of lands in Austria, year 2016')
fig9.set_facecolor('white')

# Pie plot #10 Austria 
labels10 = df_agri_A.Item
sizes10 = df_agri_A.Value
fig9, ax10 = plt.subplots()
ax10.pie(sizes10, labels=labels10, autopct='%1.1f%%',
        shadow=True, startangle=45)
ax10.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax10.title.set_text('Distribution of agricultural lands in Austria, year 2016')

# Pie plot #11 Liechtenstein
labels11 = df_land_L.Item
sizes11 = df_land_L.Value
explode = (0, 0, 0.1, 0)  # only "explode" the 3rd slice

fig11, ax11 = plt.subplots()
ax11.pie(sizes11,labels=labels11, autopct='%1.1f%%',
        shadow=True, startangle=45)
ax11.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax11.title.set_text('Distribution of lands in Liechtenstein, year 2016')
fig11.set_facecolor('white')

# Pie plot #12 Liechtenstein
labels12 = df_agri_L.Item
sizes12 = df_agri_L.Value
fig11, ax12 = plt.subplots()
ax12.pie(sizes12, labels=labels12, autopct='%1.1f%%',
        shadow=True, startangle=45)
ax12.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax12.title.set_text('Distribution of agricultural lands in Liechtenstein, year 2016')


# General settings
fig1.set_facecolor('white')
plt.show()

<div class="alert alert-block alert-info">

From the first graphes (distribution of lands) we can see that only 45,2% of Switzerland lands are used in agriculture compared to Frane, Italy or Germany where around 64% of lands are exploited in agriculture. Lands expoited in agriculture are the sum of cropland and agriculture land. We can see that the percentage of forest is quite similar between those three countries and that the main difference reside in the percentage of lands attribuated to meadows and pastures. For example France use half land less than Switzerland for meadows and pastures, Germany more than half less and Italy use only one third of what Switzerland attribute. We can deduce from these plots that Switzerland is more dedicated to dairy products and breeding. 
    When comparing Sxitzerland with Liechtenstein, we find more similarities as the percentage of land used in agriculture is 42,5%.
    From the second graphes (distribution of agricultural lands) we can see that the majority of Switzerland agricultural lands are under permanant meadows and pastures. This is a huge amount compared to the other countries which promote crop and arable lands. This suit our previous conviction than Switzerland is more dedicated to dairy products and breedings. We could hypothetize that Switzerland may be obligated to reduce this part of dedicated land to meadows and pastures in order to become food self efficient. This would also induce work  and policy transitions and impact the Swiss economy. 
    However an important aspect which is not shown by this data are the part of urban lands. We should add it to our analysis. 
    
ps: Arable lands are lands that can or are cultivable

## Yield, Area harvested and production

In [None]:
#Interactive visualization

#Plot the area harvested (sum of all items) for all countries over years
def viz_sum_evolution(element):
    df_viz_sum_evolution = crops_sum.loc[crops_sum['Element']== element]
    
    # multiple line plot
    plt.figure(figsize=(20,10))
    plt.plot( 'Year', 'Sum', data=df_viz_sum_evolution.loc[df_viz_sum_evolution['Area']=='Austria'], marker='', color='green',  label = 'Austria')
    plt.plot( 'Year', 'Sum', data=df_viz_sum_evolution.loc[df_viz_sum_evolution['Area']=='France'], marker='', color='skyblue', label = 'France')
    plt.plot( 'Year', 'Sum', data=df_viz_sum_evolution.loc[df_viz_sum_evolution['Area']=='Switzerland'], marker='', color='red', label = 'Switzerland', linewidth=3)
    plt.plot( 'Year', 'Sum', data=df_viz_sum_evolution.loc[df_viz_sum_evolution['Area']=='Germany'], marker='', color='orange', label = 'Germany')
    plt.plot( 'Year', 'Sum', data=df_viz_sum_evolution.loc[df_viz_sum_evolution['Area']=='Italy'], marker='', color='grey', label = 'Italy')
    
    plt.legend() 
    plt.title(f'{element} of all items in Switzerland and its neighbours throughout years', fontsize= 20)
    plt.xlabel("Year", fontsize= 20)
    plt.ylabel("Values", fontsize= 20)
    plt.show()
   
elements = crops_sum.Element.unique()
interact(viz_sum_evolution, element = elements)

<div class="alert alert-block alert-info">
    
   Switzerland has the lowest production and area harvested of all items throughout years but it allways have one of the higher yield and it is increasing. 

## Map showing the yield of countries

In [None]:
m = folium.Map(location=[48, -102], zoom_start=3)

world_geo = 'https://raw.githubusercontent.com/johan/world.geo.json/master/countries.geo.json'
Bins = list(yield_df.mean_element.quantile([0, 0.25, 0.5, 0.75, 1]))

m = folium.Map(zoom_start=3)

folium.Choropleth(
    geo_data=world_geo,
    name='choropleth',
    data=log_yield_df,
    columns=[log_yield_df.index.get_level_values(level='country_or_area').values,'mean_element'],
    key_on='feature.properties.name',
    fill_color='BuPu',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='yield',
    #bins = Bins,
    reset=True
).add_to(m)

folium.LayerControl().add_to(m)

m

<div class="alert alert-block alert-warning">

TO DO: zoom on Switzerland and its neighbours? 

## Map showing the ratio of area Harvested/superficy of the country

In [None]:
m = folium.Map(location=[48, -102], zoom_start=3)

world_geo = 'https://raw.githubusercontent.com/johan/world.geo.json/master/countries.geo.json'
Bins = list(crops_countries_area.ratio.quantile([0, 0.25, 0.5, 0.75, 1]))

m = folium.Map(zoom_start=3)

folium.Choropleth(
    geo_data=world_geo,
    name='choropleth',
    data=log_df,
    columns=[crops_countries_area.index.get_level_values(level='country_or_area').values,'ratio'],
    key_on='feature.properties.name',
    fill_color='BuPu',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='country surface vs surface harvested ratio',
    reset=True
).add_to(m)

folium.LayerControl().add_to(m)

m

<div class="alert alert-block alert-warning">

TO DO: zoom on Switzerland and its neighbours? 

## Demographic Growth

In [None]:
#plot of the evolution of the demography over the years
from scipy.stats import linregress

plt.figure(figsize=(20,10))
plt.plot( 'Year', 'Population', data=demography.loc[demography['Area']=='Austria'], marker='', color='green',  label = 'Austria')
plt.plot( 'Year', 'Population', data=demography.loc[demography['Area']=='France'], marker='', color='skyblue', label = 'France')
plt.plot( 'Year', 'Population', data=demography.loc[demography['Area']=='Switzerland'], marker='', color='red', label = 'Switzerland', linewidth=3)
plt.plot( 'Year', 'Population', data=demography.loc[demography['Area']=='Germany'], marker='', color='orange', label = 'Germany')
plt.plot( 'Year', 'Population', data=demography.loc[demography['Area']=='Italy'], marker='', color='grey', label = 'Italy')
    
plt.legend() 
plt.title('Evolution of the demography over the years' , fontsize= 20)
plt.xlabel("Year", fontsize= 20)
plt.ylabel("Population value", fontsize= 20)
plt.show()

<div class="alert alert-block alert-warning">
For our predictive model we will need a way to estimate the future population values. Linear regression?

## B. Farmers population 

<div class="alert alert-block alert-warning">
TO DO: ANALYSIS: Is it increasing or decreasing? Why? (new machines, not apealing job anymore....)

Answer the question: Is food selfsuffience of CH realistic ? How many farmer would it need ? 
    

In [None]:
df_employ_basic = pd.read_csv('../data/FAOSTAT_data_12-10-2019_employment.csv')
df_employ_basic.columns = map(str.lower, df_employ_basic.columns)
df_employ_basic.head(2)

In [None]:
df_employ = df_employ_basic.drop(columns={'domain code','domain','area code','indicator code','source code',\
                              'year code'}).copy()
df_employ.head(2)

In [None]:
df_employ.year.unique()

In [None]:
df_employ.indicator.unique()

In [None]:
df_employ.area.unique()

In [None]:
df_employ.flag.unique() #We only have international reliable sources

In [None]:
df_employ_newind = df_employ.copy()
df_employ_newind = df_employ.set_index(['area','indicator'])
df_employ_newind = df_employ_newind.sort_index()
df_employ_newind.head(2)

In [None]:
ax = plt.gca()
df_employ_newind.loc[('Austria','Employment in agriculture')].plot(kind='scatter',color='cyan',x='year',y='value',ax=ax, label='Austria')
df_employ_newind.loc[('Germany','Employment in agriculture')].plot(kind='scatter',color='black',x='year',y='value',ax=ax, label='Germany')
df_employ_newind.loc[('France','Employment in agriculture')].plot(kind='scatter',color='blue',x='year',y='value',ax=ax, label='France')
df_employ_newind.loc[('Switzerland','Employment in agriculture')].plot(kind='scatter',color='red',x='year',y='value',ax=ax, label='Switzerland')
df_employ_newind.loc[('Italy','Employment in agriculture')].plot(kind='scatter',color='green',x='year',y='value',ax=ax, label='Italy',
                                                                 figsize=(15,10))
ax.set(title='Employment in Agriculture (1969-2017)',
ylabel='Nb persons /1000',
xlabel='Years')
ax.yaxis.label.set_size(30)
ax.xaxis.label.set_size(30)
ax.title.set_size(30)
plt.show()

<div class="alert alert-block alert-info">
    
Employment in agriculture are decreasing a lot in France, Germany and Italy but are constants in Austria and Switzerland. 
The number of employes in the agriculture sector are very small in Switzerland, only around 200 000 employes. why? 
    

In [None]:
ax = plt.gca()
df_employ_newind.loc[('Austria','Share of employees in agriculture (% of total employees)')].plot(kind='scatter',color='cyan',x='year',y='value',ax=ax, label='Austria')
df_employ_newind.loc[('Germany','Share of employees in agriculture (% of total employees)')].plot(kind='scatter',color='black',x='year',y='value',ax=ax, label='Germany')
df_employ_newind.loc[('France','Share of employees in agriculture (% of total employees)')].plot(kind='scatter',color='blue',x='year',y='value',ax=ax, label='France')
df_employ_newind.loc[('Switzerland','Share of employees in agriculture (% of total employees)')].plot(kind='scatter',color='red',x='year',y='value',ax=ax, label='Switzerland')
df_employ_newind.loc[('Italy','Share of employees in agriculture (% of total employees)')].plot(kind='scatter',color='green',x='year',y='value',ax=ax, label='Italy',
                                                                                                figsize=(17,10))
ax.set(title='Share of employees in agriculture',
ylabel='% of total employees',
xlabel='Years')
ax.yaxis.label.set_size(30)
ax.xaxis.label.set_size(30)
ax.title.set_size(30)
plt.show()

<div class="alert alert-block alert-info">
Agriculture represents a very small part employment, around 1%. But it is approximatively the same fo its neighbours: France Germany and Austria.
Only Italy has a relatively high percentage of employes in agriculture but it can be correlated with the fact that they also have the highest number of employees in this sector. 

In [None]:
ax = plt.gca()
df_employ_newind.loc[('Austria','Employment-to-population ratio, rural areas')].plot(kind='scatter',color='cyan',x='year',y='value',ax=ax, label='Austria')
df_employ_newind.loc[('Germany','Employment-to-population ratio, rural areas')].plot(kind='scatter',color='black',x='year',y='value',ax=ax, label='Germany')
df_employ_newind.loc[('France','Employment-to-population ratio, rural areas')].plot(kind='scatter',color='blue',x='year',y='value',ax=ax, label='France')
df_employ_newind.loc[('Switzerland','Employment-to-population ratio, rural areas')].plot(kind='scatter',color='red',x='year',y='value',ax=ax, label='Switzerland')
df_employ_newind.loc[('Italy','Employment-to-population ratio, rural areas')].plot(kind='scatter',color='green',x='year',y='value',ax=ax, label='Italy',
                                                                                                figsize=(17,10))
ax.set(title='Employment-to-population ratio, RURAL AREAS',
ylabel='share of the employed population in total working-age population',
xlabel='Years')
ax.yaxis.label.set_size(13)
ax.xaxis.label.set_size(30)
ax.title.set_size(30)
plt.show()

<div class="alert alert-block alert-info">
When looking to the employment to population ration we can see that it is Switzerland that have the higher one, meaning that Switzerland have the biggest population part working in Agriculture. So Agriculture represents an important employment sector. 
We also remark that this ratio is increasing this past few years meaning that this work sector is attractive.

## C. temperature : climate impact and correlations with food production


<div class="alert alert-block alert-success">
Started

Answer the question:
How's been the productivity over the years? Is it growing, decreasing? What are the factors correlated with the trend? Temperature rise, fertilizer usage?

Let's look at how the temperature changes affects our five main items production other the year.
five main items: potatoes, wheat, Maize, sugar beet and apples

In [None]:
#we should make an interactive plot so we can select a food element and see how its production is affected
#by temperatures changes. 
years = np.sort(CH_data.Year.unique())
fig, ax1 = plt.subplots()
data1 = CH_data.loc[CH_data.Item=='Potatoes']['Produced Quantity']
data2 = CH_temperatures.year

color = 'tab:red'
ax1.set_xlabel('year')
ax1.set_ylabel('production', color=color)
ax1.plot(years, data1, color=color)
ax1.tick_params(axis='y', labelcolor=color)

ax2 = ax1.twinx()  # instantiate a second axes that shares the same x-axis

color = 'tab:blue'
ax2.set_ylabel('temperature', color=color)  # we already handled the x-label with ax1
ax2.plot(years, data2, color=color)
ax2.tick_params(axis='y', labelcolor=color)

fig.tight_layout()  # otherwise the right y-label is slightly clipped
plt.title('Potatoes production and temperatures every year')
plt.show()

In [None]:
#we should make an interactive plot so we can select a food element and see how its production is affected
#by temperatures changes. 
years = np.sort(CH_data.Year.unique())
fig, ax1 = plt.subplots()
data1 = CH_data.loc[CH_data.Item=='Wheat']['Produced Quantity']
data2 = CH_temperatures.year

color = 'tab:red'
ax1.set_xlabel('year')
ax1.set_ylabel('production', color=color)
ax1.plot(years, data1, color=color)
ax1.tick_params(axis='y', labelcolor=color)

ax2 = ax1.twinx()  # instantiate a second axes that shares the same x-axis

color = 'tab:blue'
ax2.set_ylabel('temperature', color=color)  # we already handled the x-label with ax1
ax2.plot(years, data2, color=color)
ax2.tick_params(axis='y', labelcolor=color)

fig.tight_layout()  # otherwise the right y-label is slightly clipped
plt.title('Wheat production and temperatures every year')
plt.show()

In [None]:
#we should make an interactive plot so we can select a food element and see how its production is affected
#by temperatures changes. 
years = np.sort(CH_data.Year.unique())
fig, ax1 = plt.subplots()
data1 = CH_data.loc[CH_data.Item=='Sugar beet']['Produced Quantity']
data2 = CH_temperatures.year

color = 'tab:red'
ax1.set_xlabel('year')
ax1.set_ylabel('production', color=color)
ax1.plot(years, data1, color=color)
ax1.tick_params(axis='y', labelcolor=color)

ax2 = ax1.twinx()  # instantiate a second axes that shares the same x-axis

color = 'tab:blue'
ax2.set_ylabel('temperature', color=color)  # we already handled the x-label with ax1
ax2.plot(years, data2, color=color)
ax2.tick_params(axis='y', labelcolor=color)

fig.tight_layout()  # otherwise the right y-label is slightly clipped
plt.title('Sugar beet production and temperatures every year')
plt.show()

In [None]:
#we should make an interactive plot so we can select a food element and see how its production is affected
#by temperatures changes. 
years = np.sort(CH_data.Year.unique())
fig, ax1 = plt.subplots()
data1 = CH_data.loc[CH_data.Item=='Apples']['Produced Quantity']
data2 = CH_temperatures.year

color = 'tab:red'
ax1.set_xlabel('year')
ax1.set_ylabel('production', color=color)
ax1.plot(years, data1, color=color)
ax1.tick_params(axis='y', labelcolor=color)

ax2 = ax1.twinx()  # instantiate a second axes that shares the same x-axis

color = 'tab:blue'
ax2.set_ylabel('temperature', color=color)  # we already handled the x-label with ax1
ax2.plot(years, data2, color=color)
ax2.tick_params(axis='y', labelcolor=color)

fig.tight_layout()  # otherwise the right y-label is slightly clipped
plt.title('Apples production and temperatures every year')
plt.show()

In [None]:
#we should make an interactive plot so we can select a food element and see how its production is affected
#by temperatures changes. 
years = np.sort(CH_data.Year.unique())
fig, ax1 = plt.subplots()
data1 = CH_data.loc[CH_data.Item=='Maize']['Produced Quantity']
data2 = CH_temperatures.year

color = 'tab:red'
ax1.set_xlabel('year')
ax1.set_ylabel('production', color=color)
ax1.plot(years, data1, color=color)
ax1.tick_params(axis='y', labelcolor=color)

ax2 = ax1.twinx()  # instantiate a second axes that shares the same x-axis

color = 'tab:blue'
ax2.set_ylabel('temperature', color=color)  # we already handled the x-label with ax1
ax2.plot(years, data2, color=color)
ax2.tick_params(axis='y', labelcolor=color)

fig.tight_layout()  # otherwise the right y-label is slightly clipped
plt.title('Maize production and temperatures every year')
plt.show()

<div class="alert alert-block alert-info">
As temperature increase, the potatoes, maize and apples production decreases.
Wheat production is less affected by tempearture changes. 
Sugar beet production increase with temperatures, so this item need heat to be grown. 
    

<div class="alert alert-block alert-warning">
For our predictive model we will need a way to estimate the future temperatures. If we base our reasoning on +1.5°C in 2050, we can make linear regression? 

## D. Environment (use of fertilizers + pesticides needed ? depends on productivity)

<div class="alert alert-block alert-warning">
TO DO: need to find dataset, clear it and plot the correlation between fertilizers use and production. Analysis

In [None]:
fertilizers_dataset = pd.read_csv('../data/FAOSTAT_data_fertilizers.csv')

In [None]:
fertilizers_dataset =fertilizers_dataset[['Domain', 'Area', 'Element', 'Item', 'Year', 'Unit', 'Value', 'Flag Description']]

In [None]:
fertilizers_dataset.head()

In [None]:
fertilizers_dataset.Area.unique() #No data available for Germany and Liechstenstein 

In [None]:
#Compute total use of fertilizer by year (=combine all types)
fert_sum = fertilizers_dataset.groupby(['Area','Year'])\
                              .agg({'Value':'sum'})\
                              .rename(columns={'Value':'Sum'})\
                              .reset_index()                            
fert_sum

In [None]:
#Next : add the production for those years 
#Lest's try with CH:
fert_ch = fert_sum.loc[fert_sum['Area']=='Switzerland']

In [None]:
pesticides_dataset = pd.read_csv('../data/FAOSTAT_data_pesticides.csv')

In [None]:
pesticides_dataset = pesticides_dataset[['Domain', 'Area', 'Element', 'Item', 'Year', 'Unit', 'Value', 'Flag Description']]

In [None]:
pesticides_dataset.head()

In [None]:
pest_ch = pesticides_dataset.loc[pesticides_dataset['Area']=='Switzerland'].loc[pesticides_dataset['Item']=='Pesticides (total)']
pest_ch = pest_ch[['Year','Value']]

In [None]:
prod_ch = crops_sum.loc[crops_sum['Element']== 'Production'].loc[crops_sum['Area']== 'Switzerland']

In [None]:
#pd.concat([prod_ch, fert_ch], sort=False).tail(60)
combo_ch = pd.merge(prod_ch, fert_ch, how='inner', on=['Year'])\
                .rename(columns={'Area_x':'Area'})\
                .rename(columns={'Sum_x':'Production'})\
                .rename(columns={'Sum_y':'Fertilizers'})\
                .drop(columns=['Area_y','Element'])
combo_ch

In [None]:
combo_ch = pd.merge(combo_ch, pest_ch, how='inner', on=['Year'])\
                .rename(columns={'Value':'Pesticides'})
combo_ch

In [None]:
plt.figure(figsize=(20,10))
plt.plot( 'Year', 'Production', data=combo_ch, marker='', color='green', label = 'Production', linewidth=3)
plt.plot( 'Year', 'Fertilizers', data=combo_ch, marker='', color='blue', label = 'Fertilizers', linewidth=3)
plt.plot( 'Year', 'Pesticides', data=combo_ch, marker='', color='red', label = 'Pesticides', linewidth=3)
plt.legend() 
plt.title(f'Use of fertilizers and production over years in Switzerland', fontsize= 20)
plt.xlabel("Year", fontsize= 20)
plt.ylabel("Values [tonnes]", fontsize= 20)
plt.show()

For other countries :

In [None]:
fert_it = fert_sum.loc[fert_sum['Area']=='Italy']
prod_it = crops_sum.loc[crops_sum['Element']== 'Production'].loc[crops_sum['Area']== 'Italy']
combo_it = pd.merge(prod_it, fert_it, how='inner', on=['Year'])\
                .rename(columns={'Area_x':'Area'})\
                .rename(columns={'Sum_x':'Production'})\
                .rename(columns={'Sum_y':'Fertilizers'})\
                .drop(columns=['Area_y','Element'])

plt.figure(figsize=(20,10))
plt.plot( 'Year', 'Production', data=combo_it, marker='', color='green', label = 'Production', linewidth=3)
plt.plot( 'Year', 'Fertilizers', data=combo_it, marker='', color='red', label = 'Fertilizers', linewidth=3)
plt.legend() 
plt.title(f'Use of fertilizers and production over years in Italy', fontsize= 20)
plt.xlabel("Year", fontsize= 20)
plt.ylabel("Values [tonnes]", fontsize= 20)
plt.show()

In [None]:
fert_fr = fert_sum.loc[fert_sum['Area']=='France']
prod_fr = crops_sum.loc[crops_sum['Element']== 'Production'].loc[crops_sum['Area']== 'France']
combo_fr = pd.merge(prod_fr, fert_fr, how='inner', on=['Year'])\
                .rename(columns={'Area_x':'Area'})\
                .rename(columns={'Sum_x':'Production'})\
                .rename(columns={'Sum_y':'Fertilizers'})\
                .drop(columns=['Area_y','Element'])

plt.figure(figsize=(20,10))
plt.plot( 'Year', 'Production', data=combo_fr, marker='', color='green', label = 'Production', linewidth=3)
plt.plot( 'Year', 'Fertilizers', data=combo_fr, marker='', color='red', label = 'Fertilizers', linewidth=3)
plt.legend() 
plt.title(f'Use of fertilizers and production over years in France', fontsize= 20)
plt.xlabel("Year", fontsize= 20)
plt.ylabel("Values [tonnes]", fontsize= 20)
plt.show()

In [None]:
fert_au = fert_sum.loc[fert_sum['Area']=='Austria']
prod_au = crops_sum.loc[crops_sum['Element']== 'Production'].loc[crops_sum['Area']== 'Austria']
combo_au = pd.merge(prod_au, fert_au, how='inner', on=['Year'])\
                .rename(columns={'Area_x':'Area'})\
                .rename(columns={'Sum_x':'Production'})\
                .rename(columns={'Sum_y':'Fertilizers'})\
                .drop(columns=['Area_y','Element'])

plt.figure(figsize=(20,10))
plt.plot( 'Year', 'Production', data=combo_au, marker='', color='green', label = 'Production', linewidth=3)
plt.plot( 'Year', 'Fertilizers', data=combo_au, marker='', color='red', label = 'Fertilizers', linewidth=3)
plt.legend() 
plt.title(f'Use of fertilizers and production over years in Austria', fontsize= 20)
plt.xlabel("Year", fontsize= 20)
plt.ylabel("Values [tonnes]", fontsize= 20)
plt.show()

#### Analysis: 

No concluant. For all studied countries, use of fertilizers constant over years (normes pays, on est deja au max de l'utilisation autorisée?), no correlation with production fluctuations.

## E. Compute the predictive model (ReadMe)

<div class="alert alert-block alert-warning">
TO DO :  The model should return the percentage of land that could be allocated for each element in addition of the already existing lands. To compute this, the model will take into account the different food elements, the available land for agriculture, the temperatues and the demography. <br> Once we get this percentage, we can add it to the land already used for each country and calculate the increase in production for each food elements. Thus we could see if the importations can be reduce and conclude about the self food sufficiency of Switzerland. This model will be run on every years until 2030. 

# 3. Economic consequences analysis? 

<div class="alert alert-block alert-warning">
Honnetement ca c'est si on a le temps .... 