# ADA Project - Food self-sufficiency :   what about Switzerland?

<h1>Table of Contents<span class="tocSkip"></span></h1>

<div class="toc">  
    <ul class="toc-item">
        <li><span><a href="#Libraries-importation" data-toc-modified-id="Libraries-importation-0">Environment set up and libraries </a></span></li>
        <li><span><a href="#World-global-view" data-toc-modified-id="World-global-view-1">World global view</a></span>
            <ul class="toc-item">
                <li><span><a href="#Data-loading" data-toc-modified-id="Data-loading-1.1">Data loading and preprocessing</a></span>  
                    <ul class="toc-item"><li><span><a href="#Series" data-toc-modified-id="Series-1.1.1">subpart if needed (just for me to know how this works)</a></span></li></ul> 
                    <ul class="toc-item">
                        <li><span><a href="#Series" data-toc-modified-id="Series-1.1.1">subpart if needed</a></span></li>
                        <ul class="toc-item"><li><span><a href="#Missing-Values" data-toc-modified-id="Missing-Values-1.2.0.1">Subsub</a></span></li></ul>
                    </ul>
                </li>
                <li><span><a href="#Insights" data-toc-modified-id="Insights-1.2">Insights</a></span></li>
            </ul>
       </li>
    </ul>
    <ul class="toc-item">
        <li>
            <span><a href="#Switzerland" data-toc-modified-id="Switzerland-2">Switzerland</a></span>
            <ul class="toc-item"><li><span><a href="#Data-loading-and-cleaning" data-toc-modified-id="Data-loading-and-cleaning-2.1">Data loading and cleaning</a></span>
                <ul class="toc-item"><li><span><a href="#Crops-Dataset" data-toc-modified-id="Crops-Dataset-2.1.1">Dataset : crops</a></span></li></ul>
                <ul class="toc-item"><li><span><a href="#Land-Use-Areas-Dataset" data-toc-modified-id="Land-Use-Areas-Dataset-2.1.2">Dataset : land use (area)</a></span></li></ul>
                <ul class="toc-item"><li><span><a href="#Land-Use-Indicators-Dataset" data-toc-modified-id="Land-Use-Indicators-Dataset-2.1.3">Dataset : land use (indicators)</a></span></li></ul>
                <ul class="toc-item"><li><span><a href="#Land-Cover-Dataset" data-toc-modified-id="Land-Cover-Dataset-2.1.4">Dataset : land cover</a></span></li></ul>
                <ul class="toc-item"><li><span><a href="#Demography-Dataset" data-toc-modified-id="Demography-Dataset-2.1.5">Dataset : demography</a></span></li></ul>
                <ul class="toc-item"><li><span><a href="#Swiss-importations-and-exportations-of-agricultural-goods-Dataset" data-toc-modified-id="Swiss-importations-and-exportations-of-agricultural-goods-Dataset-2.1.6">Dataset : importation/exportation (CH)</a></span></li></ul>
                <ul class="toc-item"><li><span><a href="#Italian-importations-and-exportations-of-agricultural-goods-Dataset" data-toc-modified-id="Italian-importations-and-exportations-of-agricultural-goods-Dataset-2.1.7">Dataset : importation/exportation (I)</a></span></li></ul>
                <ul class="toc-item"><li><span><a href="#French-importations-and-exportations-of-agricultural-goods-Dataset" data-toc-modified-id="French-importations-and-exportations-of-agricultural-goods-Dataset-2.1.8">Dataset : importation/exportation (FR)</a></span></li></ul>
                <ul class="toc-item"><li><span><a href="#Austrian-importations-and-exportations-of-agricultural-goods-Dataset" data-toc-modified-id="Austrian-importations-and-exportations-of-agricultural-goods-Dataset-2.1.9">Dataset : importation/exportation (AU)</a></span></li></ul>
                <ul class="toc-item"><li><span><a href="#German-importations-and-exportations-of-agricultural-goods-Dataset" data-toc-modified-id="German-importations-and-exportations-of-agricultural-goods-Dataset-2.1.10">Dataset : importation/exportation (G)</a></span></li></ul>
                <ul class="toc-item"><li><span><a href="#Swiss-temperatures-Dataset" data-toc-modified-id="Swiss-temperatures-Dataset-2.1.11">Dataset : swiss temperatures</a></span></li></ul>
                <ul class="toc-item"><li><span><a href="#Farmers-population-Dataset" data-toc-modified-id="Farmers-population-Dataset-2.1.12">Dataset : farmers population</a></span></li></ul>
                <ul class="toc-item"><li><span><a href="#Fertilizers-and-Pesticides-Dataset" data-toc-modified-id="Fertilizers-and-Pesticides-Dataset-2.1.13">Dataset : fertilizers and pesticides</a></span></li></ul>
                </li></ul>
            <ul class="toc-item"><li><span><a href="#Investigation-plots" data-toc-modified-id="Investigation-plots-2.2">Investigation plots</a></span></li></ul>
            <ul class="toc-item"><li><span><a href="#Main-results" data-toc-modified-id="Main-results-2.3">Main results</a></span>
                <ul class="toc-item"><li><span><a href="#SSR-score" data-toc-modified-id="SSR-score-2.3.1">SSR score</a></span></li></ul>
            </li></ul>

# Libraries importation

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import folium

- `conda install -c conda-forge ipywidgets`  --> installs ipwydgets

- `conda install nodejs` --> required to run the following line

- `jupyter labextension install @jupyter-widgets/jupyterlab-manager` --> enables interactive visualization for jupyter lab also

In [None]:
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed, interact_manual

In [None]:
import plotly.offline as py
py.init_notebook_mode(connected=False)
import plotly.graph_objs as go
import plotly.express as px

from scipy.stats import linregress

In [None]:
import warnings
warnings.filterwarnings("ignore", category=UserWarning) #mutes warnings

In [None]:
from ipywidgets import IntSlider
from ipywidgets.embed import embed_minimal_html

slider = IntSlider(value=40)
embed_minimal_html('export.html', views=[slider], title='Widgets export')

# World global view

<div class="alert alert-block alert-success">
    
## We first investigated the dataset chosen from the proposed list : "Global Food & Agriculture Statistics"

Our aim was initially to link food production to hunger in some areas. An other idea was to found the possible causes for food insufficiency (natural disasters, wars...). <br>

The FAO dataset is the one we dowloaded from the course's link. It contains all the FAO data for world crops production. We started our analysis with this file but realized that given the diversity of the data, we should rather focus our project on a region or country. Moreover, this data is somehow out of date. <br>

You will find right bellow our data investigation for "Global Food & Agriculture Statistics" dataset as we want to explicit our reasoning.

## Data loading


<span style='background :gray' > Load Data into a pandas dataframe </span>

In [None]:
complete_dataset = pd.read_csv('../data/fao_data_crops_data.csv')

In [None]:
# We split the data and metadata and store them in 'crops' and 'flags' dataframe, respecitvely.
crops = complete_dataset.loc[:2255342].copy() 
flags = complete_dataset.loc[2255344:2255348].copy() 
# 'flags' contains correspondance list of acronyms that describe how a given sample was acquired --> only informative
flags.drop(['element','year','unit','value','value_footnotes','category'], axis=1, inplace = True) 
flags.rename(columns={'country_or_area':'acronym', 'element_code':'description'}, inplace=True) 
flags.set_index('acronym', inplace=True)
flags

<span style='background :gray' > Exploratory data analysis </span>

In [None]:
crops.head()

In [None]:
print("Size of the DataFrame: {s}\n".format(s=crops.shape))
print("Variable types present in DataFrame: \n{t}".format(t=crops.dtypes))

In [None]:
# List all the different footnotes values present in the dataset
footnotes = crops['value_footnotes'].unique() 
print(footnotes)
# Display dataframe that only contains one given value of 'value_footnotes'
display(crops.query('value_footnotes==@footnotes[4]')) 
# Return dataframe that only contains samples having NaN as value for 'value_footnotes'
crops[crops.value_footnotes.isnull()] 

In [None]:
print(crops['element'].unique())
print(crops['year'].unique())
print(crops['unit'].unique())
print(crops['category'].unique())
print(crops['element_code'].unique())
print(crops['country_or_area'].unique())


<span style='background :gray' > Data preprocessing </span>

We clear the data by dropping all the row containing only NAN values. 
We also clear the raw where value_footnotes is NR as it means not repported by country, so it won't be usefull for our analysis.

In [None]:
# Returns a boolean of whether a column contains NaN (True) or not (False).
print(crops.isnull().values.any(axis=0)) 

# Drop rows which contain only missing values.
crops.dropna(how='all', inplace=True) 

In [None]:
# We drop the samples where 'value' is unknown (NaN) because they are of no utility    
crops.dropna(subset=['value', 'value_footnotes'], inplace=True) 

# Let's drop also all the samples that have 'NR' as a 'value_footnotes' value or 0 as 'value'
crops.drop(index=crops[crops['value_footnotes'].str.contains('NR')].index, inplace=True)
crops.drop(index=crops[crops['value']==0].index, inplace=True)


In our dataset, regions are indicated by a "+" at the end of their names. We want to separate regions from countries to facilitate our analysis so we can be more precise. 

In [None]:
regions_bool = crops['country_or_area'].str.contains('\+')
crops_regions = crops[regions_bool].copy()
crops_countries = crops[~regions_bool].copy()

We calculate the mean of all the elements for every country so we can compare the area harvested, seed or yield between each country. The mean is calculated over all years. 

In [None]:
#calculate the mean of all the elements for every country.  
crops_countries_by_country_year = crops_countries.groupby(['country_or_area', 'element']) \
                            .agg({'value':'mean'}) \
                            .rename(columns={'value':'mean_element'}) 
crops_countries_by_country_year

In [None]:
area_harvested = crops_countries_by_country_year.loc[(slice(None),'Area Harvested'), :]
#area_harvested.loc['United States of America']
area_harvested.head()

## Insights

### <span style='background :gray' >Create a map showing yield by country (average over all years) </span>

The following maps provide an insight of agricultural yield and area harvested in World countries.


In [None]:
yield_df= crops_countries_by_country_year.loc[(slice(None),'Yield'), :]
#we take the log value for the following plot, so our quantile a more equilibrate
log_yield_df=pd.DataFrame(yield_df.mean_element.map(lambda x:np.log(x)))
log_yield_df.head()

In [None]:
m1 = folium.Map(location=[48, -102], zoom_start=3)

world_geo = 'https://raw.githubusercontent.com/johan/world.geo.json/master/countries.geo.json'
Bins = list(yield_df.mean_element.quantile([0, 0.25, 0.5, 0.75, 1]))

m1 = folium.Map(zoom_start=3)

folium.Choropleth(
    geo_data=world_geo,
    name='choropleth',
    data=log_yield_df,
    columns=[log_yield_df.index.get_level_values(level='country_or_area').values,'mean_element'],
    key_on='feature.properties.name',
    fill_color='BuPu',
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name='yield',
    #bins = Bins,
    reset=True
).add_to(m1)

folium.LayerControl().add_to(m1)

m1

In [None]:
m1.save('../doc/map_yield.html') #map saved as html file in doc folder.

The countries having the most important yield are Iceland and Danemark. We can also see that in many countries of Africa the yield is very low. Mongolia has also a very low yield. We can see the yield of Switzerland is higher than its neighbours (averaged over years).

<div class="alert alert-block alert-warning">

Add more stuff of our "world analysis" if notebook stills run well after I put all the important steps
    

# Switzerland

<div class="alert alert-block alert-success">

# Project Update - Insight into Switzerland agriculture production 

We will focus on Switzerland compared to its neighbours. We would like to know if Switzerland could be self-sufficient in term of food production. 

## Abstract

In the wake of the the years 2007-08, food self-sufficiency policies have gained increased attention in a number of coutries following the international food crisis that triggered great volatilities on the world food markets causing important economic and social damages. <br>
Since then, diverse countries have expressed interest in improving their levels of food self-sufficiency arising controversy into a massive economically connected world.

On the 23th september of 2018, in the small country of Switzerland, the debate is materialized into a popular referendum submitted to its population asking wherever a food self-sufficiency politic should be adopted or not. Such a politic could have unexpected consequences considering a country as Switzerland with many neighbours and such a small area capacity. <br>
This paper aims to analyse the questions surrounding the debate over food self-sufficiency in Switzerland. 

- What does Switzerland produce and in which quantity? 
- What about the amount of importations/exportations?
- Are all Switzerland areas optimally harvested?
- Links to population size
- How is the Swiss productiviy trend evolving, is it correlated with external factors such as temperature, fertilizer use, ...?

Then we will make comparaisons between Switzerland and its neighbours. Does Switzerland import more than its neighbours (due to its small size ?) ? Is food self-suffience of Switzerland realistic ? How many farms/farmers would it need ?


## **Data loading and cleaning**


<span style='background :gray' > Load Data into pandas dataframes </span>

### Crops Dataset

This dataset represents our new start line, it contains almost the same informations as the "Global Food & Agriculture Statistics" we already used. However, the data are more recent.  

We found the majority of our following data on the __[Food And Agriculture Organization of the United Nations Datasets](http://www.fao.org/faostat/en/#data)__ website (we will precise later if one dataset does not come from this link).

File contains data about Switzerland and neighbours (Italy, Germany, France, Austria and Liechtenstein)

In [None]:
raw_CH_crops_dataset = pd.read_csv('../data/FAOSTAT_data_crops_CHandNeighbours.csv')

In [None]:
raw_CH_crops_dataset =raw_CH_crops_dataset[['Domain', 'Area', 'Element', 'Item', 'Year', 'Unit', 'Value', 'Flag Description']]
raw_CH_crops_dataset.drop(index=raw_CH_crops_dataset[raw_CH_crops_dataset['Flag Description'].str.contains('Data not available')].index, inplace=True)

In [None]:
raw_CH_crops_dataset.head()

In [None]:
print("Size of the DataFrame: {s}\n".format(s=raw_CH_crops_dataset.shape))
print("Variable types present in DataFrame: \n{t}".format(t=raw_CH_crops_dataset.dtypes))

In [None]:
print(raw_CH_crops_dataset.isna().values.any(axis=0)) 

What about the categories listed in our columns?

In [None]:
print(raw_CH_crops_dataset['Domain'].unique())
print(raw_CH_crops_dataset['Area'].unique())
print(raw_CH_crops_dataset['Element'].unique())
print(raw_CH_crops_dataset['Item'].unique())
print(raw_CH_crops_dataset['Year'].unique())
print(raw_CH_crops_dataset['Unit'].unique())
print(raw_CH_crops_dataset['Flag Description'].unique())

### Land Use Areas Dataset

File contains data about Switzerland and neighbours (Italy, Germany, France, Austria and Liechtenstein).
This file will allows us to know the potential of Switzerland in term of agriculture. Does the country use all its land or not? 

Data exploration and pre-processing is very simmilar to first dataset. We will therefore not describe all steps as precisely as before.

In [None]:
dist_land_dataset = pd.read_csv('../data/FAOSTAT_data_NEWLandUse.csv')

In [None]:
dist_land_dataset = dist_land_dataset[['Domain','Area','Element','Item','Year','Value','Flag Description']]

In [None]:
pd.options.mode.chained_assignment = None  # default='warn', Mutes warnings when copying a slice from a DataFrame.
dist_land_dataset['Value'] = dist_land_dataset['Value'].apply(lambda x: x*1000)

In [None]:
dist_land_dataset.head()

### Land Use Indicators Dataset

File contains data about Switzerland and neighbours (Italy, Germany, France, Austria and Liechtenstein).
This file will allows us to know the potential of Switzerland in term of agriculture. Does the country use all its land or not? 

Data exploration and pre-processing is very simmilar to first dataset. We will therefore not describe all steps as precisely as before.

In [None]:
raw_land_use_dataset = pd.read_csv('../data/FAOSTAT_data_LandUseIndicators.csv')

In [None]:
raw_land_use_dataset =raw_land_use_dataset[['Domain', 'Area', 'Element', 'Item', 'Year', 'Unit', 'Value', 'Flag Description']]
print(raw_land_use_dataset.isnull().values.any(axis=0))  # --> PERFECT!
raw_land_use_dataset.head()

In [None]:
print("Size of the DataFrame: {s}\n".format(s=raw_land_use_dataset.shape))
print("Variable types present in DataFrame: \n{t}".format(t=raw_land_use_dataset.dtypes))

In [None]:
print(raw_land_use_dataset['Domain'].unique())
print(raw_land_use_dataset['Area'].unique())
print(raw_land_use_dataset['Element'].unique())
print(raw_land_use_dataset['Item'].unique())
print(raw_land_use_dataset['Year'].unique())
print(raw_land_use_dataset['Unit'].unique())
print(raw_land_use_dataset['Flag Description'].unique())

### Land Cover Dataset

File contains data about Switzerland and neighbours (Italy, Germany, France, Austria and Liechtenstein). It is complementary to the previous one, since it **also references urban areas**.

Data exploration and pre-processing is very simmilar to first dataset. We will therefore not describe all steps as precisely as before.

In [None]:
raw_land_cover_dataset = pd.read_csv('../data/FAOSTAT_data_LandCover.csv')

In [None]:
raw_land_cover_dataset =raw_land_cover_dataset[['Domain', 'Area', 'Element', 'Item', 'Year', 'Unit', 'Value', 'Flag Description']]
raw_land_cover_dataset.drop(index=raw_land_cover_dataset[raw_land_cover_dataset['Flag Description'].str.contains('Data not available')].index, inplace=True)
print(raw_land_cover_dataset.isnull().values.any(axis=0))  # --> PERFECT!

In [None]:
pd.options.mode.chained_assignment = None  # default='warn', Mutes warnings when copying a slice from a DataFrame.
raw_land_cover_dataset["Surface"] = raw_land_cover_dataset.Value.apply(lambda x: x*1000)
raw_land_cover_dataset = raw_land_cover_dataset.drop(columns='Value')
raw_land_cover_dataset['Unit'] = 'ha'
raw_land_cover_dataset.head()

In [None]:
print("Size of the DataFrame: {s}\n".format(s=raw_land_cover_dataset.shape))
print("Variable types present in DataFrame: \n{t}".format(t=raw_land_cover_dataset.dtypes))

In [None]:
print(raw_land_cover_dataset['Domain'].unique())
print(raw_land_cover_dataset['Area'].unique())
print(raw_land_cover_dataset['Element'].unique())
print(raw_land_cover_dataset['Item'].unique())
print(raw_land_cover_dataset['Year'].unique())
print(raw_land_cover_dataset['Unit'].unique())
print(raw_land_cover_dataset['Flag Description'].unique())

### Demography Dataset

File contains data about Switzerland and neighbours (Italy, Germany, France, Austria but not Liechtenstein -data missing from dataset).
These data will allow us to know the number of consumers in Switzerland and to compare the possible food self-sufficiency between Switzerland and its neighbours. we would like to answer questions as: With the growing population, can we feed everybody with Swiss agriculture in the next few years?

Data exploration and pre-processing is very simmilar to first dataset. We will therefore not describe all steps as precisely as before.

In [None]:
demography = pd.read_csv('../data/FAOSTAT_data_demography.csv')

In [None]:
demography = demography[['Area', 'Year','Unit', 'Value']]

In [None]:
pd.options.mode.chained_assignment = None  # default='warn', Mutes warnings when copying a slice from a DataFrame.
demography["Population"] = demography.Value.apply(lambda x: x*1000)
demography=demography.drop(columns='Value')
demography['Unit'] = 'persons'
demography.head()

In [None]:
for col in demography:
    print (demography[col].unique())

In [None]:
data_demography = demography.loc[demography.Area=='Switzerland'].loc[demography.Year>=1986].loc[demography.Year<=2017].Population

### Swiss importations and exportations of agricultural goods Dataset

Files contain data for Switzerland only. The data provide insight about the trade of agricultural goods, that is the importations and exportations for a given product.
Data exploration and pre-processing is very simmilar to first dataset. We will therefore not describe all steps as precisely as before.

In [None]:
CH_imports = pd.read_csv('../data/FAOSTAT_data_11-23-2019.csv')

In [None]:
CH_imports = CH_imports[['Reporter Countries', 'Partner Countries','Element','Item','Year','Unit','Value','Flag Description']]
CH_imports

In [None]:
CH_exports = pd.read_csv('../data/FAOSTAT_data_exports.csv')

In [None]:
CH_exports = CH_exports[['Reporter Countries', 'Partner Countries','Element','Item','Year','Unit','Value','Flag Description']]
CH_exports

In [None]:
CH_trade = pd.concat([CH_imports, CH_exports])

In [None]:
for col in CH_trade:
    print (CH_trade[col].unique())

For maximizing reliability of later results, we discard the numbers that were obtained from an unofficial source.

In [None]:
unofficial_stats_index = CH_trade.loc[CH_trade['Flag Description']=='Unofficial figure'].index
# Drop the unofficial data
CH_trade = CH_trade.drop(index = unofficial_stats_index)

We keep only the importation and exportation values that are represented in tonnes, so that we can compare it with the agricultural production.

In [None]:
CH_trade = CH_trade.loc[CH_imports.Unit=='tonnes']

In [None]:
#for further task
CH_trade_network=CH_trade.copy()

In [None]:
CH_trade = CH_trade[['Element','Partner Countries', 'Item', 'Year', 'Unit', 'Value']]

In [None]:
CH_trade_network = CH_trade_network[['Element','Reporter Countries','Partner Countries', 'Item', 'Year', 'Unit', 'Value']]

In [None]:
#CH_trade_network = CH_trade_network[['Element','Reporter Countries','Partner Countries', 'Value']]

To keep the model simple, we sum the importations and exportations for a given product over all partner countries.

In [None]:
CH_trade = CH_trade.groupby(['Item', 'Year', 'Element']).agg({'Value':'sum'})\
                                    .reset_index()

In [None]:
CH_trade2 = CH_trade.copy()

In [None]:
CH_trade2.head()

We improve the structure of our dataframe by pivoting its values of importations and exportations.

In [None]:
CH_trade_transformed = pd.pivot(CH_trade,columns = 'Element', values='Value')\
                .rename(columns={'Export Quantity':'Exported Quantity','Import Quantity':'Imported Quantity'})

In [None]:
CH_trade_transformed

In [None]:
CH_trade = pd.concat([CH_trade, CH_trade_transformed], axis=1, join='inner')

In [None]:
CH_trade.drop(columns=['Value', 'Element'], inplace=True)

In [None]:
CH_trade = CH_trade.groupby(['Item', 'Year'])\
                            .agg({'Exported Quantity':'mean','Imported Quantity':'mean'})\
                            .reset_index()
                                    

Combine production and trade data in one dataframe 'CH_data' so that we have all the information at the same place. Note that we don't have values of importations and exportations before 1986 so production of goods before 1986 will not be considered as from here.

In [None]:
CH_crops = raw_CH_crops_dataset[['Area', 'Item','Element', 'Year', 'Unit', 'Value']]

In [None]:
# Merge importations data with production data
CH_data = CH_crops.loc[CH_crops.Area=='Switzerland'].loc[CH_crops.Element=='Production'].loc[CH_crops.Year>= 1986]\
                                    .merge(CH_trade,on=['Item', 'Year'], how='inner')\
                                    .rename(columns={'Value':'Produced Quantity'})



In [None]:
CH_data.head()

Now, combine with the land analysis of Switzerland, the consumers trends and the Swiss demography we could estimate if the country has an interest of producing more of an item, if it is able to produce more of an item and stop its importation of the item. --> # Milestone 3

In [None]:
total_crops_imports = CH_data.groupby('Year').agg({'Produced Quantity':'sum', 'Exported Quantity':'sum', 'Imported Quantity':'sum'})

In [None]:
CH_data2 = CH_data.copy().rename(columns={'Produced Quantity':'Country production', 'Imported Quantity':'Importation', 'Exported Quantity':'Exportation'})
CH_data_transformed = pd.melt(CH_data2, value_vars=['Country production', 'Importation'], id_vars=['Area', 'Element','Item','Year','Unit'], var_name='Input', value_name='Value')

In [None]:
CH_restrained = CH_data_transformed.loc[CH_data_transformed.Item.isin(['Apples','Wheat','Potatoes', 'Maize', 'Maize', 'Grapes', 'Barley'])]

In [None]:
CH_data_transformed_exportations = pd.melt(CH_data2, value_vars='Exportation', id_vars=['Area', 'Element','Item','Year','Unit'], var_name='Input', value_name='Value')

In [None]:
CH_restrained_exportations = CH_data_transformed_exportations.loc[CH_data_transformed_exportations.Item.isin(['Apples','Wheat','Potatoes', 'Maize', 'Sugar beet','Grapes','Barley'])]

In [None]:
CH_trade_network = CH_trade_network[['Element','Reporter Countries','Partner Countries', 'Item', 'Year', 'Unit', 'Value']]
CH_trade_network.head()

### Italian importations and exportations of agricultural goods Dataset


In [None]:
Italy_trade = pd.read_csv('../data/FAOSTAT_data_italy.csv')

In [None]:
Italy_trade.dtypes

In [None]:
unofficial_stats_index_it = Italy_trade.loc[Italy_trade.Flag=='*'].index

In [None]:
# Drop the unofficial data
Italy_trade = Italy_trade.drop(index = unofficial_stats_index_it)

In [None]:
#we keep only tonnes units
Italy_trade = Italy_trade.loc[Italy_trade.Unit=='tonnes']

In [None]:
Italy_trade.drop(index=Italy_trade[Italy_trade['Flag Description'].str.contains('Data not available')].index, inplace=True)

In [None]:
Italy_trade = Italy_trade[['Element','Area', 'Item', 'Year', 'Unit', 'Value']]

To keep the model simple, we sum the importations and exportations for a given product over all partner countries.


In [None]:
Italy_trade = Italy_trade.groupby(['Item', 'Year', 'Element']).agg({'Value':'sum'})\
                                    .reset_index()
Italy_trade.head()

We improve the structure of our dataframe by pivoting its values of importations and exportations.

In [None]:
Italy_trade_transformed = pd.pivot(Italy_trade,columns = 'Element', values='Value')\
                .rename(columns={'Export Quantity':'Exported Quantity','Import Quantity':'Imported Quantity'})
Italy_trade_transformed.head()

In [None]:
Italy_trade = pd.concat([Italy_trade, Italy_trade_transformed], axis=1, join='inner')
Italy_trade.drop(columns=['Value', 'Element'], inplace=True)
Italy_trade = Italy_trade.groupby(['Item', 'Year'])\
                            .agg({'Exported Quantity':'mean','Imported Quantity':'mean'})\
                            .reset_index()

In [None]:
Italy_trade.head()

Combine production and trade data in one dataframe 'Italy_data' so that we have all the information at the same place. Note that we don't have values of importations and exportations before 1986 so production of goods before 1986 will not be considered as from here.

In [None]:
Italy_crops = raw_CH_crops_dataset[['Area', 'Item','Element', 'Year', 'Unit', 'Value']]

In [None]:
# Merge importations data with production data
Italy_data = Italy_crops.loc[Italy_crops.Area=='Italy'].loc[Italy_crops.Element=='Production'].loc[Italy_crops.Year>= 1986]\
                                    .merge(Italy_trade,on=['Item', 'Year'], how='inner')\
                                    .rename(columns={'Value':'Produced Quantity'})



In [None]:
Italy_data.head()

### French importations and exportations of agricultural goods Dataset


In [None]:
France_trade = pd.read_csv('../data/FAOSTAT_data_france.csv')

In [None]:
France_trade.dtypes

In [None]:
unofficial_stats_index_fr = France_trade.loc[France_trade.Flag=='*'].index

In [None]:
# Drop the unofficial data
France_trade = France_trade.drop(index = unofficial_stats_index_fr)

In [None]:
#we keep only tonnes units
France_trade = France_trade.loc[France_trade.Unit=='tonnes']

In [None]:
France_trade.drop(index=France_trade[France_trade['Flag Description'].str.contains('Data not available')].index, inplace=True)

In [None]:
France_trade = France_trade[['Element','Area', 'Item', 'Year', 'Unit', 'Value']]

To keep the model simple, we sum the importations and exportations for a given product over all partner countries.

In [None]:
France_trade = France_trade.groupby(['Item', 'Year', 'Element']).agg({'Value':'sum'})\
                                    .reset_index()
France_trade.head()

We improve the structure of our dataframe by pivoting its values of importations and exportations.

In [None]:
France_trade_transformed = pd.pivot(France_trade,columns = 'Element', values='Value')\
                .rename(columns={'Export Quantity':'Exported Quantity','Import Quantity':'Imported Quantity'})
France_trade_transformed.head()

In [None]:
France_trade = pd.concat([France_trade, France_trade_transformed], axis=1, join='inner')
France_trade.drop(columns=['Value', 'Element'], inplace=True)
France_trade = France_trade.groupby(['Item', 'Year'])\
                            .agg({'Exported Quantity':'mean','Imported Quantity':'mean'})\
                            .reset_index()
France_trade.head()

Combine production and trade data in one dataframe 'France_data' so that we have all the information at the same place. Note that we don't have values of importations and exportations before 1986 so production of goods before 1986 will not be considered as from here.

In [None]:
France_crops = raw_CH_crops_dataset[['Area', 'Item','Element', 'Year', 'Unit', 'Value']]

In [None]:
# Merge importations data with production data
France_data = France_crops.loc[France_crops.Area=='France'].loc[France_crops.Element=='Production'].loc[France_crops.Year>= 1986]\
                                    .merge(France_trade,on=['Item', 'Year'], how='inner')\
                                    .rename(columns={'Value':'Produced Quantity'})

In [None]:
France_data.head()

### Austrian importations and exportations of agricultural goods Dataset

In [None]:
Austria_trade = pd.read_csv('../data/FAOSTAT_data_austria.csv')

In [None]:
Austria_trade.dtypes

In [None]:
unofficial_stats_index_au = Austria_trade.loc[Austria_trade.Flag=='*'].index

In [None]:
# Drop the unofficial data
Austria_trade = Austria_trade.drop(index = unofficial_stats_index_au)

In [None]:
#we keep only tonnes units
Austria_trade = Austria_trade.loc[Austria_trade.Unit=='tonnes']

In [None]:
Austria_trade.drop(index=Austria_trade[Austria_trade['Flag Description'].str.contains('Data not available')].index, inplace=True)

In [None]:
Austria_trade = Austria_trade[['Element','Area', 'Item', 'Year', 'Unit', 'Value']]

To keep the model simple, we sum the importations and exportations for a given product over all partner countries.

In [None]:
Austria_trade = Austria_trade.groupby(['Item', 'Year', 'Element']).agg({'Value':'sum'})\
                                    .reset_index()
Austria_trade.head()

We improve the structure of our dataframe by pivoting its values of importations and exportations.

In [None]:
Austria_trade_transformed = pd.pivot(Austria_trade,columns = 'Element', values='Value')\
                .rename(columns={'Export Quantity':'Exported Quantity','Import Quantity':'Imported Quantity'})
Austria_trade_transformed.head()

In [None]:
Austria_trade = pd.concat([Austria_trade, Austria_trade_transformed], axis=1, join='inner')
Austria_trade.drop(columns=['Value', 'Element'], inplace=True)
Austria_trade = Austria_trade.groupby(['Item', 'Year'])\
                            .agg({'Exported Quantity':'mean','Imported Quantity':'mean'})\
                            .reset_index()
Austria_trade.head()

Combine production and trade data in one dataframe 'Austria_data' so that we have all the information at the same place. Note that we don't have values of importations and exportations before 1986 so production of goods before 1986 will not be considered as from here.

In [None]:
Austria_crops = raw_CH_crops_dataset[['Area', 'Item','Element', 'Year', 'Unit', 'Value']]

In [None]:
# Merge importations data with production data
Austria_data = Austria_crops.loc[Austria_crops.Area=='Austria'].loc[Austria_crops.Element=='Production'].loc[Austria_crops.Year>= 1986]\
                                    .merge(Austria_trade,on=['Item', 'Year'], how='inner')\
                                    .rename(columns={'Value':'Produced Quantity'})

In [None]:
Austria_data.head()

### German importations and exportations of agricultural goods Dataset

In [None]:
Germany_trade = pd.read_csv('../data/FAOSTAT_data_germany.csv')

In [None]:
Germany_trade.dtypes

In [None]:
unofficial_stats_index_ge = Germany_trade.loc[Germany_trade.Flag=='*'].index

In [None]:
# Drop the unofficial data
Germany_trade = Germany_trade.drop(index = unofficial_stats_index_ge)

In [None]:
#we keep only tonnes units
Germany_trade = Germany_trade.loc[Germany_trade.Unit=='tonnes']

In [None]:
Germany_trade.drop(index=Germany_trade[Germany_trade['Flag Description'].str.contains('Data not available')].index, inplace=True)

In [None]:
Germany_trade = Germany_trade[['Element','Area', 'Item', 'Year', 'Unit', 'Value']]
Germany_trade.head()

To keep the model simple, we sum the importations and exportations for a given product over all partner countries.

In [None]:
Germany_trade = Germany_trade.groupby(['Item', 'Year', 'Element']).agg({'Value':'sum'})\
                                    .reset_index()
Germany_trade.head()

We improve the structure of our dataframe by pivoting its values of importations and exportations.

In [None]:
Germany_trade_transformed = pd.pivot(Germany_trade,columns = 'Element', values='Value')\
                .rename(columns={'Export Quantity':'Exported Quantity','Import Quantity':'Imported Quantity'})
Germany_trade_transformed.head()

In [None]:
Germany_trade = pd.concat([Germany_trade, Germany_trade_transformed], axis=1, join='inner')
Germany_trade.drop(columns=['Value', 'Element'], inplace=True)
Germany_trade = Germany_trade.groupby(['Item', 'Year'])\
                            .agg({'Exported Quantity':'mean','Imported Quantity':'mean'})\
                            .reset_index()
Germany_trade.head()

Combine production and trade data in one dataframe 'Germany_data' so that we have all the information at the same place. Note that we don't have values of importations and exportations before 1986 so production of goods before 1986 will not be considered as from here.

In [None]:
Germany_crops = raw_CH_crops_dataset[['Area', 'Item','Element', 'Year', 'Unit', 'Value']]

In [None]:
# Merge importations data with production data
Germany_data = Germany_crops.loc[Germany_crops.Area=='Germany'].loc[Germany_crops.Element=='Production'].loc[Germany_crops.Year>= 1986]\
                                    .merge(Germany_trade,on=['Item', 'Year'], how='inner')\
                                    .rename(columns={'Value':'Produced Quantity'})

In [None]:
Germany_data.head()

### Swiss temperatures Dataset

This dataset does not come from FAOSTATS but from : __[MeteoSwiss](https://www.meteoswiss.admin.ch/home/climate/swiss-climate-in-detail/Swiss-temperature-mean/Data-on-the-Swiss-temperature-mean.html)__

In [None]:
CH_temperatures = pd.read_csv('../data/10.18751-Climate-Timeseries-CHTM-1.1-swiss.txt', sep="\t", header=0, skiprows=15)

In [None]:
CH_temperatures = CH_temperatures.loc[CH_temperatures.time>=1986].loc[CH_temperatures.time<=2017]

In [None]:
CH_temperatures.head()

In [None]:
CH_temperatures = CH_temperatures[["time","winter","summer"]]

In [None]:
CH_temperatures = CH_temperatures.rename(columns={"time":"Year"})
CH_temperatures.head()

### Farmers population Dataset

<div class="alert alert-block alert-warning">
TO DO: ANALYSIS: Is it increasing or decreasing? Why? (new machines, not apealing job anymore....)

Answer the question: Is food selfsuffience of CH realistic ? How many farmer would it need ? 
    

In [None]:
df_employ_basic = pd.read_csv('../data/FAOSTAT_data_12-10-2019_employment.csv')

In [None]:
df_employ_basic.columns = map(str.lower, df_employ_basic.columns)
df_employ = df_employ_basic.drop(columns={'domain code','domain','area code','indicator code','source code',\
                              'year code'}).copy()
df_employ.head()

In [None]:
df_employ.indicator.unique()

### Fertilizers and Pesticides Dataset

In [None]:
fertilizers_dataset = pd.read_csv('../data/FAOSTAT_data_fertilizers.csv')

In [None]:
fertilizers_dataset =fertilizers_dataset[['Domain', 'Area', 'Element', 'Item', 'Year', 'Unit', 'Value', 'Flag Description']]

In [None]:
fertilizers_dataset.head()

In [None]:
fertilizers_dataset.Area.unique() #No data available for Germany and Liechstenstein 

In [None]:
#Compute total use of fertilizer by year (=combine all types)
fert_sum = fertilizers_dataset.groupby(['Area','Year'])\
                              .agg({'Value':'sum'})\
                              .rename(columns={'Value':'Sum'})\
                              .reset_index()                            
fert_sum.head()

In [None]:
pesticides_dataset = pd.read_csv('../data/FAOSTAT_data_pesticides.csv')

In [None]:
pesticides_dataset = pesticides_dataset[['Domain', 'Area', 'Element', 'Item', 'Year', 'Unit', 'Value', 'Flag Description']]

In [None]:
pesticides_dataset.head()

## **Investigation plots**


<span style='background :gray' > These are very general plots. </span>

We use them to get a quick look at our data.

### <span style='background :gray' > - Production of all countries over time for a selected crop - </span>

**Dataset :** Crops

**Data :** Production of particular item over years by countries.

**Notes :** This plot is interactive. It allows you to select for an item (apples, berries..) and shows you its production over years for the 6 countries (CH + Neighbours as listed above).


In [None]:
#Interactive visualization

#Plot the production of selected item for all countries over years
def viz_evolution(item):
    df_viz_evolution = raw_CH_crops_dataset.loc[raw_CH_crops_dataset['Element']=='Production'].loc[raw_CH_crops_dataset['Item']==item]
    
    # multiple line plot
    plt.figure(figsize=(20,10))
    plt.plot( 'Year', 'Value', data=df_viz_evolution.loc[df_viz_evolution['Area']=='Austria'], marker='', color='green',  label = 'Austria')
    plt.plot( 'Year', 'Value', data=df_viz_evolution.loc[df_viz_evolution['Area']=='France'], marker='', color='skyblue', label = 'France')
    plt.plot( 'Year', 'Value', data=df_viz_evolution.loc[df_viz_evolution['Area']=='Switzerland'], marker='', color='red', label = 'Switzerland', linewidth=3)
    plt.plot( 'Year', 'Value', data=df_viz_evolution.loc[df_viz_evolution['Area']=='Germany'], marker='', color='orange', label = 'Germany')
    plt.plot( 'Year', 'Value', data=df_viz_evolution.loc[df_viz_evolution['Area']=='Italy'], marker='', color='grey', label = 'Italy')
    
    plt.legend() 
    plt.title(f'Production of {item} in Switzerland and its neighbours throughout years', fontsize= 20)
    plt.xlabel("Year", fontsize= 20)
    plt.ylabel("Values", fontsize= 20)
    plt.show()
   
items = raw_CH_crops_dataset.Item.unique()
interact(viz_evolution, item = items)    

<div class="alert alert-block alert-success">
    For most of the items, Switzerland has the lowest production values. This can be explained by the small size of this country but to better understand those values and to know if their are sufficient to feed the Swiss population we will analyse how the swiss lands are used and occupied and look at the swiss demography. 
    We will also analyse swiss importations and exportations to know what Switzerland need and try to estimate if the country could produce it by it self. 

### <span style='background :gray' > - Plot production/area_harvested for all items of all countries over time - </span>

**Dataset :** Crops

**Data :** Ration of production over area harvested for all items over years by countries.

**Notes :** This plot is interactive. It allows you to select for an element (production, area harvested or yield) and shows you the sum for all items for each country over years (CH & Co).


In [None]:
# Sum of area/yiel/production of items by country and year
crops_sum = raw_CH_crops_dataset.groupby(['Area', 'Element','Year']) \
                                .agg({'Value':'sum'}) \
                                .rename(columns={'Value':'Sum'}) \
                                .reset_index()

In [None]:
#Interactive visualization

#Plot the area harvested (sum of all items) for all countries over years
def viz_sum_evolution(element):
    df_viz_sum_evolution = crops_sum.loc[crops_sum['Element']== element]
    
    # multiple line plot
    plt.figure(figsize=(20,10))
    plt.plot( 'Year', 'Sum', data=df_viz_sum_evolution.loc[df_viz_sum_evolution['Area']=='Austria'], marker='', color='green',  label = 'Austria')
    plt.plot( 'Year', 'Sum', data=df_viz_sum_evolution.loc[df_viz_sum_evolution['Area']=='France'], marker='', color='skyblue', label = 'France')
    plt.plot( 'Year', 'Sum', data=df_viz_sum_evolution.loc[df_viz_sum_evolution['Area']=='Switzerland'], marker='', color='red', label = 'Switzerland', linewidth=3)
    plt.plot( 'Year', 'Sum', data=df_viz_sum_evolution.loc[df_viz_sum_evolution['Area']=='Germany'], marker='', color='orange', label = 'Germany')
    plt.plot( 'Year', 'Sum', data=df_viz_sum_evolution.loc[df_viz_sum_evolution['Area']=='Italy'], marker='', color='grey', label = 'Italy')
    
    plt.legend() 
    plt.title(f'{element} of all items in Switzerland and its neighbours throughout years', fontsize= 20)
    plt.xlabel("Year", fontsize= 20)
    plt.ylabel("Values", fontsize= 20)
    plt.show()
   
elements = crops_sum.Element.unique()
interact(viz_sum_evolution, element = elements)  

<div class="alert alert-block alert-success">
    
   Switzerland has the lowest production and area harvested of all items throughout years but it allways have one of the higher yield and it is increasing. 
   

### <span style='background :gray' > - Switzerland land area distribution, year 2017 - </span>

**Dataset :** Land Use Area

**Data :** Area for different categories

**Notes :** This plot is interactive. Shows values when cursor passes over the graph.


In [None]:
dist_land_CH_2017= dist_land_dataset.loc[dist_land_dataset['Area']=='Switzerland'].loc[dist_land_dataset['Year']==2017]

In [None]:
parent = [np.NaN, 'Country area', 'Land area', 'Agriculture', 'Agricultural land', 'Cropland', 'Arable land', 'Arable land', 'Arable land', 'Cropland', 'Agriculural land',
    'Land under perm. meadows and pastures', 'Land under perm. meadows and pastures', 'Agriculture','Land area', 'Forestry', 'Forest land', 'Forest land', 'Forest land',
    'Land area', 'Country area', np.NaN, np.NaN,np.NaN,np.NaN,np.NaN,np.NaN]
dist_land_CH_2017['Parent'] = parent
dist_land_CH_2017

In [None]:
fig =go.Figure(go.Sunburst(
    labels=['Country area','Land area', 'Agriculture', 'Agricultural land', 'Cropland', 'Arable land', 'Land under temporary crops',
       'Land under temp. meadows and pastures', 'Land with temporary fallow', 'Land under permanent crops',
       'Land under perm. meadows and pastures', 'Perm. meadows & pastures - Cultivated',
       'Perm. meadows & pastures - Nat. growing', 'Land under protective cover', 'Forestry', 'Forest land',
       'Primary Forest', 'Other naturally regenerated forest', 'Planted Forest', 'Other land','Inland waters'],
    parents=['','Country area', 'Land area', 'Agriculture', 'Agricultural land', 'Cropland', 'Arable land',
       'Arable land', 'Arable land', 'Cropland', 'Agricultural land', 'Land under perm. meadows and pastures',
       'Land under perm. meadows and pastures', 'Agriculture', 'Land area', 'Forestry',
       'Forest land', 'Forest land', 'Forest land', 'Land area','Country area'],
    values=[4.1290390e+06, 3.9516030e+06, 1.5398066e+06, 1.5129990e+06,
       4.2308850e+05, 3.9818400e+05, 2.7100630e+05, 1.2422920e+05,
       2.9484000e+03, 2.4904500e+04, 1.0899105e+06, 6.1734310e+05,
       4.7256740e+05, 2.6807600e+04, 1.2540000e+06, 1.2540000e+06, 4.0000000e+04,
       1.0420000e+06, 1.7200000e+05, 1.1577964e+06, 1.7743600e+05],
))
# Update layout for tight margin
fig.update_layout(margin = dict(t=0, l=0, r=0, b=0))

fig.show()

<div class="alert alert-block alert-success">

**Country area =** Land area + Inland waters
    
**Land area =** Forestry + Agriculture + Other lands

**Forestry =** Forest land
    
**Agriculture =** Land under protective cover + Agricultural land
    
**Forest land =** Primary forset + Planted forest + Other naturally regenerated forest
    
**Agricultural land =** Cropland + Land under perm. meadows and pastures
    
**Cropland =** Arable land + Land under permanent crops
    
**Land under perm. meadows and pastures =** Perm. meadows and pastures Cultivated + Perm. meadows and pastures Nat. growing
    
**Arable land =** Land under temp. meadows and pastures + Land under temporary crops + Land under temp. fallow
    

### <span style='background :gray' > - Switzerland land area and agricultral land distribution, year 2017 - </span>

**Dataset :** Land Use Area

**Data :** Area for different categories

**Notes :** None.


In [None]:
fig, (ax1,ax2) = plt.subplots(1,2,figsize=(30,10))

size = 0.25

cmap1 = plt.cm.Reds
cmap2 = plt.cm.Greens
cmap3 = plt.cm.Purples
cmap4 = plt.cm.Oranges

outer_colors = [cmap1(.8), cmap2(.8), cmap3(.8)]
inner_colors = [*cmap1(np.linspace(.6, .1, 3)), *cmap2(np.linspace(.6, .2, 2)), *cmap3(np.linspace(.6, .2, 2))]
labels_1o = ['Forest land','Agriculture','Other land']
labels_1 = ['PF','NRF','PF','PC','AL','','OL','','']
labels_1i = ['Primary Forest','Other naturally regenerated forest','Planted Forest','Land under protective cover','Agricultural land','','Other land','','']
vals1 = np.array([[40000,1042000,172000], [26807.6,1512999,0], [1157796.4,0,0]])

ax1.axis('equal')
ax1.pie(vals1.sum(axis=1), radius=1, colors=outer_colors,labels=labels_1o, labeldistance=0.8, wedgeprops=dict(width=size, edgecolor='w'))
ax1.pie(vals1.flatten(), radius=1-size, colors=inner_colors, labels=labels_1, labeldistance=0.6, wedgeprops=dict(width=size, edgecolor='w'))
plt.margins(0,0)
ax1.set(aspect="equal", title='Land area distribution')


labels_2o = ['Cropland','Land under perm. meadows & pastures']
labels_2 = ['AL','Pc','C','Ng']
labels_2i = ['Arable land','Land under perm. crops','Cultivated','Naturaly growing']
vals2 = np.array([[398184,24904.5], [617343.1,472567.4]])
              
ax2.axis('equal')
ax2.pie(vals2.sum(axis=1), radius=1, colors=[cmap2(.8), cmap4(.8)], labels=labels_2o, labeldistance=0.8, wedgeprops=dict(width=size, edgecolor='w'))
ax2.pie(vals2.flatten(), radius=1-size, colors=[*cmap2(np.linspace(.6, .1, 2)), *cmap4(np.linspace(.6, .2, 2))], labels=labels_2, labeldistance=0.6, wedgeprops=dict(width=size, edgecolor='w'))
plt.margins(0,0)
ax2.set(aspect="equal", title='Agricultural land distribution') 

fig.set_facecolor('white') #backgroung color
plt.legend(loc=(0.9, 0.1))

handles, labels = ax1.get_legend_handles_labels()
ax1.legend(handles[3:], labels_1i, loc=(0.9, 0.1))
handles, labels = ax2.get_legend_handles_labels()
ax2.legend(handles[2:], labels_2i, loc=(0.9, 0.1))

plt.show()


<div class="alert alert-block alert-success">

There are large parts of land that are unproper to agricultural use in Switzerland.    
Howerver, if Switzerland wanted to increase its area harvested, it could use all the fiels dedicated to animal husbandry (in a future were we would all be vegetarian), or at least make a better use of arable lands and/or land under permanent meadows and pastures (could be reduced a bit without much consequences).
    
    Should we estimate the rise in production if those lands were used for food production instead of animal husbandery?

### <span style='background :gray' > - Switzerland land area and agricultral land distribution, year 2016 - </span> 
#### <span style='background :red' > **Maybe to delete** car les chiffres sont moyens</span>  


**Dataset :** Land Use Indicators

**Data :** Land use and agricultural land use indicators

**Notes :** None.


In [None]:
import matplotlib.pyplot as plt

# DataFrames to plot
df_land = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='Switzerland'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Land area']
df_agri = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='Switzerland'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Agricultural land']

# Pie plot #1
labels1 = df_land.Item
sizes1 = df_land.Value
explode = (0, 0, 0.1, 0)  # only "explode" the 3rd slice

fig1, ax1 = plt.subplots()
ax1.pie(sizes1, explode=explode, labels=labels1, autopct='%1.1f%%',
        shadow=True, startangle=90)
#ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax1.title.set_text('Distribution of lands in Switzerland, year 2016')
fig1.set_facecolor('white')

# Pie plot #2
labels2 = df_agri.Item
sizes2 = df_agri.Value
fig1, ax2 = plt.subplots()
ax2.pie(sizes2, labels=labels2, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax2.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax2.title.set_text('Distribution of agricultural lands in Switzerland, year 2016')

# General settings
fig1.set_facecolor('white')
plt.show()
df_land

<div class="alert alert-block alert-info">

From the first graphes (distribution of lands) we can see that only 45,2% of Switzerland lands are used in agriculture compared to Frane, Italy or Germany where around 64% of lands are exploited in agriculture. Lands expoited in agriculture are the sum of cropland and agriculture land. We can see that the percentage of forest is quite similar between those three countries and that the main difference reside in the percentage of lands attribuated to meadows and pastures. For example France use half land less than Switzerland for meadows and pastures, Germany more than half less and Italy use only one third of what Switzerland attribute. We can deduce from these plots that Switzerland is more dedicated to dairy products and breeding. 
    When comparing Sxitzerland with Liechtenstein, we find more similarities as the percentage of land used in agriculture is 42,5%.
    From the second graphes (distribution of agricultural lands) we can see that the majority of Switzerland agricultural lands are under permanant meadows and pastures. This is a huge amount compared to the other countries which promote crop and arable lands. This suit our previous conviction than Switzerland is more dedicated to dairy products and breedings. We could hypothetize that Switzerland may be obligated to reduce this part of dedicated land to meadows and pastures in order to become food self efficient. This would also induce work  and policy transitions and impact the Swiss economy. 
    However an important aspect which is not shown by this data are the part of urban lands. We should add it to our analysis. 
    
ps: Arable lands are lands that can or are cultivable

### <span style='background :gray' > - Switzerland land cover, with artificial areas, year 2016 - </span>  


**Dataset :** Land Cover

**Data :** Ditribution of land between artificial and natural areas

**Notes :** This plot overlaps a bit the others, but contains additional information about atrificial areas.


In [None]:
import matplotlib.pyplot as plt

# DataFrames to plot
df_artificial_surface = raw_land_cover_dataset.loc[raw_land_cover_dataset['Area']=='Switzerland'].loc[raw_land_cover_dataset['Year']==2016].loc[raw_land_cover_dataset['Element']=='Area from MODIS']
# Pie plot #1
labels1 = df_artificial_surface.Item
sizes1 = df_artificial_surface.Surface

fig1, ax1 = plt.subplots(figsize=(30,15))
ax1.pie(sizes1, labels=labels1, labeldistance=1.05, autopct='%1.1f%%', shadow=True, startangle=45)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax1.title.set_text('Distribution of artificial lands in Switzerland, year 2016')
fig1.set_facecolor('white')

# General settings
fig1.set_facecolor('white')
plt.show()
df_artificial_surface

<div class="alert alert-block alert-info">
    
        Comparison with neighbours:
    
In Switzerland, 60,5 % of the lands are not usable for agriculture (sum of artificial, inland water bodies,snow and tree-covered areas).
    
In France, 51 % of the lands are not usable for agriculture. 
    
In Germany, 52,6 %
    
In Italy only 41,7%
    
In Austria 66,8 %
    
In Leichtenstein: no datas
    
So compared to its neighbours Switzerland have one of the smallest ratio of usable lands for Agriculture but still manage to have the best yields. 
    
Now that we have the general distribution of lands for each country, lets focus our plots on the agriculture lands. 

<div class="alert alert-block alert-success">
    
    Around 60 % of Switzerland land are unusable for agriculture (sum of forests, shrub covered areas,  inland water bodies, permanent snow and glaciers and artificial surfaces). 
    Inland water bodies are lakes and  artificial areas are urban areas. 
    This percentage can be increased via deforestation but it is not in the Interest of the country for environmental issues. 

### <span style='background :gray' > - Domographic evolution of Switzerland over the years - </span>

**Dataset :** Demography

**Data :** Population for different countries over years

**Notes :** None.


In [None]:
#plot of the evolution of the demography over the years

plt.figure(figsize=(20,10))
plt.plot( 'Year', 'Population', data=demography.loc[demography['Area']=='Austria'], marker='', color='green',  label = 'Austria')
plt.plot( 'Year', 'Population', data=demography.loc[demography['Area']=='France'], marker='', color='skyblue', label = 'France')
plt.plot( 'Year', 'Population', data=demography.loc[demography['Area']=='Switzerland'], marker='', color='red', label = 'Switzerland', linewidth=3)
plt.plot( 'Year', 'Population', data=demography.loc[demography['Area']=='Germany'], marker='', color='orange', label = 'Germany')
plt.plot( 'Year', 'Population', data=demography.loc[demography['Area']=='Italy'], marker='', color='grey', label = 'Italy')
    
plt.legend() 
plt.title('Evolution of the demography over the years' , fontsize= 20)
plt.xlabel("Year", fontsize= 20)
plt.ylabel("Population value", fontsize= 20)
plt.show()

In [None]:
min_swiss_demography = demography[demography.Area.str.contains('Switzerland')].Population.min()
min_swiss_demography

In [None]:
max_swiss_demography = demography[demography.Area.str.contains('Switzerland')].Population.max()
max_swiss_demography

In [None]:
delta_swiss_demography= max_swiss_demography - min_swiss_demography
delta_swiss_demography

<div class="alert alert-block alert-success">
    
We can see that as expected, the population is growing in every country. From 1950 to 2018 the swiss population has increased by 0,38*10^7 persons. It has allmost double, so if the wold's predictions about demographic growth reveal to be true for the next years, how could Switzerland become self-sufficcient? 
    

<div class="alert alert-block alert-warning">
    idea: we could make linear regression for each country so we get the slope of the demographic growth and we can compare and extrapolate it to further years for our predictive model? 

### <span style='background :gray' > - Production, exports and imports of items in Switzerland over years. - </span>

**Dataset :** Importaions/Exportations (CH)

**Data :** Importation and Exportations of different items 

**Notes :** This plot is interactive. It allows you to choose for an item (apples, berries..) and shows you its production, exportation and importation in Switzerland over years.


In [None]:
#Interactive visualization

#Plot the production of selected item for all countries over years
def viz_evolution(item):
    df_viz_evolution = CH_data.loc[CH_data['Item']==item]
    
    # multiple line plot
    plt.figure(figsize=(20,10))
    plt.plot( 'Year', 'Produced Quantity', data=df_viz_evolution, marker='', color='red', label = 'production', linewidth=3)
    plt.plot('Year', 'Imported Quantity', data=df_viz_evolution, marker='', color='blue', label = 'imports', linewidth=3)
    plt.plot('Year', 'Exported Quantity', data=df_viz_evolution, marker='', color='green', label = 'exports', linewidth=3) 
    plt.legend() 
    plt.title(f'Production and imports of {item} in Switzerland throughout years', fontsize= 20)
    plt.xlabel("Year", fontsize= 20)
    plt.ylabel("Values [tonnes]", fontsize= 20)
    plt.show()
   
items = CH_data.Item.unique()
interact(viz_evolution, item = items)    

**Most produced, imported and exported products :**

- Most produced crops products

In [None]:
CH_data.loc[CH_data.Year == 2016].sort_values(by='Produced Quantity', ascending = False).head(10)

- Most imported crops products

In [None]:
CH_data.loc[CH_data.Year == 2016].sort_values(by='Imported Quantity', ascending = False).head(10)

- Most exported crops products

In [None]:
CH_data.loc[CH_data.Year == 2016].sort_values(by='Exported Quantity', ascending = False).head(10)

In [None]:
total_export_quantity = CH_data["Exported Quantity"].sum()
total_export_quantity

In [None]:
total_import_quantity = CH_data["Imported Quantity"].sum()
total_import_quantity

In [None]:
dv=total_import_quantity/total_export_quantity
dv

<div class="alert alert-block alert-success">
    
We can see that some of the most produced items are also some of the more imported like potatoes, wheat, maize, grappes, lettuce and chicory and sugar beet. This can show a high consumption of the item by the population and can indicate us that one of the priority could be to increase their production. In the more exported items, it is not suprising to find several items that are highly produced by Switzerland such as wheat, potatoes, apples, maise and Barley.  But we also find some strange items like oilseeds nes which are more imported than produced and then exported in a higher quantity than the one produce which indicates an economic advantage in this transition. 
    When summing the total amount of exported and imported products we can see that switzerland import 70 times more products. But is this in a transit goal ( see oilseeds ex) ? or for consumption? 

**Less produced, imported and exported products :**

- Less produced crops products

In [None]:
CH_data.loc[CH_data.Year == 2016].sort_values(by='Produced Quantity', ascending = True).head(10)

<div class="alert alert-block alert-success">
    Does the less produced items corresponds to the most imported ones? 
    we can see that none of these items figured in the most imported ones.. is it related to the consumption trends of swiss people? Is it necessary to increase their production if they do not seems necessary? 
    The only ecxeption is Oilseed nes but as discussed previously they are also exported. 
    
    

- Less imported products

In [None]:
CH_data.loc[CH_data.Year == 2016].sort_values(by='Imported Quantity', ascending = True).head(10)

<div class="alert alert-block alert-success">
    Does the less imported items corresponds to the most produced and exported ones? which would suggest high sufficiency of the items. 
    we can see not math between the less imported and the most imported an produced items. 
    

- less exported crops products

In [None]:
CH_data.loc[CH_data.Year == 2016].sort_values(by='Exported Quantity', ascending = True).head(10)

<div class="alert alert-block alert-success">

Are the less exported items the less produced and the more imported ones? 
    No similarities between the less produced and the less exported items, same for importations. 
    

<div class="alert alert-block alert-warning">

Idea: on pourrait faire des matrices de corrélations pour répondre à ces questions?

### <span style='background :gray' > - Production, exports and imports of items for CH neigbours over years. - </span>

**Dataset :** Importaions/Exportations (I)(FR)(G)(AU) + crops

**Data :**  Most produced/imported/expoted items 

**Notes :** 


#### Italy

- Most produced Items

In [None]:
Italy_data.loc[Italy_data.Year == 2016].sort_values(by='Produced Quantity', ascending = False).head(10)

- Most exported Items

In [None]:
Italy_data.loc[Italy_data.Year == 2016].sort_values(by='Exported Quantity', ascending = False).head(10)

- Most imported Items

In [None]:
Italy_data.loc[Italy_data.Year == 2016].sort_values(by='Imported Quantity', ascending = False).head(10)

#### France

- Most produced Items

In [None]:
France_data.loc[France_data.Year == 2016].sort_values(by='Produced Quantity', ascending = False).head(10)

- Most exported Items

In [None]:
France_data.loc[France_data.Year == 2016].sort_values(by='Exported Quantity', ascending = False).head(10)

- Most imported Items

In [None]:
France_data.loc[France_data.Year == 2016].sort_values(by='Imported Quantity', ascending = False).head(10)

#### Austria

- Most produced items

In [None]:
Austria_data.loc[Austria_data.Year == 2016].sort_values(by='Produced Quantity', ascending = False).head(10)

- Most exported items

In [None]:
Austria_data.loc[Austria_data.Year == 2016].sort_values(by='Exported Quantity', ascending = False).head(10)

- Most imported items

In [None]:
Austria_data.loc[Austria_data.Year == 2016].sort_values(by='Imported Quantity', ascending = False).head(10)

#### Germany

- Most produceditems

In [None]:
Germany_data.loc[Germany_data.Year == 2016].sort_values(by='Produced Quantity', ascending = False).head(10)

- Most exported items

In [None]:
Germany_data.loc[Germany_data.Year == 2016].sort_values(by='Exported Quantity', ascending = False).head(10)

- Most imported items

In [None]:
Germany_data.loc[Germany_data.Year == 2016].sort_values(by='Imported Quantity', ascending = False).head(10)

<div class="alert alert-block alert-success">

We can observe that apples, maize, potatoes wheats and Sugar beet are important items for all countries as they are often amoung the most produced, exported and imported items. 
So we can focus our study on those products to answer the question of swiss food suficiency. 

    on devrait remplacer oat par autre chose... What about Sugar beet??

### <span style='background :gray' > - TITLE - </span>

**Dataset :** 

**Data :** 

**Notes :** 

In [None]:
import plotly # conda install -c anaconda plotly #AND# jupyter labextension install @jupyterlab/plotly-extension
import plotly.graph_objects as go
y_wheat = CH_data.loc[CH_data.Year == 2016].loc[CH_data.Item=='Wheat'].values[0,-3:]
y_potatoes = CH_data.loc[CH_data.Year == 2016].loc[CH_data.Item=='Potatoes'].values[0,-3:]
y_beet = CH_data.loc[CH_data.Year == 2016].loc[CH_data.Item=='Sugar beet'].values[0,-3:]
y_maize = CH_data.loc[CH_data.Year == 2016].loc[CH_data.Item=='Maize'].values[0,-3:]
y_apples = CH_data.loc[CH_data.Year == 2016].loc[CH_data.Item=='Apples'].values[0,-3:]
y_barley = CH_data.loc[CH_data.Year == 2016].loc[CH_data.Item=='Barley'].values[0,-3:]
y_grapes = CH_data.loc[CH_data.Year == 2016].loc[CH_data.Item=='Grapes'].values[0,-3:]



x=['Produced', 'Exported', 'Imported']
fig = go.Figure(go.Bar(x=x, y=y_wheat, name='Wheat'))
fig.add_trace(go.Bar(x=x, y=y_potatoes, name='Potatoes'))
fig.add_trace(go.Bar(x=x, y=y_beet, name='Sugar beet'))
fig.add_trace(go.Bar(x=x, y=y_maize, name='Maize'))
fig.add_trace(go.Bar(x=x, y=y_apples, name='Apples'))
fig.add_trace(go.Bar(x=x, y=y_barley, name='Barley'))
fig.add_trace(go.Bar(x=x, y=y_grapes, name='Grapes'))

fig.update_layout(
    title='Most produced, exported and imported items in Switzerland in 2016',
    yaxis_title="Values [tonnes]",
    barmode='stack', 
    font=dict(
        family="Courier New, monospace",
        size=16,
        color="#7f7f7f")
    )
fig.show()

### <span style='background :gray' > - Plot production,  importation and exportation of selected item in Switzerland throughout years - </span>

**Dataset :** Importaions/Exportations (CH)

**Data :** Importation and Exportations of different items 

**Notes :** This plot is interactive. 


In [None]:
import plotly.graph_objects as go

def viz_potatoe(item):
    y_wheat = CH_data.loc[CH_data.Item==item].values[:,-3:]
    fig = go.Figure()
    fig.add_trace(go.Scatter(x=CH_data.Year.unique(), y=y_wheat[:,0], fill='tonexty', name='Produced')) # fill down to xaxis
    fig.add_trace(go.Scatter(x=CH_data.Year.unique(), y=y_wheat[:,1], fill='tozeroy', name='Exported')) # fill to trace0 y
    fig.add_trace(go.Scatter(x=CH_data.Year.unique(), y=y_wheat[:,2], fill='tonexty', name='Imported')) # fill to trace0 y
    fig.update_layout(
        title=f"{item} importations and productions throughout years in Switzerland",
        yaxis_title="Values [tonnes]",
        xaxis_title='Years'
        )
    fig.show()

items = CH_data.Item.unique()
interact(viz_potatoe, item = items)  


### <span style='background :gray' > - Plot production,  importation and exportation in Switzerland throughout years - </span>

**Dataset :** Importaions/Exportations (CH)

**Data :** Importation and Exportations of different items 

**Notes :** This plot is interactive. Shows values upon cursor selection. As reported before, values of exportations are much lower than those of production and importations. Hence, exportations values will now be plot separately, to better show their trend.


In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=CH_data.Year.unique(), y=total_crops_imports['Produced Quantity'].values, fill='tonexty', name='Produced')) # fill down to xaxis
fig.add_trace(go.Scatter(x=CH_data.Year.unique(), y=total_crops_imports['Imported Quantity'].values, fill='tozeroy', name='Imported')) # fill to trace0 y
fig.add_trace(go.Scatter(x=CH_data.Year.unique(), y=total_crops_imports['Exported Quantity'].values, fill='tozeroy', name='Exported'))
fig.update_layout(
    title="Sum of all importations, exportations and productions throughout years in Switzerland",
    yaxis_title="Values [tonnes]",
    xaxis_title='Years'
    )
fig.show()

<div class="alert alert-block alert-success">
    
Here again we can see that Switzerland is a very small exporter and is quite constant with its importations. Nevertheless, its importations seems to be slightly increasing since 2005. Is it because of demand for food diversity or because of production issues? Since The production values through years seems constantly varying we could follow the first hypothesis (ask for food diversity as a consequence of the mondialisation). 
    
    For the analysis + writing of report, I would say that:
        - CH is  a good player after all, since produces way more that imports (ratio 3:1)
        - For in raise in importations: I would say that a higher demand on diversity would not affect the curve since diversity does not mean quantity (actually we like chocolate but still eat more rice+ "common food products" ) --> To me it would rather reflect the fact that demography increases linearly and CH is ~cst in production = need to import more!

As the exportations are hardly visible on the previous graph due to scale differences, we will plot them alone.

Maybe we could add here an intercative plot where we can select the item to see (but for now we are sure if we can/how to combine plotly and ipwidgets libraries)

In [None]:
fig = px.area(CH_restrained, x="Year", y="Value", color='Item',
      line_group='Input')
fig.update_layout(
    title="Switzerland's production evolution for 6 most important items (no sugar beet)",
    yaxis_title="Values [tonnes]",
    xaxis_title='Years'
    )
fig.show()

In [None]:
fig = px.area(CH_restrained, x="Year", y="Value", color='Item',
      line_group='Input')

fig.update_layout(
    title="Switzerland's importation evolution for 6 most important items (no sugar beet)",
    yaxis_title="Values [tonnes]",
    xaxis_title='Years'
    )
fig.show()

In [None]:
fig = px.area(CH_restrained, x="Year", y="Value", color='Item',
      line_group='Input')

fig.update_layout(
    title="Switzerland's importation and production evolution for 6 most important items (no sugar beet)",
    yaxis_title="Values [tonnes]",
    xaxis_title='Years'
    )
fig.show()

In [None]:
import plotly.express as px
fig = px.area(CH_restrained_exportations, x="Year", y="Value", color='Item',
      line_group='Input')
fig.update_layout(
    title="Switzerland's exportations evolution for five most important items over time",
    yaxis_title="Values [tonnes]",
    xaxis_title='Years'
    )
fig.show()

In [None]:
CH_restrained = CH_crops.loc[CH_crops.Area=='Switzerland'].loc[CH_crops.Item.isin(['Apples','Wheat','Potatoes', 'Maize', 'Grapes', 'Barley'])].loc[CH_crops.Element=='Yield']

import plotly.express as px
fig = px.area(CH_restrained, x="Year", y="Value", color='Item',
      line_group='Item')
fig.update_layout(
    title="Switzerland's yield evolution for 6 most important items (no sugar beet)",
    yaxis_title="Values [hg/ha]",
    xaxis_title='Years'
    )
fig.show()

### <span style='background :gray' > -  Plot : Is there a correlation between production and temperature? - </span>

**Dataset :** Swiss temperatures

**Data :** 

**Notes :** 


In [None]:
CH_temperatures.head()

In [None]:
#we should make an interactive plot so we can select a food element and see how its production is affected
#by temperatures changes. 
years = np.sort(CH_data.Year.unique())
fig, ax1 = plt.subplots()
data1 = CH_data.loc[CH_data.Item=='Potatoes']['Produced Quantity']
data2 = CH_temperatures.summer

color = 'tab:red'
ax1.set_xlabel('year')
ax1.set_ylabel('production', color=color)
ax1.plot(years, data1, color=color)
ax1.tick_params(axis='y', labelcolor=color)

ax2 = ax1.twinx()  # instantiate a second axes that shares the same x-axis

color = 'tab:blue'
ax2.set_ylabel('temperature', color=color)  # we already handled the x-label with ax1
ax2.plot(years, data2, color=color)
ax2.tick_params(axis='y', labelcolor=color)

fig.tight_layout()  # otherwise the right y-label is slightly clipped
plt.title('Potatoes production and temperatures every year')
plt.show()

<div class="alert alert-block alert-success">
As temperature increase, the production decrease. 


<div class="alert alert-block alert-info">
As temperature increase, the potatoes, maize and apples production decreases.
Wheat production is less affected by tempearture changes. 
Sugar beet production increase with temperatures, so this item need heat to be grown. 
    

<div class="alert alert-block alert-info">

The Pearson correlation coefficient measures the linear relationship
 between two datasets. Strictly speaking, Pearson's correlation requires
 that each dataset be normally distributed. Like other correlation
 coefficients, this one varies between -1 and +1 with 0 implying no
 correlation. Correlations of -1 or +1 imply an exact linear
 relationship. Positive correlations imply that as x increases, so does
 y. Negative correlations imply that as x increases, y decreases.

calculation of the Pearson correlation between temperatures and the production of our 7 main items 

Lets check our temperatures our normaly distributed

In [None]:
plt.hist(CH_temperatures['summer'], bins=10)

In [None]:
plt.hist(CH_temperatures['winter'], bins=10)

<div class="alert alert-block info-alert">
Moyen donc je sais pas si on peut parler des coeff de Pearson mais regardons quand meme

In [None]:
CH_corr_Apples_temp = CH_data_transformed.loc[CH_data_transformed.Item.str.contains("Apples")].loc[CH_data_transformed.Input=='Country production']
CH_corr_Apples_temp = CH_corr_Apples_temp[["Year","Value"]]

In [None]:
CH_corr_Apples_temp = CH_corr_Apples_temp.rename(columns={"Value":"Apples production"})

In [None]:
CH_corr_Grapes_temp = CH_data_transformed.loc[CH_data_transformed.Item.str.contains("Grapes")].loc[CH_data_transformed.Input=='Country production']
CH_corr_Grapes_temp = CH_corr_Grapes_temp[["Year","Value"]]
CH_corr_Grapes_temp = CH_corr_Grapes_temp.rename(columns={"Value":"Grapes production"})

CH_corr_Wheat_temp = CH_data_transformed.loc[CH_data_transformed.Item.str.contains("Wheat")].loc[CH_data_transformed.Input=='Country production']
CH_corr_Wheat_temp = CH_corr_Wheat_temp[["Year","Value"]]
CH_corr_Wheat_temp = CH_corr_Wheat_temp.rename(columns={"Value":"Wheat production"})

CH_corr_Potatoes_temp = CH_data_transformed.loc[CH_data_transformed.Item.str.contains("Potatoes")].loc[CH_data_transformed.Input=='Country production']
CH_corr_Potatoes_temp = CH_corr_Potatoes_temp[["Year","Value"]]
CH_corr_Potatoes_temp = CH_corr_Potatoes_temp.rename(columns={"Value":"Potatoes production"})

CH_corr_Barley_temp = CH_data_transformed.loc[CH_data_transformed.Item.str.contains("Barley")].loc[CH_data_transformed.Input=='Country production']
CH_corr_Barley_temp = CH_corr_Barley_temp[["Year","Value"]]
CH_corr_Barley_temp = CH_corr_Barley_temp.rename(columns={"Value":"Barley production"})

CH_corr_Maize_temp = CH_data_transformed.loc[CH_data_transformed.Item.str.contains("Maize")].loc[CH_data_transformed.Input=='Country production']
CH_corr_Maize_temp = CH_corr_Maize_temp[["Year","Value"]]
CH_corr_Maize_temp = CH_corr_Maize_temp.rename(columns={"Value":"Maize production"})

In [None]:
CH_corr_SB_temp = CH_data_transformed.loc[CH_data_transformed.Item.str.contains("Sugar beet")].loc[CH_data_transformed.Input=='Country production']
CH_corr_SB_temp = CH_corr_SB_temp[["Year","Value"]]
CH_corr_SB_temp = CH_corr_SB_temp.rename(columns={"Value":"Sugar beet production"})

In [None]:
plt.hist(CH_corr_Potatoes_temp['Potatoes production'], bins=10)

In [None]:
plt.hist(CH_corr_Grapes_temp['Grapes production'], bins=10)

In [None]:
plt.hist(CH_corr_Apples_temp['Apples production'], bins=10)

In [None]:
plt.hist(CH_corr_Wheat_temp['Wheat production'], bins=10)

In [None]:
plt.hist(CH_corr_Barley_temp['Barley production'], bins=10)

In [None]:
plt.hist(CH_corr_SB_temp['Sugar beet production'], bins=10)

In [None]:
plt.hist(CH_corr_Maize_temp['Maize production'], bins=10)

In [None]:
CH_summer_temp=CH_temperatures[["Year","summer"]]
CH_winter_temp=CH_temperatures[["Year","winter"]]

In [None]:
df_winter=CH_winter_temp.merge(CH_corr_SB_temp,on='Year').merge(CH_corr_Apples_temp,on='Year').merge(CH_corr_Grapes_temp,on='Year').merge(CH_corr_Wheat_temp,on='Year').merge(CH_corr_Potatoes_temp,on='Year').merge(CH_corr_Barley_temp,on='Year').merge(CH_corr_Maize_temp,on='Year')
df_winter.head()

In [None]:
df_summer=CH_summer_temp.merge(CH_corr_SB_temp,on='Year').merge(CH_corr_Apples_temp,on='Year').merge(CH_corr_Grapes_temp,on='Year').merge(CH_corr_Wheat_temp,on='Year').merge(CH_corr_Potatoes_temp,on='Year').merge(CH_corr_Barley_temp,on='Year').merge(CH_corr_Maize_temp,on='Year')
df_summer.head()

In [None]:
df_corr_summer=df_summer.drop(columns={"Year"})
coeff_summer=df_corr_summer.corr().iloc[0][1:]#.sort_values(ascending=False)
coeff_summer=pd.DataFrame(coeff_summer).T
coeff_summer.head()

In [None]:
df_corr_winter=df_winter.drop(columns={"Year"})
coeff_winter=df_corr_winter.corr().iloc[0][1:].sort_values(ascending=False)
coeff_winter=pd.DataFrame(coeff_winter).T
coeff_winter.head()

In [None]:
import seaborn as sns
import pandas as pd

plt.figure(figsize=(5,5)) 

sns.heatmap(coeff_summer.T)

In [None]:
plt.figure(figsize=(5,5)) 

sns.heatmap(coeff_winter.T)

### <span style='background :gray' > -   - </span>

**Dataset :**

**Data :** 

**Notes :** 


In [None]:
df_employ_newind = df_employ.copy()
df_employ_newind = df_employ.set_index(['area','indicator'])
df_employ_newind = df_employ_newind.sort_index()
df_employ_newind.head(2)

In [None]:
ax = plt.gca()
df_employ_newind.loc[('Austria','Employment in agriculture')].plot(kind='scatter',color='cyan',x='year',y='value',ax=ax, label='Austria')
df_employ_newind.loc[('Germany','Employment in agriculture')].plot(kind='scatter',color='black',x='year',y='value',ax=ax, label='Germany')
df_employ_newind.loc[('France','Employment in agriculture')].plot(kind='scatter',color='blue',x='year',y='value',ax=ax, label='France')
df_employ_newind.loc[('Switzerland','Employment in agriculture')].plot(kind='scatter',color='red',x='year',y='value',ax=ax, label='Switzerland')
df_employ_newind.loc[('Italy','Employment in agriculture')].plot(kind='scatter',color='green',x='year',y='value',ax=ax, label='Italy',
                                                                 figsize=(15,10))
ax.set(title='Employment in Agriculture (1969-2017)',
ylabel='Nb persons /1000',
xlabel='Years')
ax.yaxis.label.set_size(30)
ax.xaxis.label.set_size(30)
ax.title.set_size(30)
plt.show()

<div class="alert alert-block alert-info">
    
Employment in agriculture are decreasing a lot in France, Germany and Italy but are constants in Austria and Switzerland. 
The number of employes in the agriculture sector are very small in Switzerland, only around 200 000 employes. why? 
    

### <span style='background :gray' > -  Plot :  - </span>

**Dataset :**

**Data :** 

**Notes :** 


In [None]:
ax = plt.gca()
df_employ_newind.loc[('Austria','Share of employees in agriculture (% of total employees)')].plot(kind='scatter',color='cyan',x='year',y='value',ax=ax, label='Austria')
df_employ_newind.loc[('Germany','Share of employees in agriculture (% of total employees)')].plot(kind='scatter',color='black',x='year',y='value',ax=ax, label='Germany')
df_employ_newind.loc[('France','Share of employees in agriculture (% of total employees)')].plot(kind='scatter',color='blue',x='year',y='value',ax=ax, label='France')
df_employ_newind.loc[('Switzerland','Share of employees in agriculture (% of total employees)')].plot(kind='scatter',color='red',x='year',y='value',ax=ax, label='Switzerland')
df_employ_newind.loc[('Italy','Share of employees in agriculture (% of total employees)')].plot(kind='scatter',color='green',x='year',y='value',ax=ax, label='Italy',
                                                                                                figsize=(17,10))
ax.set(title='Share of employees in agriculture',
ylabel='% of total employees',
xlabel='Years')
ax.yaxis.label.set_size(30)
ax.xaxis.label.set_size(30)
ax.title.set_size(30)
plt.show()

<div class="alert alert-block alert-info">
Agriculture represents a very small part employment, around 1%. But it is approximatively the same fo its neighbours: France Germany and Austria.
Only Italy has a relatively high percentage of employes in agriculture but it can be correlated with the fact that they also have the highest number of employees in this sector. 

### <span style='background :gray' > -  Plot :  - </span>

**Dataset :** 

**Data :** 

**Notes :** 


In [None]:
ax = plt.gca()
df_employ_newind.loc[('Austria','Employment-to-population ratio, rural areas')].plot(kind='scatter',color='cyan',x='year',y='value',ax=ax, label='Austria')
df_employ_newind.loc[('Germany','Employment-to-population ratio, rural areas')].plot(kind='scatter',color='black',x='year',y='value',ax=ax, label='Germany')
df_employ_newind.loc[('France','Employment-to-population ratio, rural areas')].plot(kind='scatter',color='blue',x='year',y='value',ax=ax, label='France')
df_employ_newind.loc[('Switzerland','Employment-to-population ratio, rural areas')].plot(kind='scatter',color='red',x='year',y='value',ax=ax, label='Switzerland')
df_employ_newind.loc[('Italy','Employment-to-population ratio, rural areas')].plot(kind='scatter',color='green',x='year',y='value',ax=ax, label='Italy',
                                                                                                figsize=(17,10))
ax.set(title='Employment-to-population ratio, RURAL AREAS',
ylabel='share of the employed population in total working-age population',
xlabel='Years')
ax.yaxis.label.set_size(13)
ax.xaxis.label.set_size(30)
ax.title.set_size(30)
plt.show()

<div class="alert alert-block alert-info">
When looking to the employment to population ration we can see that it is Switzerland that have the higher one, meaning that Switzerland have the biggest population part working in Agriculture. So Agriculture represents an important employment sector. 
We also remark that this ratio is increasing this past few years meaning that this work sector is attractive.

### <span style='background :gray' > -  Plot : Use of fertilizers and production over years in Switzerland - </span>

**Dataset :** fertilizers, pesticides and crops

**Data :** 

**Notes :** 


In [None]:
#Next : add the production for those years 
#Lest's try with CH:
fert_ch = fert_sum.loc[fert_sum['Area']=='Switzerland']

In [None]:
pest_ch = pesticides_dataset.loc[pesticides_dataset['Area']=='Switzerland'].loc[pesticides_dataset['Item']=='Pesticides (total)']
pest_ch = pest_ch[['Year','Value']]

In [None]:
prod_ch = crops_sum.loc[crops_sum['Element']== 'Production'].loc[crops_sum['Area']== 'Switzerland']

In [None]:
#pd.concat([prod_ch, fert_ch], sort=False).tail(60)
combo_ch = pd.merge(prod_ch, fert_ch, how='inner', on=['Year'])\
                .rename(columns={'Area_x':'Area'})\
                .rename(columns={'Sum_x':'Production'})\
                .rename(columns={'Sum_y':'Fertilizers'})\
                .drop(columns=['Area_y','Element'])

In [None]:
combo_ch = pd.merge(combo_ch, pest_ch, how='inner', on=['Year'])\
                .rename(columns={'Value':'Pesticides'})
combo_ch.head()

In [None]:
plt.figure(figsize=(20,10))
plt.plot( 'Year', 'Production', data=combo_ch, marker='', color='green', label = 'Production', linewidth=3)
plt.plot( 'Year', 'Fertilizers', data=combo_ch, marker='', color='blue', label = 'Fertilizers', linewidth=3)
plt.plot( 'Year', 'Pesticides', data=combo_ch, marker='', color='red', label = 'Pesticides', linewidth=3)
plt.legend() 
plt.title(f'Use of fertilizers and production over years in Switzerland', fontsize= 20)
plt.xlabel("Year", fontsize= 20)
plt.ylabel("Values [tonnes]", fontsize= 20)
plt.show()

# Main results

What's next:

<div class="alert alert-block alert-info">
    
1. Defining what is food self-sufficiency
    1. $ SSR = Production * 100 / (Production + Imports - Exports)$ to develop
    2. Addapt it to the Swiss case : take a look to what we import (basic needs ?), export (top exports ? by far ?) and production graphs
    3. __[Ref. Paper "Food self-sufficiency: Making sense of it, and when it makes sense" By Jennifer Clapp](https://www.sciencedirect.com/science/article/pii/S0306919216305851#b0240)__. <br> Résumé : __[Résumé par le site Resilience du paper de Clapp](https://www.resilience.org/stories/2018-03-13/food-self-sufficiency-does-it-make-sense/)__
    4. Compare our results with other sources just to know if we share the same results (e.g. selfsufficiency switzerland on wikipedia __[List of countries by food self-sufficieent rate](https://en.wikipedia.org/wiki/List_of_countries_by_food_self-sufficiency_rate)__)

    
2. Food situation of Switzerland from 1986 to 2017.
    1. Is/was it food self-sufficient ? SSR scores over the years.
    2. Comapre to neighbours

    
3. Will it be **physically** possible for Switzerland in a near future to be food self-sufficient (in the sense of the 2018 initiative bc we have seen that definition is relative) taking into account its population growth (hesimated increase in consumption computation)? What would it imply/take into account in terms of :
    1. Area harvested (actual ratio and estimation of its evolution)
    2. Farmers population 
    3. Temperature (climate impact food production correlation)
    4. Environment (use of fertilizers needed ? depends on productivity)

    
4. Attempt on **economy** consequences analysis ?
    1. Complicated ... What about looking at what happended in countries that adopted food self-sufficient policies such as Senegal, India, the Philippines, Qatar, Bolivia, and Russia ? (Jaccard and correlations?)
    2. Jaccard similarity of country based on SSR to see which country should adopt more food self-sufficient policies ?

## SSR score


##### <span style='background :gray' >Compute SSR for Switzerland and its neighbour over the years then plot - Analysis</span>

In [None]:
CH_clear = CH_data[["Year", "Produced Quantity", "Exported Quantity", "Imported Quantity"]]

In [None]:
CH_ssr = CH_clear.groupby("Year")\
                .agg({'Produced Quantity':'sum','Exported Quantity':'sum','Imported Quantity':'sum'})\
                .reset_index()

In [None]:
SSR_list=[]
for i in range(0, CH_ssr.shape[0]):
    SSR_list.append((CH_ssr["Produced Quantity"].iloc[i]*100)/(CH_ssr["Produced Quantity"].iloc[i] + CH_ssr["Imported Quantity"].iloc[i]-CH_ssr["Exported Quantity"].iloc[i]))

CH_ssr["SSR"]=SSR_list

Now we do the same for Switzerland's neighbours

In [None]:
France_clear = France_data[["Year", "Produced Quantity", "Exported Quantity", "Imported Quantity"]]
France_ssr = France_clear.groupby("Year")\
                .agg({'Produced Quantity':'sum','Exported Quantity':'sum','Imported Quantity':'sum'})\
                .reset_index()
SSR_list_F=[]
for i in range(0, France_ssr.shape[0]):
    SSR_list_F.append((France_ssr["Produced Quantity"].iloc[i]*100)/(France_ssr["Produced Quantity"].iloc[i] + France_ssr["Imported Quantity"].iloc[i] - France_ssr["Exported Quantity"].iloc[i]))

France_ssr["SSR"]=SSR_list_F

In [None]:
Germany_clear = Germany_data[["Year", "Produced Quantity", "Exported Quantity", "Imported Quantity"]]
Germany_ssr = Germany_clear.groupby("Year")\
                .agg({'Produced Quantity':'sum','Exported Quantity':'sum','Imported Quantity':'sum'})\
                .reset_index()
SSR_list_G=[]
for i in range(0, Germany_ssr.shape[0]):
    SSR_list_G.append((Germany_ssr["Produced Quantity"].iloc[i]*100)/(Germany_ssr["Produced Quantity"].iloc[i] + Germany_ssr["Imported Quantity"].iloc[i]-France_ssr["Exported Quantity"].iloc[i]))

Germany_ssr["SSR"]=SSR_list_G

In [None]:
Italy_clear = Italy_data[["Year", "Produced Quantity", "Exported Quantity", "Imported Quantity"]]
Italy_ssr = Italy_clear.groupby("Year")\
                .agg({'Produced Quantity':'sum','Exported Quantity':'sum','Imported Quantity':'sum'})\
                .reset_index()
SSR_list_I=[]
for i in range(0, Italy_ssr.shape[0]):
    SSR_list_I.append((Italy_ssr["Produced Quantity"].iloc[i]*100)/(Italy_ssr["Produced Quantity"].iloc[i] + Italy_ssr["Imported Quantity"].iloc[i] - Italy_ssr["Exported Quantity"].iloc[i]))

Italy_ssr["SSR"]=SSR_list_I

In [None]:
Austria_clear = Austria_data[["Year", "Produced Quantity", "Exported Quantity", "Imported Quantity"]]
Austria_ssr = Austria_clear.groupby("Year")\
                .agg({'Produced Quantity':'sum','Exported Quantity':'sum','Imported Quantity':'sum'})\
                .reset_index()
SSR_list_A=[]
for i in range(0, Austria_ssr.shape[0]):
    SSR_list_A.append((Austria_ssr["Produced Quantity"].iloc[i]*100)/(Austria_ssr["Produced Quantity"].iloc[i] + Austria_ssr["Imported Quantity"].iloc[i] - Austria_ssr["Exported Quantity"].iloc[i]))

Austria_ssr["SSR"]=SSR_list_A

In [None]:
plt.figure(figsize=(20,10))
plt.plot( 'Year', 'SSR', data=Austria_ssr, marker='', color='green',  label = 'Austria')
plt.plot( 'Year', 'SSR', data=France_ssr, marker='', color='skyblue', label = 'France')
plt.plot( 'Year', 'SSR', data=CH_ssr, marker='', color='red', label = 'Switzerland', linewidth=3)
plt.plot( 'Year', 'SSR', data=Germany_ssr, marker='', color='orange', label = 'Germany')
plt.plot( 'Year', 'SSR', data=Italy_ssr, marker='', color='grey', label = 'Italy')
    
plt.legend() 
plt.title('Evolution of the SSR over the years' , fontsize= 20)
plt.xlabel("Year", fontsize= 20)
plt.ylabel("SSR value in %", fontsize= 20)
plt.show()

<div class="alert alert-block alert-info">
    
We can see that Switzerland has the smallers SSR ratio. It oscillates between 90 and 70% over the years.
This indicates that Switzerland has never been food self-sufficient.
We can also observe that Germany, France and sometimes Austria have SSR superior than 100% this can be explained by the fact that our dataset contains mostly items producted in our country so we are missing a lot of importations. Mathematically this means that those countries have high exports value and low import values. 
    
We will now try to recompute those SSR with only our 5 main items:
    potatoes, Wheat, Sugar beet, apples and maize. 

In [None]:
keep=["Potatoes","Apples","Maize","Sugar beet","Wheat", "Barley", "Grapes"]
CH_clear_ = CH_data[CH_data.Item.isin(keep)]

In [None]:
CH_ssr_5 = CH_clear_.groupby("Year")\
                .agg({'Produced Quantity':'sum','Exported Quantity':'sum','Imported Quantity':'sum'})\
                .reset_index()
SSR_list_CH_5=[]
for i in range(0, CH_ssr_5.shape[0]):
    SSR_list_CH_5.append((CH_ssr_5["Produced Quantity"].iloc[i]*100)/(CH_ssr_5["Produced Quantity"].iloc[i] + CH_ssr_5["Imported Quantity"].iloc[i] - CH_ssr_5["Exported Quantity"].iloc[i]))

CH_ssr_5["SSR"]=SSR_list_CH_5

In [None]:
France_clear_ = France_data[France_data.Item.isin(keep)]

France_ssr_5 = France_clear_.groupby("Year")\
                .agg({'Produced Quantity':'sum','Exported Quantity':'sum','Imported Quantity':'sum'})\
                .reset_index()
SSR_list_FR_5=[]
for i in range(0, France_ssr_5.shape[0]):
    SSR_list_FR_5.append((France_ssr_5["Produced Quantity"].iloc[i]*100)/(France_ssr_5["Produced Quantity"].iloc[i] + France_ssr_5["Imported Quantity"].iloc[i] - France_ssr_5["Exported Quantity"].iloc[i]))

France_ssr_5["SSR"]=SSR_list_FR_5

In [None]:
Germany_clear_ = Germany_data[Germany_data.Item.isin(keep)]

Germany_ssr_5 = Germany_clear_.groupby("Year")\
                .agg({'Produced Quantity':'sum','Exported Quantity':'sum','Imported Quantity':'sum'})\
                .reset_index()
SSR_list_G_5=[]
for i in range(0, Germany_ssr_5.shape[0]):
    SSR_list_G_5.append((Germany_ssr_5["Produced Quantity"].iloc[i]*100)/(Germany_ssr_5["Produced Quantity"].iloc[i] + Germany_ssr_5["Imported Quantity"].iloc[i] - Germany_ssr_5["Exported Quantity"].iloc[i]))

Germany_ssr_5["SSR"]=SSR_list_G_5

In [None]:
Italy_clear_ = Italy_data[Italy_data.Item.isin(keep)]

Italy_ssr_5 = Italy_clear_.groupby("Year")\
                .agg({'Produced Quantity':'sum','Exported Quantity':'sum','Imported Quantity':'sum'})\
                .reset_index()
SSR_list_I_5=[]
for i in range(0, Italy_ssr_5.shape[0]):
    SSR_list_I_5.append((Italy_ssr_5["Produced Quantity"].iloc[i]*100)/(Italy_ssr_5["Produced Quantity"].iloc[i] + Italy_ssr_5["Imported Quantity"].iloc[i] - Italy_ssr_5["Exported Quantity"].iloc[i]))

Italy_ssr_5["SSR"]=SSR_list_I_5

In [None]:
Austria_clear_ = Austria_data[Austria_data.Item.isin(keep)]

Austria_ssr_5 = Austria_clear_.groupby("Year")\
                .agg({'Produced Quantity':'sum','Exported Quantity':'sum','Imported Quantity':'sum'})\
                .reset_index()
SSR_list_A_5=[]
for i in range(0, Austria_ssr_5.shape[0]):
    SSR_list_A_5.append((Austria_ssr_5["Produced Quantity"].iloc[i]*100)/(Austria_ssr_5["Produced Quantity"].iloc[i] + Austria_ssr_5["Imported Quantity"].iloc[i] - Austria_ssr_5["Exported Quantity"].iloc[i]))

Austria_ssr_5["SSR"]=SSR_list_A_5

In [None]:
plt.figure(figsize=(20,10))
plt.plot( 'Year', 'SSR', data=Austria_ssr_5, marker='', color='green',  label = 'Austria')
plt.plot( 'Year', 'SSR', data=France_ssr_5, marker='', color='skyblue', label = 'France')
plt.plot( 'Year', 'SSR', data=CH_ssr_5, marker='', color='red', label = 'Switzerland', linewidth=3)
plt.plot( 'Year', 'SSR', data=Germany_ssr_5, marker='', color='orange', label = 'Germany')
plt.plot( 'Year', 'SSR', data=Italy_ssr_5, marker='', color='grey', label = 'Italy')
    
plt.legend() 
plt.title('Evolution of the SSR over the years for the 7 main proucts' , fontsize= 20)
plt.xlabel("Year", fontsize= 20)
plt.ylabel("SSR value in %", fontsize= 20)
plt.show()

<div class="alert alert-block alert-info">
    
This time it is Italy which have the lower SSR over the years. The SSR of Switzerland is still around 90 % but seems to decrease during the last years. 
    
The SSR of France is still very high, same possible explanations as before. 
    
Germany and Austria have similar SSR.
    
Which version do you want to keep????

## Switzerland importations and exportations network

We want to know which country are the main partners of Switzerland 

 </div> <div class="alert alert-block alert-danger">
Select all the products that Switzerland only produces and doesn't export.

In [None]:
CH_imports.loc[~CH_imports.Item.isin(CH_data_transformed.Item.unique())].Item.unique()

In [None]:
CH_data_transformed.Item.unique()

 </div> <div class="alert alert-block alert-danger">
    
Select only the ones that are non-processed and non-animal food

In [None]:
import_selection = [
        'Grapefruit (inc. pomelos)', 'Oranges',
       'Pineapples', 'Anise, badian, fennel, coriander', 
        'Avocados', 'Bananas','Cashew nuts, with shell', 
        'Dates', 'Eggplants (aubergines)',
        'Lemons and limes', 'Lentils','Nutmeg, mace and cardamoms',
        'Persimmons','Rice - total  (Rice milled equivalent)', 'Rice, milled',
       'Roots and tubers nes','Watermelons','Coconuts',
       'Figs', 'Mangoes, mangosteens, guavas', 
       'Plantains and others', 'Sweet potatoes','Cranberries'
       'Fruit, tropical fresh nes', 'Sesame seed', 'Sorghum',
        'Chick peas', 'Cocoa, beans', 'Ginger','Hazelnuts, with shell',
        'Nuts nes','Papayas','Quinoa', 
        'Tangerines, mandarins, clementines, satsumas',
        'Almonds, with shell', 'Bambara beans', 'Brazil nuts, shelled',
       'Cashew nuts, shelled','Mustard seed','Vanilla',
        'Cinnamon (cannella)','Cloves', 'Olives', 
        'Pistachios', 'Kola nuts', 'Areca nuts',
    

     ]

 </div> <div class="alert alert-block alert-danger">
This selection is if you want to plot the biggest partner countries for products that are also produced in Switzerland (you need to change the variable below accordingly then)

In [None]:
production_selection = ['Apples', 'Apricots', 'Artichokes', 'Asparagus', 'Barley',
       'Beans, green', 'Broad beans, horse beans, dry',
       'Cabbages and other brassicas', 'Carrots and turnips',
       'Cauliflowers and broccoli', 'Cherries', 'Chestnut',
       'Chillies and peppers, green', 'Cucumbers and gherkins',
       'Currants', 'Fruit, fresh nes', 'Garlic', 'Gooseberries', 'Grapes',
       'Hops', 'Kiwi fruit', 'Leeks, other alliaceous vegetables',
       'Lettuce and chicory', 'Linseed', 'Maize',
       'Melons, other (inc.cantaloupes)', 'Millet',
       'Mushrooms and truffles', 'Oats', 'Oilseeds nes', 'Onions, dry',
       'Onions, shallots, green', 'Peaches and nectarines', 'Pears',
       'Peas, dry', 'Peas, green', 'Plums and sloes', 'Potatoes',
       'Pumpkins, squash and gourds', 'Quinces', 'Rapeseed', 'Rye',
       'Soybeans', 'Spinach', 'Strawberries', 'Sugar beet',
       'Sunflower seed', 'Tobacco, unmanufactured', 'Tomatoes',
       'Triticale', 'Vegetables, fresh nes', 'Walnuts, with shell',
       'Wheat']

 </div> <div class="alert alert-block alert-danger">
    
This below allows to plot the imporations for the top 6 products produced in Switzerland (don't run the cell below if you want to show the most imported products)

In [None]:
# run only if needed!
top6=['Maize','Sugar beet','Potatoes','Wheat','Barley','Grapes', 'Apples']
import_selection = top6

In [None]:
CH_trade_import_selection = CH_trade_network.loc[CH_trade_network.Item.isin(import_selection)]

In [None]:
CH_trade_import_selection = CH_trade_import_selection[['Element','Reporter Countries','Partner Countries', 'Item', 'Value', 'Year']]

 </div> <div class="alert alert-block alert-danger">

Sum all these items importations for each year

In [None]:
sum_trade = CH_trade_import_selection.groupby(["Element","Partner Countries","Reporter Countries", 'Year']).agg({'Value':'sum'})\
                                    .reset_index()
sum_trade.sort_values(by='Value', ascending=False).head()


 </div> <div class="alert alert-block alert-danger">
    
Display the countries 20 with which Switzerland imports the most of theses products in 2017

In [None]:
import networkx as nx

G=nx.from_pandas_edgelist(sum_trade.loc[sum_trade.Element=='Import Quantity'].loc[sum_trade.Year==2017].sort_values(by='Value', ascending=False)[:20], 'Reporter Countries', 'Partner Countries', edge_attr=['Value'], create_using=nx.Graph())
 
# Plot it
nx.draw(G, with_labels=True, k=1, alpha=0.8, node_size=(sum_trade.loc[sum_trade.Element=='Import Quantity'].loc[sum_trade.Year==2017].sort_values(by='Value', ascending=False)[:21].Value)/50)
plt.show()
limits=plt.axis('off')


 </div> <div class="alert alert-block alert-danger">
    
 Display the countries 20 with which Switzerland imports the most of theses products in each year (interactive)

In [None]:
world_countries_path = '../data/countries.csv'
world_countries = pd.read_csv(world_countries_path)
world_countries.rename(columns={'name':'Partner Countries'}, inplace=True)


 </div> <div class="alert alert-block alert-danger">
    
Modify the names of some countries which don't correspond to their names in the trade dataset

In [None]:
world_countries.at[235, 'Partner Countries']='Viet Nam'
world_countries.at[27, 'Partner Countries']= 'Bolivia (Plurinational State of)'
world_countries.at[92, 'Partner Countries']= 'China, Hong Kong SAR'
world_countries.at[222, 'Partner Countries']= 'China, Taiwan Province of'
world_countries.at[45, 'Partner Countries']= 'China, mainland'
world_countries.at[37, 'Partner Countries']= 'Democratic Republic of the Congo'
world_countries.at[52, 'Partner Countries']= 'Czechia'
world_countries.at[39, 'Partner Countries']= 'Congo'
world_countries.at[41, 'Partner Countries']= "Côte d'Ivoire"
world_countries.at[207, 'Partner Countries']= 'Eswatini'
world_countries.at[105, 'Partner Countries']= 'Iran (Islamic Republic of)'
world_countries.at[142, 'Partner Countries']= 'Myanmar'
world_countries.at[140, 'Partner Countries']= 'North Macedonia'
world_countries.at[179, 'Partner Countries']= 'Palestine'
world_countries.at[119, 'Partner Countries']= 'Republic of Korea'
world_countries.at[136, 'Partner Countries']= 'Republic of Moldova'
world_countries.at[187, 'Partner Countries']= 'Russian Federation'
world_countries.at[206, 'Partner Countries']= 'Syrian Arab Republic'
world_countries.at[223, 'Partner Countries']= 'United Republic of Tanzania'
world_countries.at[227, 'Partner Countries']= 'United States of America'
world_countries.at[232, 'Partner Countries']= 'Venezuela (Bolivarian Republic of)'

In [None]:
world_countries.set_index('Partner Countries', inplace=True)
world_countries.drop(columns='country', inplace=True)

In [None]:
sum_trade_geo = sum_trade.join(world_countries,on='Partner Countries', how='left')
sum_trade_geo.dropna(how='any', inplace=True)

In [None]:
sum_import_geo= sum_trade_geo.loc[sum_trade_geo.Element=='Import Quantity']

In [None]:
sum_trade_geo = sum_trade.join(world_countries,on='Partner Countries', how='left')
sum_trade_geo.dropna(how='any', inplace=True)

In [None]:
sum_import_geo= sum_trade_geo.loc[sum_trade_geo.Element=='Import Quantity']

In [None]:
# just useful for the size of the circles on the map
sum_import_geo['stroke'] = sum_import_geo.Value.apply(lambda x: False if x==0 else True)

In [None]:
 def viz_evolution(year):
    df_viz_evolution = sum_import_geo.loc[sum_import_geo['Year']==year].sort_values(by='Value', ascending=False)[:10]
    
    # multiple line plot
    plt.figure(figsize=(20,10))
    plt.barh(df_viz_evolution['Partner Countries'][::-1].values, df_viz_evolution.Value[::-1])
    
    plt.title(f'Imports of Switzerland in {year}', fontsize= 20)
    plt.xlabel("Quantity (in tonnes)", fontsize= 20)
    plt.ylabel("Country", fontsize= 20)
    plt.show()


years = sum_import_geo.sort_values(by='Year').Year.unique()
interact(viz_evolution, year = years)    

 </div> <div class="alert alert-block alert-danger">
    
Now display those on a map

 </div> <div class="alert alert-block alert-danger">
    
The map is for year 2017.

In [None]:
sum_import_geo_2017 = sum_import_geo.loc[sum_import_geo['Year']==2017]

In [None]:
m2 = folium.Map(location=[48, 0], zoom_start=2)
#world_geo_path = '../data/countries.geo.json'
#world_geo = json.load(open(world_geo_path))
for i in range(0,len(sum_import_geo_2017)):
    folium.Circle(
      location= [sum_import_geo_2017.iloc[i]['latitude'], str(sum_import_geo_2017.iloc[i]['longitude'])],
      tooltip='%s : %s' %(sum_import_geo_2017.iloc[i]['Partner Countries'],str(sum_import_geo_2017.iloc[i]['Value'])),
      radius=int(sum_import_geo_2017.iloc[i]['Value'])*8,
      color='crimson',
      stroke= bool(sum_import_geo_2017.iloc[i]['stroke']),
      fill=True,
      fill_color='crimson' 
    ).add_to(m2)

folium.LayerControl().add_to(m2)

m2

In [None]:
m2.save('../doc/map_import.html') #map saved as html file in doc folder.

In [None]:
CH_trade_selection_sum = CH_trade2.loc[CH_trade2.Item.isin(import_selection)].groupby(['Element','Year']) \
                                .agg({'Value':'sum'}) \
                                .rename(columns={'Value':'Sum'}) \
                                .reset_index()

 </div> <div class="alert alert-block alert-danger">
    
The top 6 is what is used by the functions below to produce the plot, but you can change it to plot top_traditional, or top_fancy

In [None]:
top6 = ['Apples','Wheat','Potatoes', 'Maize', 'Apples', 'Barley','Grapes']

In [None]:
top_traditional = ['Cabbages and other brassicas', 'Carrots and turnips', 'Cauliflowers and broccoli', 'Leeks, other alliaceous vegetables', 'Lettuce and chicory', 'Onions, dry','Onions, shallots, green', 'Apples','Wheat','Potatoes', 'Maize', 'Grapes', 'Barley', 'Sugar beet']


In [None]:
top_fancy = ['Apricots', 'Artichokes', 'Asparagus',
       'Cherries',
       'Chillies and peppers, green', 'Cucumbers and gherkins',
       'Garlic',
       'Kiwi fruit', 'Melons, other (inc.cantaloupes)',
       'Mushrooms and truffles',
       'Peaches and nectarines', 'Pears',
       'Spinach', 'Strawberries',
       'Tomatoes']

In [None]:
CH_top6_production = CH_data_transformed.loc[CH_data_transformed.Item.isin(top6)].loc[CH_data_transformed.Input=='Country production']
CH_top6_production_sum = CH_top6_production.groupby(['Area', 'Element','Year']) \
                                .agg({'Value':'sum'}) \
                                .rename(columns={'Value':'Sum'}) \
                                .reset_index()

CH_top6_importation = CH_data_transformed.loc[CH_data_transformed.Item.isin(top6)].loc[CH_data_transformed.Input=='Importation']
CH_top6_importation_sum = CH_top6_importation.groupby(['Area', 'Element','Year']) \
                                .agg({'Value':'sum'}) \
                                .rename(columns={'Value':'Sum'}) \
                                .reset_index()

In [None]:
CH_top6_exportation = CH_data_transformed_exportations.loc[CH_data_transformed_exportations.Item.isin(top6)]
CH_top6_exportation_sum = CH_top6_exportation.groupby(['Area', 'Element','Year']) \
                                .agg({'Value':'sum'}) \
                                .rename(columns={'Value':'Sum'}) \
                                .reset_index()


In [None]:
crops_sum_item = raw_CH_crops_dataset.groupby(['Area', 'Element','Year', 'Item']) \
                                .agg({'Value':'sum'}) \
                                .rename(columns={'Value':'Sum'}) \
                                .reset_index()

In [None]:
area_yield_top6 = crops_sum_item.loc[crops_sum_item.Area=='Switzerland'].loc[crops_sum_item.Item.isin(top6)].loc[crops_sum_item.Year>=1986].loc[crops_sum_item.Year<=2017]

In [None]:
area_yield_top6_sum = area_yield_top6.groupby(['Element', 'Year'])\
                                .agg({'Sum':'sum'})\
                                .reset_index()

In [None]:
CH_urban_land = raw_land_cover_dataset.loc[raw_land_cover_dataset.Area=='Switzerland'].loc[raw_land_cover_dataset.Item=='Artificial surfaces (including urban and associated areas)'].loc[raw_land_cover_dataset.Element=='Area from CCI_LC'].Surface
CH_urban_land=CH_urban_land*1000

In [None]:
start_CH_urban_land = pd.Series([0,0,0,0,0,0])
end_CH_urban_land = pd.Series([0,0])
CH_urban_land_completed = start_CH_urban_land.append(CH_urban_land).append(end_CH_urban_land)

In [None]:
fig, ax1 = plt.subplots(figsize=(12,8))

ind = CH_top6_production_sum.Year
width = 0.35       # the width of the bars: can also be len(x) sequence

p1 = ax1.bar(ind, CH_top6_production_sum.Sum, width)
p2 = ax1.bar(ind, CH_top6_importation_sum.Sum, width,
             bottom=CH_top6_production_sum.Sum)
p3 = ax1.bar(ind, CH_top6_exportation_sum.Sum, width, bottom=CH_top6_importation_sum.Sum+CH_top6_production_sum.Sum)
ax2=ax1.twinx()
p4 = ax2.plot(ind, data_demography, color='red')

ax3 = ax1.twinx()
ax3.spines["right"].set_position(("axes", 1.2))
p5 = ax3.plot(ind, area_yield_top6_sum.loc[area_yield_top6_sum.Element=='Area harvested'].Sum, color='lime')

ax4 = ax3.twiny()
ax4.tick_params(axis='x', which='both', top='off', bottom='off', labelbottom='off', labeltop='off')
#ax4.spines["right"].set_position(("axes", 1.2))
p6 = ax4.plot(ind, CH_urban_land_completed, color='black')

ax1.set_ylabel('Quantity [tonnes]')
ax2.set_ylabel('Population')
ax3.set_ylabel('Area harvested [Ha]')
ax4.set_ylabel('Urban area')
ax1.set_title('Parts of production and importations of 7 main products and population')
plt.xticks()
plt.yticks()
plt.legend((p1[0], p2[0], p3[0], p4[0], p5[0], p6[0]), ('Production', 'Importation', 'Exportation', 'Population', 'Area harvested', 'Urban area'))

plt.show()

In [None]:
fig, ax1 = plt.subplots(figsize=(12,8))

ind = CH_trade_selection_sum.Year.unique()
width = 0.35       # the width of the bars: can also be len(x) sequence

p1 = ax1.bar(ind, CH_trade_selection_sum.loc[CH_trade_selection_sum.Element=='Import Quantity'].Sum, width, color='darkorange')
p2 = ax1.bar(ind, CH_trade_selection_sum.loc[CH_trade_selection_sum.Element=='Export Quantity'].Sum, width, color='green',
             bottom=CH_trade_selection_sum.loc[CH_trade_selection_sum.Element=='Import Quantity'].Sum)

ax2=ax1.twinx()
p3 = ax2.plot(ind, data_demography, color='blue')



ax1.set_ylabel('Quantity [tonnes]')
ax2.set_ylabel('Population')
ax1.set_title('Parts of importations and exportations of more exotic products and population')
plt.xticks()
plt.yticks()
plt.legend((p1[0], p2[0], p3[0]), ('Importation', 'Exportation', 'Population'))

plt.show()


We can make 3 differents networks graphes, weighted in different manners:

    - according to quantity exchanged
    -according to number of times they are linked
    -accrding by the variety of products exchanged
    
I choose the first one. So i can get rid off the columns year, units, and item. The network will show the best partners from 1985 to 2016 for all products. 

The netork graph is also directed (Importation towards Switzerland and Exportations towards the partner country)   

In [None]:
#compute the weights
weights= CH_trade_network.groupby(["Element","Partner Countries","Reporter Countries"]).agg({'Value':'sum'})\
                                    .reset_index()

weights.sort_values(by='Value', ascending = False).head(20)

<div class="alert alert-block alert-success">

France, Italy, Germany, Spain an Netherlands are the biggest Importer countries for Switzerland when we look at the quantity imported over the years. 
France, Germany, Austria Italy and the United States are the biggest partner exporters of Switzerland. 
    

In [None]:
# Helper function for printing various graph properties
def describe_graph(G):
    print(nx.info(G))
    if nx.is_connected(G):
        print("Avg. Shortest Path Length: %.4f" %nx.average_shortest_path_length(G))
        print("Diameter: %.4f" %nx.diameter(G)) # Longest shortest path
    else:
        print("Graph is not connected")
        print("Diameter and Avg shortest path length are not defined!")
    print("Sparsity: %.4f" %nx.density(G))  # #edges/#edges-complete-graph
    # #closed-triplets(3*#triangles)/#all-triplets
    print("Global clustering coefficient aka Transitivity: %.4f" %nx.transitivity(G))

In [None]:
import networkx as nx

G=nx.from_pandas_edgelist(weights, 'Reporter Countries', 'Partner Countries', edge_attr=['Value'], create_using=nx.Graph())
 
# Plot it
nx.draw(G, with_labels=True, k=1, alpha=0.8)
#plt.size(18.5, 10.5)
plt.show()


In [None]:
print(nx.info(G))

In [None]:
describe_graph(G)  

To make it more readable, we decide to keep only the 20 biggest partners in exportations and importations.

In [None]:
weights = weights.sort_values(by='Value', ascending = False).head(20)

G2=nx.from_pandas_edgelist(weights[:20], 'Reporter Countries', 'Partner Countries', edge_attr=['Value'], create_using=nx.Graph())
 
# Plot it
nx.draw(G2, with_labels=True, k=0.05, alpha=0.8)
#plt.size(18.5, 10.5)
plt.show()

In [None]:
print(nx.info(G2))

In [None]:
describe_graph(G2) 

<div class="alert alert-block alert-warning">
    Pas fini j'ai pas reussi a faire directed parce que j'ai dans la meme colonne du tableau imported and exported il faudrait séparer en 2 differentes colones ds le genre :
create new_df avec colonne from et colonne to et colone weight
    
    iterate on the old dataframe raws:
    if imported :  to = Switzerland and from = Partner country and weight = value
    if exported : to= Partner country and from = Switzerland and weight = value 
    et apres faire 
    G=nx.from_pandas_edgelist(new_df, 'from', 'to', edge_attr=['weight'], create_using=nx.DiGraph())
    
    et la ca serait juste
    et on garderai seulement les 20 plus importants sinon c'est ilisible. 
    Vous pensez ca vaut la peine de faire ou on va pas utiliser????
    
    

In [None]:
To_list=[]
From_list=[]
Weight_list=[]

for i in range(0, weights.shape[0]):
    if weights.Element.iloc[i] == "Import Quantity":
        To_list.append("Switzerland")
        From_list.append(weights["Partner Countries"].iloc[i])
        Weight_list.append(weights.Value.iloc[i])
    if weights.Element.iloc[i] == "Export Quantity":
        To_list.append(weights["Partner Countries"].iloc[i])
        From_list.append("Switzerland")
        Weight_list.append(weights.Value.iloc[i])
        
trade_network_df= pd.DataFrame({'To': To_list, 'From': From_list, 'weight': Weight_list})
trade_network_df = trade_network_df.sort_values(by='weight', ascending = False)
trade_network_df["Logarithmic weight"] = trade_network_df.weight.apply(lambda x: np.log(x))

<div class="alert alert-block alert-info">

Main importation partners are: France, Germany, Italy, Spain, Netherlands, Brasil and Austria.
    
Main exportation partners are : France, Germany, Austria, United States and Italy.


In [None]:
G3=nx.from_pandas_edgelist(trade_network_df[:10], 'To', 'From', edge_attr=['Logarithmic weight'], create_using=nx.DiGraph())

fig = plt.figure()
nx.draw_networkx(G3, node_size=500, with_labels=True, k=0.05, alpha=0.8)
limits=plt.axis('off')
fig.set_facecolor("white")
plt.show()

# Trade partners of Switzerland

<div class="alert alert-block alert-info">

Main importation partners are: France, Germany, Italy, Spain, Netherlands, Brasil and Austria.
    
Main exportation partners are : France, Germany, Austria, United States and Italy.



In [None]:
G4=nx.from_pandas_edgelist(trade_network_df[:20], 'To', 'From', edge_attr=['Logarithmic weight'], create_using=nx.DiGraph())
 
fig = plt.figure()
nx.draw_shell(G4, node_size=500, with_labels=True, k=0.05, alpha=0.8)
limits=plt.axis('off')
fig.set_facecolor("white")
plt.show()

<div class="alert alert-block alert-info">
The weight of the link between countries is given by the thickness of the edge. 

## E. Compute the predictive model (ReadMe)