# ADA Project - Insight into Switzerland agriculture production 

We will focus on Switzerland compared to its neighbours. We would like to know if Switzerland could be self-sufficient in term of food production. 

- What does Switzerland produce and in which quantity? 
- What about the amount of importations/exportations?
- Are all Switzerland areas optimally harvested?
- Limks to population size
- How is the Swiss productiviy trend evolving, is it correlated with external factors such as temperature, fertilizer use?

Then we will make comparaisons between Switzerland and its neighbours. Does CH import more than its neighbours (due to its small size ?) ?

Is food selfsuffience of CH realistic ? How many farmer would it need ?


In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import folium

## Data loading - Crops 

This dataset represents our start line, as it is the one we choose from the course's list.

File contains data about Switzerland and neighbours (Italy, Germany, France, Austria and Liechtenstein)

In [None]:
raw_CH_crops_dataset = pd.read_csv('../data/FAOSTAT_data_crops_CHandNeighbours.csv')

Let's explore the structure of our dataset :

In [None]:
raw_CH_crops_dataset.head()

Keep only relevant information.

In [None]:
raw_CH_crops_dataset =raw_CH_crops_dataset[['Domain', 'Area', 'Element', 'Item', 'Year', 'Unit', 'Value', 'Flag Description']]

In [None]:
raw_CH_crops_dataset.drop(index=raw_CH_crops_dataset[raw_CH_crops_dataset['Flag Description'].str.contains('Data not available')].index, inplace=True)

In [None]:
raw_CH_crops_dataset.head()

Let's also load the flags dataset, in case we need it later (very small size --> doesn't cost anything)

In [None]:
flags = pd.read_csv('../data/FAOSTAT_data_flags.csv')
flags

In [None]:
print("Size of the DataFrame: {s}\n".format(s=raw_CH_crops_dataset.shape))
print("Variable types present in DataFrame: \n{t}".format(t=raw_CH_crops_dataset.dtypes))

Null values investigation:

In [None]:
print(raw_CH_crops_dataset.isnull().values.any(axis=0)) 

No NaN values found. Perfect.

What about the categories listed in our columns?

In [None]:
print(raw_CH_crops_dataset['Domain'].unique())
print(raw_CH_crops_dataset['Area'].unique())
print(raw_CH_crops_dataset['Element'].unique())
print(raw_CH_crops_dataset['Item'].unique())
print(raw_CH_crops_dataset['Year'].unique())
print(raw_CH_crops_dataset['Unit'].unique())
print(raw_CH_crops_dataset['Flag Description'].unique())

**Quick view of the crops dataset ready to be used**

In [None]:
raw_CH_crops_dataset.head()

## **Crops plots :** what we can already see/investigate with this first dataset

Even if we will probably not use those plots for final presentation/analysis, it helps us to see what's inside our data. It is quick and very visual.

### Plot production of all countries over time for a selected crop

This plot is interactive. It allows you to choose for an item (apples, berries..) and shows you its production over years for the 6 countries (CH + Neighbours as listed above).

In [None]:
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed, interact_manual
# All those  library importations would probably be at the begining for the final version of te notebook, 
# but for now, we just kee them where we use them since we don't know what we will keep or not.

In [None]:
#Interactive visualization

#Plot the production of selected item for all countries over years
def viz_evolution(item):
    df_viz_evolution = raw_CH_crops_dataset.loc[raw_CH_crops_dataset['Element']=='Production'].loc[raw_CH_crops_dataset['Item']==item]
    
    # multiple line plot
    plt.figure(figsize=(20,10))
    plt.plot( 'Year', 'Value', data=df_viz_evolution.loc[df_viz_evolution['Area']=='Austria'], marker='', color='green',  label = 'Austria')
    plt.plot( 'Year', 'Value', data=df_viz_evolution.loc[df_viz_evolution['Area']=='France'], marker='', color='skyblue', label = 'France')
    plt.plot( 'Year', 'Value', data=df_viz_evolution.loc[df_viz_evolution['Area']=='Switzerland'], marker='', color='red', label = 'Switzerland', linewidth=3)
    plt.plot( 'Year', 'Value', data=df_viz_evolution.loc[df_viz_evolution['Area']=='Germany'], marker='', color='orange', label = 'Germany')
    plt.plot( 'Year', 'Value', data=df_viz_evolution.loc[df_viz_evolution['Area']=='Italy'], marker='', color='grey', label = 'Italy')
    
    plt.legend() 
    plt.title(f'Production of {item} in Switzerland and its neighbours throughout years', fontsize= 20)
    plt.xlabel("Year", fontsize= 20)
    plt.ylabel("Values", fontsize= 20)
    plt.show()
   
items = raw_CH_crops_dataset.Item.unique()
interact(viz_evolution, item = items)    

### Plot production/area_harvested for all items of all countries over time.

This plot is interactive. It allows you to choose for an element (production/area harvested/yield) and shows you the sum of all items for each country over years (CH & Co)

In [None]:
# plot area harvested of each country over years
crops_sum = raw_CH_crops_dataset.groupby(['Area', 'Element','Year']) \
                                .agg({'Value':'sum'}) \
                                .rename(columns={'Value':'Sum'}) \
                                .reset_index()
crops_sum.head() # Sum of area/yiel/production of items by country and year

In [None]:
#Interactive visualization

#Plot the area harvested (sum of all items) for all countries over years
def viz_sum_evolution(element):
    df_viz_sum_evolution = crops_sum.loc[crops_sum['Element']== element]
    
    # multiple line plot
    plt.figure(figsize=(20,10))
    plt.plot( 'Year', 'Sum', data=df_viz_sum_evolution.loc[df_viz_sum_evolution['Area']=='Austria'], marker='', color='green',  label = 'Austria')
    plt.plot( 'Year', 'Sum', data=df_viz_sum_evolution.loc[df_viz_sum_evolution['Area']=='France'], marker='', color='skyblue', label = 'France')
    plt.plot( 'Year', 'Sum', data=df_viz_sum_evolution.loc[df_viz_sum_evolution['Area']=='Switzerland'], marker='', color='red', label = 'Switzerland', linewidth=3)
    plt.plot( 'Year', 'Sum', data=df_viz_sum_evolution.loc[df_viz_sum_evolution['Area']=='Germany'], marker='', color='orange', label = 'Germany')
    plt.plot( 'Year', 'Sum', data=df_viz_sum_evolution.loc[df_viz_sum_evolution['Area']=='Italy'], marker='', color='grey', label = 'Italy')
    
    plt.legend() 
    plt.title(f'{element} of all items in Switzerland and its neighbours throughout years', fontsize= 20)
    plt.xlabel("Year", fontsize= 20)
    plt.ylabel("Values", fontsize= 20)
    plt.show()
   
elements = crops_sum.Element.unique()
interact(viz_sum_evolution, element = elements)  

## Data loading - Land use indicators 

File contains data about Switzerland and neighbours (Italy, Germany, France, Austria and Liechtenstein).

Data exploration and pre-processing is very simmilar to first dataset. We will therefore not describe all steps as precisely as before.

In [None]:
raw_land_use_dataset = pd.read_csv('../data/FAOSTAT_data_LandUseIndicators.csv')

In [None]:
raw_land_use_dataset.head()

In [None]:
raw_land_use_dataset =raw_land_use_dataset[['Domain', 'Area', 'Element', 'Item', 'Year', 'Unit', 'Value', 'Flag Description']]

In [None]:
print("Size of the DataFrame: {s}\n".format(s=raw_land_use_dataset.shape))
print("Variable types present in DataFrame: \n{t}".format(t=raw_land_use_dataset.dtypes))

In [None]:
print(raw_land_use_dataset.isnull().values.any(axis=0))  # --> PERFECT!

In [None]:
print(raw_land_use_dataset['Domain'].unique())
print(raw_land_use_dataset['Area'].unique())
print(raw_land_use_dataset['Element'].unique())
print(raw_land_use_dataset['Item'].unique())
print(raw_land_use_dataset['Year'].unique())
print(raw_land_use_dataset['Unit'].unique())
print(raw_land_use_dataset['Flag Description'].unique())

## **Land use indicators plots :** what we can already see/investigate with this second dataset

Even if we will probably not use those plots for final presentation/analysis, it helps us to see what's inside our data. It is quick and very visual.

### Plot the lands distribution in Switzerland

We would like to precise those data (with more datasets) by including also urban areas to the distribution.

In [None]:
import matplotlib.pyplot as plt

# DataFrames to plot
df_land = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='Switzerland'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Land area']
df_agri = raw_land_use_dataset.loc[raw_land_use_dataset['Area']=='Switzerland'].loc[raw_land_use_dataset['Year']==2016].loc[raw_land_use_dataset['Element']=='Share in Agricultural land']

# Pie plot #1
labels1 = df_land.Item
sizes1 = df_land.Value
explode = (0, 0, 0.1, 0)  # only "explode" the 3rd slice

fig1, ax1 = plt.subplots()
ax1.pie(sizes1, explode=explode,labels=labels1, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax1.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax1.title.set_text('Distribution of lands in Switzerland, year 2016')
fig1.set_facecolor('white')

# Pie plot #2
labels2 = df_agri.Item
sizes2 = df_agri.Value
fig1, ax2 = plt.subplots()
ax2.pie(sizes2, labels=labels2, autopct='%1.1f%%',
        shadow=True, startangle=90)
ax2.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
ax2.title.set_text('Distribution of agricultural lands in Switzerland, year 2016')

# General settings
fig1.set_facecolor('white')
plt.show()

## Data loading - Demographical data 

File contains data about Switzerland and neighbours (Italy, Germany, France, Austria and Liechtenstein).

Data exploration and pre-processing is very simmilar to first dataset. We will therefore not describe all steps as precisely as before.

In [None]:
demography = pd.read_csv('../data/FAOSTAT_data_demography.csv')

In [None]:
demography

In [None]:
for col in demography:
    print (demography[col].unique())

In [None]:
demography = demography[['Area', 'Year', 'Value']]
demography

The value unit beeing 1000 persons, we adjust the number to display the population values in terms of individuals

In [None]:
pd.options.mode.chained_assignment = None  # default='warn', Mutes warnings when copying a slice from a DataFrame.
demography["Population"] = demography.Value.apply(lambda x: x*1000)
demography.drop(columns='Value')

## Data loading - Importations for Switerland data 

File contains data about Switzerland only.

Data exploration and pre-processing is very simmilar to first dataset. We will therefore not describe all steps as precisely as before.

In [None]:
CH_imports = pd.read_csv('../data/FAOSTAT_data_11-23-2019.csv')

In [None]:
CH_imports.shape

In [None]:
CH_imports

In [None]:
CH_imports.dtypes

In [None]:
CH_imports.Year.min()

In [None]:
for col in CH_imports:
    print (CH_imports[col].unique())

In [None]:
unofficial_stats_index = CH_imports.loc[CH_imports.Flag=='*'].index

In [None]:
# Drop the unofficial data
CH_imports = CH_imports.drop(index = unofficial_stats_index)

In [None]:
# Select only the data with Unit 'tonnes'
CH_imports = CH_imports.loc[CH_imports.Unit=='tonnes']

In [None]:
CH_imports = CH_imports[['Partner Countries', 'Item', 'Year', 'Unit', 'Value']]

In [None]:
# Sum the importations over all the partner countries
CH_imports = CH_imports.groupby(['Item', 'Year']).agg({'Value':'sum'})\
                                    .reset_index()

In [None]:
CH_crops = raw_CH_crops_dataset[['Area', 'Element', 'Item', 'Year', 'Unit', 'Value']]

Combine production and importation data

In [None]:
# Merge importations data with production data
CH_data = CH_crops.loc[CH_crops.Area=='Switzerland'].loc[CH_crops.Element=='Production'].loc[CH_crops.Year>= 1986]\
                                    .merge(CH_imports,on=['Item', 'Year'], how='left', suffixes=('_crops', '_imports'))

In [None]:
CH_data

### Plot production and import of items in Switzerland over years.

This plot is interactive. It allows you to choose for an item (apples, berries..) and shows you its production and importation in Switzerland over years.

In [None]:
#Interactive visualization

#Plot the production of selected item for all countries over years
def viz_evolution(item):
    df_viz_evolution = CH_data.loc[CH_data['Item']==item]
    
    # multiple line plot
    plt.figure(figsize=(20,10))
    plt.plot( 'Year', 'Value_crops', data=df_viz_evolution, marker='', color='red', label = 'crops', linewidth=3)
    plt.plot('Year', 'Value_imports', data=df_viz_evolution, marker='', color='blue', label = 'imports', linewidth=3) 
    plt.legend() 
    plt.title(f'Production and imports of {item} in Switzerland throughout years', fontsize= 20)
    plt.xlabel("Year", fontsize= 20)
    plt.ylabel("Values [tonnes]", fontsize= 20)
    plt.show()
   
items = CH_data.Item.unique()
interact(viz_evolution, item = items)    

**Most produced and imported products :**

- Most produced crops products

In [None]:
CH_data.loc[CH_data.Year == 2016].sort_values(by='Value_crops', ascending = False).head(20)

- Most imported crops products

In [None]:
CH_data.loc[CH_data.Year == 2016].sort_values(by='Value_imports', ascending = False).head(20)

### Plot most produced and imported items in Switzerland, year 2016.

This plot is interactive. Shows values upon cursor selection.

In [None]:
import plotly # conda install -c anaconda plotly #AND# jupyter labextension install @jupyterlab/plotly-extension
import plotly.graph_objects as go
y_wheat = CH_data.loc[CH_data.Year == 2016].loc[CH_data.Item=='Wheat'].values[0,-2:]
y_potatoes = CH_data.loc[CH_data.Year == 2016].loc[CH_data.Item=='Potatoes'].values[0,-2:]
y_beet = CH_data.loc[CH_data.Year == 2016].loc[CH_data.Item=='Sugar beet'].values[0,-2:]
#y_wheat = CH_data.loc[CH_data.Year == 2016].loc[CH_data.Item=='Wheat'].Value_crops.values
#y_potatoes = CH_data.loc[CH_data.Year == 2016].loc[CH_data.Item=='Potatoes'].Value_crops.values
#y_beet = CH_data.loc[CH_data.Year == 2016].loc[CH_data.Item=='Sugar beet'].Value_crops.values
x=['Produced', 'Imported']
fig = go.Figure(go.Bar(x=x, y=y_wheat, name='Wheat'))
fig.add_trace(go.Bar(x=x, y=y_potatoes, name='Potatoes'))
fig.add_trace(go.Bar(x=x, y=y_beet, name='Sugar beet'))

fig.update_layout(
    title='Most Imported and Produced items in Switzerland for year 2016',
    yaxis_title="Values [tonnes]",
    barmode='stack', 
    xaxis={'categoryorder':'category ascending'},
    font=dict(
        family="Courier New, monospace",
        size=18,
        color="#7f7f7f")
    )
fig.show()


### Plot importation/production of potatoes in Switzerland throughout years

This plot is interactive. Shows values upon cursor selection

In [None]:
import plotly.graph_objects as go

y_wheat = CH_data.loc[CH_data.Item=='Potatoes'].values[:,-2:]
fig = go.Figure()
fig.add_trace(go.Scatter(x=CH_data.Year.unique(), y=y_wheat[:,0], fill='tonexty', name='Produced')) # fill down to xaxis
fig.add_trace(go.Scatter(x=CH_data.Year.unique(), y=y_wheat[:,1], fill='tozeroy', name='Imported')) # fill to trace0 y
fig.update_layout(
    title="Potatoes importations and productions throughout years in Switzerland",
    yaxis_title="Values [tonnes]",
    xaxis_title='Years'
    )
fig.show()


### Plot importation/production of potatoes in Switzerland throughout years --> WHHATS THE DIFFERENCE WITH FORMER PLOT?

This plot is interactive. Shows values upon cursor selection

In [None]:
total_crops_imports = CH_data.groupby('Year').agg({'Value_crops':'sum', 'Value_imports':'sum'})

In [None]:
total_crops_imports.Value_crops.values

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatter(x=CH_data.Year.unique(), y=total_crops_imports.Value_crops.values, fill='tonexty', name='Produced')) # fill down to xaxis
fig.add_trace(go.Scatter(x=CH_data.Year.unique(), y=total_crops_imports.Value_imports.values, fill='tozeroy', name='Imported')) # fill to trace0 y
fig.update_layout(
    title="Potatoes importations and productions throughout years in Switzerland",
    yaxis_title="Values [tonnes]",
    xaxis_title='Years'
    )
fig.show()


In [None]:
CH_data2 = CH_data.copy().rename(columns={'Value_crops':'Country production', 'Value_imports':'Importation'})
CH_data_transformed = pd.melt(CH_data2, value_vars=['Country production', 'Importation'], id_vars=['Area', 'Element','Item','Year','Unit'], var_name='Input', value_name='Value')

In [None]:
CH_data_transformed.loc[CH_data_transformed.Item=='Potatoes']

### Plot --> Je te laisse ecrire Max, pas trop sure là si cest only CH.. etc

This plot is interactive. Shows values upon cursor selection

In [None]:
CH_restrained = CH_data_transformed.loc[CH_data_transformed.Item.isin(['Apples','Wheat','Potatoes', 'Maize', 'Oats'])]

In [None]:
# Just trying a plot
import plotly.express as px
gapminder = px.data.gapminder()
fig = px.area(CH_restrained, x="Year", y="Value", color='Item',
      line_group="Input")
fig.update_layout(
    title="Add title",
    yaxis_title="Add y label",
    xaxis_title='Years'
    )
fig.show()

**Load Switzerland temperatures**

In [None]:
CH_temperatures = pd.read_csv('../data/10.18751-Climate-Timeseries-CHTM-1.1-swiss.txt', sep="\t", header=0, skiprows=15)

In [None]:
CH_temperatures = CH_temperatures.loc[CH_temperatures.time>=1986].loc[CH_temperatures.time<=2017]

In [None]:
CH_temperatures = CH_temperatures.iloc[:,-3:]

In [None]:
CH_temperatures

### Plot : Is there a correlation between production and temperature?


In [None]:
years = np.sort(CH_data.Year.unique())
fig, ax1 = plt.subplots()
data1 = CH_data.loc[CH_data.Item=='Potatoes'].Value_crops
data2 = CH_temperatures.year

color = 'tab:red'
ax1.set_xlabel('year')
ax1.set_ylabel('production', color=color)
ax1.plot(years, data1, color=color)
ax1.tick_params(axis='y', labelcolor=color)

ax2 = ax1.twinx()  # instantiate a second axes that shares the same x-axis

color = 'tab:blue'
ax2.set_ylabel('temperature', color=color)  # we already handled the x-label with ax1
ax2.plot(years, data2, color=color)
ax2.tick_params(axis='y', labelcolor=color)

fig.tight_layout()  # otherwise the right y-label is slightly clipped
plt.title('Potatoes production and temperatures every year')
plt.show()