# Bayesian Modeling Crop Yields
---
## Final Project

**Authors** : Carnio, Gritti

**Course**  : Bayesian Data Analysis and Probabilistic Programming

---

## 0. Preparations

### 0.1 Importing libraries

In [25]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import numpy as np

### 0.2 Loading the dataset

In [26]:
df = pd.read_csv("archive/yield_df.csv")

df["log_yield"] = np.log(df["hg/ha_yield"]) #creating the log-yield
df["temp_bin"] = df["avg_temp"].round(1)    #binning the temperature

### 0.3 Exploring the dataset

In [27]:
df.isna().sum()

Unnamed: 0                       0
Area                             0
Item                             0
Year                             0
hg/ha_yield                      0
average_rain_fall_mm_per_year    0
pesticides_tonnes                0
avg_temp                         0
log_yield                        0
temp_bin                         0
dtype: int64

The dataset does not contain any null values. We can now explore it

In [28]:
px.box(
    df,
    x="Item",
    y="hg/ha_yield",
    color="Item",
    title="Crop yield distributions",
    template="plotly_dark"
).update_layout(showlegend=False)


In [29]:
px.histogram(
    df,
    x="log_yield",
    color="Item",
    nbins=20,
    title="Crop yield log-distribution",
    template="plotly_dark"
)

In [30]:
px.scatter(
    df,
    x="average_rain_fall_mm_per_year",
    y="hg/ha_yield",
    color="Item",
    title="Yield vs Rainfall (per crop)",
    template="plotly_dark"
)


In [31]:
px.scatter(
    df,
    x="avg_temp",
    y="hg/ha_yield",
    color="Item",
    title="Yield vs Mean Temperature",
    template="plotly_dark"
)

In [32]:
mean_yield_year = (
    df
    .groupby(["Year", "Item"], as_index=False)
    .agg(mean_yield=("hg/ha_yield", "mean"))
)

px.line(
    mean_yield_year,
    x="Year",
    y="mean_yield",
    color="Item",
    markers=True,
    title="Crop yield per year",
    template="plotly_dark"
)


In [34]:
temp_effect = (
    df
    .groupby(["temp_bin", "Item"], as_index=False)
    .agg(mean_yield=("hg/ha_yield", "mean"))
)

px.line(
    temp_effect,
    x="temp_bin",
    y="mean_yield",
    color="Item",
    markers=True,
    title="Mean yield per temperature",
    template="plotly_dark"
)

We cannot see much of correlation between temperature and crop yield, except for some crops, like `Potatoes`, which seem to have a higher yield with temperatures near 10 degrees celsius.