# soilMoistureVisualization.ipynb
After [aggregating the soil moisture data]() the following script can be used to visualize the data

##### Output graphs:
- Correlation plot
- Pair plot
- Histograms

In [None]:
import matplotlib.pyplot as plt  # type: ignore
from dotenv import load_dotenv
import sqlalchemy as sq
import seaborn as sns  # type: ignore
import pandas as pd
import numpy as np
import os, sys

sys.path.append("../")
from Shared.DataService import DataService

In [None]:
# table that holds the aggregated soil moisture data
MOISTURE_TABLE = "agg_soil_moisture"
ERGOT_TABLE = "agg_ergot_sample"  # table that holds the aggregated ergot data


# Load the database connection environment variables located in the docker folder
load_dotenv("../docker/.env")
PG_DB = os.getenv("POSTGRES_DB")
PG_ADDR = os.getenv("POSTGRES_ADDR")
PG_PORT = os.getenv("POSTGRES_PORT")
PG_USER = os.getenv("POSTGRES_USER")
PG_PW = os.getenv("POSTGRES_PW")

Purpose:  
Connect to the database

In [None]:
if (
    PG_DB is None
    or PG_ADDR is None
    or PG_PORT is None
    or PG_USER is None
    or PG_PW is None
):
    raise ValueError("Environment variables not set")

db = DataService(PG_DB, PG_ADDR, int(PG_PORT), PG_USER, PG_PW)
conn = db.connect()

Purpose:  
Load the ergot and soil moisture data from the database

Pseudocode:  
- Create the ergot SQL query
- Create the soil moisture SQL query
- [Load both from the database directly into a DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html)
- Close the connection to the database

In [None]:
ergotQuery = sq.text(f"SELECT * FROM {ERGOT_TABLE};")
moistureQuery = sq.text(
    f"SELECT soil_moisture_min, soil_moisture_max, soil_moisture_mean, year, district FROM {MOISTURE_TABLE};"
)

ergotDF = pd.read_sql_query(ergotQuery, conn)
moistureDF = pd.read_sql_query(moistureQuery, conn)

db.cleanup()

Purpose:  
Create individual copies of the soil moisture dataframe for each province

Pseudocode:  
- [Create the Alberta soil moisture DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html)
- [Create the Manitoba soil moisture DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html)
- [Create the Saskatchewan soil moisture DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html)

In [None]:
ab_df = moistureDF.loc[moistureDF["district"] >= 4800]
mb_df = moistureDF.loc[moistureDF["district"] < 4700]

sk_df = moistureDF.loc[
    (moistureDF["district"] >= 4700) | (moistureDF["district"] < 4800)
]

Purpose:  
Joins the ergot and moisture data into a single DataFrame

Pseudocode:  
- [Joins the ergot and moisture data into a single DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html) (based on year and district)
- [Drop irrelevant columns](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html)


In [None]:
final_df = moistureDF.merge(ergotDF, on=["year", "district"])
final_df.drop(columns=["year", "district"], inplace=True)

Purpose:  
Create the correlation matrix between attributes

Psuedocode:  
- [Create the correlation matrix between attributes](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html)

In [None]:
corr = final_df.corr()

Purpose:  
Create a correlation plot between houly weather station data and ergot

Note: this code is boilerplate, therefore additional information can be found [here](https://seaborn.pydata.org/examples/many_pairwise_correlations.html)

In [None]:
sns.set_theme(style="white")

# Generate a mask for the upper triangle
mask = np.triu(np.ones_like(corr, dtype=bool))

# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(11, 9))

# Generate a custom diverging colormap
cmap = sns.diverging_palette(230, 20, as_cmap=True)

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(
    corr,
    mask=mask,
    cmap=cmap,
    vmax=0.5,
    vmin=-0.5,
    center=0,
    square=True,
    linewidths=0.5,
    cbar_kws={"shrink": 0.5},
)

Purpose:  
Create pairplots between the more interesting relationships between soil and ergot

Psuedocode:  
- [Create the pairplots](https://seaborn.pydata.org/generated/seaborn.pairplot.html) (attributes of interest are selected from the DataFrame as a list)
- [Generate the pairplots](https://seaborn.pydata.org/generated/seaborn.objects.Plot.show.html)

In [None]:
sns.pairplot(final_df)
plt.show()

Purpose:  
Create side by side histograms containing the minimum, mean and maximum values

Pseudocode:  
- [Create 3 subplot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html)
- [Assign a figure title](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.suptitle.html)
- [Plot groups of histograms](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html) (minimum, mean and maximum values) for all provinces of interest
- [Add a legend](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html)

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(1, 3)
fig.suptitle("Soil Moisture (<2cm thickness in %)")


ax1.hist(sk_df["soil_moisture_min"], alpha=0.5, label="sk")
ax1.hist(ab_df["soil_moisture_min"], alpha=0.5, label="ab")
ax1.hist(mb_df["soil_moisture_min"], alpha=0.5, label="mb")
ax1.title.set_text("min")
ax1.legend(loc="upper left")

ax2.hist(sk_df["soil_moisture_mean"], alpha=0.5, label="sk")
ax2.hist(ab_df["soil_moisture_mean"], alpha=0.5, label="ab")
ax2.hist(mb_df["soil_moisture_mean"], alpha=0.5, label="mb")
ax2.title.set_text("mean")
ax2.legend(loc="upper left")

ax3.hist(sk_df["soil_moisture_max"], alpha=0.5, label="sk")
ax3.hist(ab_df["soil_moisture_max"], alpha=0.5, label="ab")
ax3.hist(mb_df["soil_moisture_max"], alpha=0.5, label="mb")
ax3.title.set_text("max")
ax3.legend(loc="upper left")