# stationDataVisualization.ipynb
After aggregating the weather station data the following script can be used to visualize the data

##### Output graphs:
- Correlation plot ([such as ...](https://github.com/ChromaticPanic/CGC_Grain_Outcome_Predictions/blob/main/.github/img/hlyCorrPlot.png) or [this](https://github.com/ChromaticPanic/CGC_Grain_Outcome_Predictions/blob/main/.github/img/dlyCorrPlot.png))
- Histograms ([such as ...](https://github.com/ChromaticPanic/CGC_Grain_Outcome_Predictions/blob/main/.github/img/hlyDewTempDist.png))

In [None]:
import matplotlib.pyplot as plt  # type: ignore
from dotenv import load_dotenv
import sqlalchemy as sq
import seaborn as sns  # type: ignore
import pandas as pd
import numpy as np
import os, sys

sys.path.append("../")
from Shared.DataService import DataService

In [None]:
ERGOT_TABLE = "agg_ergot_sample"  # table that holds the aggregated ergot data
DLY_STATION_TABLE = "stations_dly"  # table that holds the daily stations
HLY_STATION_TABLE = "stations_hly"  # table that holds the hourly stations

AB_HLY_TABLE = "ab_hly_station_data"  # table that holds Albertas hourly data
MB_HLY_TABLE = "mb_hly_station_data"  # table that holds Manitobas hourly data
SK_HLY_TABLE = "sk_hly_station_data"  # table that holds Saskatchewans hourly data

AB_DLY_TABLE = "ab_station_data"  # table that holds Albertas daily data
MB_DLY_TABLE = "mb_station_data"  # table that holds Manitobas daily data
SK_DLY_TABLE = "sk_station_data"  # table that holds Saskatchewans daily data


# Load the database connection environment variables located in the docker folder
load_dotenv("../docker/.env")
PG_DB = os.getenv("POSTGRES_DB")
PG_ADDR = os.getenv("POSTGRES_ADDR")
PG_PORT = os.getenv("POSTGRES_PORT")
PG_USER = os.getenv("POSTGRES_USER")
PG_PW = os.getenv("POSTGRES_PW")

Purpose:  
Connect to the database

In [None]:
if (
    PG_DB is None
    or PG_ADDR is None
    or PG_PORT is None
    or PG_USER is None
    or PG_PW is None
):
    raise ValueError("Environment variables not set")

db = DataService(PG_DB, PG_ADDR, int(PG_PORT), PG_USER, PG_PW)
conn = db.connect()

Purpose:  
Load the hourly data (and metadata) from the database

Pseudocode:  
- Create the weather data SQL query
- Create the station SQL query
- [Load the data from the database directly into a DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html)

In [None]:
mbHlyQuery = sq.text(f"SELECT * FROM public.{MB_HLY_TABLE}")
skHlyQuery = sq.text(f"SELECT * FROM public.{SK_HLY_TABLE}")
abHlyQuery = sq.text(f"SELECT * FROM public.{AB_HLY_TABLE}")

hlyStationDataQuery = sq.text(
    f"""
    SELECT station_id, district FROM public.{HLY_STATION_TABLE}
    WHERE district IS NOT NULL;
    """
)

mb_hly_df = pd.read_sql(mbHlyQuery, conn)
sk_hly_df = pd.read_sql(skHlyQuery, conn)
ab_hly_df = pd.read_sql(abHlyQuery, conn)
hlyStations = pd.read_sql(hlyStationDataQuery, conn)

Purpose:  
Load the daily data (and metadata) from the database

Pseudocode:  
- Create the weather data SQL query
- Create the station SQL query
- [Load the data from the database directly into a DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html)

In [None]:
dlyWeatherDataQuery = sq.text(
    f"""
    SELECT * FROM public.{AB_DLY_TABLE}
    UNION
    SELECT * FROM public.{MB_DLY_TABLE}
    UNION
    SELECT * FROM public.{SK_DLY_TABLE};
    """
)

dlyStationDataQuery = sq.text(
    f"""
    SELECT station_id, district FROM public.{DLY_STATION_TABLE}
    WHERE district IS NOT NULL;
    """
)

dlyData = pd.read_sql(dlyWeatherDataQuery, conn)
dlyStations = pd.read_sql(dlyStationDataQuery, conn)

Purpose:  
Load the ergot data from the database and close the database connection

Pseudocode:  
- Create the ergot data SQL query
- [Load the data from the database directly into a DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html)
- Close the connection to the database

In [None]:
ergotQuery = sq.text(f"SELECT * FROM public.{ERGOT_TABLE}")
ergotDF = pd.read_sql_query(ergotQuery, conn)

db.cleanup()

# Hourly Data Visualization

Purpose:  
Preprocesses the data for visualization

Psuedocode:  
- [Concat all province houly data into one DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.concat.html)
- Convert district ([station metadata](https://github.com/ChromaticPanic/CGC_Grain_Outcome_Predictions#stations_hly)) into an integer as double was throwing an error
- [Merge the DataFrames together](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html)
- [Drop irrelevant columns](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html)

In [None]:
hlyData = pd.concat([mb_hly_df, sk_hly_df, ab_hly_df])

hlyStations[["district"]] = hlyStations[["district"]].astype(int)

hlyDF = hlyData.merge(hlyStations, on="station_id")

final_hly_df = hlyDF.merge(ergotDF, on=["year", "district"])
final_hly_df.drop(columns=["id", "station_id", "year", "month", "day"], inplace=True)

Purpose:  
Create the correlation matrix between attributes

Psuedocode:  
- [Create the correlation matrix between attributes](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html)

In [None]:
hly_corr = final_hly_df.corr()

Purpose:  
Create a correlation plot between houly weather station data and ergot

Note: this code is boilerplate, therefore additional information can be found [here](https://seaborn.pydata.org/examples/many_pairwise_correlations.html)

In [None]:
sns.set_theme(style="white")

# Generate a mask for the upper triangle
mask = np.triu(np.ones_like(hly_corr, dtype=bool))

# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(11, 9))

# Generate a custom diverging colormap
cmap = sns.diverging_palette(230, 20, as_cmap=True)

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(
    hly_corr,
    mask=mask,
    cmap=cmap,
    vmax=0.1,
    vmin=-0.1,
    center=0,
    square=True,
    linewidths=0.5,
    cbar_kws={"shrink": 0.5},
)

Purpose:  
Create side by side histograms containing the minimum, mean and maximum values

Pseudocode:  
- [Create 3 subplot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html)
- [Assign a figure title](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.suptitle.html)
- [Plot groups of histograms](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html) (minimum, mean and maximum values) for all provinces of interest
- [Add a legend](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html)

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(1, 3)
fig.suptitle("Temperature (°C)")


ax1.hist(ab_hly_df["min_temp"], alpha=0.5, label="ab")
ax1.hist(mb_hly_df["min_temp"], alpha=0.5, label="mb")
ax1.hist(sk_hly_df["min_temp"], alpha=0.5, label="sk")
ax1.title.set_text("min")
ax1.legend(loc="upper left")

ax2.hist(ab_hly_df["mean_temp"], alpha=0.5, label="ab")
ax2.hist(mb_hly_df["mean_temp"], alpha=0.5, label="mb")
ax2.hist(sk_hly_df["mean_temp"], alpha=0.5, label="sk")
ax2.title.set_text("mean")
ax2.legend(loc="upper left")

ax3.hist(ab_hly_df["max_temp"], alpha=0.5, label="ab")
ax3.hist(mb_hly_df["max_temp"], alpha=0.5, label="mb")
ax3.hist(sk_hly_df["max_temp"], alpha=0.5, label="sk")
ax3.title.set_text("max")
ax3.legend(loc="upper left")

Purpose:  
Create side by side histograms containing the minimum, mean and maximum values

Pseudocode:  
- [Create 3 subplot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html)
- [Assign a figure title](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.suptitle.html)
- [Plot groups of histograms](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html) (minimum, mean and maximum values) for all provinces of interest
- [Add a legend](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html)

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(1, 3)
fig.suptitle("Dew Point Temperature (°C)")


ax1.hist(ab_hly_df["min_dew_point_temp"], alpha=0.5, label="ab")
ax1.hist(mb_hly_df["min_dew_point_temp"], alpha=0.5, label="mb")
ax1.hist(sk_hly_df["min_dew_point_temp"], alpha=0.5, label="sk")
ax1.title.set_text("min")
ax1.legend(loc="upper left")

ax2.hist(ab_hly_df["mean_dew_point_temp"], alpha=0.5, label="ab")
ax2.hist(mb_hly_df["mean_dew_point_temp"], alpha=0.5, label="mb")
ax2.hist(sk_hly_df["mean_dew_point_temp"], alpha=0.5, label="sk")
ax2.title.set_text("mean")
ax2.legend(loc="upper left")

ax3.hist(ab_hly_df["max_dew_point_temp"], alpha=0.5, label="ab")
ax3.hist(mb_hly_df["max_dew_point_temp"], alpha=0.5, label="mb")
ax3.hist(sk_hly_df["max_dew_point_temp"], alpha=0.5, label="sk")
ax3.title.set_text("max")
ax3.legend(loc="upper left")

Purpose:  
Create side by side histograms containing the minimum, mean and maximum values

Pseudocode:  
- [Create 3 subplot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html)
- [Assign a figure title](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.suptitle.html)
- [Plot groups of histograms](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html) (minimum, mean and maximum values) for all provinces of interest
- [Add a legend](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html)

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(1, 3)
fig.suptitle("Humidity Index (air temperature + humidity)")


ax1.hist(ab_hly_df["min_humidex"], alpha=0.5, label="ab")
ax1.hist(mb_hly_df["min_humidex"], alpha=0.5, label="mb")
ax1.hist(sk_hly_df["min_humidex"], alpha=0.5, label="sk")
ax1.title.set_text("min")
ax1.legend(loc="upper left")

ax2.hist(ab_hly_df["mean_humidex"], alpha=0.5, label="ab")
ax2.hist(mb_hly_df["mean_humidex"], alpha=0.5, label="mb")
ax2.hist(sk_hly_df["mean_humidex"], alpha=0.5, label="sk")
ax2.title.set_text("mean")
ax2.legend(loc="upper left")

ax3.hist(ab_hly_df["max_humidex"], alpha=0.5, label="ab")
ax3.hist(mb_hly_df["max_humidex"], alpha=0.5, label="mb")
ax3.hist(sk_hly_df["max_humidex"], alpha=0.5, label="sk")
ax3.title.set_text("max")
ax3.legend(loc="upper left")

Purpose:  
Create side by side histograms containing the minimum, mean and maximum values

Pseudocode:  
- [Create 3 subplot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html)
- [Assign a figure title](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.suptitle.html)
- [Plot groups of histograms](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html) (minimum, mean and maximum values) for all provinces of interest
- [Add a legend](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html)

In [None]:
plt.hist(ab_hly_df["total_precip"], alpha=0.5, label="ab")
plt.hist(mb_hly_df["total_precip"], alpha=0.5, label="mb")
plt.hist(sk_hly_df["total_precip"], alpha=0.5, label="sk")
plt.legend(loc="upper right")
plt.title("Total Precipitation (mm)")
plt.xlim(0, 2500)
plt.show()

Purpose:  
Create side by side histograms containing the minimum, mean and maximum values

Pseudocode:  
- [Create 3 subplot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html)
- [Assign a figure title](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.suptitle.html)
- [Plot groups of histograms](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html) (minimum, mean and maximum values) for all provinces of interest
- [Add a legend](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html)

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(1, 3)
fig.suptitle("Humidity (%)")


ax1.hist(ab_hly_df["min_rel_humid"], alpha=0.5, label="ab")
ax1.hist(mb_hly_df["min_rel_humid"], alpha=0.5, label="mb")
ax1.hist(sk_hly_df["min_rel_humid"], alpha=0.5, label="sk")
ax1.title.set_text("min")
ax1.legend(loc="upper left")

ax2.hist(ab_hly_df["mean_rel_humid"], alpha=0.5, label="ab")
ax2.hist(mb_hly_df["mean_rel_humid"], alpha=0.5, label="mb")
ax2.hist(sk_hly_df["mean_rel_humid"], alpha=0.5, label="sk")
ax2.title.set_text("mean")
ax2.legend(loc="upper left")

ax3.hist(ab_hly_df["max_rel_humid"], alpha=0.5, label="ab")
ax3.hist(mb_hly_df["max_rel_humid"], alpha=0.5, label="mb")
ax3.hist(sk_hly_df["max_rel_humid"], alpha=0.5, label="sk")
ax3.title.set_text("max")
ax3.legend(loc="upper left")

Purpose:  
Create side by side histograms containing the minimum, mean and maximum values

Pseudocode:  
- [Create 3 subplot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html)
- [Assign a figure title](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.suptitle.html)
- [Plot groups of histograms](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html) (minimum, mean and maximum values) for all provinces of interest
- [Add a legend](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html)

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(1, 3)
fig.suptitle("Station Pressure (kPa)")


ax1.hist(ab_hly_df["min_stn_press"], alpha=0.5, label="ab")
ax1.hist(mb_hly_df["min_stn_press"], alpha=0.5, label="mb")
ax1.hist(sk_hly_df["min_stn_press"], alpha=0.5, label="sk")
ax1.title.set_text("min")
ax1.legend(loc="upper left")

ax2.hist(ab_hly_df["mean_stn_press"], alpha=0.5, label="ab")
ax2.hist(mb_hly_df["mean_stn_press"], alpha=0.5, label="mb")
ax2.hist(sk_hly_df["mean_stn_press"], alpha=0.5, label="sk")
ax2.title.set_text("mean")
ax2.legend(loc="upper left")

ax3.hist(ab_hly_df["max_stn_press"], alpha=0.5, label="ab")
ax3.hist(mb_hly_df["max_stn_press"], alpha=0.5, label="mb")
ax3.hist(sk_hly_df["max_stn_press"], alpha=0.5, label="sk")
ax3.title.set_text("max")
ax3.legend(loc="upper left")

Purpose:  
Create side by side histograms containing the minimum, mean and maximum values

Pseudocode:  
- [Create 3 subplot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html)
- [Assign a figure title](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.suptitle.html)
- [Plot groups of histograms](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html) (minimum, mean and maximum values) for all provinces of interest
- [Add a legend](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html)

In [None]:
fig, (ax1, ax2, ax3) = plt.subplots(1, 3)
fig.suptitle("Visibility (km)")


ax1.hist(ab_hly_df["min_visibility"], alpha=0.5, label="ab")
ax1.hist(mb_hly_df["min_visibility"], alpha=0.5, label="mb")
ax1.hist(sk_hly_df["min_visibility"], alpha=0.5, label="sk")
ax1.title.set_text("min")
ax1.legend(loc="upper left")

ax2.hist(ab_hly_df["mean_visibility"], alpha=0.5, label="ab")
ax2.hist(mb_hly_df["mean_visibility"], alpha=0.5, label="mb")
ax2.hist(sk_hly_df["mean_visibility"], alpha=0.5, label="sk")
ax2.title.set_text("mean")
ax2.legend(loc="upper left")

ax3.hist(ab_hly_df["max_visibility"], alpha=0.5, label="ab")
ax3.hist(mb_hly_df["max_visibility"], alpha=0.5, label="mb")
ax3.hist(sk_hly_df["max_visibility"], alpha=0.5, label="sk")
ax3.title.set_text("max")
ax3.legend(loc="upper left")

# Daily data visualization

Purpose:  
Preprocesses the data for visualization

Psuedocode:  
- Convert district ([station metadata](https://github.com/ChromaticPanic/CGC_Grain_Outcome_Predictions#stations_hly)) into an integer as double was throwing an error
- [Merge the DataFrames together](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html)
- [Drop irrelevant columns](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html)

In [None]:
dlyStations[["district"]] = dlyStations[["district"]].astype(int)

dlyData = dlyData.merge(dlyStations, on="station_id")

final_dly_df = dlyData.merge(ergotDF, on=["year", "district"])
final_dly_df.drop(columns=["station_id", "date", "month", "day", "year"], inplace=True)

Purpose:  
Create the correlation matrix between attributes

Psuedocode:  
- [Create the correlation matrix between attributes](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html)

In [None]:
dly_corr = final_dly_df.corr()

Purpose:  
Create a correlation plot between houly weather station data and ergot

Note: this code is boilerplate, therefore additional information can be found [here](https://seaborn.pydata.org/examples/many_pairwise_correlations.html)

In [None]:
sns.set_theme(style="white")

# Generate a mask for the upper triangle
mask = np.triu(np.ones_like(corr, dtype=bool))

# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(11, 9))

# Generate a custom diverging colormap
cmap = sns.diverging_palette(230, 20, as_cmap=True)

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(
    dly_corr,
    mask=mask,
    cmap=cmap,
    vmax=0.1,
    center=0,
    square=True,
    linewidths=0.5,
    cbar_kws={"shrink": 0.5},
)