# stationVisualization.ipynb
After loading the [weather stations into the database](https://github.com/ChromaticPanic/CGC_Grain_Outcome_Predictions/blob/main/src/WeatherStation/importBoundariesAndStations.ipynb) the following script can be used to visualize the spread of the stations as well as the data they hold

##### Output graphs:
- Station summaries for each district ([such as...](https://github.com/ChromaticPanic/CGC_Grain_Outcome_Predictions/blob/main/.github/img/TLDR.png))
- Region plots for stations ([such as...](https://github.com/ChromaticPanic/CGC_Grain_Outcome_Predictions/blob/main/.github/img/allStations.png))

Both of which consider:
- Station elevation
- Which stations are still active?
- Which stations are hourly and which are daily?
- Amount of data collected

In [None]:
import matplotlib.patches as mpatches  # type: ignore
from matplotlib import pyplot as plt  # type: ignore
from dotenv import load_dotenv
import geopandas as gpd  # type: ignore
import sqlalchemy as sq
import os, sys


sys.path.append("../")
from Shared.DataService import DataService

In [None]:
DLY_STATIONS_TABLE = "stations_dly"  # table that holds the daily stations
HLY_STATIONS_TABLE = "stations_hly"  # table that holds the hourly stations
AG_REGIONS_TABLE = "census_ag_regions"  # table that holds the agriculture regions

MB_CUTOFF_ELEVATION = 300 + 50  # the average elevation for MB in m plus a 50m buffer
SK_CUTOFF_ELEVATION = 610 + 50  # the average elevation for SK in m plus a 50m buffer
AB_CUTOFF_ELEVATION = 800 + 50  # the average elevation for AB in m plus a 50m buffer


# Load the database connection environment variables located in the docker folder
load_dotenv("../docker/.env")
PG_USER = os.getenv("POSTGRES_USER")
PG_PW = os.getenv("POSTGRES_PW")
PG_DB = os.getenv("POSTGRES_DB")
PG_ADDR = os.getenv("POSTGRES_ADDR")
PG_PORT = os.getenv("POSTGRES_PORT")

Purpose:  
Connects to the database

Pseudocode:  
- Load the environment variables
- Connect to the database

In [None]:
if (
    PG_DB is None
    or PG_ADDR is None
    or PG_PORT is None
    or PG_USER is None
    or PG_PW is None
):
    raise ValueError("Environment variables not set")

# Handles connections to the database
db = DataService(PG_DB, PG_ADDR, int(PG_PORT), PG_USER, PG_PW)
conn = db.connect()

Purpose:  
Load the agriculture regions from the agriculture regions table ([readme](https://github.com/ChromaticPanic/CGC_Grain_Outcome_Predictions#census_ag_regions))

Pseudocode:  
- Create the agriculture regions SQL query
- [Load the data from the database directly into a DataFrame](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.from_postgis.html)
    - crs specifies the coordinate system which in our case we are using EPSG:3347
    - geom_col specifies the name of the columns we expect to find the geometry/borders within

In [None]:
regionQuery = sq.text(
    f"select district, color, geometry FROM public.{AG_REGIONS_TABLE}"
)

agRegions = gpd.GeoDataFrame.from_postgis(
    regionQuery, conn, crs="EPSG:3347", geom_col="geometry"
)

Purpose:  
Load the daily weather stations from the daily weather stations table ([readme](https://github.com/ChromaticPanic/CGC_Grain_Outcome_Predictions#stations_dly))

Pseudocode:  
- Create the daily weather stations SQL query (all stations)
- Create the daily weather stations SQL query 
    - restricted by elevation
    - Only returns one station given the same set of coordinates
    - First and last years must be valid
- [Load the data from the database directly into a DataFrame](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.from_postgis.html)
    - crs specifies the coordinate system which in our case we are using EPSG:3347
    - geom_col specifies the name of the columns we expect to find the geometry/borders within

In [None]:
allDlyQuery = sq.text(f"SELECT * FROM public.{DLY_STATIONS_TABLE}")

dlyQuery = sq.text(
    f"""
    SELECT latitude, longitude, MIN(dly_first_year), MAX(dly_last_year), district, geometry FROM public.{DLY_STATIONS_TABLE} 
    WHERE dly_first_year IS NOT NULL AND dly_last_year IS NOT NULL AND
        (elevation <= {MB_CUTOFF_ELEVATION} AND province = 'MB' OR elevation <= {SK_CUTOFF_ELEVATION} AND province = 'SK' OR elevation <= {AB_CUTOFF_ELEVATION} AND province = 'AB')
    GROUP BY latitude, longitude, district, geometry;
    """
)

allDlyStations = gpd.GeoDataFrame.from_postgis(
    allDlyQuery, conn, crs="EPSG:3347", geom_col="geometry"
)
dlyStations = gpd.GeoDataFrame.from_postgis(
    dlyQuery, conn, crs="EPSG:3347", geom_col="geometry"
)

Purpose:  
Load the daily weather stations from the hourly weather stations table ([readme](https://github.com/ChromaticPanic/CGC_Grain_Outcome_Predictions#stations_hly))

Pseudocode:  
- Create the hourly weather stations SQL query (all stations)
- Create the hourly weather stations SQL query 
    - restricted by elevation
    - Only returns one station given the same set of coordinates
    - First and last years must be valid
- [Load the data from the database directly into a DataFrame](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.from_postgis.html)
    - crs specifies the coordinate system which in our case we are using EPSG:3347
    - geom_col specifies the name of the columns we expect to find the geometry/borders within

In [None]:
allHlyQuery = sq.text(f"SELECT * FROM public.{HLY_STATIONS_TABLE}")

hlyQuery = sq.text(
    f"""
    SELECT latitude, longitude, MIN(dly_first_year), MAX(dly_last_year), district, geometry FROM public.{HLY_STATIONS_TABLE} 
    WHERE hly_first_year IS NOT NULL AND hly_last_year IS NOT NULL AND
        (elevation <= {MB_CUTOFF_ELEVATION} AND province = 'MB' OR elevation <= {SK_CUTOFF_ELEVATION} AND province = 'SK' OR elevation <= {AB_CUTOFF_ELEVATION} AND province = 'AB')
    GROUP BY latitude, longitude, district, geometry;
    """
)

allHlyStations = gpd.GeoDataFrame.from_postgis(
    allHlyQuery, conn, crs="EPSG:3347", geom_col="geometry"
)
hlyStations = gpd.GeoDataFrame.from_postgis(
    hlyQuery, conn, crs="EPSG:3347", geom_col="geometry"
)

Purpose:  
Disconnect from the database

In [None]:
db.cleanup()

Purpose:  
Since no color is assigned to region 4612 (Nothern Manitoba, we assign it to as white)

Psuedocode:  
- [Reference the locations in the dataframe where the district is equal to 4612 and assign the value](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html) in the the column *color* to white

In [None]:
agRegions.loc[agRegions["district"] == 4612, "color"] = "white"

Purpose:  
Creates a region plot of the agriculture regions labeled their respective district number

Psuedocode:  
- [Generate the minimum and maximum bounds of the geography](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.total_bounds.html)
- [Create a subplot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html)
- [Assign the vertical view limit](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_ylim.html)
- [Assign the hoirzontal view limit](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_xlim.html)
- [Assign a title](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_title.html)
- [Plot the geometry/districts](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.plot.html#geopandas.GeoDataFrame.plot)
- [Add the centered labels](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.annotate.html)
- [Generate the region plot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.show.html)

In [None]:
minx, miny, maxx, maxy = agRegions.total_bounds
fig, ax = plt.subplots(figsize=(20, 20))
ax.set_ylim(miny, maxy)
ax.set_xlim(minx, maxx)
ax.set_title("Geometry with District Identifiers")
agRegions.plot(ax=ax, color=agRegions["color"], edgecolor="black")
agRegions.apply(
    lambda x: ax.annotate(
        text=x["district"],
        xy=x.geometry.centroid.coords[0],
        ha="center",
        color="black",
        size=10,
    ),
    axis=1,
)

plt.show()

Purpose:  
Creates markers/labels to be used in region plots

Pseudocode:  
- [Create labels for stations](https://matplotlib.org/stable/api/patches_api.html)
- Assign colors to quantities of data per station
- [Create labels for the quantities of data](https://matplotlib.org/stable/api/patches_api.html)

In [None]:
# Create labels for stations
hourly = mpatches.Patch(color="black", label="Houly stations (larger)")
daily = mpatches.Patch(color="red", label="Daily stations (smaller)")
stations = mpatches.Patch(color="red", label="Stations")

# Assign colors to quantities of data per station
hasLessThan5Col = "white"
hasLessThan10Col = "pink"
hasLessThan15Col = "red"
hasLessThan20Col = "maroon"
hasMoreThan20Col = "black"

# Create labels for the quantities of data
hasLessThan5Yrs = mpatches.Patch(
    color=hasLessThan5Col, label="Less than 5 years of data"
)
hasLessThan10Yrs = mpatches.Patch(
    color=hasLessThan10Col, label="Less than 10 years of data"
)
hasLessThan15Yrs = mpatches.Patch(
    color=hasLessThan15Col, label="Less than 15 years of data"
)
hasLessThan20Yrs = mpatches.Patch(
    color=hasLessThan20Col, label="Less than 20 years of data"
)
hasMoreThan20Yrs = mpatches.Patch(
    color=hasMoreThan20Col, label="More than 20 years of data"
)

Purpose:  
Creates a region plot for all daily weather stations

Psuedocode:  
- [Generate the minimum and maximum bounds of the geography](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.total_bounds.html)
- [Create a subplot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html)
- [Assign the vertical view limit](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_ylim.html)
- [Assign the hoirzontal view limit](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_xlim.html)
- [Assign a title](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_title.html)
- [Plot the geometry/districts](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.plot.html#geopandas.GeoDataFrame.plot)
- [Plot the weather stations](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.plot.html#geopandas.GeoDataFrame.plot)
- [Add a legend](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html)

In [None]:
minx, miny, maxx, maxy = agRegions.total_bounds
fig, ax = plt.subplots(figsize=(20, 20))
ax.set_ylim(miny, maxy)
ax.set_xlim(minx, maxx)
ax.set_title("All Daily Stations")
agRegions.plot(ax=ax, color=agRegions["color"], edgecolor="black")
allDlyStations.plot(ax=ax, color="red", markersize=10)
plt.legend(handles=[stations], fontsize="30")

Purpose:  
Creates a region plot for all hourly weather stations

Psuedocode:  
- [Generate the minimum and maximum bounds of the geography](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.total_bounds.html)
- [Create a subplot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html)
- [Assign the vertical view limit](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_ylim.html)
- [Assign the hoirzontal view limit](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_xlim.html)
- [Assign a title](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_title.html)
- [Plot the geometry/districts](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.plot.html#geopandas.GeoDataFrame.plot)
- [Plot the weather stations](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.plot.html#geopandas.GeoDataFrame.plot)
- [Add a legend](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html)

In [None]:
minx, miny, maxx, maxy = agRegions.total_bounds
fig, ax = plt.subplots(figsize=(20, 20))
ax.set_ylim(miny, maxy)
ax.set_xlim(minx, maxx)
ax.set_title("All Hourly Stations")
agRegions.plot(ax=ax, color=agRegions["color"], edgecolor="black")
allHlyStations.plot(ax=ax, color="red", markersize=10)
plt.legend(handles=[stations], fontsize="30")

Purpose:  
Creates a region plot for all weather stations (daily and hourly)

Psuedocode:  
- [Generate the minimum and maximum bounds of the geography](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.total_bounds.html)
- [Create a subplot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html)
- [Assign the vertical view limit](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_ylim.html)
- [Assign the hoirzontal view limit](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_xlim.html)
- [Assign a title](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_title.html)
- [Plot the geometry/districts](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.plot.html#geopandas.GeoDataFrame.plot)
- [Plot the weather stations](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.plot.html#geopandas.GeoDataFrame.plot)
- [Add a legend](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html)

In [None]:
minx, miny, maxx, maxy = agRegions.total_bounds
fig, ax = plt.subplots(figsize=(20, 20))
ax.set_ylim(miny, maxy)
ax.set_xlim(minx, maxx)
ax.set_title("All Daily and Hourly Stations")
agRegions.plot(ax=ax, color=agRegions["color"], edgecolor="black")
allHlyStations.plot(ax=ax, color="black", markersize=25)
allDlyStations.plot(ax=ax, color="red", markersize=5)
plt.legend(handles=[hourly, daily], fontsize="30")

Purpose:  
Creates a region plot for all weather stations but with the following restrictions:
- restricted by elevation
- Only returns one station given the same set of coordinates
- First and last years must be valid

Psuedocode:  
- [Generate the minimum and maximum bounds of the geography](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.total_bounds.html)
- [Create a subplot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html)
- [Assign the vertical view limit](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_ylim.html)
- [Assign the hoirzontal view limit](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_xlim.html)
- [Assign a title](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_title.html)
- [Plot the geometry/districts](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.plot.html#geopandas.GeoDataFrame.plot)
- [Plot the weather stations](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.plot.html#geopandas.GeoDataFrame.plot)
- [Add a legend](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html)

In [None]:
minx, miny, maxx, maxy = agRegions.total_bounds
fig, ax = plt.subplots(figsize=(20, 20))
ax.set_ylim(miny, maxy)
ax.set_xlim(minx, maxx)
ax.set_title("Restricted Daily and Hourly Stations")
agRegions.plot(ax=ax, color=agRegions["color"], edgecolor="black")
dlyStations.plot(ax=ax, color="black", markersize=25)
hlyStations.plot(ax=ax, color="red", markersize=5)
plt.legend(handles=[hourly, daily], fontsize="30")

Purpose:  
Drops stations without the latest data (2022)

Pseudocode:  
- Create a geoDataFrame from the weather stations such that only the rows with their max year column = 2022 are included (by dropping the others)

In [None]:
activeDlyStations = gpd.GeoDataFrame(
    dlyStations.drop(dlyStations[dlyStations["max"] != 2022].index)
)

activeHlyStations = gpd.GeoDataFrame(
    hlyStations.drop(hlyStations[hlyStations["max"] != 2022].index)
)

Purpose:  
Creates a region plot for all active weather stations (has data from 2022)

Psuedocode:  
- [Generate the minimum and maximum bounds of the geography](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.total_bounds.html)
- [Create a subplot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html)
- [Assign the vertical view limit](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_ylim.html)
- [Assign the hoirzontal view limit](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_xlim.html)
- [Assign a title](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_title.html)
- [Plot the geometry/districts](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.plot.html#geopandas.GeoDataFrame.plot)
- [Plot the weather stations](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.plot.html#geopandas.GeoDataFrame.plot)
- [Add a legend](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html)

In [None]:
minx, miny, maxx, maxy = agRegions.total_bounds
fig, ax = plt.subplots(figsize=(20, 20))
ax.set_ylim(miny, maxy)
ax.set_xlim(minx, maxx)
ax.set_title("Active Daily and Hourly Stations")
agRegions.plot(ax=ax, color=agRegions["color"], edgecolor="black", label=5)
activeHlyStations.plot(ax=ax, color="black", markersize=25)
activeDlyStations.plot(ax=ax, color="red", markersize=5)

plt.legend(handles=[hourly, daily], fontsize="30")

Purpose:  
Creates duplicater geoDataFrames of the original station data and then adds a color column (assigned to None)

In [None]:
coloredDlyStations = gpd.GeoDataFrame(dlyStations)
coloredHlyStations = gpd.GeoDataFrame(hlyStations)

coloredDlyStations["color"] = None
coloredHlyStations["color"] = None

Purpose:  
Assigns colors to stations depending on how many years of data is available in a station (last year - first year)

Psuedocode:  
- Calculate the number of years with data (lastYear - firstYear)
- Assign the color which corresponds to the following categories of data:
    - less than or equal to 5 years
    - less than or equal to 10 years
    - less than or equal to 15 years
    - less than or equal to 20 years
    - More than 20 years

In [None]:
def checkColor(df, index, firstYear, lastYear):
    numYrs = lastYear - firstYear

    if numYrs <= 5:
        df.at[index, "color"] = hasLessThan5Col
    elif numYrs <= 10:
        df.at[index, "color"] = hasLessThan10Col
    elif numYrs <= 15:
        df.at[index, "color"] = hasLessThan15Col
    elif numYrs <= 20:
        df.at[index, "color"] = hasLessThan20Col
    else:
        df.at[index, "color"] = hasMoreThan20Col

Purpose:  
Iterrate through the rows of data and add the corresponding color (dly and hly)

In [None]:
# Add the corresponding colors (years of data) for the dly stations
for index, row in coloredDlyStations.iterrows():
    checkColor(coloredDlyStations, index, row["min"], row["max"])

# Add the corresponding colors (years of data) for the hly stations
for index, row in coloredHlyStations.iterrows():
    checkColor(coloredHlyStations, index, row["min"], row["max"])

Purpose:  
Creates a region plot for daily weather stations such that different colors correspond to the amount of data available:
- less than 5 years = white
- less than 10 years = pink
- less than 15 years = red
- less than 20 years = maroon
- more than 20 years = black

Psuedocode:  
- [Generate the minimum and maximum bounds of the geography](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.total_bounds.html)
- [Create a subplot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html)
- [Assign the vertical view limit](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_ylim.html)
- [Assign the hoirzontal view limit](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_xlim.html)
- [Assign a title](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_title.html)
- [Plot the geometry/districts](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.plot.html#geopandas.GeoDataFrame.plot)
- [Plot the weather stations](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.plot.html#geopandas.GeoDataFrame.plot)
- [Add a legend](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html)

In [None]:
minx, miny, maxx, maxy = agRegions.total_bounds
fig, ax = plt.subplots(figsize=(20, 20))
ax.set_ylim(miny, maxy)
ax.set_xlim(minx, maxx)
ax.set_title("Number of Years of Data for Daily Stations")
agRegions.plot(ax=ax, color=agRegions["color"], edgecolor="black")
coloredDlyStations.plot(ax=ax, color=coloredDlyStations["color"], markersize=20)

plt.legend(
    handles=[
        hasLessThan5Yrs,
        hasLessThan10Yrs,
        hasLessThan15Yrs,
        hasLessThan20Yrs,
        hasMoreThan20Yrs,
    ],
    fontsize="30",
)

Purpose:  
Creates a region plot for hourly weather stations such that different colors correspond to the amount of data available:
- less than 5 years = white
- less than 10 years = pink
- less than 15 years = red
- less than 20 years = maroon
- more than 20 years = black

Psuedocode:  
- [Generate the minimum and maximum bounds of the geography](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.total_bounds.html)
- [Create a subplot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html)
- [Assign the vertical view limit](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_ylim.html)
- [Assign the hoirzontal view limit](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_xlim.html)
- [Assign a title](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_title.html)
- [Plot the geometry/districts](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.plot.html#geopandas.GeoDataFrame.plot)
- [Plot the weather stations](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.plot.html#geopandas.GeoDataFrame.plot)
- [Add a legend](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html)

In [None]:
minx, miny, maxx, maxy = agRegions.total_bounds
fig, ax = plt.subplots(figsize=(20, 20))
ax.set_ylim(miny, maxy)
ax.set_xlim(minx, maxx)
ax.set_title("Number of Years of Data for Hourly Stations (darker means more)")
agRegions.plot(ax=ax, color=agRegions["color"], edgecolor="black")
coloredHlyStations.plot(ax=ax, color=coloredHlyStations["color"], markersize=20)

plt.legend(
    handles=[
        hasLessThan5Yrs,
        hasLessThan10Yrs,
        hasLessThan15Yrs,
        hasLessThan20Yrs,
        hasMoreThan20Yrs,
    ],
    fontsize="30",
)

Purpose:  
Creates a region plot for all weather stations such that different colors correspond to the amount of data available:
- less than 5 years = white
- less than 10 years = pink
- less than 15 years = red
- less than 20 years = maroon
- more than 20 years = black

Psuedocode:  
- [Generate the minimum and maximum bounds of the geography](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoSeries.total_bounds.html)
- [Create a subplot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html)
- [Assign the vertical view limit](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_ylim.html)
- [Assign the hoirzontal view limit](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_xlim.html)
- [Assign a title](https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_title.html)
- [Plot the geometry/districts](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.plot.html#geopandas.GeoDataFrame.plot)
- [Plot the weather stations](https://geopandas.org/en/stable/docs/reference/api/geopandas.GeoDataFrame.plot.html#geopandas.GeoDataFrame.plot)
- [Add a legend](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.legend.html)

In [None]:
minx, miny, maxx, maxy = agRegions.total_bounds
fig, ax = plt.subplots(figsize=(20, 20))
ax.set_ylim(miny, maxy)
ax.set_xlim(minx, maxx)
ax.set_title("Number of Years of Data for Daily and Hourly Stations")
agRegions.plot(ax=ax, color=agRegions["color"], edgecolor="black")
coloredHlyStations.plot(ax=ax, color=coloredHlyStations["color"], markersize=25)
coloredDlyStations.plot(ax=ax, color=coloredDlyStations["color"], markersize=5)

plt.legend(
    title="Hourly Stations are Larger, Daily Stations are Smaller",
    title_fontsize="20",
    handles=[
        hasLessThan5Yrs,
        hasLessThan10Yrs,
        hasLessThan15Yrs,
        hasLessThan20Yrs,
        hasMoreThan20Yrs,
    ],
    fontsize="30",
)

Purpose:  
Creates a report based on the data of a weather stations which considers the following:
- Raw number of stations
- Number of stations that satisfy elevation and have a unique position
- Number of stations with at least 5 years of data
- Number of stations with at least 10 years of data
- Number of stations with at least 15 years of data
- Number of stations with at least 20 years of data
- Number of stations with more then 20 years of data
- Number of stations that were active as of 2022

In [None]:
def genReport(
    allHlyStations: gpd.GeoDataFrame,
    hlyStations: gpd.GeoDataFrame,
    coloredHlyStations: gpd.GeoDataFrame,
    activeHlyStations: gpd.GeoDataFrame,
    allDlyStations: gpd.GeoDataFrame,
    dlyStations: gpd.GeoDataFrame,
    coloredDlyStations: gpd.GeoDataFrame,
    activeDlyStations: gpd.GeoDataFrame,
    append: str = "",
):
    print(
        f"""
        {append}Raw number of hourly stations: {len(allHlyStations.index)}
        {append}Raw number of daily stations: {len(allDlyStations.index)}

        {append}Number of hourly stations that satisfy elevation and have a unique position: {len(hlyStations.index)}
        {append}Number of daily stations that satisfy elevation and have a unique position: {len(dlyStations.index)}

        {append}The following statistics describe the restricted set of weather stations:
        {append}\tNumber of hourly stations with at least 5 years of data: {len(coloredHlyStations[coloredHlyStations["color"] == hasLessThan5Col])}
        {append}\tNumber of daily stations with at least 5 years of data: {len(coloredDlyStations[coloredDlyStations["color"] == hasLessThan5Col])}

        {append}\tNumber of hourly stations with at least 10 years of data: {len(coloredHlyStations[coloredHlyStations["color"] == hasLessThan10Col])}
        {append}\tNumber of daily stations with at least 10 years of data: {len(coloredDlyStations[coloredDlyStations["color"] == hasLessThan10Col])}

        {append}\tNumber of hourly stations with at least 15 years of data: {len(coloredHlyStations[coloredHlyStations["color"] == hasLessThan15Col])}
        {append}\tNumber of daily stations with at least 15 years of data: {len(coloredDlyStations[coloredDlyStations["color"] == hasLessThan15Col])}

        {append}\tNumber of hourly stations with at least 20 years of data: {len(coloredHlyStations[coloredHlyStations["color"] == hasLessThan20Col])}
        {append}\tNumber of daily stations with at least 20 years of data: {len(coloredDlyStations[coloredDlyStations["color"] == hasLessThan20Col])}
    
        {append}\tNumber of hourly stations with more than 20 years of data: {len(coloredHlyStations[coloredHlyStations["color"] == hasMoreThan20Col])}
        {append}\tNumber of daily stations with more than 20 years of data: {len(coloredDlyStations[coloredDlyStations["color"] == hasMoreThan20Col])}
    
        {append}\tNumber of hourly stations that were active as of 2022: {len(activeHlyStations.index)}
        {append}\tNumber of daily stations that were active as of 2022: {len(activeDlyStations.index)}\n
        """
    )

Purpose:  
Creates reports for all loaded data

In [None]:
genReport(
    allHlyStations,
    hlyStations,
    coloredHlyStations,
    activeHlyStations,
    allDlyStations,
    dlyStations,
    coloredDlyStations,
    activeDlyStations,
)

Purpose:  
Creates reports based on districts and provinces

In [None]:
for index, row in agRegions.iterrows():
    currDistrict = row["district"]
    province = "SK"

    if currDistrict < 4700:
        province = "MB"
    elif currDistrict >= 4800:
        province = "AB"

    allDistHlyStations = allHlyStations[allHlyStations["district"] == currDistrict]
    hlyDistStations = hlyStations[hlyStations["district"] == currDistrict]
    coloredDistHlyStations = coloredHlyStations[
        coloredHlyStations["district"] == currDistrict
    ]
    activeDistHlyStations = activeHlyStations[
        activeHlyStations["district"] == currDistrict
    ]

    allDistDlyStations = allDlyStations[allDlyStations["district"] == currDistrict]
    dlyDistStations = dlyStations[dlyStations["district"] == currDistrict]
    coloredDistDlyStations = coloredDlyStations[
        coloredDlyStations["district"] == currDistrict
    ]
    activeDistDlyStations = activeHlyStations[
        activeDlyStations["district"] == currDistrict
    ]

    print(f"District: {currDistrict} which is in {province}")
    genReport(
        allDistHlyStations,
        hlyDistStations,
        coloredDistHlyStations,
        activeDistHlyStations,
        allDistDlyStations,
        dlyDistStations,
        coloredDistDlyStations,
        activeDistDlyStations,
        "\t",
    )

    if len(hlyDistStations) > 0:
        print(
            f'\tHourly date range falls into: {int(hlyDistStations["min"].min())} - {int(hlyDistStations["max"].max())}'
        )
    if len(dlyDistStations) > 0:
        print(
            f'\tDaily date range falls into: {int(dlyDistStations["min"].min())} - {int(dlyDistStations["max"].max())}\n\n\n'
        )