# soilMoistureAggregation.ipynb
After loading the [soil moisture data](https://github.com/ChromaticPanic/CGC_Grain_Outcome_Predictions/blob/main/src/SatelliteSoilMoisture/PullMoistureData.ipynb), this script can be used to calculate the minimum, mean and maximum of all attributes per district and date

##### Output:
- [agg_soil_moisture](https://github.com/ChromaticPanic/CGC_Grain_Outcome_Predictions#agg_soil_moisture)

In [None]:
from SoilMoistureQueryHandler import SoilMoistureQueryHandler  # type: ignore
import matplotlib.pyplot as plt  # type: ignore
from dotenv import load_dotenv
import geopandas as gpd  # type: ignore
import sqlalchemy as sq
import pandas as pd
import xarray as xr
import os, sys

sys.path.append("../")
from Shared.DataService import DataService

In [None]:
LOG_FILE = "/data/pull_moisture.log"  # The file used to store progress information

# The table that will store the aggregated soil moisture data
TABLE = "agg_soil_moisture"
SOIL_MOISTURE_TABLE = "soil_moisture"  # The table that stores the soil moisture data


# Load the database connection environment variables located in the docker folder
load_dotenv("../docker/.env")
PG_USER = os.getenv("POSTGRES_USER")
PG_PW = os.getenv("POSTGRES_PW")
PG_DB = os.getenv("POSTGRES_DB")
PG_ADDR = os.getenv("POSTGRES_ADDR")
PG_PORT = os.getenv("POSTGRES_PORT")

Purpose:  
Outputs progress updates to log files/to the console if no filename is provided

Pseudocode:  
- Check if a filename is provided
- Sets the current directory to the files directory if it is
- Opens the file and adds the progress message
- Otherwise, print the message

In [None]:
def updateLog(fileName: str, message: str) -> None:
    try:
        if fileName is not None:
            with open(fileName, "a") as log:
                log.write(message + "\n")
    except Exception as e:
        print(message)

Purpose:  
Connect to the database

In [None]:
if (
    PG_DB is None
    or PG_ADDR is None
    or PG_PORT is None
    or PG_USER is None
    or PG_PW is None
):
    updateLog(LOG_FILE, "Missing database credentials")
    raise ValueError("Environment variables are not set")

db = DataService(PG_DB, PG_ADDR, int(PG_PORT), PG_USER, PG_PW)
conn = db.connect()

Purpose:  
Loads the soil moisture data from the soil moisture table

Tables:  
- [soil_moisture](https://github.com/ChromaticPanic/CGC_Grain_Outcome_Predictions#soil_moisture)

Psuedocode:  
- Create the soil moisture SQL query
- [Load the data from the database directly into a DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.read_sql.html)

In [None]:
query = sq.text("select * FROM public.{SOIL_MOISTURE_TABLE}")
sm_df = pd.read_sql(query, conn)

Purpose:  
Extract the individual date components (to replace the datetime64 date column)

Pseudocode:  
- [Convert the date column into type datetime](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html)
- Extract the year, month and day
- [Delete](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop.html) the original date column

In [None]:
sm_df["date"] = pd.to_datetime(sm_df["date"])
sm_df["year"] = sm_df["date"].dt.year
sm_df["month"] = sm_df["date"].dt.month
sm_df["day"] = sm_df["date"].dt.day

sm_df.drop(columns="date", inplace=True)

Purpose:  
Aggregate the soil moisture data by year, month, day, cr_num and district

Psuedocode:  
- [Aggregate](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.agg.html) the columns [by year, month, day, cr_num and district](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html)
- Name the columns into the final DataFrame

In [None]:
sm_df = (
    sm_df.groupby(["year", "month", "day", "cr_num", "district"])
    .agg({"soil_moisture": ["min", "max", "mean"]})
    .reset_index()
)

sm_df.columns = [  # type: ignore
    "year",
    "month",
    "day",
    "cr_num",
    "district",
    "soil_moisture_min",
    "soil_moisture_min",
    "soil_moisture_min",
]

Purpose:  
Push the soil moisture to the database then close the connection

In [None]:
sm_df.to_sql(TABLE, conn, schema="public", if_exists="replace")
db.cleanup()