In [None]:
%run ./0_workspace_setup.ipynb


# Updating Climate Drivers with gdptools

The Willamette River modeling domain currently uses climate data from 1979–2022. The gridMET dataset, which is updated daily with a one-day lag, allows us to regularly update our climate drivers with the latest available data.

## Introduction

This notebook introduces tools and workflows for updating climate drivers in the current modeling domain. It covers:

1. An overview of the [`gdptools`](https://gdptools.readthedocs.io/en/develop/) package, which spatially interpolates gridded climate data to the polygonal modeling domain (HRUs) using areal-intersection weights.
2. A workflow for updating climate drivers using:
   - **`gdptools`** ([repo](https://code.usgs.gov/wma/nhgf/toolsteam/gdptools))
   - **`pyPRMS`** ([repo](https://github.com/DOI-USGS/pyPRMS/tree/development/docs))

## `gdptools` Package

`gdptools` is a Python package for spatially interpolating gridded data to a polygonal fabric using areal weighting. It is used here to interpolate [gridMET climate data](https://www.climatologylab.org/gridmet.html) to the Willamette River modeling domain. `gdptools` was also used to create the original climate drivers for this domain.

While `gdptools` provides the initial spatial interpolation, further post-processing is sometimes required:
1. Renaming variables and dimensions for compatibility with PRMS, pyPRMS, and pyWatershed.
2. Filling missing data, if the gridded dataset does not fully overlap the modeling domain. For the Willamette River domain, gridMET coverage is complete, but we include the filling step for completeness.

### Source and Target Data

- **Source data:** gridMET gridded climate data.
- **Target data:** The modeling domain, defined in `0_workspace_setup.ipynb` as `./domain_data/willamette_river/model_layers.gpkg`.

`gdptools` provides an interface to the ClimateR-Catalog, a collection of gridded climate datasets and metadata. gridMET is included in this catalog.

### Working with the ClimateR-Catalog

The [ClimateR-Catalog](https://github.com/mikejohnson51/climateR-catalogs) contains metadata for gridded climate datasets, including URLs, variable names, and coordinate information. We use the latest Parquet catalog file from [this release](https://github.com/mikejohnson51/climateR-catalogs/releases/download/June-2024/catalog.parquet), read it into a pandas DataFrame, and filter for the gridMET dataset. We then create a dictionary mapping variables of interest to their catalog entries.

`gdptools` provides the `ClimRCatData` data class, which we use in the workflow below. The steps are:
1. Read the catalog into a pandas DataFrame.
2. Search for the relevant gridMET data.
3. Use the `ClimRCatData` class to access the data.

In [None]:
import geopandas as gpd
import pandas as pd
import numpy as np
import xarray as xr

import hvplot.xarray
import hvplot.pandas
import hvplot.xarray

from gdptools import WeightGen
from gdptools import AggGen
from gdptools import ClimRCatData

from pathlib import Path

## Inspect the existing climate driver file
In this section, we will read the existing climate driver file and inspect its contents. The file is located at `./domain_data/willamette_river/cbh.nc`. We will use the `xarray` library to read the NetCDF file and display its structure.

Note the existing time bounds used in the Willamette River modeling domain are from 1979-2022. We will update these bounds to include the latest available data from gridMET. Also note the data variable names, `tmax`, `tmin`, `prcp`, their units and dimension names, particularly `nhru`, as we will have to rename and convert our initial data processed by `gdptools` in our post-processing steps below.

In [None]:
existing_ds = xr.open_dataset("domain_data/willamette_river/cbh.nc")
existing_ds

## 1. Read the ClimageR-Catalog into a pandas DataFrame and parameterize the `ClimRCatData` class, which we use to represent our source data.

The ClimateR-Catalog is a collection of gridded climate datasets and associated metadata. We will read the latest catalog file into a pandas DataFrame for further processing. In addition to the parquet file, there is also a JSON file available and for first time users it can be useful to open the file in a text editor that supports [JSON](https://github.com/mikejohnson51/climateR-catalogs/releases/download/June-2024/catalog.json) formatting to see the structure of the catalog.  Also the gdptools documentation has a [table](https://gdptools.readthedocs.io/en/develop/#example-catalog-datasets) of common datasets available in the catalog.  

In [None]:
climrcat_url = "https://github.com/mikejohnson51/climateR-catalogs/releases/download/June-2024/catalog.parquet"
climrcat_df = pd.read_parquet(climrcat_url)
climrcat_df

### Pandas DataFrame Query Language

Pandas provides a powerful query language that allows us to filter and manipulate data in a DataFrame. We can use the `query()` method to filter rows based on specific conditions. For example, if we want to filter the DataFrame for rows where the `id` column is equal to `gridmet`, we can use the following syntax where `@` is used to reference variables in the query string:

```python

In [None]:
_id = "gridmet"
gridmet_df = climrcat_df.query("id == @ _id")
gridmet_df

### Further Processing of the ClimateR-Catalog Data for use with `gdptools` `ClimRCatData`

Once we have filtered the DataFrame for the gridMET dataset, we can create a dictionary that maps variable names to their corresponding catalog entries. This will allow us to easily access the data for each variable when using `gdptools`.


In [None]:
# Create a dictionary of climateR-catalog values for each variable
tvars = ["tmmn", "tmmx", "pr"]
cat_params = [
    gridmet_df.query("id == @ _id & variable == @ _var").to_dict(orient="records")[0]for _var in tvars
]

cat_dict = dict(zip(tvars, cat_params))

# Output an example of the cat_param.json entry for "aet".
cat_dict.get("tmmn")

### Read in the target data file and inspect its contents

The `ClimRCatData` class requires a target data file that defines the polygonal modeling domain (HRUs). We will read the target data file located at `./domain_data/willamette_river/model_layers.gpkg` and inspect its contents. This file contains the geometry of the HRUs, which will be used for spatial interpolation of the climate data.  In addition, we need the column header used to identify the HRU geometry, in this case `model_hru_idx`.

In [None]:
target_gdf = gpd.read_file(
    "./domain_data/willamette_river/GIS/model_layers.gpkg", layer="nhru"
)
target_gdf

### Parameterize the `ClimRCatData` class

We use this data class to further parameterize WeightGen and AggGen classes in `gdptools` for generating areal weights and aggregating data, respectively. Set the period the time bounds for the data we want to process, in this case we will we will update the existing data through then of 2024.

In [None]:
user_data = ClimRCatData(
    cat_dict=cat_dict,
    f_feature=target_gdf,
    id_feature="model_hru_idx",
    period=["2023-01-01", "2024-12-31"],
)

## 2. Generate Areal Weights with `gdptools`

The Areal Weights Generator (`WeightGen`) in `gdptools` is used to create areal weights for spatial interpolation of gridded climate data to the polygonal modeling domain (HRUs). The weights are calculated based on the intersection of the gridded data with the HRU geometries. The weights are a table representing the target column header id, the gridded data cell ids (i and j indexes), and the normalized areal weights for each cell within each HRU.  As long as the source and target data are the same used in generating the weights, the weights can be reused for subsequent updates to the climate drivers.

> Note: The `calculate_weights` method returns the weights as a pandas DataFrame, and also saves the weights to a CSV file for later use. 

In [None]:
gdptools_path = Path("./domain_data/willamette_river/gdptools")
if not gdptools_path.exists():
    gdptools_path.mkdir(parents=True)
weights_file = gdptools_path / "gridmet_Wn_wghts.csv"
if not weights_file.exists():
    wght_gen = WeightGen(
        user_data=user_data,
        method="serial",
        output_file=weights_file,
        weight_gen_crs=5070,
    )

wghts = wght_gen.calculate_weights()

In [None]:
wghts

A simple check on the generated weights is to group the weights by the target column header id and sum the weights. The sum should equal 1 for each target id, indicating that the weights are normalized.  Those target geometries with weights that sum to less than 1 indicate that the gridded data does not fully cover the HRU geometry, and we will need to fill those gaps in the post-processing step..

In [None]:
sum_wghts = wghts.groupby("model_hru_idx").sum().reset_index()
sum_wghts

### Inspect the generated weights

Here we provide a quick inspection of the generated weights to ensure there are no HRUs with weights that sum to less than 1. If there are, we will need to fill those gaps in the post-processing step.

> Note: In this case there are no HRUs with weights that sum to less than 1, indicating that the gridded data fully covers the HRU geometries.

In [None]:
# Define your tolerance (atol=absolute, rtol=relative)
tolerance = 1e-6

# Boolean mask: which values are NOT close to 1 (outside tolerance)
not_close_to_1 = ~np.isclose(sum_wghts['wght'], 1.0, atol=tolerance)

print(sum_wghts[not_close_to_1])

## 3. Aggregate the Climate Data with `gdptools`

In [None]:
agg_out_path = Path("./domain_data/willamette_river/gdptools")
agg_gen = AggGen(
    user_data=user_data,
    stat_method="masked_mean",
    agg_engine="serial",
    agg_writer="netcdf",
    weights= "./domain_data/willamette_river/gdptools/gridmet_Wn_wghts.csv",
    out_path=agg_out_path,
    file_prefix="cbh_2024_temp",
)
ngdf, ds_out = agg_gen.calculate_agg()

In [None]:
new_climate_ds = xr.open_dataset("domain_data/willamette_river/gdptools/cbh_2024_temp.nc")
new_climate_ds

In [None]:
ds_new = new_climate_ds.rename(
    {
        "daily_minimum_temperature": "tmin",
        "daily_maximum_temperature": "tmax",
        "precipitation_amount": "precip",
        "model_hru_idx": "hruid",
    }
)
ds_new

In [None]:
import pint_xarray

# 1. Quantify the dataset: attaches Pint units to each variable
quantified = ds_new.pint.quantify()

# 2. Perform unit conversions in-place on a copy, maintaining Dataset structure
#    Use assign() so you do not have to break out variables
quantified_converted = quantified.assign(
    tmin=quantified["tmin"].pint.to("degF"),
    tmax=quantified["tmax"].pint.to("degF"),
    precip=quantified["precip"].pint.to("inch")  # "inch" or "inches" depending on your registry
)

# 3. Dequantify: converts Pint Quantities back to vanilla xarray, puts units in .attrs
dequantified = quantified_converted.pint.dequantify()

# 4. Save to NetCDF
dequantified.to_netcdf("./domain_data/willamette_river/gdptools/cbh_2024.nc")

# 5. (Optional) Inspect units to verify
for v in dequantified.data_vars:
    print(f"{v}: {dequantified[v].attrs.get('units', 'No units attribute')}")

In [None]:
dequantified