# 02 - Process models in pynhm

*James McCreight, September 2022*

---

The representation of "individual" physical processes in the model is fundamental. In pynhm we aim to make individual processes distinct units of code which can be run individually, assuming their inputs can be found. In our re-expression of PRMS, there are 8 "indivdial" processes (I keep putting "individual" in quotes because there are certainly debates over what that means and if processes are always aggregations of other processes. Effectively, the individual processes are conceptualizations that are useful for certain applications.) These are (with associated pynhm class names in parentheses):

* solar geometry (PRMSSolarGeometry)
* atmosphere (PRMSAtmosphere)
* canopy (PRMSCanopy)
* snow (PRMSSnow)
* runoff (PRMSRunoff)
* soil (PRMSSoilzone)
* groundwater (PRMSGroundwater)
* stream flow (PRMSChannel)

Given these process conceptualizations, we want to be able to run them in isolation. This provides a concise way to hypothesis testing, particularly of individual processes. 

## Requirements: pyws_nb virtual env
The pynhm virtual environment was installed in notebook 00. You need this environment to proceed. __This notebook is to be run with a bash kernel using the conda env: pyws_nb.__

## PRMSChannel: Single process model example
Let's say we are interested in just running the stream flow process in isolation. In pynhm we would do this

In [None]:
%load_ext jupyter_black

In [None]:
import pathlib as pl
import pywatershed
import tempfile

pynhm_root = pywatershed.constants.__pywatershed_root__
domain = "drb_2yr"
domain_dir = pynhm_root.parent / f"test_data/{domain}"
input_dir = domain_dir / "output"
run_dir = pl.Path(tempfile.mkdtemp())

control = pywatershed.Control.load(domain_dir / "control.test")
control.config["input_dir"] = input_dir
params = pywatershed.parameters.PrmsParameters.load(domain_dir / "myparam.param")

streamflow_model = pywatershed.Model(
    [pywatershed.PRMSChannel], control=control, parameters=params
)

We have not yet run the model, but a lot already happened. Here's a "play-by-play".
We imported the pynhm package and asked it where it was located (pynhm_root). Relative to this we know where our test data are and we use that to assign the `domain_dir` variable, which is where our control and parameter files are found for running our model domain. We chose to use the "drb_2yr" domain for this model. For our `input_dir` we selected the `output` subdirectory of the domain directory. What we are using here is saying we want to look for our inputs among the PRMS outputs. That is we want to drive our stream channel model with fluxes calculated by PRMS. We can illustrate that with with the following code. 

In [None]:
pywatershed.analysis.ModelGraph(streamflow_model, hide_variables=False).SVG(
    verbose=True
)

The plot shows inputs and (public) variables of the PRMSChannel process (the only process in our streamflow_model). The graph shows its inputs coming from file. It's inputs are mass flux volumes from groundwater (gwres_flow_vol), surface runoff (sroff_vol), and the subsurface reservoir (ssres_flow_vol). The files use for these inputs are those fluxes calculated by PRMS (in this case they've been converted to volumes instead of inches output natively by PRMS). We are using the PRMS calculated values out of convenience, but also because if our `pywatershed.PRMSChannel` process representation matches that of PRMS, our outputs will be identical. We will verify that momentarily. Also of note in the plot are the mass balance/budget terms outlined in blue. Stream channel has a "global" budget which is calculate over the entire spatial domain whereas HRU-based processes track a budget on each HRU. 

Returning to the previous code block, a temporary directory is created for the output of this model. Next, the parameter and control files are read in from the domain directory and the control object is given the parameters to manage. Finally, the `streamflow_model` is instantiated with the single process, providing the control object, a directory to search for input data, and the type of budget which is to error on imbalance. 

To run this model with netcdf output, we do the following:

In [None]:
%%time
streamflow_model.run(netcdf_dir=run_dir, finalize=True)

In [None]:
import process_plot

gis_dir = pynhm_root.parent / f"examples/pynhm_gis/{domain}"
proc_plot = process_plot.ProcessPlot(gis_dir)

proc_name = "PRMSChannel"
var_name = "seg_outflow"
proc = streamflow_model.processes[proc_name]
display(proc_plot.plot_seg_var(var_name, proc))

Now we can easily check the results of our pynhm model against PRMS. The model only maintains the current state, not the full timeseries, 

In [None]:
streamflow_model.processes[
    "PRMSChannel"
].seg_outflow.shape  # tolower on the names as they are instances?

So we'll evaluate the full timeseries of output retrieved from disk.

In [None]:
import numpy as np
import xarray as xr


def compare_results(var, verbose=False):
    ans_file = input_dir / f"{var}.nc"
    if not ans_file.exists():
        print(f"PRMS does not output {var}")
        return
    ans_prms = xr.open_dataset(ans_file)
    result_pynhm = xr.open_dataset(run_dir / f"{var}.nc")
    outflow_errs = result_pynhm - ans_prms
    outflow_rel_errs = outflow_errs / ans_prms
    tol = np.finfo(np.float32).resolution
    assert (abs(outflow_errs).max() < tol) | (abs(outflow_rel_errs).max() < tol)

    if verbose:
        display(ans_prms)
        display(result_pynhm)
        display(abs(outflow_errs).max().values())
        display(abs(outflow_rel_errs).max().values())

    return


for vv in pywatershed.PRMSChannel.get_variables():
    print(f"comparing variable: {vv}")
    verbose = False
    if vv == "seg_outflow":
        verbose = True
    compare_results(vv, verbose=verbose)
    print("")

The above code is similar to the test of pynhmPRMSChannel against PRMS5.2.1 output. We see that the absolute differences have a maximum of 0.02661 while the absolute differences normalized by the PRMS magnitudes are less than about 7.1e-7. That is, errors are in the range of single precision floating point accuracy. We call this result good enough as PRMS has a mixture of single and double precision types. 

Note that the (static)method `get_variables()` is defined on the `PRMSChannel` class but could also be obtained on the instance in the model (`streamflow_model.processes['PRMSChannel']` which should probably not be capitalized todo). This is one way in which process models are self-describing. Also apparent in the netcdf output is metadata describing the variable (particularly if you click on the little piece of paper looking icon to the right of `seg_outflow`. If you want a full description of the public inputs, variables, and parameters for a given process model you can do the following: 

In [None]:
from pprint import pprint, pformat

# pprint(pywatershed.PRMSChannel.description(), sort_dicts=False) ## to save space here, we'll just print the first 50 lines... but feel free to run this line yourself.
whole_repr = pformat(pywatershed.PRMSChannel.description(), sort_dicts=False)
for line in whole_repr.splitlines()[:50]:
    print(line)

We will discuss how process models are self-describing and the metadata in later notebooks.

## Multi-process model

Usually, a single process is just not enough! Let's make a model with more processes. For example we are interested in processes in the soil and groundwater and how these models contribute to changes in streamflow. We simply add the process classes we want to include as the first arguments to `pywatershed.Model()`. 

In [None]:
control = pywatershed.Control.load(domain_dir / "control.test")
control.config["input_dir"] = input_dir

multi_proc_model = pywatershed.Model(
    [
        pywatershed.PRMSSoilzone,
        pywatershed.PRMSGroundwater,
        pywatershed.PRMSChannel,
    ],
    control=control,
    parameters=params,
)

As before, we can plot the model graph. This time we adopt the standard color palette for the NHM processes that's already defined in the package.

In [None]:
palette = pywatershed.analysis.utils.colorbrewer.nhm_process_colors(multi_proc_model)
pywatershed.analysis.utils.colorbrewer.jupyter_palette(palette)
try:
    pywatershed.analysis.ModelGraph(
        multi_proc_model, process_colors=palette, hide_variables=False
    ).SVG(verbose=True)
except:
    print("Sorry, this fails for no good reason on Mac ARM architecture")

We run the model just as before

In [None]:
%%time
run_dir_multi_proc = pl.Path(tempfile.mkdtemp())
multi_proc_model.run(netcdf_dir=run_dir_multi_proc, finalize=True)

Let's plot some variables, say soil recharge storage, at the final time of the simulation. 

In [None]:
import process_plot

gis_dir = pynhm_root.parent / f"examples/pynhm_gis/{domain}"
proc_plot = process_plot.ProcessPlot(gis_dir)

proc_name = "PRMSSoilzone"
var_name = "soil_rechr"
proc = multi_proc_model.processes[proc_name]
display(proc_plot.plot_hru_var(var_name, proc))

We can also inspect the mass budget at the final time. Processes on HRUs check budgets on individual HRUs (or spatial units) whereas the `PRMSChannel` tracks a global budget, checking if the total water entering and leaving the channel model is consistent with its stoage changes.

In [None]:
pprint(multi_proc_model.processes["PRMSSoilzone"].get_mass_budget_terms())
multi_proc_model.processes["PRMSSoilzone"].budget

In [None]:
pprint(multi_proc_model.processes["PRMSChannel"].get_mass_budget_terms())
multi_proc_model.processes["PRMSChannel"].budget

## NHM Model

In [None]:
control = pywatershed.Control.load(domain_dir / "control.test")
control.config["input_dir"] = domain_dir  # the PRMS/NHM inputs are all in this level

nhm = pywatershed.Model(
    [
        pywatershed.PRMSSolarGeometry,
        pywatershed.PRMSAtmosphere,
        pywatershed.PRMSCanopy,
        pywatershed.PRMSSnow,
        pywatershed.PRMSRunoff,
        pywatershed.PRMSGroundwater,
        pywatershed.PRMSSoilzone,
        pywatershed.PRMSChannel,
    ],
    control=control,
    parameters=params,
)

In [None]:
palette = pywatershed.analysis.utils.colorbrewer.nhm_process_colors(nhm)
pywatershed.analysis.utils.colorbrewer.jupyter_palette(palette)
try:
    pywatershed.analysis.ModelGraph(
        nhm, process_colors=palette, hide_variables=False
    ).SVG(verbose=True)
except:
    print("Sorry, this fails for no good reason on Mac ARM architecture")