# Multi-process models in pywatershed: PRMS-legacy instantiation

Because `pywatershed` has its roots in the Precipitation-Runoff Modeling System (PRMS, Regan et al., 2015), `pywatershed` supports PRMS-model instantation from legacy PRMS input files. The traditional PRMS input files are the control file, the parameter file, and the climate-by-hru (CBH) files (typically daily precipitation and maximum and minimum temperatures). While the CBH files need to be pre-processed to NetCDF format, native PRMS control and parameter files are supported. 

Below we'll show how to preprocess the CBH files to NetCDF and how to instantiate a `Model` using PRMS-native files. In this notebook we'll reproduce the basic results from the previous notebook (01) for the NHM full model and its submodel. As in the previous notebooks, we'll look at running PRMS processes moels on the Delaware River Basin (DRB) subdomain of the NHM for a 2 year period using `pywatershed`.

## Prerequisites

In [None]:
# auto-format the code in this notebook
%load_ext jupyter_black

In [None]:
import pathlib as pl
from platform import processor
from pprint import pprint
from shutil import rmtree
from sys import platform

import pydoc

import hvplot.pandas  # noqa
import numpy as np
import pywatershed as pws
import xarray as xr

from helpers import gis_files

gis_files.download()  # this downloads GIS files

The domain directory is where we have all the required inputs to run this model (among others) and `nb_output_dir` is where this notebook will write its output. 

In [None]:
domain_dir = pws.constants.__pywatershed_root__ / "data/drb_2yr"
nb_output_dir = pl.Path("./02_prms_legacy_models")

Since we need to preprocess CBH to netcdf files, let's see how to do it

In [None]:
cbh_nc_dir = nb_output_dir / "drb_2yr_cbh_files"
if cbh_nc_dir.exists():
    rmtree(cbh_nc_dir)
cbh_nc_dir.mkdir(parents=True)

cbh_files = {
    "prcp": domain_dir / "prcp.cbh",
    "tmax": domain_dir / "tmax.cbh",
    "tmin": domain_dir / "tmin.cbh",
}

params = pws.parameters.PrmsParameters.load(domain_dir / "myparam.param")

for kk, vv in cbh_files.items():
    out_file = cbh_nc_dir / f"{kk}.nc"
    pws.utils.cbh_files_to_netcdf({kk: vv}, params, out_file)

## An NHM multi-process model: PRMS-legacy instantation

The 8 conceptual `Process` classes that comprise the NHM are, in order:

In [None]:
nhm_processes = [
    pws.PRMSSolarGeometry,
    pws.PRMSAtmosphere,
    pws.PRMSCanopy,
    pws.PRMSSnow,
    pws.PRMSRunoff,
    pws.PRMSSoilzone,
    pws.PRMSGroundwater,
    pws.PRMSChannel,
]

A multi-process model, comprised of the above `Process`es (see notebook 00), is assembled by the `Model` class. We can take a quick look at the first 22 lines of help on `Model`:

In [None]:
# this is equivalent to help() but we get the multiline string and just look at part of it
model_help = pydoc.render_doc(pws.Model, "Help on %s")
# the first 22 lines of help(pws.Model)
print("\n".join(model_help.splitlines()[0:22]))

The `help()` mentions that there are 2 distinct ways of instantiating a `Model` class. In this notebook, we focus on the PRMS-legacy instantiation (see previous notebook for the pywatershed-centric way).

With the PRMS-legacy approach, the first argument is a "process list", which is what we defined with the 8 NHM classes above. In addition, we'll supply the `control` and `parameters` arguments. The full `help()` describes PRMS-legacy instantation and provides examples. Please use it for reference and more details. Here we'll give an extended concrete example. 

We already loaded a `PrmsParameters` object from a PRMS-native file when we converted the CBH files. We'll just check that it is an instance/subclass of the `Parameters` class. Then we'll return the object (which invokes the `__repr__` method on the object, just giving information about it) to see what we got.

In [None]:
print(isinstance(params, pws.Parameters))
params

We got a `PrmsParameters` object which is a subclass of `Parameters`. We now load the control file into a `Control` object, 

In [None]:
control = pws.Control.load(domain_dir / "control.test")
control

Now we'll edit this control object. First we'll reduce the total simulation time to six months for the purposes of this demonstration (but feel free to increase this to the full 2 years available, if you like). Next we'll specify several global options, including the location of the atmospheric forcing/input data, the budget type, and the calculation method.

In [None]:
control.edit_end_time(np.datetime64("1979-07-01T00:00:00"))
run_dir = nb_output_dir / "nhm"
control.options = control.options | {
    "input_dir": cbh_nc_dir,
    "budget_type": "warn",
    "calc_method": "numba",
    "netcdf_output_dir": run_dir,
}

Now we can initialize the NHM model.

In [None]:
nhm = pws.Model(
    nhm_processes,
    control=control,
    parameters=params,
)

Numba is a Python package for just-in-time compiling. It takes code written using the `numpy` package and accelerates it by compiling at run time. The processes listed above compiled their code on initialization of the model. The processes that benefit from jit compiling are challenging to vectorize and have a loop over space which is accelerated by the compiling. The remaining processes, not jit compiled, are all vectorized (in space and time): PRMSSolarGeometry, PRMSAtmosphere. In fact, these two compute for all time at the beginning of the simulation (which is why there is a pause before the time loop starts in the next cell.)

Now we can run the model, requesting it finalize at the end, and we'll time the result.

In [None]:
%%time
nhm.run(finalize=True)

Now that we've run our NHM model on the DRB, let's take a look at the simulated streamflow. Still in memory are the streamflow values from the final timestep, we'll plot those on the stream network overlaid on the watershed. 

In [None]:
proc_plot = pws.analysis.ProcessPlot(gis_files.gis_dir / "drb_2yr")
proc_name = "PRMSChannel"
var_name = "seg_outflow"
proc = nhm.processes[proc_name]
display(proc_plot.plot(var_name, proc))

Above we see the spatial domain, the outline of its extent, the network on which streamflow is calculated, and the final simulated values of streamflow. 

Let us turn towards the model structure in more detail: how do atmospheric inputs/forcings result in the simulated streamflow above? We will produce the model graph which shows the flow of information from the input files through all the process representations, all the way down to the channel streamflow. First, we print a color legend for each represented process in the NHM. Each process is outlined by a box of this color and values/fluxes flowing from a process have the color of the originating process. Finally, a variable outlined (above and on the sides) participates in the mass budget of its process. This diagram gives some very specific information of the model conceptualization, how the processes relate to each other, and the complexity of the indivdual processes. (Note that the underlying graphviz/dot program that generates the plot is not fully working on Mac ARM/M1, so plots here and below are less detailed if you are are using such a machine, the notebooks in the repo will be complete for your reference.) Each process's data is placed into one of three categories: inputs(blue), parameters(orange), and variables(green). All of this information is public for each process (indeed in static methods) so we can produce these plots programatically without needing to run the `Model`. As discussed in previous notebooks, the `Model` object contains all the information needed to generate the plot when it is initialized.

In [None]:
palette = pws.analysis.utils.colorbrewer.nhm_process_colors(nhm)
pws.analysis.utils.colorbrewer.jupyter_palette(palette)
show_params = not (platform == "darwin" and processor() == "arm")
try:
    pws.analysis.ModelGraph(
        nhm,
        hide_variables=False,
        process_colors=palette,
        show_params=show_params,
    ).SVG(verbose=True, dpi=48)
except:
    print("In some cases, dot fails on Mac ARM machines.")

## NHM Submodel for the Delaware River Basin 
Now suppose you wanted to change parameters or model process representation in the PRMSSoilzone to better predict streamflow. As the model is 1-way coupled, you can simply run a submodel starting with PRMSSoilzone and running through PRMSChannel. We simply change our process list to get this "submodel" of the full NHM model above.

In [None]:
submodel_processes = [pws.PRMSSoilzone, pws.PRMSGroundwater, pws.PRMSChannel]

We can reuse the existing parameter object (since it is write protected). However, we have to re-load the the control because it tracked time through the previous simulation. We point the model to look for input from the output for the full model above. We'll also turn off output by commenting `netcdf_output_dir`.

In [None]:
control = pws.Control.load(domain_dir / "control.test")
control.edit_end_time(np.datetime64("1979-07-01T00:00:00"))
control.options = control.options | {
    "input_dir": run_dir,
    "budget_type": "warn",
    "calc_method": "numba",
    "netcdf_output_dir": nb_output_dir / "nhm_submodel",
}

control.options["netcdf_output_var_names"] = pws.PRMSChannel.get_variables()

We'll instantiate the model and display its `ModelGraph`.

In [None]:
submodel = pws.Model(
    submodel_processes,
    control=control,
    parameters=params,
)

pws.analysis.ModelGraph(
    submodel,
    hide_variables=not show_params,
    show_params=show_params,
    process_colors=palette,
).SVG(verbose=True, dpi=48)

Now we'll run the submodel.

In [None]:
%%time
submodel.run(finalize=True)