# Multi-process models in pywatershed: PRMS-legacy instantiation

Because pywatershed has its roots in the Precipitation-Runoff Modeling System (PRMS, Regan et al., 2015), pywatershed supports PRMS-model instantation from legacy PRMS input files. The traditional PRMS input files are the control file, the parameter file, and the climate-by-hru (CBH) files (typically daily precipitation and maximum and minimum temperatures). While the CBH files need to be pre-processed to NetCDF format, native PRMS control and parameter files are supported. 

Below we'll show how to preprocess the CBH files to NetCDF and how to instantiate a pywatershed `Model` using PRMS-native files. In this notebook we'll reproduce the basic results from the previous notebook (`01_multi-process_models.ipynb`) for the NHM full model and its submodel. As in the previous notebooks, this example will run PRMS processes models on the Delaware River Basin (DRB) subdomain of the NHM for a 2 year period using `pywatershed`.

## Prerequisites

In [None]:
import pathlib as pl
from platform import processor
from pprint import pprint
from shutil import rmtree
from sys import platform
import warnings

import pydoc

import hvplot.pandas  # noqa
import jupyter_black
import numpy as np
import pywatershed as pws
from pywatershed.utils import gis_files
import xarray as xr

gis_files.download()  # this downloads GIS files

jupyter_black.load()  # auto-format the code in this notebook

The domain directory is where we have all the required inputs to run this model (among others) and `nb_output_dir` is where this notebook will write its output. 

In [None]:
domain_dir = pws.constants.__pywatershed_root__ / "data/drb_2yr"
nb_output_dir = pl.Path("./02_prms_legacy_models")

## Preprocess CBH files to NetCDF format
We need to preprocess CBH to NetCDF files, let's see how to do it! We'll create a directory to hold these files and then we'll process precipitiation (prcp), maximum temperature (tmax), and minimum temperature (tmin) files from CBH to NetCDF using a utility from pywatershed.

In [None]:
cbh_nc_dir = nb_output_dir / "drb_2yr_cbh_files"
if cbh_nc_dir.exists():
    rmtree(cbh_nc_dir)
cbh_nc_dir.mkdir(parents=True)

cbh_files = [
    domain_dir / "prcp.cbh",
    domain_dir / "tmax.cbh",
    domain_dir / "tmin.cbh",
]

params = pws.parameters.PrmsParameters.load(domain_dir / "myparam.param")

for cbh_file in cbh_files:
    out_file = cbh_nc_dir / cbh_file.with_suffix(".nc").name
    pws.utils.cbh_file_to_netcdf(cbh_file, params, out_file)

## An NHM multi-process model: PRMS-legacy instantation

The 8 conceptual `Process` classes that comprise the NHM in pywatershed are, in order:

In [None]:
nhm_processes = [
    pws.PRMSSolarGeometry,
    pws.PRMSAtmosphere,
    pws.PRMSCanopy,
    pws.PRMSSnow,
    pws.PRMSRunoff,
    pws.PRMSSoilzone,
    pws.PRMSGroundwater,
    pws.PRMSChannel,
]

submodel_processes = [pws.PRMSSoilzone, pws.PRMSGroundwater, pws.PRMSChannel]

A multi-process model, comprised of the above `Process`es (see notebook `00_processes.ipynb`), is assembled by the `Model` class. We can take a quick look at the first 22 lines of help on `Model`:

In [None]:
# this is equivalent to help() but we get the multiline string and just look at part of it
model_help = pydoc.render_doc(pws.Model, "Help on %s")
# the first 22 lines of help(pws.Model)
print("\n".join(model_help.splitlines()[0:22]))

The `help()` mentions that there are 2 distinct ways of instantiating a `Model` class. In this notebook, we focus on the PRMS-legacy instantiation (see previous notebook for the pywatershed-centric way).

With the PRMS-legacy approach, the first argument is a "process list", which is what we defined with the 8 NHM classes above. In addition, we must also supply the `control` and `parameters` arguments. The full `help()` describes PRMS-legacy instantation and provides examples. Please use it for reference and more details. Here we'll give an extended concrete example. 

We already loaded a `PrmsParameters` object from a PRMS-native file when we converted the CBH files. We'll just check that it is an instance/subclass of the `Parameters` class. Then we'll return the object (which invokes the `__repr__` method on the object, just giving information about it) to see what we got.

In [None]:
print(isinstance(params, pws.Parameters))
params

We now load the PRMS-native control file into a `Control` object, 

In [None]:
control = pws.Control.load_prms(
    domain_dir / "nhm.control", warn_unused_options=False
)

control

When loading this PRMS-native parameter control file, we suppress warnings that indicating which PRMS options are not being used by pywatershed. For complete discussion of these see the help on `Control.load_prms()` in the documentation. The documentation covers what we also see in the above output, that the `netcdf_output_var_names` in `control.options` is the combination of `nhruOutVar_names` and `nsegmentOutVar_names` from the PRMS-native `nhm.control` file. The reqested output is different in this example than in the previous notebook, where all variables were output, for this reason. We'll keep all these variables for this model run and reduce the requested output later on in this notebook. However we'll need to add two pywatershed variables to this list to be able to run the sub-model below. 

Now we'll edit this control instance. First, we'll add the additional two output variables necessary to provide boundary conditions to the sub-model below. Next, we'll reduce the total simulation time to six months for the purposes of this demonstration (but feel free to increase this to the full 2 years available, if you like). Then, we'll specify several global options, including the location of the atmospheric forcing/input data, the budget type, and the calculation method.

In [None]:
control.options["netcdf_output_var_names"] += ["infil_hru", "sroff_vol"]
control.edit_end_time(np.datetime64("1979-07-01T00:00:00"))
run_dir = nb_output_dir / "nhm"
if run_dir.exists():
    rmtree(run_dir)
control.options = control.options | {
    "input_dir": cbh_nc_dir,
    "budget_type": "warn",
    "calc_method": "numba",
    "netcdf_output_dir": run_dir,
}

Now we can initialize the NHM model.

In [None]:
nhm = pws.Model(
    nhm_processes,
    control=control,
    parameters=params,
)

Numba is a Python package for just-in-time compiling. It takes code written using the `numpy` package and accelerates it by compiling at run time. The processes listed above compiled their code on initialization of the model. The processes that benefit from jit compiling are challenging to vectorize and have a loop over space which is accelerated by the compiling. The remaining processes, not jit compiled, are all vectorized (in space and time): PRMSSolarGeometry, PRMSAtmosphere. In fact, these two compute for all time at the beginning of the simulation (which is why there is a pause before the time loop starts in the next cell.)

Now we can run the model, requesting it finalize at the end, and we'll time the result.

In [None]:
%%time
nhm.run(finalize=True)

Now that we've run our NHM model on the DRB, let's take a look at the simulated streamflow. Still in memory are the streamflow values from the final timestep, we'll plot those on the stream network overlaid on the watershed. 

In [None]:
proc_plot = pws.analysis.ProcessPlot(gis_files.gis_dir / "drb_2yr")
proc_name = "PRMSChannel"
var_name = "seg_outflow"
proc = nhm.processes[proc_name]
proc_plot.plot(var_name, proc)

Above we see the network on which streamflow is calculated, and the final simulated values of streamflow. 

Let us turn towards the model structure in more detail: how do atmospheric inputs/forcings result in the simulated streamflow above? We will produce the model graph which shows the flow of information from the input files through all the process representations, all the way down to the channel streamflow. First, we print a color legend for each represented process in the NHM. Each process is outlined by a box of this color and values/fluxes flowing from a process have the color of the originating process. Finally, a variable outlined in blue (above and on the sides) participates in the mass budget of its process. This diagram gives some very specific information of the model conceptualization, how the processes relate to each other, and the complexity of the indivdual processes. (Note that the underlying graphviz/dot program that generates the plot is not fully working on Mac ARM/M1, so plots here and below are less detailed if you are are using such a machine, the notebooks in the repo will be complete for your reference.) Each process's data is placed into one of three categories: inputs(blue), parameters(orange), and variables(green). All of this information is public for each process (indeed in static methods) so we can produce these plots programatically without needing to run the `Model`. As discussed in the previous notebook, the `Model` object contains all the information needed to generate the plot when it is initialized.

In [None]:
palette = pws.analysis.utils.colorbrewer.nhm_process_colors(nhm)
pws.analysis.utils.colorbrewer.jupyter_palette(palette)
show_params = not (platform == "darwin" and processor() == "arm")
try:
    pws.analysis.ModelGraph(
        nhm,
        hide_variables=False,
        process_colors=palette,
        show_params=show_params,
    ).SVG(verbose=True, dpi=48)
except:
    static_url = "https://github.com/EC-USGS/pywatershed/releases/download/1.1.0/notebook_01_cell_11_model_graph.png"
    print(
        f"Dot fails on some machines. You can see the graph at this url: {static_url}"
    )
    from IPython.display import Image

    display(Image(url=static_url, width=1300))

## NHM Submodel for the Delaware River Basin 
Now suppose you wanted to change parameters or model process representation in the PRMSSoilzone to better predict streamflow. As the model is 1-way coupled, you can simply run a submodel starting with PRMSSoilzone and running through PRMSChannel. We simply change our process list to get this "submodel" of the full NHM model above.

We can reuse the existing parameter object (since it is write protected). However, we have to re-load the the control because it tracked time through the previous simulation. We point the model to look for input from the output for the full model above. We'll also turn off output by commenting `netcdf_output_dir`.

In [None]:
control_sub = pws.Control.load_prms(
    domain_dir / "nhm.control", warn_unused_options=False
)

run_dir_submodel = nb_output_dir / "nhm_submodel"
if run_dir_submodel.exists():
    rmtree(run_dir_submodel)

control_sub.edit_end_time(np.datetime64("1979-07-01T00:00:00"))
control_sub.options = control_sub.options | {
    "input_dir": run_dir,
    "budget_type": "warn",
    "calc_method": "numba",
    "netcdf_output_dir": run_dir_submodel,
}

control_sub.options["netcdf_output_var_names"] = (
    pws.PRMSChannel.get_variables()
)

We'll instantiate the model and display its `ModelGraph`.

In [None]:
submodel = pws.Model(
    submodel_processes,
    control=control_sub,
    parameters=params,
)

Now we'll run the submodel.

In [None]:
%%time
submodel.run(finalize=True)

We can visualize the sub-model with its `ModelGraph`.

In [None]:
show_params = not (platform == "darwin" and processor() == "arm")
try:
    pws.analysis.ModelGraph(
        submodel,
        hide_variables=False,
        process_colors=palette,
        show_params=show_params,
    ).SVG(verbose=True, dpi=48)
except:
    static_url = "https://github.com/EC-USGS/pywatershed/releases/download/1.1.0/notebook_01_cell_45_submodel_graph.png"
    print(
        f"Dot fails on some machines. You can see the graph at this url: {static_url}"
    )
    from IPython.display import Image

    display(Image(url=static_url, width=700))

Finally we can check that the output of the model and the submodel are identical. 

In [None]:
vars_output_both = set(control.options["netcdf_output_var_names"]) & set(
    control_sub.options["netcdf_output_var_names"]
)
for var in vars_output_both:
    print(var)
    nhm_da = xr.open_dataarray(run_dir / f"{var}.nc")
    sub_da = xr.open_dataarray(run_dir_submodel / f"{var}.nc")
    xr.testing.assert_equal(nhm_da, sub_da)