# Multi-process models in pywatershed

In notebook `00_processes.ipynb`, we looked at how and individual Process representations work and are designed. In this notebook we learn how to put multiple Processes together in to a composite model. 

The starting point for the development of `pywatershed` was the NHM configuration of the PRMS model. In this notebook, we'll first construct a full NHM configuration run. The spatial domain we'll use will again be the Delaware River Basin. Once we construct the full model, we'll look at how we can also construct sub-models of the full NHM configuration.

In [None]:
%load_ext jupyter_black

In [None]:
import pathlib as pl
from pprint import pprint
import yaml

import pydoc

# import hvplot.xarray  # noqa
import numpy as np
import pywatershed as pws

# from tqdm.notebook import tqdm
# import xarray as xr

In [None]:
domain_dir = pws.constants.__pywatershed_root__ / "data/drb_2yr"
nb_output_dir = pl.Path("./01_multi-process_models")

The 8 conceptual Process classes that comprise the NHM are, in order:

In [None]:
nhm_processes = [
    pws.PRMSSolarGeometry,
    pws.PRMSAtmosphere,
    pws.PRMSCanopy,
    pws.PRMSSnow,
    pws.PRMSRunoff,
    pws.PRMSSoilzone,
    pws.PRMSGroundwater,
    pws.PRMSChannel,
]

The basis of assembling a multi-process model from these is using the Model class. 

In [None]:
# this is equivalent to help() but we get the multiline string and just look at part of it
model_help = pydoc.render_doc(pws.Model, "Help on %s")
print("\n".join(model_help.splitlines()[0:20]))  # the first 22 lines of help

The help mentions that there are 2 distinct ways of instantiation a Model class. In this notebook, we focus on the pywatershed-centric instatation and leave the PRMS-legacy instantiation to the following notebook. 

With the pywatershed-centric approach, the first argument is a "model dictionary" which does nearly all the work (the other arguments will be their default values). The help describes the model dictionary and provides examples. Please use it for reference and more details. Here we'll give an extended concrete example. The help also describes how a Model can be instantiated from a model dictionary in a yaml file. First we'll build a model dictionary in memory, then we'll write it out as a yaml file and build our model from it. 

## Model dictionary in memory

Because our (pre-existing) parameter files and our Process classes are consistently named, we can begin to build the model dictionary quickly.

In [None]:
model_dict = {}

for proc in nhm_processes:
    # this is the class name
    proc_name = proc.__name__
    # the processes can have arbitrary names in the model_dict and
    # an instance should not have capitalized name anyway (according to
    # python convention), so rename from the class name
    proc_rename = "prms_" + proc_name[4:].lower()
    # each process has a dictionary of information
    model_dict[proc_rename] = {}
    # alias to shorten lines below
    proc_dict = model_dict[proc_rename]
    # required key "class" specifys the class
    proc_dict["class"] = proc
    # the "parameters" key provides an instance of Parameters
    proc_param_file = domain_dir / f"parameters_{proc_name}.nc"
    proc_dict["parameters"] = pws.Parameters.from_netcdf(proc_param_file)
    # the "dis" key provides the name of the discretizations
    # which we'll supply shortly to the model dictionary
    if proc_rename == "prms_channel":
        proc_dict["dis"] = "dis_both"
    else:
        proc_dict["dis"] = "dis_hru"

Let's look at what we have so far in the model dict.

In [None]:
pprint(model_dict, sort_dicts=False)

We have given a name to each process and then supplied the class, its parameters, and its discretization for the full set of processes. Now we'll need to add the discretizations to the model dictionary. They are added at the top level and correspond to the names the processes used. 

In [None]:
model_dict = model_dict | {
    "dis_hru": pws.Parameters.from_netcdf(
        domain_dir / "parameters_dis_hru.nc"
    ),
    "dis_both": pws.Parameters.from_netcdf(
        domain_dir / "parameters_dis_both.nc"
    ),
}
pprint(model_dict, sort_dicts=False)

For the time being, PRMSChannel needs to know about both HRUs and segments, so `dis_both` is used. We plan to remove this requirement in the near future by implementing "exchanges" between processes into the model dictionary. Stay tuned.

You may have noticed that we are missing a Control object to provide time information to the processes. We'll supply that and we'll also supply the order that the processes are executed.

In [None]:
control = pws.Control(
    start_time=np.datetime64("1979-01-01T00:00:00"),
    end_time=np.datetime64("1979-07-01T00:00:00"),
    time_step=np.timedelta64(24, "h"),
    options={
        "input_dir": domain_dir,
        "init_vars_from_file": 0,
        "dprst_flag": True,
    },
)
model_order = ["prms_" + proc.__name__[4:].lower() for proc in nhm_processes]
model_dict = model_dict | {"control": control, "model_order": model_order}
pprint(model_dict, sort_dicts=False)

The `model_dict` now specifies a complete model built from multiple processes. They way these processes are connected can be figured out by the Model class, because each process fully describes itself (as we saw in the previous notebook). So let's see what happens:

In [None]:
model = pws.Model(model_dict)
model.initialize_netcdf(nb_output_dir / "nhm_memory")
model.run()

## Model dictionary yaml file
It may be preferable to have a model dictionary encoded in yaml file. Let's do that. First we'll need to write the control as a yaml file. To do that we need a serializable dictionary in python. 

In [None]:
run_dir = pl.Path(nb_output_dir / "nhm_yaml")
run_dir.mkdir(exist_ok=True)
control_dict = {
    "start_time": str(control.start_time),
    "end_time": str(control.end_time),
    "time_step": str(control.time_step)[0:2],
    "time_step_units": str(control.time_step)[3:4],
    "netcdf_output_dir": run_dir,
} | control.options

pprint(control_dict, sort_dicts=False)

We add the option `netcdf_output_dir` to the control since we assume we wont be able to do so at run time. Note that this option and the `input_dir` option are `pathlib.Path` objects. These are not what we want to write to file. We want their string version. We could do `str()` on each one by hand, but it will be more handy to write a small, recursive function to do this on a supplied dictionary since this will be a recurring task with the model dictionary we are about to create.

In [None]:
def dict_pl_to_str(the_dict):
    for key, val in the_dict.items():
        if isinstance(val, dict):
            the_dict[key] = dict_pl_to_str(val)
        elif isinstance(val, pl.Path):
            the_dict[key] = str(val)

    return the_dict


control_dict = dict_pl_to_str(control_dict)
pprint(control_dict, sort_dicts=False)

Next, we need to supply the model dictionary as above, but we need paths to parameter and discretization netcdf files instead of instantiated Parameter objects for both. 

In [None]:
control_yaml_file = run_dir / "control.yml"
model_dict = {
    "control": control_yaml_file.resolve(),
    "dis_hru": domain_dir / "parameters_dis_hru.nc",
    "dis_both": domain_dir / "parameters_dis_both.nc",
    "model_order": model_order,
}

for proc in nhm_processes:
    proc_name = proc.__name__
    proc_rename = "prms_" + proc_name[4:].lower()
    model_dict[proc_rename] = {}
    proc_dict = model_dict[proc_rename]
    proc_dict["class"] = proc_name
    proc_param_file = domain_dir / f"parameters_{proc_name}.nc"
    proc_dict["parameters"] = proc_param_file
    if proc_rename == "prms_channel":
        proc_dict["dis"] = "dis_both"
    else:
        proc_dict["dis"] = "dis_hru"

model_dict = dict_pl_to_str(model_dict)
pprint(model_dict, sort_dicts=False)

A note on paths in the yaml file. Because we are using files in two different locations which are not easily described relative to the location of yaml file, we are using absolute paths. However, one can also describe all paths relative to the location of the yaml file if that is more suitable to your purposes. 

Finally, we have the control and model dictionaries ready to write to yaml.

In [None]:
model_dict_yaml_file = run_dir / "model_dict.yml"
dump_dict = {control_yaml_file: control_dict, model_dict_yaml_file: model_dict}
for key, val in dump_dict.items():
    with open(key, "w") as file:
        documents = yaml.dump(val, file)

In [None]:
! cat 01_multi-process_models/nhm_yaml/control.yml

In [None]:
! cat 01_multi-process_models/nhm_yaml/model_dict.yml

In [None]:
model = pws.Model.from_yml(model_dict_yaml_file)
model.run()