# Processes

The atomic unit of modeling in `pywatershed` is the process representation. `Process` is the class that provides the basis for process modeling. The design of `Process` supports `pywatershed` goals of providing modular, concise, and self-describing models.

In this notebook, we'll take a look at an example `Process`. First we'll examine 

*How does a process work?*  How does a processes manage:
* time
* parameters
* inputs
* variables
* options
* initialization
* advancing
* calculation
* outputting to disk
* mass balance/budget tracking
* finalization

Second, we'll look at 

*Process design features*:
* Self-describing
    * dimensions, coords, parameters, inputs, variables, init_values,
      restart_variables
* Process and ConservativeProcess: budget

## Prerequisites

In [None]:
# auto-format the code in this notebook
%load_ext jupyter_black

In [None]:
import pathlib as pl
from pprint import pprint
import pydoc

import hvplot.xarray  # noqa
import numpy as np
import pywatershed as pws
from tqdm.notebook import tqdm
import xarray as xr

from helpers import gis_files

gis_files.download()  # this downloads GIS files

## How a Process works

We'll pick a simple, conceptual groundwater representation from PRMS as a case study. We'll simulate groundwater on the Delaware River Basin. 

To get started, we can ask for `help()`. At the moment `help` a bit too verbose, so we'll look at just its first 22 lines describing how this PRMSGroundwater is instantiated.

In [None]:
# this is equivalent to help() but we get the multiline string and just look at part of it
prms_gw_help = pydoc.render_doc(pws.PRMSGroundwater, "Help on %s")
# the first 22 lines of help(pws.PRMSGroundwater)
print("\n".join(prms_gw_help.splitlines()[0:22]))

The first line describes the module in which this class lives. The next line says that this class' parent class is `ConservativeProcess`. Next the signature for instantiating PRMSGroundwater is given, detailing its argument names, types, and defaults. Below this, more descriptive documentation is provided. 

A `ConservativeProcess` is a special `Process` that tracks mass conservation. That is to say that the `PRMSGroundwater` is a `ConservativeProcess`, which is itself a `Process`. So `PRMSGroundwater` is a concrete example we can investigate in more detail.

To get a PRMS groundwater reservoir representation, we need to supply the arguments. We'll discuss each of the arguments as we go.

*`control`:*
An instance of the Control class. According to `help(pws.Control)`, "Control manages global time and options, and provides metadata". Hydrologic processes are generally prognostic, meaning that the next state depends on the current state ($X_{t+1}(X_t)$). The fundamental progression of time in a `pywatershed` model simulation is managed by an instance of the `Control` object.

We'll create a Control object, it will just handle time.

In [None]:
control = pws.Control(
    start_time=np.datetime64("1979-01-01T00:00:00"),
    end_time=np.datetime64("1979-07-01T00:00:00"),
    time_step=np.timedelta64(24, "h"),
)

*`discretization`:*
From the signature we see that a discretization is of class `Parameters`. These are the static data that describe the spatial aspect of the this Process. We'll load these from an existing file. 

In [None]:
pws_root = pws.constants.__pywatershed_root__
domain_dir = pws_root / "data/drb_2yr"

dis_file = domain_dir / "parameters_dis_hru.nc"
assert dis_file.exists()
dis = pws.Parameters.from_netcdf(dis_file)
assert isinstance(dis, pws.Parameters)

*`parameters`:*
Of class `Parameters`, this argument provides the static/parameter values that model uses (beyond those in the discretization). They typically do not vary with the state of the model. We'll load these from an existing file.

In [None]:
param_file = domain_dir / "parameters_PRMSGroundwater.nc"
assert param_file.exists()
params = pws.Parameters.from_netcdf(param_file)
assert isinstance(params, pws.Parameters)

The remaining arguments we can supply in the call to `PRMSGroundwater`. 

*`soil_to_gw`, `ssr_to_gw`, and `dprst_seep_hru`:*
These are the time-varying variables that are the inputs or forcings of this Process. Note that the type to be supplied is `Union[str, pl.Path, numpy.ndarray, pywatershed.base.adapter.Adapter]`. We'll choose to provide `pl.Path` object to pass static files for input. These files have the required timeseries of inputs and `Process` knows how to adapt this kind of netcdf input.

*`budget_type`:*
The `PRMSGroundwater` process computes a mass-balance or mass budget because it is a special subclass of `Process` 
called a `ConservativeProcess`. This option describes what to do when the budget does not balance. We'll elect to `error`.

*`calc_method`:*
The numerics behind the core calculations in this process. This proces has 3 options. We'll take "numpy".

*`verbose`:* 
How much extra information do we want printed to the screen?

Putting it all together:

In [None]:
prms_gw = pws.PRMSGroundwater(
    control=control,
    discretization=dis,
    parameters=params,
    soil_to_gw=domain_dir / "soil_to_gw.nc",
    ssr_to_gw=domain_dir / "ssr_to_gw.nc",
    dprst_seep_hru=domain_dir / "dprst_seep_hru.nc",
    budget_type="error",
    calc_method="numpy",
    verbose=False,
)

Now we have an instance of a `PRMSGroundwater`. 

In [None]:
prms_gw

If we want output, we can initialize it by passing the desired directory for output.

In [None]:
run_dir = pl.Path("./00_processes")
run_dir.mkdir(exist_ok=True)
prms_gw.initialize_netcdf(run_dir)

Now we are ready to simulate. This sequence is the convention for advancing and calculating `Processes` in `pywatershed`. 

In [None]:
for tt in tqdm(range(control.n_times)):
    control.advance()
    prms_gw.advance()
    prms_gw.calculate(control.time_step)
    prms_gw.output()

No errors arose, we we assume that mass balance was maintained for the entire run. Before we finalize the process we can print out the budget at the current time.

In [None]:
prms_gw.budget

If the the budget did not balance, an error would be thrown and the `=` would be `!=!` in the summary.

We can see from the summary that we reached the end time, 1979-07-01. 

Now we can finalize the process. 

In [None]:
prms_gw.finalize()

Let's take a look at the output that was written to file. 

In [None]:
output_files = sorted(run_dir.glob("*"))
pprint(output_files)

We could look at the `PRMSGroundwater_budget.nc` file if we wanted more information on the budget throught time. Let's try it out. 

In [None]:
budget_ds = xr.open_dataset("00_processes/PRMSGroundwater_budget.nc")
display(budget_ds)

Note that it dosent keep all the terms, but just the aggregate inputs, outputs and storage change (which is indeed output, in this case with `gwres_stor_change.nc`, when the calculation is on individual spatial units). We can easily plot with a slider for selecting HRUs.

In [None]:
budget_ds.storage_changes_sum.hvplot(groupby="nhm_id")

Let's take a look at the output flux variable from its own file. 

In [None]:
var = "gwres_flow"
# There is only one variable per file, so bring this into a xr.DataArray
var_da = xr.open_dataset(f"00_processes/{var}.nc")[var]
display(var_da)
var_da.hvplot(groupby="nhm_id")

We can also generate a map of `gwres_flow` on the HRUs. Here we'll plot the mean `gwres_flow` for the simulation period.

In [None]:
proc_plot = pws.analysis.process_plot.ProcessPlot(gis_files.gis_dir / "drb_2yr")
proc_plot.plot_hru_var(
    var_name="gwres_flow",
    process=prms_gw,
    data=var_da.mean(dim="time"),
    data_units=var_da.attrs["units"],
    nhm_id=var_da["nhm_id"],
)

## Process design

That's great! But you may have a few questions.
* Why did we get those output variables?
* How were the terms in the mass budget decided or known?
* What are the parameters that were in that parameter file?
* What are the units of the inputs?

The answer is that Processes (and ConservativeProcesses) are self describing in code. For example: 

In [None]:
pws.PRMSGroundwater.get_dimensions()

In [None]:
pws.PRMSGroundwater.get_parameters()

In [None]:
pws.PRMSGroundwater.get_mass_budget_terms()

In this case, this code exactly specifies how the budget is calculated and is like what the developer of any `Process` would have to specify.

In [None]:
pws.PRMSGroundwater.get_variables()

These are the public variables that the `Process` maintains and, by default, it writes all of these out when output is requested. This list can be subset or output turned not turned on at all. However, `pywwatershed` only maintains the current time in memory (and the previous time for some, prognostic variables).

Moreover, all public variables are required to have metadata. Parameters also have metadata. The `meta` module (which control provides easy access to), can give metadata for requested variables. 

In [None]:
control.meta.find_variables(pws.PRMSGroundwater.get_variables())

When put all these functionalities together, we get the following function which deeply documents the internals of the `Process`:

In [None]:
pws.PRMSGroundwater.description()

These self-describing capabilities help users and programmers get answers about `Processes`. The self-describing design also helps supports generic code that connects multiple processes. Multiple process models will be explored in the next notebook.

More, in-depth details about the design of the `Process` class are available in the documentation (e.g. `help(pws.Process)` or [online](https://pywatershed.readthedocs.io/en/latest/api/generated/pywatershed.base.Process.html)).