# PRMSChannel FlowGraph with a STARFIT Reservoir: Big Sandy Reservoir

This notebook demonstrates the capabilities of the `FlowGraph` class and its associated classes
`FlowNode` and `FlowNodeMaker` in a real-world example. This example starts from an existing graph of 
flow, embedded in `PRMSChannel` and its parameters, and adds in a single new node to represent a reservoir
within the `PRMSChannel` simulation. 

The `FlowGraph` is the class which is able to take different flow methods and combine them in 
user-specified ways. In this case we combine nodes of class `PRMSChannelFlowNode` (a re-expression
of `PRMSChannel` as a `FlowNode` to work with `FlowGraph`) with one node of class `StarfitFlowNode`. 

Please see these links to the documentation for more details on each:
[`FlowGraph`](https://pywatershed.readthedocs.io/en/latest/api/generated/pywatershed.FlowGraph.html), 
[`StarfitFlowNode`](https://pywatershed.readthedocs.io/en/latest/api/generated/pywatershed.StarfitFlowNode.html), and 
[`PRMSChannelFlowNode`](https://pywatershed.readthedocs.io/en/latest/api/generated/pywatershed.PRMSChannelFlowNode.html).

In [None]:
import pathlib as pl
from pprint import pprint

import jupyter_black
import numpy as np
import pandas as pd
from tqdm.auto import tqdm
import xarray as xr

import hvplot.xarray  # noqa, after xr
import hvplot.pandas  # noqa, after pandas

import pyPRMS
import pywatershed as pws
from pywatershed.plot import DomainPlot
from pywatershed.constants import zero

ndays_run = 365 * 2
plot_height = 600
plot_width = 1000

jupyter_black.load()

In [None]:
pws.utils.addtl_domain_files.download()

In [None]:
nb_output_dir = pl.Path("./06_flow_graph_starfit")
if not nb_output_dir.exists():
    nb_output_dir.mkdir()

## The Big Sandy Dike and the Flaming Gorge Domain
Let's get to know something about this domain and reservoir. We'll load the full Global Reservoir and Dam (GRanD) data set and pull out the row for Big Sandy. 

In [None]:
# Below in this cell is how one would get parameters for reservoirs in the ISTARF-CONUS database.
# We use pre-canned parameters and do not do this here because the GRanD file
# must be downloaded manually from
# https://ln.sync.com/dl/bd47eb6b0/anhxaikr-62pmrgtq-k44xf84f-pyz4atkm/view/default/447819520013
# unpacked and placed in the location below, which can not be automated
# for testing this notebook.
# param_src_dir = nb_output_dir / "param_sources"
# param_src_dir.mkdir(exist_ok=True)
# grand_file = param_src_dir / "GRanD_Version_1_3/GRanD_reservoirs_v1_3.dbf"
# sf_params = pws.parameters.StarfitParameters.from_istarf_conus_grand(
#     grand_file=grand_file,
#     files_directory=param_src_dir,
#     grand_ids=[419],  # the id for Big Sandy
# )

In [None]:
pkg_root = pws.constants.__pywatershed_root__
big_sandy_param_file = pkg_root / "data/big_sandy_starfit_parameters.nc"
sf_params = pws.Parameters.from_netcdf(big_sandy_param_file, use_xr=True)

In [None]:
# Take a look in pandas format
sf_params.to_xr_ds().to_pandas()

The reservoir capacity is 67.1 million cubic meters (MCM). Let's find it on a map... 

In [None]:
start_lat = sf_params.parameters["LAT_DD"]
start_lon = sf_params.parameters["LONG_DD"]

In [None]:
domain_dir = pkg_root / "data/pywatershed_addtl_domains/fgr_2yr"
domain_gis_dir = pkg_root / "data/pywatershed_gis/fgr_2yr"

control_file = domain_dir / "nhm.control"

shp_file_hru = domain_gis_dir / "model_nhru.shp"
shp_file_seg = domain_gis_dir / "model_nsegment.shp"

In [None]:
domain_plot = DomainPlot(
    hru_shp_file=shp_file_hru,
    segment_shp_file=shp_file_seg,
    hru_parameters=domain_dir / "parameters_dis_hru.nc",
    hru_parameter_names=[
        "nhm_id",
        "hru_lat",
        "hru_lon",
        "hru_area",
    ],
    segment_parameters=domain_dir / "parameters_dis_seg.nc",
    segment_parameter_names=[
        "nhm_seg",
        "tosegment",
        "seg_length",
        "seg_slope",
        "seg_cum_area",
    ],
    start_lat=start_lat,
    start_lon=start_lon,
    start_zoom=13,
)

From the above, by mousing over the segments we can see the reservoir would be inserted above nhm_seg 44426 and below nhm_segs 44434 and 44435. 

For more context, zooming out shows the full Flaming Gorge Domain on the Green River. The openstreetmap layers shows that Big Sandy Dike is located near Farson, WY. In the EsriSatellite layer, we observe this is a very dry, high plains region with farming downstream of the Big Sandy and Eden reservoirs around Farson. We can also see that the reservoir is fed by snowpack and seasonal runoff from the high Wind River Range to the Northeast. The photo of Arrowhead Lake below (taken by the author in August 2023) looks southeast at Temple Mountain, across the furthest upstream HRU of the Big Sandy Dike. 
![Arrowhead Lake, August 2023](static/arrowhead_lake.jpg)

## NHM Run on Flaming Gorge Domain: NO Big Sandy
The NHM does not represent any reservoirs. From the above plot, we'll assume the outflows of Big Sandy would be at segment 44426. Let's see how the NHM represents flow at this location, without any reservoir representation. We can run pywatershed using the "legacy instantation" as described in Notebook 02.

In [None]:
control = pws.Control.load_prms(control_file, warn_unused_options=False)
control.edit_n_time_steps(ndays_run)

parameter_file = domain_dir / control.options["parameter_file"]
params = pws.parameters.PrmsParameters.load(parameter_file)

In [None]:
# We'll output the to-channel fluxes for use later when running FlowGraph as a post-process.
run_dir = nb_output_dir / "fgr_nhm"

control.options = control.options | {
    "input_dir": domain_dir,
    "budget_type": "error",
    "calc_method": "numba",
    "netcdf_output_dir": run_dir,
    "netcdf_output_var_names": [
        "seg_outflow",
        "sroff_vol",
        "ssres_flow_vol",
        "gwres_flow_vol",
    ],
}

nhm_processes = [
    pws.PRMSSolarGeometry,
    pws.PRMSAtmosphere,
    pws.PRMSCanopy,
    pws.PRMSSnow,
    pws.PRMSRunoff,
    pws.PRMSSoilzone,
    pws.PRMSGroundwater,
    pws.PRMSChannel,
]

In [None]:
%%time
if not run_dir.exists():
    # must delete the run dir to re-run
    run_dir.mkdir()
    nhm = pws.Model(
        nhm_processes,
        control=control,
        parameters=params,
    )
    nhm.run(finalize=True)
    nhm.finalize()

In [None]:
outflow = xr.open_dataarray(run_dir / "seg_outflow.nc").sel(nhm_seg=44426)

In [None]:
outflow.hvplot(width=plot_width, height=plot_height)

In [None]:
outflow_obs = (
    xr.open_dataarray(run_dir / "seg_outflow.nc")
    .sel(nhm_seg=44438)
    .rename("modeled")
    .to_dataframe()["modeled"]
)
obs_all = pyPRMS.DataFile(domain_dir / "sf_data").data_by_variable("runoff")
wh_poi_obs = np.where(params.parameters["poi_gage_segment"] == 184)
gage_id = params.parameters["poi_gage_id"][wh_poi_obs][0]
obs = obs_all[f"runoff_{gage_id}"]
obs.rename("gage " + obs.name, inplace=True)

outflow_obs.hvplot() * obs[0 : (365 * 2)].hvplot()

## FlowGraph in Model: NHM with a STARFIT representation of Big Sandy

Because FlowGraph is not part of PRMS, we cant run FlowGraph with PRMS/NHM using the legacy instantiation (eg. notebook 02). We have to use a multi-process model, and set it up "the pywatershed way" (as described in notebook 01). The next three cells build the multi-process model which flows into the FlowGraph. We then use a helper function to insert the STARFIT resevoir into the `FlowNode` representation of the PRMS/NHM Muskingum-Mann channel routing and we append this `FlowGraph` to our multi-process model.

In [None]:
params_file_channel = domain_dir / "parameters_PRMSChannel.nc"
params_channel = pws.parameters.PrmsParameters.from_netcdf(params_file_channel)

dis_file = domain_dir / "parameters_dis_hru.nc"
dis_hru = pws.Parameters.from_netcdf(dis_file, encoding=False)

dis_both_file = domain_dir / "parameters_dis_both.nc"
dis_both = pws.Parameters.from_netcdf(dis_both_file, encoding=False)

In [None]:
control = pws.Control.load_prms(control_file, warn_unused_options=False)
control.edit_n_time_steps(ndays_run)
run_dir = nb_output_dir / "fgr_starfit"
control.options = control.options | {
    "input_dir": domain_dir,
    "budget_type": "error",
    "calc_method": "numba",
    "netcdf_output_dir": run_dir,
    "netcdf_output_var_names": [
        "node_outflows",
        "node_upstream_inflows",
        "node_storages",
    ],
}

In [None]:
nhm_processes = [
    pws.PRMSSolarGeometry,
    pws.PRMSAtmosphere,
    pws.PRMSCanopy,
    pws.PRMSSnow,
    pws.PRMSRunoff,
    pws.PRMSSoilzone,
    pws.PRMSGroundwater,  # stop here, we'll add PRMSChannel as part of FlowGraph later
]

model_dict = {
    "control": control,
    "dis_both": dis_hru,
    "dis_hru": dis_both,
    "model_order": [],
}

# As in notebook 01
for proc in nhm_processes:
    proc_name = proc.__name__
    proc_rename = "prms_" + proc_name[4:].lower()
    model_dict["model_order"] += [proc_rename]
    model_dict[proc_rename] = {}
    proc_dict = model_dict[proc_rename]
    proc_dict["class"] = proc
    proc_param_file = domain_dir / f"parameters_{proc_name}.nc"
    proc_dict["parameters"] = pws.Parameters.from_netcdf(proc_param_file)
    if proc_rename == "prms_channel":
        proc_dict["dis"] = "dis_both"
    else:
        proc_dict["dis"] = "dis_hru"

In [None]:
# what did that give us?
pprint(model_dict, sort_dicts=False)

Now we have a model dictionary describing all the processes which flow into the `PRMSChannel` (Musking-Mann). We have a very nice helper function, `prms_channel_flow_graph_to_model_dict`, we can use to add a `FlowGraph` to this model. The function takes the existing `model_dict`, the `PRMSChannel` data, plus additional user-supplied information, to construct a `FlowGraph` new nodes inserted in to the `PRMSChannel`. In this case we'll add a single new node to the `PMRSChannel`, this will be a `StarfitFlowNode` inserted at the location above nhm segment 44426 (and below 44434 and 44435) to represent the Big Sandy dike. This `FlowGraph` instance is finaly added to the `model_dict` with the name "prms_channel_flow_graph". 

We'll see that the `prms_channel_flow_graph_to_model_dict` helper function will also add an `InflowExchange` instance to the `model_dict` named "inflow_exchange". This `InflowExchange` which will manage getting the fluxes from the other process into to the FlowGraph. Zero lateral flows are supplied to the StarfitNode for Big Sandy in this case (though we could do otherwise).

In [None]:
model_dict = pws.prms_channel_flow_graph_to_model_dict(
    model_dict=model_dict,
    prms_channel_dis=dis_both,
    prms_channel_dis_name="dis_both",
    prms_channel_params=params_channel,
    new_nodes_maker_dict={
        "starfit": pws.hydrology.starfit.StarfitFlowNodeMaker(
            None,
            sf_params,
            budget_type="error",
            compute_daily=True,
        )
    },
    new_nodes_maker_names=["starfit"],
    new_nodes_maker_indices=[0],
    new_nodes_maker_ids=[999],
    new_nodes_flow_to_nhm_seg=[44426],
    graph_budget_type="warn",  # move to error
)

In [None]:
# The "inflow_exchange" and "prms_channel_flow_graph" have been added to the model
pprint(model_dict, sort_dicts=False)

In [None]:
%%time
if not run_dir.exists():
    run_dir.mkdir()
    model = pws.Model(model_dict)
    model.run()
    model.finalize()

    model.processes["prms_channel_flow_graph"].budget

In [None]:
wh_44426 = np.where(params.parameters["nhm_seg"] == 44426)[0]
outflow_nodes = xr.open_dataarray(run_dir / "node_outflows.nc")[:, wh_44426]
outflow_nodes = outflow_nodes.drop_vars(set(outflow_nodes.coords) - {"time"})

In [None]:
xr.merge([outflow, outflow_nodes]).rename(
    {"seg_outflow": "NHM", "node_outflows": "STARFIT"}
).hvplot(
    width=plot_width,
    height=plot_height,
    ylabel="streamflow (cfs)",
)

In [None]:
storage_nodes = xr.open_dataarray(run_dir / "node_storages.nc")[:, -1]
storage_nodes = storage_nodes.drop_vars(set(storage_nodes.coords) - {"time"})
storage_nodes.hvplot(width=plot_width, height=plot_height)

In [None]:
xr.merge([outflow, outflow_nodes, storage_nodes]).rename(
    {
        "seg_outflow": "NHM",
        "node_outflows": "Big Sandy Outflow",
        "node_storages": "Big Sandy Storage",
    }
).hvplot(
    width=int(plot_width / 1.25),
    height=int(plot_height / 1.3 / 1.2),
    ylabel="streamflow (cfs)\nstorage (million cubic feet)",
).opts(
    legend_position="top_left"
)

## FlowGraph as a post-process: Drive FlowGraph with STARFIT representation of Big Sandy and Pass-Through using NHM output files
Above we ran the equivalent of the full NHM but with a `StarfitNode` inserted at Big Sandy. Pywatershed is flexible and no two process representations are two-way coupled in the NHM configuration. This means that the `FlowGraph` in the model run above could be run as a post-process on the rest of the model chain.

In fact, we can use the output of the first model run above, without any reservoir representation, to drive just the `FlowGraph` in the previous run. We call running `FlowGraph` in this way a "post-process". If one were running the no-reservoir model and investigating hypotheses of what FlowGraph designs give better flow representations, this is the method you'd want to follow instead of running all the model processes above `FlowGraph` every time.

For this post-process case we have a different helper function, `prms_channel_flow_graph_postprocess`, to which we supply most of the same information about the `FlowGraph`. However, we tell it about where it can find inputs from file rather than about an existing `model_dict` (as in the previous model above).

For extra fun and illustration, we'll not only add the `StarfitNode` for Big Sandy, we'll demonstrate that we can add additional nodes to the `FlowGraph` by putting a random `PassThroughFlowNode` elsewhere on the domain. This node has no effect on the flows by design, but adding it here shows how additional nodes can easily be added to a `FlowGraph`.

In [None]:
control = pws.Control.load_prms(control_file, warn_unused_options=False)
control.edit_n_time_steps(ndays_run)
run_dir = nb_output_dir / "fgr_starfit_post"
control.options = control.options | {
    "input_dir": domain_dir,
    "budget_type": "error",
    "calc_method": "numba",
    "netcdf_output_dir": run_dir,
    "netcdf_output_var_names": [
        "node_outflows",
        "node_upstream_inflows",
        "node_storages",
    ],
}

params_file_channel = domain_dir / "parameters_PRMSChannel.nc"
params_channel = pws.parameters.PrmsParameters.from_netcdf(params_file_channel)

if "dis_hru" in locals().keys():
    del dis_hru

dis_both_file = domain_dir / "parameters_dis_both.nc"
dis_both = pws.Parameters.from_netcdf(dis_both_file, encoding=False)

In [None]:
sfp_ds = sf_params.to_xr_ds().copy()
cap_mult = 1.5
sfp_ds["GRanD_CAP_MCM"] *= cap_mult
sf_params_new = pws.Parameters.from_ds(sfp_ds)

In [None]:
input_dir = nb_output_dir / "fgr_nhm"  # use the output of the NHM run

flow_graph = pws.prms_channel_flow_graph_postprocess(
    control=control,
    input_dir=input_dir,
    prms_channel_dis=dis_both,
    prms_channel_params=params_channel,
    new_nodes_maker_dict={
        "starfit": pws.hydrology.starfit.StarfitFlowNodeMaker(
            None,
            sf_params_new,
            compute_daily=False,
            budget_type="error",
        ),
        "pass_through": pws.hydrology.pass_through_flow_node.PassThroughFlowNodeMaker(),
    },
    new_nodes_maker_names=["starfit", "pass_through"],
    new_nodes_maker_indices=[0, 0],  # relative to the indvidual NodeMakers
    new_nodes_maker_ids=[999, 9999],  # relative to the indvidual NodeMakers
    new_nodes_flow_to_nhm_seg=[
        44426,
        44435,
    ],  # the second is a pass through above the first
    addtl_output_vars=["spill", "release"],
)

In [None]:
%%time
if not run_dir.exists():
    run_dir.mkdir()
    flow_graph.initialize_netcdf()
    for istep in tqdm(range(control.n_times)):
        control.advance()
        flow_graph.advance()
        flow_graph.calculate(1.0)
        flow_graph.output()

    flow_graph.finalize()
    flow_graph.budget

In [None]:
# wh_44426 = np.where(params.parameters["nhm_seg"] == 44426)[0]
outflow_nodes_post = xr.open_dataarray(run_dir / "node_outflows.nc")
wh_big_sandy = (outflow_nodes_post.node_maker_id == 999) & (
    outflow_nodes_post.node_maker_name == "starfit"
)
outflow_nodes_post = outflow_nodes_post[:, wh_big_sandy].rename(
    "node_outflows_post"
)
outflow_nodes_post = outflow_nodes_post.drop_vars(
    set(outflow_nodes_post.coords) - {"time"}
)

In [None]:
xr.merge(
    [
        outflow,
        outflow_nodes,
        outflow_nodes_post,
    ]
).rename(
    {
        "seg_outflow": "NHM",
        "node_outflows": "STARFIT",
        "node_outflows_post": f"STARFIT CAP*{cap_mult}",
    }
).hvplot(width=plot_width, height=plot_height, ylabel="streamflow (cfs)")

In [None]:
storage_nodes_post = xr.open_dataarray(run_dir / "node_storages.nc")
wh_big_sandy = (storage_nodes_post.node_maker_id == 999) & (
    storage_nodes_post.node_maker_name == "starfit"
)
storage_nodes_post = storage_nodes_post[:, wh_big_sandy].rename(
    "node_storages_post"
)
storage_nodes_post = storage_nodes_post.drop_vars(
    set(storage_nodes_post.coords) - {"time"}
)
xr.merge(
    [
        storage_nodes,
        storage_nodes_post,
    ]
).hvplot(
    width=plot_width,
    height=plot_height,
    ylabel="storage (million cubic feet)",
)

In [None]:
xr.merge(
    [
        outflow,
        outflow_nodes,
        storage_nodes,
        outflow_nodes_post,
        storage_nodes_post,
    ]
).rename(
    {
        "seg_outflow": "NHM",
        "node_outflows": "Big Sandy Outflow",
        "node_storages": "Big Sandy Storage",
        "node_outflows_post": f"Big Sandy Outflow CAP*{cap_mult}",
        "node_storages_post": f"Big Sandy Storage CAP*{cap_mult}",
    }
).hvplot(
    width=plot_width,
    height=plot_height,
    ylabel="streamflow (cfs)\nstorage (million cubic feet)",
)

While the `FlowGraph` itself only looks at lateral inflows, upstream inflows, and total outflows at each node, there may be other variables of interest on a node for the user. Looking at the properties or attributes of an individual `FlowNode` reveals what other variables are available for each node of that type. Above, the argument `addtl_output_vars=["spill", "release"]` was passed in the call to `pws.prms_channel_flow_graph_postprocess`. This requests that these variables are output to NetCDF files on nodes where they are available. Nodes where these variables are not available will contain missing (NaN) values. In the case of `StarfitFlowNode`s, the total outflow has two components, the spill and the release. From the node outflow variable, we can not see the individual contribution of these terms. So we request these variables are output and we see that in the second summer (June 13-16, 1980) there is indeed a spill event on Big Sandy which contributes to the total outflow.

In [None]:
spill = xr.open_dataarray(run_dir / "spill.nc")[:, wh_big_sandy]
release = xr.open_dataarray(run_dir / "release.nc")[:, wh_big_sandy]
drop_vars = set(spill.coords) - {"time"}
spill = spill.drop_vars(drop_vars)
release = release.drop_vars(drop_vars)

In [None]:
xr.merge(
    [
        outflow_nodes_post,
        spill,
        release,
    ]
).rename(
    {
        "node_outflows_post": f"Big Sandy Outflow CAP*{cap_mult}",
        "spill": f"Big Sandy Spill CAP*{cap_mult}",
        "release": f"Big Sandy Release  CAP*{cap_mult}",
    }
).hvplot(
    width=plot_width,
    height=plot_height,
    ylabel="streamflow (cfs)",
)