# PRMSChannel FlowGraph with a STARFIT Reservoir: Big Sandy Reservoir

This notebook demonstrates the capabilities of the `FlowGraph` class and its associated classes
`FlowNode` and `FlowNodeMaker` in a real-world example. This example starts from an existing
flow graph which is in `PRMSChannel` and adds in a single new node to represent a reservoir
within the `PRMSChannel` simulation. The `FlowGraph` is the class which is able to take different
flow methods and combine them in user-specified ways. In this case we combine nodes of class
`PRMSChannelFlowNode` with one node of class `StarfitFlowNode`. 

Please see these links to the documentation for more details on 
[`FlowGraph`](https://pywatershed.readthedocs.io/en/latest/api/generated/pywatershed.FlowGraph.html), 
[`StarfitFlowNode`](https://pywatershed.readthedocs.io/en/latest/api/generated/pywatershed.StarfitFlowNode.html), and 
[`PRMSChannelFlowNode`](https://pywatershed.readthedocs.io/en/latest/api/generated/pywatershed.PRMSChannelFlowNode.html).

In [None]:
import pathlib as pl
from pprint import pprint

import jupyter_black
import numpy as np
import pandas as pd
from tqdm.auto import tqdm
import xarray as xr

import hvplot.xarray  # noqa, after xr
import hvplot.pandas  # noqa, after pandas

import pyPRMS
import pywatershed as pws
from pywatershed.plot import DomainPlot
from pywatershed.constants import zero

ndays_run = 365 * 2
plot_height = 600
plot_width = 1000

jupyter_black.load()

# to remove:
import warnings
warnings.filterwarnings("ignore")

In [None]:
pws.utils.addtl_domain_files.download()

In [None]:
nb_output_dir = pl.Path("./06_flow_graph_starfit")
if not nb_output_dir.exists():
    nb_output_dir.mkdir()

## The Big Sandy Dike and the Flaming Gorge Domain
Let's get to know something about this domain and reservoir. We'll load the full Global Reservoir and Dam (GRanD) data set and pull out the row for Big Sandy. 

In [None]:
pkg_root = pws.constants.__pywatershed_root__
big_sandy_param_file = pkg_root / "data/big_sandy_starfit_parameters.nc"
sf_params = pws.Parameters.from_netcdf(big_sandy_param_file, use_xr=True)

In [None]:
# Take a look in pandas format
sf_params.to_xr_ds().to_pandas()

The reservoir capacity is 67.1 million cubic meters (MCM). Let's find it on a map... 

In [None]:
start_lat = sf_params.parameters["LAT_DD"]
start_lon = sf_params.parameters["LONG_DD"]

In [None]:
domain_dir = pkg_root / "data/pywatershed_addtl_domains/fgr_2yr"
domain_gis_dir = pkg_root / "data/pywatershed_gis/fgr_2yr"

control_file = domain_dir / "nhm.control"

shp_file_hru = domain_gis_dir / "model_nhru.shp"
shp_file_seg = domain_gis_dir / "model_nsegment.shp"

In [None]:
# add GRanD shp file? or add to the object afterwards? option to get polygons
# for sf_params above? but how to show connectivity?
DomainPlot(
    hru_shp_file=shp_file_hru,
    segment_shp_file=shp_file_seg,
    hru_parameters=domain_dir / "parameters_dis_hru.nc",
    hru_parameter_names=[
        "nhm_id",
        "hru_lat",
        "hru_lon",
        "hru_area",
    ],
    segment_parameters=domain_dir / "parameters_dis_seg.nc",
    segment_parameter_names=[
        "nhm_seg",
        "tosegment",
        "seg_length",
        "seg_slope",
        "seg_cum_area",
    ],
    start_lat=start_lat,
    start_lon=start_lon,
    start_zoom=13,
)

From the above, by mousing over the segments we can see the reservoir should be inserted above nhm_seg 44426 and below nhm_segs 44434 and 44435. 

For more context, zooming out shows the full Flaming Gorge Domain on the Green River. The openstreetmap layers shows that Big Sandy Dike is located near Farson, WY. In the EsriSatellite layer, we observe this is a very dry, high plains region with farming downstream of the Big Sandy and Eden reservoirs around Farson. We can also see that the reservoir is fed by snowpack and seasonal runoff from the high Wind River Range to the Northeast. The photo of Arrowhead Lake below (taken by the author in August 2023) looks southeast at Temple Mountain, across the furthest upstream HRU of the Big Sandy Dike. 
![Arrowhead Lake, August 2023](static/arrowhead_lake.jpg)

## NHM Run on Flaming Gorge Domain: NO Big Sandy
The NHM does not represent any reservoirs. From above, we'll assume the outflows of Big Sandy are on segment 44426. We'll see how the NHM represents flow at Big Sandy.
We can run pywatershed using the "legacy instantation" as described in Notebook 02.

In [None]:
control = pws.Control.load_prms(control_file, warn_unused_options=False)
control.edit_n_time_steps(ndays_run)
parameter_file = domain_dir / control.options["parameter_file"]
params = pws.parameters.PrmsParameters.load(parameter_file)

In [None]:
# domain_dir = pl.Path("/Users/jmccreight/usgs/data/pynhm/fgr_2yr")
# # run just once to convert CBH/day forcing files to pywatershed, NetCDF format
# cbh_nc_dir = domain_dir
# cbh_files = [
#     domain_dir / "prcp_2yr.cbh",
#     domain_dir / "tmax_2yr.cbh",
#     domain_dir / "tmin_2yr.cbh",
# ]

# params = pws.parameters.PrmsParameters.load(domain_dir / "myparam.param")

# for cbh_file in cbh_files:
#     out_file = cbh_nc_dir / cbh_file.with_suffix(".nc").name
#     pws.utils.cbh_file_to_netcdf(cbh_file, params, out_file)

In [None]:
# We'll output the to-channel fluxes for use later when running FlowGraph as a post-process.
run_dir = nb_output_dir / "fgr_nhm"

control.options = control.options | {
    "input_dir": domain_dir,
    "budget_type": "error",
    "calc_method": "numba",
    "netcdf_output_dir": run_dir,
    "netcdf_output_var_names": [
        "seg_outflow",
        "sroff_vol",
        "ssres_flow_vol",
        "gwres_flow_vol",
    ],
}

nhm_processes = [
    pws.PRMSSolarGeometry,
    pws.PRMSAtmosphere,
    pws.PRMSCanopy,
    pws.PRMSSnow,
    pws.PRMSRunoff,
    pws.PRMSSoilzone,
    pws.PRMSGroundwater,
    pws.PRMSChannel,
]

In [None]:
%%time
if not run_dir.exists():
    # must delete the run dir to re-run
    run_dir.mkdir()
    nhm = pws.Model(
        nhm_processes,
        control=control,
        parameters=params,
    )
    nhm.run(finalize=True)
    nhm.finalize()

In [None]:
outflow = xr.open_dataarray(run_dir / "seg_outflow.nc").sel(nhm_seg=44426)

In [None]:
outflow.hvplot(width=plot_width, height=plot_height)

In [None]:
outflow_obs = (
    xr.open_dataarray(run_dir / "seg_outflow.nc")
    .sel(nhm_seg=44438)
    .rename("modeled")
    .to_dataframe()["modeled"]
)
obs_all = pyPRMS.DataFile(domain_dir / "sf_data").data_by_variable("runoff")
wh_poi_obs = np.where(params.parameters["poi_gage_segment"] == 184)
gage_id = params.parameters["poi_gage_id"][wh_poi_obs][0]
obs = obs_all[f"runoff_{gage_id}"]
obs.rename("gage " + obs.name, inplace=True)

outflow_obs.hvplot() * obs[0 : (365 * 2)].hvplot()

## FlowGraph in Model: NHM with a STARFIT representation of Big Sandy

Because FlowGraph is not part of PRMS, we cant run FlowGraph with PRMS/NHM using the legacy instantiation (eg. notebook 02). We have to use a multi-process model, the pywatershed way (e.g. notebook 01). The next three cells build the multi-process model above the FlowGraph. We then use a helper function to insert the STARFIT resevoir into the PRMS/NHM Muskingum-Mann channel routing and append it to our multi-process model.

In [None]:
params_file_channel = domain_dir / "parameters_PRMSChannel.nc"
params_channel = pws.parameters.PrmsParameters.from_netcdf(params_file_channel)

dis_file = domain_dir / "parameters_dis_hru.nc"
dis_hru = pws.Parameters.from_netcdf(dis_file, encoding=False)

dis_both_file = domain_dir / "parameters_dis_both.nc"
dis_both = pws.Parameters.from_netcdf(dis_both_file, encoding=False)

In [None]:
control = pws.Control.load_prms(control_file, warn_unused_options=False)
control.edit_n_time_steps(ndays_run)
run_dir = nb_output_dir / "fgr_starfit"
control.options = control.options | {
    "input_dir": domain_dir,
    "budget_type": "error",
    "calc_method": "numba",
    "netcdf_output_dir": run_dir,
    "netcdf_output_var_names": [
        "node_outflows",
        "node_upstream_inflows",
        "node_storages",
    ],
}

In [None]:
nhm_processes = [
    pws.PRMSSolarGeometry,
    pws.PRMSAtmosphere,
    pws.PRMSCanopy,
    pws.PRMSSnow,
    pws.PRMSRunoff,
    pws.PRMSSoilzone,
    pws.PRMSGroundwater,  # stop here, we'll add PRMSChannel as part of FlowGraph later
]

model_dict = {
    "control": control,
    "dis_both": dis_hru,
    "dis_hru": dis_both,
    "model_order": [],
}

# As in notebook 01
for proc in nhm_processes:
    proc_name = proc.__name__
    proc_rename = "prms_" + proc_name[4:].lower()
    model_dict["model_order"] += [proc_rename]
    model_dict[proc_rename] = {}
    proc_dict = model_dict[proc_rename]
    proc_dict["class"] = proc
    proc_param_file = domain_dir / f"parameters_{proc_name}.nc"
    proc_dict["parameters"] = pws.Parameters.from_netcdf(proc_param_file)
    if proc_rename == "prms_channel":
        proc_dict["dis"] = "dis_both"
    else:
        proc_dict["dis"] = "dis_hru"

In [None]:
# what did that give us?
pprint(model_dict, sort_dicts=False)

Now we have a model dictionary describing everything above the `PRMSChannel` (Musking-Mann). We have a very nice helper function, `prms_channel_flow_graph_to_model_dict`, we can use to add a `FlowGraph` to this model. The function takes the existing `model_dict`, the `PRMSChannel` information, plus additional user-supplied information, to construct a `FlowGraph` with a new `StarfitFlowNode` inserted in the `PRMSChannel` at the location above nhm segment 44426 (and below 44434 and 44435) to represent Big Sandy. This `FlowGraph` instance is added to the `model_dict` by name "prms_channel_flow_graph". 

The function will also add an `InflowExchange` instance to the `model_dict` named "inflow_exchange" which will manage getting the fluxes from PRMS to the FlowGraph. Zero lateral flows are supplied to the StarfitNode for Big Sandy in this case (though we could do otherwise).

In [None]:
model_dict = pws.prms_channel_flow_graph_to_model_dict(
    model_dict=model_dict,
    prms_channel_dis=dis_both,
    prms_channel_dis_name="dis_both",
    prms_channel_params=params_channel,
    new_nodes_maker_dict={
        "starfit": pws.hydrology.starfit.StarfitFlowNodeMaker(
            None,
            sf_params,
            budget_type="error",
            compute_daily=True,
        )
    },
    new_nodes_maker_names=["starfit"],
    new_nodes_maker_indices=[0],
    new_nodes_flow_to_nhm_seg=[44426],
    graph_budget_type="warn",  # move to error
)

In [None]:
# The "inflow_exchange" and "prms_channel_flow_graph" have been added to the model
pprint(model_dict, sort_dicts=False)

In [None]:
%%time
if not run_dir.exists():
    run_dir.mkdir()
    model = pws.Model(model_dict)
    model.run()
    model.finalize()

In [None]:
model.processes["prms_channel_flow_graph"].budget

In [None]:
wh_44426 = np.where(params.parameters["nhm_seg"] == 44426)[0]
outflow_nodes = xr.open_dataarray(run_dir / "node_outflows.nc")[
    :, wh_44426
].drop_vars("node_coord")

In [None]:
xr.merge([outflow, outflow_nodes]).rename(
    {"seg_outflow": "NHM", "node_outflows": "STARFIT"}
).hvplot(
    width=plot_width,
    height=plot_height,
    ylabel="streamflow (cfs)",
)

In [None]:
storage_nodes = xr.open_dataarray(run_dir / "node_storages.nc")[
    :, -1
].drop_vars("node_coord")
storage_nodes.hvplot(width=plot_width, height=plot_height)

In [None]:
xr.merge([outflow, outflow_nodes, storage_nodes]).rename(
    {
        "seg_outflow": "NHM",
        "node_outflows": "Big Sandy Outflow",
        "node_storages": "Big Sandy Storage",
    }
).hvplot(
    width=plot_width,
    height=plot_height,
    ylabel="streamflow (cfs)\nstorage (million cubic feet)",
)

## FlowGraph as a post-process: Drive FlowGraph with STARFIT representation of Big Sandy and Pass-Through using NHM output files
Above we ran the full NHM with a `StarfitNode` at Big Sandy. But pywatershed is flexible and in the NHM configuration no two process representations are two-way coupled. See [figure in the extended release notes](https://ec-usgs.github.io/pywatershed/assets/img/pywatershed_NHM_model_graph.png). (Note that some PRMS configurations in pywatershed can be two-way coupled between Runoff and Soilzone and/or Canopy and Snow.) In this case, the `PRMSChannel` is one-way coupled (forced) buy the rest of the model. So we could use the output of the first, NHM run above without any reservoir representation and use its outupts to drive just the `FlowGraph` in the run above. We might call running `FlowGraph` in this way a "post-process". If one were running the no-reservoir model and looking at hypotheses of what FlowGraphs give better flow representations, this is the method you'd want to follow.

So for this case we have a different helper function, `prms_channel_flow_graph_postprocess`, to which we supply most of the same information about the `FlowGraph`. However, we tell it about where it can find inputs from file rather than about an existing `model_dict` (as above).

For additional extra fun and illustration, we'll not only add the `StarfitNode` for Big Sandy, we'll demonstrate that we can add additional nodes to the `FlowGraph` by putting a random `PassThroughNode` elsewhere on the domain. This node has no effect on the flows by design, but adding it here shows how additional nodes can easily be added to a `FlowGraph`.

In [None]:
control = pws.Control.load_prms(control_file, warn_unused_options=False)
control.edit_n_time_steps(ndays_run)
run_dir = nb_output_dir / "fgr_starfit_post"
control.options = control.options | {
    "input_dir": domain_dir,
    "budget_type": "error",
    "calc_method": "numba",
    "netcdf_output_dir": run_dir,
    "netcdf_output_var_names": [
        "node_outflows",
        "node_upstream_inflows",
        "node_storages",
    ],
}

params_file_channel = domain_dir / "parameters_PRMSChannel.nc"
params_channel = pws.parameters.PrmsParameters.from_netcdf(params_file_channel)

if "dis_hru" in locals().keys():
    del dis_hru

dis_both_file = domain_dir / "parameters_dis_both.nc"
dis_both = pws.Parameters.from_netcdf(dis_both_file, encoding=False)

In [None]:
sfp_ds = sf_params.to_xr_ds().copy()
cap_mult = 1.5
sfp_ds["GRanD_CAP_MCM"] *= cap_mult
sf_params_new = pws.Parameters.from_ds(sfp_ds)

In [None]:
input_dir = nb_output_dir / "fgr_nhm"  # use the output of the NHM run

flow_graph = pws.prms_channel_flow_graph_postprocess(
    control=control,
    input_dir=input_dir,
    prms_channel_dis=dis_both,
    prms_channel_params=params_channel,
    new_nodes_maker_dict={
        "starfit": pws.hydrology.starfit.StarfitFlowNodeMaker(
            None,
            sf_params_new,
            compute_daily=False,
            budget_type="error",
        ),
        "pass_through": pws.hydrology.pass_through_node.PassThroughNodeMaker(),
    },
    new_nodes_maker_names=["starfit", "pass_through"],
    new_nodes_maker_indices=[0, 0],  # relative to the indvidual NodeMakers
    new_nodes_flow_to_nhm_seg=[
        44426,
        44435,
    ],  # the second is a pass through above the first
)

In [None]:
%%time
if not run_dir.exists():
    run_dir.mkdir()
    flow_graph.initialize_netcdf()
    for istep in tqdm(range(control.n_times)):
        control.advance()
        flow_graph.advance()
        flow_graph.calculate(1.0)
        flow_graph.output()

    flow_graph.finalize()

In [None]:
wh_44426 = np.where(params.parameters["nhm_seg"] == 44426)[0]
outflow_nodes_post = (
    xr.open_dataarray(run_dir / "node_outflows.nc")[:, wh_44426]
    .drop_vars("node_coord")
    .rename("node_outflows_post")
)

In [None]:
xr.merge(
    [
        outflow,
        outflow_nodes,
        outflow_nodes_post,
    ]
).rename(
    {
        "seg_outflow": "NHM",
        "node_outflows": "STARFIT",
        "node_outflows_post": f"STARFIT CAP*{cap_mult}",
    }
).hvplot(width=plot_width, height=plot_height, ylabel="streamflow (cfs)")

In [None]:
storage_nodes_post = (
    xr.open_dataarray(run_dir / "node_storages.nc")[
        :, -2
    ]  # pass through is the last node this time
    .drop_vars("node_coord")
    .rename("node_storages_post")
)
xr.merge(
    [
        storage_nodes,
        storage_nodes_post,
    ]
).hvplot(
    width=plot_width,
    height=plot_height,
    ylabel="storage (million cubic feet)",
)

In [None]:
xr.merge(
    [
        outflow,
        outflow_nodes,
        storage_nodes,
        outflow_nodes_post,
        storage_nodes_post,
    ]
).rename(
    {
        "seg_outflow": "NHM",
        "node_outflows": "Big Sandy Outflow",
        "node_storages": "Big Sandy Storage",
        "node_outflows_post": f"Big Sandy Outflow CAP*{cap_mult}",
        "node_storages_post": f"Big Sandy Storage CAP*{cap_mult}",
    }
).hvplot(
    width=plot_width,
    height=plot_height,
    ylabel="streamflow (cfs)\nstorage (million cubic feet)",
)