## 1. Introduction
This notebook demonstrates a super simple, but potentially super useful function to find elements of the nwm network based on their GNIS name. 

The scripts used to develop the GNIS names table used in this script have not been maintained and we are working on updating them, so, for the meantime, we must warn that any results from this tool are to be considered only partially reproducible.  

## 2. First steps
Import basic libraries, set a root path, get connected to the repository if operating on colab, and set the path search directory.

In [None]:
import sys
import os
import pathlib
import json
import xarray as xr
import pandas as pd

try:
    import google.colab

    ENV_IS_CL = True
    root = pathlib.Path("/content/t-route").resolve()
    subprocess.run(["git", "clone", "https://github.com/NOAA-OWP/t-route.git"])
except:
    root = pathlib.Path("..").resolve()
sys.path.append(os.path.join(root, "src", "python_framework_v02"))


## 3. Import named segments
### 3a. Import the raw .json file
This file is a dictionary organized as:
```
{"Name of River as Key Value":[list,of,nwm,link,ids,with,that,name],
 "Another Name River":[xxxx,yyyy,etc.]}
```
That structure minimizes duplication. However, it should be noted that it also means that every Sand Creek is listed under the same `"Sand Creek"` key and that care must be taken in interpreting the results.

In [None]:
json_names_path = os.path.join(
    root, r"test", "input", "json", r"nwm_reaches_conus_v21_wgnis_name.json",
)

with open(json_names_path, "r") as json_file:
    names = json.load(json_file)
    if names:
        print(f"success!! -- imported {len(names)} GNIS names from the table.")


### 3b. Convert the names into a dataframe keyed on segment id

In [None]:
ids_wnames = {}
for name, ids in names.items():
    for id in ids:
        ids_wnames[id] = name

ids_wnames_df = pd.DataFrame.from_dict(ids_wnames, orient="index", columns=["name"])


## 4. Import the NWM segments 
### 4a. Import the raw table (a.k.a. 'the Route/Link' file) as a dataframe
This is the same file used for driving the channel computations of the National Water Model. Other scripts in this repository will download it automatically for you in the background -- here, we're just a little more practical: if you don't have it, just do this: 
```
export ROUTELINK="RouteLink_NHDPLUS.nc"
wget -c https://www.nco.ncep.noaa.gov/pmb/codes/nwprod/nwm.v2.0.2/parm/domain/$ROUTELINK
nccopy -d1 -s $ROUTELINK ${ROUTELINK/\.nc/_compressed.nc}
```
Then copy the resulting file into the `geo_file_path` given below.

In [None]:
geo_file_path = os.path.join(root, "test", "input", "geo", "Channels",)

full_nc_path = os.path.join(geo_file_path, "RouteLink_NHDPLUS.nwm.v2.0.2.nc",)

key_col = "link"

rows = None
rows = (xr.open_dataset(full_nc_path)).to_dataframe()
rows = rows.set_index([key_col])
if rows is not None:
    print(f"success!! -- created dataframe with {len(rows)} nwm segments.")


### 4b. Build a graph representation of the segment table
This takes as second -- first we pick out the two columns we want it the table -- `to` and `Length` -- then we crush the sub-set dataframe into a dictionary indexed by segment and with sub dictionary elements giving the lenght and downstream connection, like this:
```
{ Segment ID: {'to': downstream_segment_id, 'Length': segment_length},
    12582913: {'to': 12582911, 'Length': 1203.0},
    20971522: {'to': 20971524, 'Length': 16.0},
    ...
}
```
We can leave the proof for later, but this represents a directed acyclic graph representation of the stream network that we can use to figure out where the network goes. In the broarder repository, we have a whole pile of tools related to this concept and perhaps we can later come back here and make sure they are all in sync. For now, this is sufficient to mention as a teaser.

In [None]:
# NOTE: This is essentially the same thing provided by `get_down_connections` in networkbuilder
# TODO: Update networkbuilder to use a dataframe as is done here.
downstream_col = "to"
length_col = "Length"

mask_set = set(rows.index.values)
rows_dl = rows[
    [downstream_col, length_col]
].copy()  # Reduce the dataframe to just the needed keys
connections = (rows_dl.loc[mask_set]).to_dict("index")
if len(connections) == len(rows):
    print(f"success!! -- created dictionary DAG for nwm segments.")


## 5. Now, to process
### 5a. Let's remind ourselves where we headed 
_Our Goal?_
  * Find named segments tributary to the a particular named River

_The general process, i.e., pseudocode?_
  * Make a set of named segments for the river
  * Make a set of connections pointing to the named segments
  * Drop from that set connections that are in the river itself (i.e., segments in the river that point to the next downstream segment in the river)
  * Optionally, drop from that set connections that don't have a name themselves

Before we continue, 
### 5b. We should review what we have


In [None]:
print(f"Using names table: {json_names_path}")
print(f"Number of names in names table: {len(names)}")
print(f"Number of unique nwm segments with names in table: {len(ids_wnames)}")

print(f"Using nwm route-link file: {full_nc_path}")
print(f"Total number of nwm segments: {len(rows)}")


### 5c. Choose a River

In [None]:
name_to_find = "Cache la Poudre River"
try:
    named_segs = names[name_to_find]
except:
    print(f"{name_to_find} not found in the names table")
    raise ValueError
# name_to_find = names["Mississippi River"]
# name_to_find = names["Missouri River"]

print(
    f"Found {len(named_segs)} segments in GNIS Names table with the name '{name_to_find}'"
)


### 5d. Isolate set of nwm segments pointing to the chosen river
(We'll remove from the set all of the self-referential segments.)

In [None]:
print(f"Searching for tributary connections of the {name_to_find}")
to_named_segs = set()
for k, v in connections.items():
    if v[downstream_col] in named_segs:
        to_named_segs.add(k)
print(
    f"Found {len(to_named_segs)} connections pointing to the selected segments of the {name_to_find}"
)
tribs = to_named_segs - set(named_segs)
print(
    f"... and of those, {len(tribs)} are tributary connections, that is, other rivers flowing into the {name_to_find}"
)
print(f"(The rest, presumably, were part of the {name_to_find} itself.)")


### 5e. List the Tribs we've found, by Id and by name, if there is one

In [None]:
print(
    f"Found the following segments of the NWM that are tributary to the {name_to_find}"
)
print(tribs)


In [None]:
named_tribs = tribs & set(ids_wnames_df.index)
print(f"Of the {len(tribs)} tributary segments, {len(named_tribs)} have names.")


In [None]:
print(
    f"The nwm segments that are the downstream-most portion of named rivers directly tributary to the {name_to_find} are:"
)
print(named_tribs)


In [None]:
print(f"...or listed by name")
named_tribs_wnames = {trib: ids_wnames[trib] for trib in named_tribs}
print(named_tribs_wnames)
