# COMID Hiearachy

Exploration of using [HyRiver](https://docs.hyriver.io/) to get a list of `fromcomid` for any `comid`.
Potentially an alternate approach to Used to complete `geography/huc12_reach_analysis.ipynb`

Building off `geography/HUC hierarchy.ipynb`

Authors: Anthony Aufdenkampe



## Imports & Paths

In [2]:
from pathlib import Path

import numpy as np
import pandas as pd
import geopandas as gpd
import pyarrow as pa

import matplotlib.patches as mpatches
import matplotlib.pyplot as plt


import pynhd
# import pynhd as nhd
# from pynhd import NLDI, NHDPlusHR, WaterData

In [3]:
# Set your project directory to your local folder for your clone of this repository
project_path = Path.cwd().parent
project_path

PosixPath('/Users/aaufdenkampe/Documents/Python/pollution-assessment')

# Try HyRiver

Starting from this example:
https://docs.hyriver.io/examples/notebooks/nhdplus.html

In [30]:
nldi = pynhd.NLDI() #instantiate object of class
comid = 4651852

nldi.navigate_byid(
    fsource='comid',
    fid=comid,
    navigation="upstreamTributaries",
    source="flowlines",
    distance=1,
)

Unnamed: 0,geometry,nhdplus_comid
0,"LINESTRING (-75.66779 39.92765, -75.66713 39.9...",4651852


This only returns an aggregated flowline.
I can't seem to find a way to get upstream comids.

# Try USGS ScienceBase

## Latest USGS NHDPlus v2.1 on ScienceBase
Download from https://www.sciencebase.gov/catalog/item/63cb311ed34e06fef14f40a3


In [12]:
# From https://www.sciencebase.gov/catalog/item/63cb311ed34e06fef14f40a3
file = project_path / 'geography/enhd_nhdplusatts.parquet'

enhd_nhdplusatts = pd.read_parquet(
    file,
    # dtype_backend='pyarrow',
)

In [13]:
enhd_nhdplusatts.info()
enhd_nhdplusatts

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2667754 entries, 0 to 2667753
Data columns (total 32 columns):
 #   Column      Dtype   
---  ------      -----   
 0   comid       float64 
 1   tocomid     float64 
 2   fcode       float64 
 3   lengthkm    float64 
 4   reachcode   object  
 5   frommeas    float64 
 6   tomeas      float64 
 7   areasqkm    float64 
 8   arbolatesu  float64 
 9   terminalpa  float64 
 10  hydroseq    float64 
 11  levelpathi  float64 
 12  pathlength  float64 
 13  dnlevelpat  float64 
 14  dnhydroseq  float64 
 15  totdasqkm   float64 
 16  terminalfl  float64 
 17  streamleve  float64 
 18  streamorde  float64 
 19  vpuin       float64 
 20  vpuout      float64 
 21  wbareatype  category
 22  slope       float64 
 23  slopelenkm  float64 
 24  ftype       object  
 25  gnis_name   object  
 26  gnis_id     object  
 27  wbareacomi  int32   
 28  hwnodesqkm  float64 
 29  rpuid       object  
 30  vpuid       object  
 31  roughness   float64 
dty

Unnamed: 0,comid,tocomid,fcode,lengthkm,reachcode,frommeas,tomeas,areasqkm,arbolatesu,terminalpa,...,slope,slopelenkm,ftype,gnis_name,gnis_id,wbareacomi,hwnodesqkm,rpuid,vpuid,roughness
0,8318793.0,8318791.0,46003.0,1.295,18010102000885,0.0,100.0,1.3104,9.382,2071985.0,...,0.060834,1.295,StreamRiver,Bear Creek,218814,0,,18c,18,0.1821
1,8318787.0,8318793.0,46003.0,1.323,18010102000886,0.0,100.0,1.8486,5.717,2071985.0,...,0.095344,1.323,StreamRiver,Bear Creek,218814,0,,18c,18,0.2622
2,8318775.0,8318787.0,46003.0,2.883,18010102000887,0.0,100.0,4.0950,2.883,2071985.0,...,0.112738,2.732,StreamRiver,Bear Creek,218814,0,0.3735,18c,18,0.2064
3,8318785.0,8318793.0,46003.0,2.370,18010102000888,0.0,100.0,2.2689,2.370,2071985.0,...,0.151423,2.220,StreamRiver,,,0,0.1935,18c,18,0.3226
4,8318789.0,8318787.0,46003.0,1.511,18010102000889,0.0,100.0,1.1691,1.511,2071985.0,...,0.162461,1.361,StreamRiver,,,0,0.0711,18c,18,0.4175
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2667749,210053.0,206879.0,55800.0,0.378,12110208016875,0.0,100.0,0.8154,25.051,1771890.0,...,0.002011,0.378,ArtificialPath,,,205451,,12d,12,0.1003
2667750,210055.0,210053.0,55800.0,0.058,12110208016876,0.0,100.0,0.0009,4.122,1771890.0,...,0.000010,0.058,ArtificialPath,,,205451,,12d,12,0.1122
2667751,210057.0,210053.0,55800.0,0.033,12110208016877,0.0,100.0,0.0135,20.551,1771890.0,...,0.000010,0.033,ArtificialPath,,,205451,,12d,12,0.1245
2667752,942110021.0,33771791.0,46006.0,0.221,12110208017252,0.0,100.0,0.0621,0.221,2608416.0,...,0.000010,0.071,StreamRiver,,,0,,12d,12,0.1437


In [16]:
enhd_nhdplusatts.columns

Index(['comid', 'tocomid', 'fcode', 'lengthkm', 'reachcode', 'frommeas',
       'tomeas', 'areasqkm', 'arbolatesu', 'terminalpa', 'hydroseq',
       'levelpathi', 'pathlength', 'dnlevelpat', 'dnhydroseq', 'totdasqkm',
       'terminalfl', 'streamleve', 'streamorde', 'vpuin', 'vpuout',
       'wbareatype', 'slope', 'slopelenkm', 'ftype', 'gnis_name', 'gnis_id',
       'wbareacomi', 'hwnodesqkm', 'rpuid', 'vpuid', 'roughness'],
      dtype='object')

In [22]:
# Set types
columns_to_int = ['comid','tocomid', 'fcode']
enhd_nhdplusatts[columns_to_int] = enhd_nhdplusatts[columns_to_int].astype(pd.Int64Dtype())

columns_to_cat = ['reachcode', 'ftype', 'gnis_name', 'gnis_id', 'rpuid', 'vpuid', 'roughness']
enhd_nhdplusatts[columns_to_cat] = enhd_nhdplusatts[columns_to_cat].astype(pd.CategoricalDtype())

# Set index
enhd_nhdplusatts.sort_values('comid', inplace=True)
enhd_nhdplusatts.set_index('comid', inplace=True)

In [23]:
enhd_nhdplusatts.info()

<class 'pandas.core.frame.DataFrame'>
Index: 2667754 entries, 101 to 948100740
Data columns (total 31 columns):
 #   Column      Dtype   
---  ------      -----   
 0   tocomid     Int64   
 1   fcode       Int64   
 2   lengthkm    float64 
 3   reachcode   category
 4   frommeas    float64 
 5   tomeas      float64 
 6   areasqkm    float64 
 7   arbolatesu  float64 
 8   terminalpa  float64 
 9   hydroseq    float64 
 10  levelpathi  float64 
 11  pathlength  float64 
 12  dnlevelpat  float64 
 13  dnhydroseq  float64 
 14  totdasqkm   float64 
 15  terminalfl  float64 
 16  streamleve  float64 
 17  streamorde  float64 
 18  vpuin       float64 
 19  vpuout      float64 
 20  wbareatype  category
 21  slope       float64 
 22  slopelenkm  float64 
 23  ftype       category
 24  gnis_name   category
 25  gnis_id     category
 26  wbareacomi  int32   
 27  hwnodesqkm  float64 
 28  rpuid       category
 29  vpuid       category
 30  roughness   category
dtypes: Int64(2), category(8),

## Add to DRWI Reaches

In [25]:
# read geometry data from DRWI GeoParquet files
reach_gdf = gpd.read_parquet(project_path / 'geography/reach_gdf.parquet')

In [26]:
reach_gdf.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Index: 19496 entries, 1748535 to 932040370
Data columns (total 17 columns):
 #   Column              Non-Null Count  Dtype   
---  ------              --------------  -----   
 0   catchment_hectares  19496 non-null  float64 
 1   watershed_hectares  19496 non-null  float64 
 2   maflowv             19496 non-null  float64 
 3   geometry            19494 non-null  geometry
 4   cluster             17358 non-null  category
 5   sub_focusarea       186 non-null    Int64   
 6   nord                18870 non-null  Int64   
 7   nordstop            18844 non-null  Int64   
 8   huc12               19496 non-null  category
 9   streamorder         19496 non-null  int64   
 10  headwater           19496 non-null  int64   
 11  phase               4082 non-null   category
 12  fa_name             4082 non-null   category
 13  in_drb              19496 non-null  boolean 
 14  huc08               19496 non-null  category
 15  huc10               194

In [27]:
reach_gdf['tocomid'] = enhd_nhdplusatts.tocomid

In [28]:
reach_gdf.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Index: 19496 entries, 1748535 to 932040370
Data columns (total 18 columns):
 #   Column              Non-Null Count  Dtype   
---  ------              --------------  -----   
 0   catchment_hectares  19496 non-null  float64 
 1   watershed_hectares  19496 non-null  float64 
 2   maflowv             19496 non-null  float64 
 3   geometry            19494 non-null  geometry
 4   cluster             17358 non-null  category
 5   sub_focusarea       186 non-null    Int64   
 6   nord                18870 non-null  Int64   
 7   nordstop            18844 non-null  Int64   
 8   huc12               19496 non-null  category
 9   streamorder         19496 non-null  int64   
 10  headwater           19496 non-null  int64   
 11  phase               4082 non-null   category
 12  fa_name             4082 non-null   category
 13  in_drb              19496 non-null  boolean 
 14  huc08               19496 non-null  category
 15  huc10               194

In [None]:
# Save back
reach_gdf.to_parquet(
    project_path / 'geography/reach_gdf.parquet',
    engine='pyarrow'
)