# Subset NGEN HydroFabric on S3

**Authors:**  
   - Tony Castronova <acastronova@cuahsi.org>    
   - Irene Garousi-Nejad <igarousi@cuahsi.org>  
    
**Last Updated:** 06.19.2024   

**Description**:  

The purpose of this Jupyter Notebook is to demonstrate the process of preparing inputs required to execute the [NOAA Next Generation (NextGen) Water Resource Modeling Framework](https://github.com/NOAA-OWP/ngen). These inputs consist of the following components: 

- Hydrologic and hydrodynamic graphs based on the National Hydrologic Geospatial (Hydrofabric) data which includes catchments, nexus, and flowlines. 
- Model domain parameters represented as configuration files.
- Meteorological forcing data.

 The Hydrofabric data can be accessed publicly through the AWS catalog. In this notebook, we use the **pre-release** version of the dataset, which represents the most recent version available on the Amazon S3 Bucket at the time of developing this notebook (https://nextgen-hydrofabric.s3.amazonaws.com/index.html#pre-release/). The configuration files encompass model default parameters, formulations, input and output paths, simulation time step, initial conditions, and other relevant settings. This example demonstrates the retrieval of hydrofabric data, followed by the extraction of necessary infromation for creating the parameter configuration file. These files are created for running Conceptual Functional Equivalent (CFE) model and Simple Logical Tautology Handler (SLoTH) in the NGEN framework. To prepare forcing data, run the *ngen-hydrofabric-subset.ipynb* Jupyter Notebook.

**Software Requirements**:  

The software and operating system versions used to develop this notebook are listed below. To avoid encountering issues related to version conflicts among Python packages, we recommend creating a new environment variable and installing the required packages specifically for this notebook.


> boto3: 1.26.76  
  dask-core: 2023.4.0  
  fiona: 1.9.3  
  fsspec: 2023.4.0  
  geopandas: 0.12.2   
  ipyleaflet: 0.17.2  
  ipywidgets: 7.7.5   
  matplotlib: 3.7.1   
  netcdf4: 1.6.3   
  numpy: 1.24.2  
  pandas: 2.0.0  
  requests: 2.28.2  
  s3fs: 2023.4.0  
  scipy: 1.10.1  
  xarray: 2023.4.1
  
**Supplementary Code**

This notebook relies on the following external scripts:  
- `subset.py` - A script originally written by Nels Frazier to subset the NGen Hydrofabric

---

In [5]:
import os
import time
import fiona
import pandas
import subset
import pyproj
import datetime
import geopandas
import ipyleaflet
import geopandas as gpd
from pathlib import Path
from sidecar import Sidecar
from requests import Request
from ipywidgets import Layout

# from helpers import SideCarMap

## 1. Create a map and load the Hydrofabric VPU geometries

The data that we'll be using this notebook are located inside Vector Processing Unit 16. This area of the Hydrofabric covers the Great Basin. See the [NextGen Hydrofabric](https://mikejohnson51.github.io/hyAggregate/) help pages for more information regarding these data. A geopackage file may consist of many layers. Use `Fiona` to view the vector layers that are included within the `nextgen_16` geopackage.

In [6]:
# list the layers include in the nextgen geopackage for VPU 16.
fiona.listlayers('sample-data/nextgen_16.gpkg')

['hydrolocations',
 'nexus',
 'flowpaths',
 'lakes',
 'divides',
 'network',
 'flowpath_attributes',
 'layer_styles']

Our workflow will require the ID of the outlet catchment, so let's load the 'divides' layer. This layer contains over 30,000 features so we will not attempt to display it in the notebook. For reference, these data cover the following area:

<div>
<img src="img/vpu-16.png" width="500"/>
</div>



Load these data into `Geopandas` and convert into the coordinate reference we will be using in out leaflet map (EPSG: 4326).

In [7]:
gdf = geopandas.read_file('sample-data/nextgen_16.gpkg', layer='divides')
gdf = gdf.to_crs(epsg='4326')

Create an interactive map interface for us to select our area of interest. This map will contain USGS river gauges and NHD+ reach geometries to help us select our area of interest. 

In [11]:
class SideCarMap():
    def __init__(self, basemap=ipyleaflet.basemaps.OpenStreetMap.Mapnik, gdf=None, plot_gdf=False, name='Map'):
        self.selected_id = None
        self.selected_layer = None
        self.map = None
        self.basemap = basemap
        self.gdf = gdf
        self.plot_gdf = False
        self.name = name

    def display_map(self):
        defaultLayout=Layout(width='960px', height='940px')

        self.map = ipyleaflet.Map(
        basemap=ipyleaflet.basemap_to_tiles(ipyleaflet.basemaps.OpenStreetMap.Mapnik, layout=defaultLayout),
            center=(45.9163, -94.8593),
            zoom=9,
            scroll_wheel_zoom=True,
            tap=False
            )
        
        
        # add USGS Gages
        self.map.add_layer(
            ipyleaflet.WMSLayer(
                url='http://arcgis.cuahsi.org/arcgis/services/NHD/usgs_gages/MapServer/WmsServer',
                layers='0',
                transparent=True,
                format='image/png',
                min_zoom=8,
                max_zoom=18,
                )
        )
        
        # add NHD+ Reaches
        self.map.add_layer(
            ipyleaflet.WMSLayer(
                url='https://hydro.nationalmap.gov/arcgis/services/nhd/MapServer/WMSServer',
                layers='6',
                transparent=True,
                format='image/png',
                min_zoom=8,
                max_zoom=18,
                )
        )

        # add features from geopandas if they are provided
        if self.gdf is not None:

            # update the map center point
            polygon = box(*self.gdf.total_bounds)
            approx_center = (polygon.centroid.y, polygon.centroid.x)
            self.map.center = approx_center

            # bind the map handler function
            self.map.on_interaction(self.handle_map_interaction)

            if self.plot_gdf:
                print('Loading GDF Features...', end='')
                st = time.time()
                geo_data = ipyleaflet.GeoData(geo_dataframe = self.gdf,
                       style={'color': 'blue', 'opacity':0.5, 'weight':1.9,}
                      )
                self.map.add(geo_data)

                print(f'{time.time() - st:0.2f} sec')

        sc = Sidecar(title=self.name)
        with sc:
            display(self.map)
        
    def handle_map_interaction(self, **kwargs):
    
        if kwargs.get('type') == 'click':
            print(kwargs)
            lat, lon = kwargs['coordinates']
            print(f'{lat}, {lon}')
            
            # query the reach nearest this point
            point = shapely.Point(lon, lat)

            # buffer the selected point by a small degree. This
            # is a hack for now and Buffer operations should only
            # be applied in a projected coordinate system in the future.
            print('buffering')
            pt_buf = point.buffer(0.001) 

            try:
                # remove the previously selected layers
                if self.selected_layer is not None:
                    self.map.remove(self.selected_layer)
                # while len(self.map.layers) > 3:
                #     self.map.remove(self.map.layers[-1]);
                
                # query the FIM reach that intersects with the point
                print('intersecting...')
                mask = self.gdf.intersects(pt_buf)
                print(f'found {len(self.gdf.loc[mask])} reaches')
                print('saving selected...')
                self.selected(value=self.gdf.loc[mask].iloc[0])

                # highlight this layer on the map
                wlayer = ipyleaflet.WKTLayer(
                    wkt_string=self.selected().geometry.wkt,
                    style={'color': 'green', 'opacity':1, 'weight':2.,})
                self.map.add(wlayer)
                self.selected_layer = self.map.layers[-1]
                
            except Exception: 
                print('Could not find reach for selected area')

    # getter/setter for the selected reach
    def selected(self, value=None):
        if value is None:
            if self.selected_id is None:
                print('No reach is selected.\nUse the map interface to select a reach of interest')
            else:
                return self.selected_id
        else:
            self.selected_id = value


Launch the interactive map interface for us to select our area of interest. We'll pass the `divides` geometries that were loaded above so we can query them when the map is clicked and retrieve their metadata. 

In [12]:
m = SideCarMap(gdf=gdf, name='VPU 16 Map')
m.display_map()

After selecting our outlet catchment we can access it's metadata in the notebook using the `m.selected()` command. We're interested in the `id` attribute of this feature. It should look something like: `wb-2853613`.

In [13]:
selected_catchment = m.selected()
selected_catchment

divide_id                                                      cat-2853639
toid                                                           nex-2853640
type                                                               network
ds_id                                                                  NaN
areasqkm                                                           10.9161
id                                                              wb-2853639
lengthkm                                                          5.111081
tot_drainage_areasqkm                                              71.1414
has_flowline                                                          True
geometry                 POLYGON ((-111.74018480488945 40.5554322243749...
Name: 790, dtype: object

## 3. Subset hydrofabric data for the selected area

The following code passes the `id` and `VPU` values of the geometries selected on the map (`selected_df`) to the hydrofabric subsetting script (`subset.py`). The subsetting algorithm implemented in the code adopts a reverse tracing technique called `subset_upstream`. It systematically identifies and selects all the upstream divides, catchments, nexuses, and flowlines starting from the most downstream nexus linked to the chosen geometries.

In [17]:
selected_catchment.id

'wb-2853639'

In [20]:

id = selected_catchment.id
vpu = 16

output_files = []
 
print(50*'-'+f'\nProcessing VPU {vpu}, {id} ')
st = time.time()
# build the hydrofabric_url
# the complete dataset can be found at: s3://lynker-spatial
hydrofabric_url = f's3://lynker-spatial/hydrofabric/v20.1/gpkg/nextgen_{vpu}.gpkg'
subset.subset_upstream(hydrofabric_url, id)

# move these files into a subdir to keep things orderly
counter = 1
outpath = ids[i]
while os.path.exists(outpath):
    outpath = ids[i] + "_" + str(counter)
    counter += 1
os.mkdir(outpath)
for subdir in ['config', 'forcings', 'outputs']:
    os.mkdir(os.path.join(outpath, subdir))

for f in [f'{ids[i]}_upstream_subset.gpkg',
          'catchments.geojson',
          'crosswalk.json',
          'flowpath_edge_list.json',
          'flowpaths.geojson',
          'nexus.geojson',
          'cfe_noahowp_attributes.csv']:
    os.rename(f, os.path.join(outpath, 'config', f))
    
# output_files.append(f'{ids[i]}_upstream_subset.gpkg')
print(f'Output files located at: {outpath}')
print(f'Completed in {time.time() - st} seconds\n'+50*'-')    

outdir = Path(outpath)

--------------------------------------------------
Processing VPU 16, wb-2853639 
s3://lynker-spatial/hydrofabric/v20.1/gpkg/nextgen_16.gpkg
Building Graph Network




FileNotFoundError: s3://lynker-spatial/hydrofabric/v20.1/gpkg/nextgen_16_cfe_noahowp.parquet

## 4. Add the Subset Hydrofabric to the map


Execute the following code cell to add the subset catchments and rivers from the geopackage file  as visual overlays on the map. In this section, we begin by reading the shapefiles stored within the geopackage file. We then proceed to transform the projection system of these shapes into Web Mercator, the desired coordinate reference system for our map. Next, we create a WKTLayer for each catchment and river in the subset shapefiles. A WKTLayer enables us to represent the shape's geometry using a WKT string, which provides a concise description of its spatial properties.

In [None]:
# read the shapes
catchments = geopandas.read_file(f'{outdir}/config/{selected_df.id[0]}_upstream_subset.gpkg',
                          layer='divides')
rivers = geopandas.read_file(f'{outdir}/config/{selected_df.id[0]}_upstream_subset.gpkg',
                          layer='flowpaths')

# define the target projection as EPSG:4269 - Web Mercator
# this is the default crs for leaflet
target_crs = pyproj.Proj('4269')

# transform the shapefile into EPSG:4269
catchments = catchments.to_crs(target_crs.crs)
rivers = rivers.to_crs(target_crs.crs)

# add the catchments to the map
for idx, shape in catchments.iterrows():
    wkt = ipyleaflet.WKTLayer(wkt_string=shape.geometry.wkt)
    wkt.style = {'color': 'green'}
    m.add_layer(wkt)
    
# add the rivers to the map
for idx, shape in rivers.iterrows():
    wkt = ipyleaflet.WKTLayer(wkt_string=shape.geometry.wkt)
    wkt.style = {'color': 'blue'}
    m.add_layer(wkt)