<h1 style="
    font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
    font-size: 36px;
    color: #2c3e50;
    background-color: #ecf0f1;
    padding: 20px;
    border-radius: 12px;
    text-align: center;
    box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.1);">
    NextGen Data Preparation
</h1>

**Authors:** 

<ul style="line-height:1.5;">
<li>Ayman Nassar <a href="mailto:ayman.nassar@usu.edu">(ayman.nassar@usu.edu)</a></li>
<li>David Tarboton <a href="mailto:david.tarboton@usu.edu">(david.tarboton@usu.edu)</a></li>
<li>Furqan Baig <a href="fbaig@illinois.edu">(fbaig@illinois.edu)</a></li>
</ul>

**Last Updated:** 4/22/2025

**Purpose:**

This notebook simplifies the preparation of input data for running NextGen water model. It is designed to assist researchers in efficiently subsetting and retrieving the required data for the NextGen model. For more information on NextGen data preparation, refer to this [**link**](https://pypi.org/project/ngiab-data-preprocess/).

**Audience:**

Researchers who are familiar with Jupyter Notebooks, basic Python, and basic hydrologic data analysis.

**Description:**

This notebook accepts inputs such as a gage ID, catchment ID, or Vector Processing Unit (VPU) to define a specific location, along with start and end dates for the desired time period. It utilizes geometry and model attributes from the [**v2.2 hydrofabric**]((https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/conus/conus_nextgen.gpkg)). It uses AORC forcing data at a 1-km grid resolution. The forcings are computed as a weighted mean of the gridded AORC data, with weights derived using the `Exact Extract` tool and calculated using `NumPy`.

**Data Description:**

This Jupyter notebook automatically prepares a complete run package for NextGen simulations by extracting geometry and model attributes from [**Hydrofabric version 2.2**](https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/conus/conus_nextgen.gpkg), with detailed documentation available [**here**](https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/hfv2.2-data_model.html). It utilizes AORC meteorological forcing data at a 1-km grid resolution and generates all configuration files required for simulation, including the NextGen model configuration (`realization.json`), routing configuration (`troute.yaml`), and model parameter files for each catchment.

**Software Requirements:**

This notebook requires the `hydrofabric_visualization_utils.py` module to visualize the hydrofabric dataset.

### 1. Prepare the Python Environment

Import the **hydrofabric_visualization_utils** module, which provides tools to visualize hydrofabric datasets, particularly `divides`, `flowpaths` and `nexus`.

In [None]:
#from hydrofabric_visualization_utils import display_hydrofabric_map

### 2. Set Inputs

This section allows users to define key variables required for NextGen data preprocessing, including the hydrofabric ID (to specify the spatial domain of interest) and the start and end dates.

In [1]:
# Define your Hydrofabric ID
hydrofabric_id = "gage-10109001"  # This can be a catchment ID (e.g., 'cat-7080'), a gage ID (e.g., 'gage-10154200'), 
                             # or a Vector Processing Unit (VPU) code (e.g., '01')

# Start date
start_date = "2021-10-01"    # Specify the start date in the format 'YYYY-MM-DD'

# End date
end_date = "2022-09-30"      # Specify the end date in the format 'YYYY-MM-DD'

In [2]:
# Download hydrofabric if not available in default location
import os, sys
venv_site_packages = '/ngen/.venv/lib/python3.11/site-packages'
sys.path.insert(0, venv_site_packages)
if not os.path.isdir(os.path.expanduser('~/.ngiab')):
    from data_sources.source_validation import download_and_update_hf
    download_and_update_hf()
    from data_sources.source_validation import file_paths
    file_paths.set_working_dir('~/ngiab_preprocess_output/')
sys.path.remove(venv_site_packages)

Output()

### 3. Subset Hydrofabric using Catchment ID, Gage ID, or VPU

This section delineates the entire upstream area of your point of interest (e.g., catchment, gage, flowpath, etc.) and outputs the result as a GeoPackage. More information on the different hydrofabric IDs that can be used to define the spatial domain of interest is available in this [**link**](https://pypi.org/project/ngiab-data-preprocess/). It uses geometry and model attributes from the [**v2.2 hydrofabric**](https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/conus/conus_nextgen.gpkg), more information on all data sources [**here**](https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/hfv2.2-data_model.html).

In [3]:
!source /ngen/.venv/bin/activate && python -m ngiab_data_cli -i $hydrofabric_id -s

[2K[32m⠧[0m Initializing.....
[1A[2K2025-07-17 23:10:59,571 - INFO - Getting catid for 10109001, in /home/jovyan/.ngiab/hydrofabric/v2.2/conus_nextgen.gpkg
[32m2025-07-17 23:10:59,576 - INFO - Found cat-2861391 from gage-10109001[0m
[32m2025-07-17 23:10:59,576 - INFO - Processing cat-2861391 in /home/jovyan/ngiab_preprocess_output/gage-10109001[0m
2025-07-17 23:10:59,576 - INFO - Building network graph
[32m2025-07-17 23:11:09,423 - INFO - Upstream catchments: 88[0m
[32m2025-07-17 23:11:09,423 - INFO - Subsetting hydrofabric[0m
2025-07-17 23:11:09,751 - INFO - Subsetting tables: ['divides', 'divide-attributes', 'flowpath-attributes', 'flowpath-attributes-ml', 'flowpaths', 'hydrolocations', 'nexus', 'pois', 'lakes', 'network']
2025-07-17 23:11:11,136 - INFO - Subset complete for 213 features (catchments + nexuses)
[32m2025-07-17 23:11:11,136 - INFO - Subsetting complete.[0m
[32m2025-07-17 23:11:11,136 - INFO - All operations completed successfully.[0m
[32m2025-07-17 23:

Visualize the hydrofabric subset on an interactive map, showcasing key features such as divides, flowpaths, and nexus points.

In [4]:
# Provide the path to your GeoPackage
#gpkg_path = f"/home/jovyan/ngiab_preprocess_output/{hydrofabric_id}/config/{hydrofabric_id}_subset.gpkg"

# Display the interactive map
#display_hydrofabric_map(gpkg_path)

### 4. Generate Forcings for a Specific Catchment ID
Download and process meteorological forcing data tailored to the selected catchment. This process uses pre-defined start and end dates for the time period of interest.


In [6]:
!source /ngen/.venv/bin/activate && python -m ngiab_data_cli -i "$hydrofabric_id" -f --start "$start_date" --end "$end_date"

[2K[32m⠹[0m Initializing.....
[1A[2K2025-07-17 23:17:00,327 - INFO - Getting catid for 10109001, in /home/jovyan/.ngiab/hydrofabric/v2.2/conus_nextgen.gpkg
[32m2025-07-17 23:17:00,331 - INFO - Found cat-2861391 from gage-10109001[0m
[32m2025-07-17 23:17:00,331 - INFO - Processing cat-2861391 in /home/jovyan/ngiab_preprocess_output/gage-10109001[0m
[32m2025-07-17 23:17:01,087 - INFO - Upstream catchments: 88[0m
[32m2025-07-17 23:17:01,088 - INFO - Generating forcings from 2021-10-01 00:00:00 to 2022-09-30 00:00:00...[0m
2025-07-17 23:17:26,275 - INFO - No cache found
2025-07-17 23:17:26,297 - INFO - Selected time range and clipped to bounds
2025-07-17 23:17:26,297 - INFO - Downloading and caching forcing data, this may take a while
2025-07-17 23:24:37,399 - INFO - Computing zonal stats in parallel for all timesteps
[2KProcessing DLWRF_surface [90m━━━[0m [35m 12%[0m 1/8 •  Elapsed Time: [33m0:…[0m  Remaining Time: [36m-:-…[0m36m-:-…[0m
[2K[1A[2KProcessing DLWRF_

### 5. Create Model Configuration/Realization
Generates all necessary configuration files to run NextGen, including realization.json for model setup, troute.yaml for routing, and individual per-catchment model configurations.

In [7]:
!source /ngen/.venv/bin/activate && python -m ngiab_data_cli -i "$hydrofabric_id" -r --start "$start_date" --end "$end_date"

[2K[32m⠹[0m Initializing.....
[1A[2K2025-07-17 23:24:48,955 - INFO - Getting catid for 10109001, in /home/jovyan/.ngiab/hydrofabric/v2.2/conus_nextgen.gpkg
[32m2025-07-17 23:24:48,959 - INFO - Found cat-2861391 from gage-10109001[0m
[32m2025-07-17 23:24:48,959 - INFO - Processing cat-2861391 in /home/jovyan/ngiab_preprocess_output/gage-10109001[0m
[32m2025-07-17 23:24:49,729 - INFO - Upstream catchments: 88[0m
[32m2025-07-17 23:24:49,729 - INFO - Creating realization from 2021-10-01 00:00:00 to 2022-09-30 00:00:00...[0m
[32m2025-07-17 23:25:00,776 - INFO - Realization creation complete.[0m
[32m2025-07-17 23:25:00,776 - INFO - All operations completed successfully.[0m
[32m2025-07-17 23:25:00,776 - INFO - Output folder: file:////home/jovyan/ngiab_preprocess_output/gage-10109001[0m


In [8]:
from pyngiab import PyNGIAB
data_dir = '/home/jovyan/ngiab_preprocess_output/gage-10109001' #'/shared/examples/ngiab_preprocess_output/gage-10109001'
# serial execution of the model
test_ngiab_serial = PyNGIAB(data_dir, serial_execution_mode=True)
test_ngiab_serial.run()

WARN: pydantic version: Required(v1), Found(2.7.4).
WARN: numpy version: Required(1.26.4), Found(1.26.4).
Required dependencies not found in system path, looking into /ngen/.venv/
Valid dependencies found in venv: /ngen/.venv/
*****************
forcings exists. 2 forcings files found.
config exists. 3 config files found.
outputs exists. 0 outputs files found.
Run NextGen Model Framework in Serial Model ...
Running command: /dmod/bin/ngen-serial config/gage-10109001_subset.gpkg all config/gage-10109001_subset.gpkg all config/realization.json
NGen Framework 0.3.0
Building Nexus collection
Reading 38 features from layer nexus using ID column `id`
Building Catchment collection
Reading 88 features from layer divides using ID column `divide_id`
Initializing formulations
[   {
          name :    bmi_c++ ,
          params :         {
               allow_exceed_end_time :            true ,
               fixed_time_step :           false ,
               init_config :       /dev/null ,
     

Definition of "au" in "/usr/share/xml/udunits/udunits2-accepted.xml", line 123, overrides prefixed-unit "1.6605402e-45 kilogram"
Definition of "kt" in "/usr/share/xml/udunits/udunits2-common.xml", line 105, overrides prefixed-unit "1000000 kilogram"
Definition of "microns" in "/usr/share/xml/udunits/udunits2-common.xml", line 411, overrides prefixed-unit "1e-15 second"
Definition of "ft" in "/usr/share/xml/udunits/udunits2-common.xml", line 522, overrides prefixed-unit "1e-12 kilogram"
Definition of "yd" in "/usr/share/xml/udunits/udunits2-common.xml", line 531, overrides prefixed-unit "8.64e-20 second"
Definition of "pt" in "/usr/share/xml/udunits/udunits2-common.xml", line 785, overrides prefixed-unit "1e-09 kilogram"
Definition of "at" in "/usr/share/xml/udunits/udunits2-common.xml", line 1250, overrides prefixed-unit "1e-15 kilogram"
Definition of "ph" in "/usr/share/xml/udunits/udunits2-common.xml", line 1880, overrides prefixed-unit "3.6e-09 second"
Definition of "nt" in "/usr/sh

Updating layer: surface layer
Running timestep 100
Updating layer: surface layer
Running timestep 200
Updating layer: surface layer
Running timestep 300
Updating layer: surface layer
Running timestep 400
Updating layer: surface layer
Running timestep 500
Updating layer: surface layer
Running timestep 600
Updating layer: surface layer
Running timestep 700
Updating layer: surface layer
Running timestep 800
Updating layer: surface layer
Running timestep 900
Updating layer: surface layer
Running timestep 1000
Updating layer: surface layer
Running timestep 1100
Updating layer: surface layer
Running timestep 1200
Updating layer: surface layer
Running timestep 1300
Updating layer: surface layer
Running timestep 1400
Updating layer: surface layer
Running timestep 1500
Updating layer: surface layer
Running timestep 1600
Updating layer: surface layer
Running timestep 1700
Updating layer: surface layer
Running timestep 1800
Updating layer: surface layer
Running timestep 1900
Updating layer: surfa

2025-07-17 23:27:58,267 - root - INFO - [AbstractNetwork.py:525 - create_independent_networks]: organizing connections into reaches ...
2025-07-17 23:27:58,268 - root - INFO - [AbstractNetwork.py:682 - initial_warmstate_preprocess]: setting channel initial states ...
2025-07-17 23:27:58,301 - root - INFO - [AbstractNetwork.py:128 - assemble_forcings]: Creating a DataFrame of lateral inflow forcings ...


supernetwork connections set complete
... in 2.213597536087036 seconds.


2025-07-17 23:28:01,043 - root - INFO - [DataAssimilation.py:77 - __init__]: NudgingDA class is Started.
2025-07-17 23:28:01,044 - root - INFO - [DataAssimilation.py:286 - __init__]: PersistenceDA class is started.
2025-07-17 23:28:01,044 - root - INFO - [DataAssimilation.py:840 - __init__]: RFCDA class is started.
2025-07-17 23:28:01,045 - root - INFO - [DataAssimilation.py:719 - __init__]: great_lake class is started.
2025-07-17 23:28:01,045 - root - INFO - [__main__.py:1183 - nwm_route]: executing routing computation ...
2025-07-17 23:28:01,048 - root - INFO - [compute.py:659 - compute_nhd_routing_v02]: JIT Preprocessing time 0.0009777545928955078 seconds.
2025-07-17 23:28:01,048 - root - INFO - [compute.py:660 - compute_nhd_routing_v02]: starting Parallel JIT calculation
2025-07-17 23:28:05,337 - root - INFO - [compute.py:908 - compute_nhd_routing_v02]: PARALLEL TIME 4.289074182510376 seconds.
2025-07-17 23:28:05,342 - root - INFO - [output.py:180 - nwm_output_generator]: Handling 

Finished routing
NGen top-level timings:
	NGen::init: 2.58224
	NGen::simulation: 169.276
	NGen::routing: 9.82983
NGIAB executed successfully ...


True

In [9]:
data_dir = '/home/jovyan/ngiab_preprocess_output/gage-10109001'
# serial execution of the model
test_ngiab_serial = PyNGIAB(data_dir)
test_ngiab_serial.run()

WARN: pydantic version: Required(v1), Found(2.7.4).
WARN: numpy version: Required(1.26.4), Found(1.26.4).
Required dependencies not found in system path, looking into /ngen/.venv/
Valid dependencies found in venv: /ngen/.venv/
*****************
forcings exists. 2 forcings files found.
config exists. 3 config files found.
outputs exists. 0 outputs files found.
Run NextGen Model Framework in Parallel Model ...
No partitions file found, generating ...
Reading 88 features from layer divides using ID column `divide_id`
Partitioning 88 catchments into 12 partitions.
Reading 38 features from layer nexus using ID column `id`
Validating catchments...

Number of catchments is: 88
Catchment validation completed
Found 5 remotes in partition 0
Found 7 remotes in partition 1
Found 5 remotes in partition 2
Found 2 remotes in partition 3
Found 5 remotes in partition 4
Found 4 remotes in partition 5
Found 6 remotes in partition 6
Found 1 remotes in partition 7
Found 4 remotes in partition 8
Found 2 rem

Definition of "au" in "/usr/share/xml/udunits/udunits2-accepted.xml", line 123, overrides prefixed-unit "1.6605402e-45 kilogram"
Definition of "kt" in "/usr/share/xml/udunits/udunits2-common.xml", line 105, overrides prefixed-unit "1000000 kilogram"
Definition of "microns" in "/usr/share/xml/udunits/udunits2-common.xml", line 411, overrides prefixed-unit "1e-15 second"
Definition of "au" in "/usr/share/xml/udunits/udunits2-accepted.xml", line 123, overrides prefixed-unit "1.6605402e-45 kilogram"
Definition of "ft" in "/usr/share/xml/udunits/udunits2-common.xml", line 522, overrides prefixed-unit "1e-12 kilogram"
Definition of "yd" in "/usr/share/xml/udunits/udunits2-common.xml", line 531, overrides prefixed-unit "8.64e-20 second"
Definition of "pt" in "/usr/share/xml/udunits/udunits2-common.xml", line 785, overrides prefixed-unit "1e-09 kilogram"
Definition of "kt" in "/usr/share/xml/udunits/udunits2-common.xml", line 105, overrides prefixed-unit "1000000 kilogram"
Definition of "micro

Catchment topology is dendritic.
Running Models
Updating layer: surface layer
Running timestep 0
Building Feature Index
Catchment topology is dendritic.
Running Models
Updating layer: surface layer
Running timestep 0
Updating layer: surface layer
Running timestep 100
Updating layer: surface layer
Running timestep 100
Updating layer: surface layer
Running timestep 100
Updating layer: surface layer
Running timestep 100
Updating layer: surface layer
Running timestep 200
Updating layer: surface layer
Running timestep 200
Updating layer: surface layer
Running timestep 300
Updating layer: surface layer
Running timestep 300
Updating layer: surface layer
Running timestep 200
Updating layer: surface layer
Running timestep 200
Updating layer: surface layer
Running timestep 400
Updating layer: surface layer
Running timestep 400
Building Feature Index
Catchment topology is dendritic.
Running Models
Updating layer: surface layer
Running timestep 0
Updating layer: surface layer
Running timestep 500


Definition of "au" in "/usr/share/xml/udunits/udunits2-accepted.xml", line 123, overrides prefixed-unit "1.6605402e-45 kilogram"
Definition of "kt" in "/usr/share/xml/udunits/udunits2-common.xml", line 105, overrides prefixed-unit "1000000 kilogram"
Definition of "microns" in "/usr/share/xml/udunits/udunits2-common.xml", line 411, overrides prefixed-unit "1e-15 second"
Definition of "ft" in "/usr/share/xml/udunits/udunits2-common.xml", line 522, overrides prefixed-unit "1e-12 kilogram"
Definition of "yd" in "/usr/share/xml/udunits/udunits2-common.xml", line 531, overrides prefixed-unit "8.64e-20 second"
Definition of "pt" in "/usr/share/xml/udunits/udunits2-common.xml", line 785, overrides prefixed-unit "1e-09 kilogram"
Definition of "at" in "/usr/share/xml/udunits/udunits2-common.xml", line 1250, overrides prefixed-unit "1e-15 kilogram"
Definition of "ph" in "/usr/share/xml/udunits/udunits2-common.xml", line 1880, overrides prefixed-unit "3.6e-09 second"
Definition of "nt" in "/usr/sh

Updating layer: surface layer
Running timestep 300
Updating layer: surface layer
Running timestep 600
Updating layer: surface layer
Running timestep 600
Updating layer: surface layer
Running timestep 400
Updating layer: surface layer
Running timestep 700
Updating layer: surface layer
Running timestep 700
Updating layer: surface layer
Running timestep 400
Updating layer: surface layer
Running timestep 800
Updating layer: surface layer
Running timestep 800
Updating layer: surface layer
Running timestep 500
Updating layer: surface layer
Running timestep 900
Updating layer: surface layer
Running timestep 900
Updating layer: surface layer
Running timestep 500
Updating layer: surface layer
Running timestep 600
Updating layer: surface layer
Running timestep 1000
Updating layer: surface layer
Running timestep 1000
Updating layer: surface layer
Running timestep 1100
Updating layer: surface layer
Running timestep 1100
Updating layer: surface layer
Running timestep 700
Updating layer: surface lay

2025-07-18 00:01:19,771 - root - INFO - [AbstractNetwork.py:525 - create_independent_networks]: organizing connections into reaches ...
2025-07-18 00:01:19,772 - root - INFO - [AbstractNetwork.py:682 - initial_warmstate_preprocess]: setting channel initial states ...
2025-07-18 00:01:19,799 - root - INFO - [AbstractNetwork.py:128 - assemble_forcings]: Creating a DataFrame of lateral inflow forcings ...


supernetwork connections set complete
... in 2.223864793777466 seconds.


2025-07-18 00:01:22,462 - root - INFO - [DataAssimilation.py:77 - __init__]: NudgingDA class is Started.
2025-07-18 00:01:22,463 - root - INFO - [DataAssimilation.py:286 - __init__]: PersistenceDA class is started.
2025-07-18 00:01:22,464 - root - INFO - [DataAssimilation.py:840 - __init__]: RFCDA class is started.
2025-07-18 00:01:22,464 - root - INFO - [DataAssimilation.py:719 - __init__]: great_lake class is started.
2025-07-18 00:01:22,465 - root - INFO - [__main__.py:1183 - nwm_route]: executing routing computation ...
2025-07-18 00:01:22,467 - root - INFO - [compute.py:659 - compute_nhd_routing_v02]: JIT Preprocessing time 0.0008783340454101562 seconds.
2025-07-18 00:01:22,467 - root - INFO - [compute.py:660 - compute_nhd_routing_v02]: starting Parallel JIT calculation
2025-07-18 00:01:26,618 - root - INFO - [compute.py:908 - compute_nhd_routing_v02]: PARALLEL TIME 4.15074348449707 seconds.
2025-07-18 00:01:26,623 - root - INFO - [output.py:180 - nwm_output_generator]: Handling o

Finished routing
NGen top-level timings:
	NGen::init: 2.93814
	NGen::simulation: 146.274
	NGen::routing: 9.61265
NGIAB executed successfully ...


True

In [None]:
!pwd