<h1 style="
    font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
    font-size: 36px;
    color: #2c3e50;
    background-color: #ecf0f1;
    padding: 20px;
    border-radius: 12px;
    text-align: center;
    box-shadow: 0px 4px 10px rgba(0, 0, 0, 0.1);">
    NextGen Data Preparation
</h1>

**Authors:** 

<ul style="line-height:1.5;">
<li>Ayman Nassar <a href="mailto:ayman.nassar@usu.edu">(ayman.nassar@usu.edu)</a></li>
<li>David Tarboton <a href="mailto:david.tarboton@usu.edu">(david.tarboton@usu.edu)</a></li>
<li>Furqan Baig <a href="fbaig@illinois.edu">(fbaig@illinois.edu)</a></li>
</ul>

**Last Updated:** 4/22/2025

**Purpose:**

This notebook simplifies the preparation of input data for running NextGen water model. It is designed to assist researchers in efficiently subsetting and retrieving the required data for the NextGen model. For more information on NextGen data preparation, refer to this [**link**](https://pypi.org/project/ngiab-data-preprocess/).

**Audience:**

Researchers who are familiar with Jupyter Notebooks, basic Python, and basic hydrologic data analysis.

**Description:**

This notebook accepts inputs such as a gage ID, catchment ID, or Vector Processing Unit (VPU) to define a specific location, along with start and end dates for the desired time period. It utilizes geometry and model attributes from the [**v2.2 hydrofabric**]((https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/conus/conus_nextgen.gpkg)). It uses AORC forcing data at a 1-km grid resolution. The forcings are computed as a weighted mean of the gridded AORC data, with weights derived using the `Exact Extract` tool and calculated using `NumPy`.

**Data Description:**

This Jupyter notebook automatically prepares a complete run package for NextGen simulations by extracting geometry and model attributes from [**Hydrofabric version 2.2**](https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/conus/conus_nextgen.gpkg), with detailed documentation available [**here**](https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/hfv2.2-data_model.html). It utilizes AORC meteorological forcing data at a 1-km grid resolution and generates all configuration files required for simulation, including the NextGen model configuration (`realization.json`), routing configuration (`troute.yaml`), and model parameter files for each catchment.

**Software Requirements:**

This notebook requires the `hydrofabric_visualization_utils.py` module to visualize the hydrofabric dataset.

### 1. Prepare the Python Environment

Import the **hydrofabric_visualization_utils** module, which provides tools to visualize hydrofabric datasets, particularly `divides`, `flowpaths` and `nexus`.

In [None]:
#from hydrofabric_visualization_utils import display_hydrofabric_map

### 2. Set Inputs

This section allows users to define key variables required for NextGen data preprocessing, including the hydrofabric ID (to specify the spatial domain of interest) and the start and end dates.

In [1]:
# Define your Hydrofabric ID
hydrofabric_id = "gage-10109001"  # This can be a catchment ID (e.g., 'cat-7080'), a gage ID (e.g., 'gage-10154200'), 
                             # or a Vector Processing Unit (VPU) code (e.g., '01')

# Start date
start_date = "2021-10-01"    # Specify the start date in the format 'YYYY-MM-DD'

# End date
end_date = "2022-09-30"      # Specify the end date in the format 'YYYY-MM-DD'

In [2]:
# Download hydrofabric if not available in default location
import os, sys
venv_site_packages = '/ngen/.venv/lib/python3.11/site-packages'
sys.path.insert(0, venv_site_packages)

#import ngiab_data_cli #ngiab_data_preprocess
if not os.path.isdir('~/.ngiab/'):
    from data_sources.source_validation import download_and_update_hf
    download_and_update_hf()
    from data_sources.source_validation import file_paths
    file_paths.set_working_dir('~/ngiab_preprocess_output/')
sys.path.remove(venv_site_packages)

Output()

### 3. Subset Hydrofabric using Catchment ID, Gage ID, or VPU

This section delineates the entire upstream area of your point of interest (e.g., catchment, gage, flowpath, etc.) and outputs the result as a GeoPackage. More information on the different hydrofabric IDs that can be used to define the spatial domain of interest is available in this [**link**](https://pypi.org/project/ngiab-data-preprocess/). It uses geometry and model attributes from the [**v2.2 hydrofabric**](https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/conus/conus_nextgen.gpkg), more information on all data sources [**here**](https://lynker-spatial.s3-us-west-2.amazonaws.com/hydrofabric/v2.2/hfv2.2-data_model.html).

In [3]:
!source /ngen/.venv/bin/activate && python -m ngiab_data_cli -i $hydrofabric_id -s

[2K[32mâ ™[0m Initializing.....
[1A[2K2025-07-14 20:13:56,250 - INFO - Getting catid for 10109001, in /home/jovyan/.ngiab/hydrofabric/v2.2/conus_nextgen.gpkg
[32m2025-07-14 20:13:56,254 - INFO - Found cat-2861391 from gage-10109001[0m
[32m2025-07-14 20:13:56,255 - INFO - Processing cat-2861391 in /home/jovyan/ngiab_preprocess_output/gage-10109001[0m
2025-07-14 20:13:56,255 - INFO - Building network graph
[32m2025-07-14 20:14:05,308 - INFO - Upstream catchments: 88[0m
[32m2025-07-14 20:14:05,308 - INFO - Subsetting hydrofabric[0m
2025-07-14 20:14:05,630 - INFO - Subsetting tables: ['divides', 'divide-attributes', 'flowpath-attributes', 'flowpath-attributes-ml', 'flowpaths', 'hydrolocations', 'nexus', 'pois', 'lakes', 'network']
2025-07-14 20:14:06,177 - INFO - Subset complete for 213 features (catchments + nexuses)
[32m2025-07-14 20:14:06,177 - INFO - Subsetting complete.[0m
[32m2025-07-14 20:14:06,177 - INFO - All operations completed successfully.[0m
[32m2025-07-14 2

Visualize the hydrofabric subset on an interactive map, showcasing key features such as divides, flowpaths, and nexus points.

In [None]:
# Provide the path to your GeoPackage
#gpkg_path = f"/home/jovyan/ngiab_preprocess_output/{hydrofabric_id}/config/{hydrofabric_id}_subset.gpkg"

# Display the interactive map
#display_hydrofabric_map(gpkg_path)

### 4. Generate Forcings for a Specific Catchment ID
Download and process meteorological forcing data tailored to the selected catchment. This process uses pre-defined start and end dates for the time period of interest.


In [None]:
!source /ngen/.venv/bin/activate && python -m ngiab_data_cli -i "$hydrofabric_id" -f --start "$start_date" --end "$end_date"

### 5. Create Model Configuration/Realization
Generates all necessary configuration files to run NextGen, including realization.json for model setup, troute.yaml for routing, and individual per-catchment model configurations.

In [None]:
!source /ngen/.venv/bin/activate && python -m ngiab_data_cli -i "$hydrofabric_id" -r --start "$start_date" --end "$end_date"

In [None]:
from pyngiab import PyNGIAB
data_dir = '/home/jovyan/ngiab_preprocess_output/gage-10109001'
# serial execution of the model
test_ngiab_serial = PyNGIAB(data_dir, serial_execution_mode=True)
test_ngiab_serial.run()

In [None]:
data_dir = '/home/jovyan/ngiab_preprocess_output/gage-10109001'
# serial execution of the model
test_ngiab_serial = PyNGIAB(data_dir)
test_ngiab_serial.run()