## Datasets Used

## 1. HDMA River Network

**Description**  
The river network used in this study is derived from the **Hydrologic Derivatives for Modeling and Applications (HDMA)** database.

**Citation**  
Verdin, K. L. (2017).  
*Hydrologic Derivatives for Modeling and Applications (HDMA) database.*  
U.S. Geological Survey data release.  

**Dataset Access**    
https://doi.org/10.5066/F7S180ZP

---

### 2. HydroLAKES (Version 1)

**Description**  
A global vector database of lakes and reservoirs, providing detailed information on lake shorelines, surface area, volume, depth estimates, and hydrological connectivity. HydroLAKES is widely used in global hydrology and water resources studies.

**Citation**  
Messager, M. L., Lehner, B., Grill, G., Nedeva, I., & Schmitt, O. (2016).  
*Estimating the volume and age of water stored in global lakes using a geostatistical approach.*  
**Nature Communications**, 7, 13603.  
https://doi.org/10.1038/ncomms13603

**Dataset Access**  
- HydroLAKES product page:  
  https://www.hydrosheds.org/products/hydrolakes

## Assigning parameters and folders

In [1]:
# outputfolder for where the files will be sitting
OutFolder = '/Users/shg096/Desktop/LakeRiverOut/HDMA/'

# location of MERIT-Basin bug fixed files
regions = {
    "NorthAmerica": {
        "files": {
            "riv": "/Users/shg096/Downloads/na_streams/na_streams.shp",
            "cat": "/Users/shg096/Downloads/na_catch/na_catch.shp",
        }
    },
}

# location of HydroLAKES
lake_file = '/Volumes/F:/hydrography/hydrolakes/HydroLAKES_polys_v10_shp/HydroLAKES_polys_v10_shp/HydroLAKES_polys_v10.shp'


In [2]:
# load the needed packages
import os
import shutil
import geopandas as gpd
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from   riverlakenetwork import Utility, BurnLakes
import warnings; warnings.filterwarnings("ignore")

ERROR:tornado.general:Uncaught exception in ZMQStream callback
Traceback (most recent call last):
  File "/Users/shg096/Desktop/RiverLakeNetwork/env/RiverLakeEnv/lib/python3.9/site-packages/traitlets/traitlets.py", line 632, in get
    value = obj._trait_values[self.name]
KeyError: '_control_lock'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/shg096/Desktop/RiverLakeNetwork/env/RiverLakeEnv/lib/python3.9/site-packages/zmq/eventloop/zmqstream.py", line 565, in _log_error
    f.result()
  File "/Users/shg096/Desktop/RiverLakeNetwork/env/RiverLakeEnv/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 301, in dispatch_control
    async with self._control_lock:
  File "/Users/shg096/Desktop/RiverLakeNetwork/env/RiverLakeEnv/lib/python3.9/site-packages/traitlets/traitlets.py", line 687, in __get__
    return t.cast(G, self.get(obj, cls))  # the G should encode the Optional
  File "/Users/shg096/Desktop/RiverLak

In [3]:
#load hydrolakeDataset
lake = gpd.read_file(lake_file) # read the hydrolake dataset
# merge lake Michigan and Huron as they are hydraulically connected
# remove lake id 847 for North America, as it causes a loop; close to Ocean
lake = Utility.FixHydroLAKESv1(lake, merge_lakes={"Michigan+Huron": [6, 8]}, lake_to_remove=[847])

In [4]:
# loop over regions and their files
for region, files in regions.items():
        
    # read the pfaf merit folder
    riv, cat = Utility.hdma_read_file (riv_file=files["files"]["riv"],
                                       cat_file=files["files"]["cat"])
    
    # create folder to save
    region_base = f"{region}"
    # create the folder if not existed
    org_folder = os.path.join(OutFolder, f"{region_base}_org")
    if os.path.isdir(org_folder):
        try:
            shutil.rmtree(org_folder)
        except OSError as e:
            raise RuntimeError(f"Failed to remove {org_folder}: {e}")
    os.makedirs(org_folder, exist_ok=True)
    
    # save riv, and cat
    riv.to_file(os.path.join(org_folder, "riv.gpkg"))
    cat.to_file(os.path.join(org_folder, "cat.gpkg"))
    
    # create the config and pass it to the Burn lake
    config = {
        "riv": riv,
        "riv_dict": {
            "COMID": {"col":"COMID"},
            "NextDownCOMID": {"col":"NextDownCOMID"},
            "length": {"col":"length"},
            "uparea": {"col":"uparea","unit":"km2"}
        },
        "cat": cat,
        "cat_dict": {
            "COMID": {"col":"COMID"},
            "unitarea": {"col":"unitarea","unit":"km2"},
        },
        "lake": lake,
        "lake_dict": {
            "LakeCOMID": {"col":"Hylak_id"},
            "unitarea": {"col":"Lake_area","unit":"km2"}
        },
    }

    # burn lakes into river network
    bl = BurnLakes(config)

    # create folder to save
    region_base = f"{region}"
    # create the folder if not existed
    corrected_folder = os.path.join(OutFolder, f"{region_base}_corrected")
    if os.path.isdir(corrected_folder):
        try:
            shutil.rmtree(corrected_folder)
        except OSError as e:
            raise RuntimeError(f"Failed to remove {corrected_folder}: {e}")
    os.makedirs(corrected_folder, exist_ok=True)

    # save riv, cat, and lake
    bl.riv.to_file(os.path.join(corrected_folder, "riv.gpkg"))
    bl.cat.to_file(os.path.join(corrected_folder, "cat.gpkg"))
    bl.lake.to_file(os.path.join(corrected_folder, "lake.gpkg"))

=== Input loader started at : 2026-01-03 19:31:54  ===
riv: Loaded
riv_dict: {'COMID': {'col': 'COMID'}, 'NextDownCOMID': {'col': 'NextDownCOMID'}, 'length': {'col': 'length'}, 'uparea': {'col': 'uparea', 'unit': 'km2'}}
cat: Loaded
cat_dict: {'COMID': {'col': 'COMID'}, 'unitarea': {'col': 'unitarea', 'unit': 'km2'}}
lake: Loaded
lake_dict: {'LakeCOMID': {'col': 'Hylak_id'}, 'unitarea': {'col': 'Lake_area', 'unit': 'km2'}}
=== Input loader finished at: 2026-01-03 19:31:55  ===
=== Input checker started at : 2026-01-03 19:31:55  ===
Subbasin and lake area units are consistent: km2
riv CRS: EPSG:4326
cat CRS: EPSG:4326
lake CRS: EPSG:4326
✅ No loop detected in network topology
=== Input checker finished at: 2026-01-03 19:31:58  ===
=== Resolving lakes started at : 2026-01-03 19:31:58  ===
==== Number of lakes after subsetting: 990182 ====
==== Number of lakes after removing intersection with only one lake: 55567 ====
==== Number of lakes after removing lakes that do not touch starting or