## Datasets Used

### 1. MERIT-Hydro Derived River Network

**Description**  
The regional river network used in this study is independent of the MERIT‑Basins and was specifically derived from **MERIT Hydro** DEM for a region in Canada. This derived river network is 5 times denser than MERIT-Basins. Some preprocessing is applied, as explained here, to make both the derived river network and subbasins ready to be used with the riverlakenetwork package.

**Citation**  
Yamazaki, D., Ikeshima, D., Sosa, J., Bates, P. D., Allen, G. H., & Pavelsky, T. M. (2019).  
*MERIT Hydro: A high‑resolution global hydrography map based on the latest topography datasets.*  
**Water Resources Research**, 55, 5053–5073.  
https://doi.org/10.1029/2019WR024873

**Dataset Access**  
- MERIT Hydro global hydrography dataset:  
  https://hydro.iis.u-tokyo.ac.jp/~yamadai/MERIT_Hydro/

---

### 2. HydroLAKES (Version 1)

**Description**  
A global vector database of lakes and reservoirs, providing detailed information on lake shorelines, surface area, volume, depth estimates, and hydrological connectivity. HydroLAKES is widely used in global hydrology and water resources studies.

**Citation**  
Messager, M. L., Lehner, B., Grill, G., Nedeva, I., & Schmitt, O. (2016).  
*Estimating the volume and age of water stored in global lakes using a geostatistical approach.*  
**Nature Communications**, 7, 13603.  
https://doi.org/10.1038/ncomms13603

**Dataset Access**  
- HydroLAKES product page:  
  https://www.hydrosheds.org/products/hydrolakes

### Assigning parameters and folders

In [1]:
# outputfolder for where the files will be sitting
OutFolder = '/Users/shg096/Desktop/LakeRiverOut/MERITDerivedSK/'

# location of MERIT-Basin bug fixed files
# run the script under the preparation to get the rivers_final_SK and basins_final_SK
riv_file="/Users/shg096/Desktop/RiverLakeNetwork/examples/preparation/rivers_final_SK.shp"
cat_file="/Users/shg096/Desktop/RiverLakeNetwork/examples/preparation/basins_final_SK.shp"

# location of HydroLAKES
lake_file = '/Volumes/F:/hydrography/hydrolakes/HydroLAKES_polys_v10_shp/HydroLAKES_polys_v10_shp/HydroLAKES_polys_v10.shp'

In [2]:
# load the needed packages
import os
import shutil
import geopandas as gpd
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from   riverlakenetwork import Utility, BurnLakes
import warnings; warnings.filterwarnings("ignore")

In [3]:
#load hydrolakeDataset
lake = gpd.read_file(lake_file) # read the hydrolake dataset
# merge lake Michigan and Huron as they are hydraulically connected
lake = Utility.FixHydroLAKESv1(lake, merge_lakes={"Michigan+Huron": [6, 8]})

In [4]:
# read the file
riv = gpd.read_file(riv_file)
cat = gpd.read_file(cat_file)

# create the folder if not existed
org_folder = os.path.join(OutFolder, "org")
if os.path.isdir(org_folder):
    try:
        shutil.rmtree(org_folder)
    except OSError as e:
        raise RuntimeError(f"Failed to remove {org_folder}: {e}")
os.makedirs(org_folder, exist_ok=True)

# save riv and cat
riv.to_file(os.path.join(org_folder, "riv.gpkg"))
cat.to_file(os.path.join(org_folder, "cat.gpkg"))

# create the config and pass it to the Burn lake
config = {
    "riv": riv,
    "riv_dict": {
        "COMID": {"col":"link_id"},
        "NextDownCOMID": {"col":"ds_link_id"},
        "length": {"col":"length"},
        "uparea": {"col":"uparea","unit":"km2"}
    },
    "cat": cat,
    "cat_dict": {
        "COMID": {"col":"link_id"},
        "unitarea": {"col":"unitarea","unit":"km2"},
    },
    "lake": lake,
    "lake_dict": {
        "LakeCOMID": {"col":"Hylak_id"},
        "unitarea": {"col":"Lake_area","unit":"km2"}
    },
}

# burn lakes into river network
bl = BurnLakes(config)

# create the folder if not existed
corrected_folder = os.path.join(OutFolder, "corrected")
if os.path.isdir(corrected_folder):
    try:
        shutil.rmtree(corrected_folder)
    except OSError as e:
        raise RuntimeError(f"Failed to remove {corrected_folder}: {e}")
os.makedirs(corrected_folder, exist_ok=True)

# save riv, cat, and lake
bl.riv.to_file(os.path.join(corrected_folder, "riv.gpkg"))
bl.cat.to_file(os.path.join(corrected_folder, "cat.gpkg"))
bl.lake.to_file(os.path.join(corrected_folder, "lake.gpkg"))

=== Input loader started at : 2026-01-03 18:39:18  ===
riv: Loaded
riv_dict: {'COMID': {'col': 'link_id'}, 'NextDownCOMID': {'col': 'ds_link_id'}, 'length': {'col': 'length'}, 'uparea': {'col': 'uparea', 'unit': 'km2'}}
cat: Loaded
cat_dict: {'COMID': {'col': 'link_id'}, 'unitarea': {'col': 'unitarea', 'unit': 'km2'}}
lake: Loaded
lake_dict: {'LakeCOMID': {'col': 'Hylak_id'}, 'unitarea': {'col': 'Lake_area', 'unit': 'km2'}}
=== Input loader finished at: 2026-01-03 18:39:19  ===
=== Input checker started at : 2026-01-03 18:39:19  ===
Subbasin and lake area units are consistent: km2
riv CRS: EPSG:4326
cat CRS: EPSG:4326
lake CRS: EPSG:4326
✅ No loop detected in network topology
=== Input checker finished at: 2026-01-03 18:39:21  ===
=== Resolving lakes started at : 2026-01-03 18:39:21  ===
==== Number of lakes after subsetting: 4029 ====
==== Number of lakes after removing intersection with only one lake: 1012 ====
==== Number of lakes after removing lakes that do not touch starting or e