# SYMFLUENCE Tutorial 03b — Continental-Scale Modeling (North America)

## Introduction

This tutorial represents the ultimate scaling challenge in our SYMFLUENCE series by advancing from regional to continental-scale hydrological modeling. Building on Tutorial 03a's regional Iceland domain, we now demonstrate modeling across an entire continent.

Continental-scale modeling introduces unprecedented computational complexity and extraordinary scientific opportunities. Using North America as our example—encompassing 24.7 million km²—we demonstrate how SYMFLUENCE handles massive data volumes and sophisticated methodological challenges inherent in modeling hydrology across an entire continental landmass.

This scale differs fundamentally from previous tutorials. Modeling units expand from hundreds to tens of thousands of HRUs, data volumes transition from gigabytes to multi-terabyte datasets, and processing extends from hours to days across distributed computing clusters. Continental modeling requires high-performance computing infrastructure with substantial resource allocations.

**Important Note**: While this tutorial demonstrates continental-scale model setup and configuration principles, actual execution typically requires HPC resources beyond desktop environments. The tutorial prepares users to understand and configure continental applications for execution when appropriate computational resources become available.


## Step 1 — Continental configuration

We generate a continental-scale configuration starting from the template, specifying North American domain extent.

In [None]:
# Import libraries
from pathlib import Path
import yaml
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd
import numpy as np
import xarray as xr
import warnings

warnings.filterwarnings('ignore')

from symfluence import SYMFLUENCE
from symfluence.resources import get_config_template


In [None]:
from symfluence.resources import get_config_template
SYMFLUENCE_CODE_DIR = Path.cwd().resolve()
# Step 1 — Create continental configuration for North America
config_template = get_config_template()

# Load base configuration
with open(config_template, 'r') as f:
    config = yaml.safe_load(f)

# Configure for continental North America
config['SYMFLUENCE_CODE_DIR'] = str(SYMFLUENCE_CODE_DIR)

# Continental domain settings
config['DOMAIN_NAME'] = 'North_America'
config['DOMAIN_DEFINITION_METHOD'] = 'delineate'
config['BOUNDING_BOX_COORDS'] = '85/-178/5/-53'  # North America extent
config['STREAM_THRESHOLD'] = 15000  # High threshold for continental scale

# Experiment settings (short period for demonstration)
config['EXPERIMENT_ID'] = 'continental_tutorial'
config['DOMAIN_DEFINITION_METHOD'] = 'delineate'
config['DELINEATION_METHOD'] = 'stream_threshold'
config['DOMAIN_DISCRETIZATION'] = 'GRUs'


config['DELINEATE_COASTAL_WATERSHEDS'] = True
config['DELINEATE_BY_POURPOINT'] = False
config['CLEANUP_INTERMEDIATE_FILES'] = False


config['BOUNDING_BOX_COORDS'] = '66.5/-25.0/63.0/-13.0'  # Iceland bounding box
config['POUR_POINT_COORDS'] = '64.01/-16.01'  # random pour point in iceland 
config['STREAM_THRESHOLD'] = 2000  # Higher threshold for regional scale

# Experiment settings
config['EXPERIMENT_ID'] = 'regional_tutorial'
config['EXPERIMENT_TIME_START'] = '2018-01-01 01:00'
config['EXPERIMENT_TIME_END'] = '2018-01-31 23:00'

# Model settings
config['HYDROLOGICAL_MODEL'] = 'SUMMA'
config['ROUTING_MODEL'] = 'mizuRoute'
config['DOMAIN_DISCRETIZATION'] = 'GRUs'

# HPC settings (adjust based on available resources)
config['MPI_PROCESSES'] = 4  # Configure based on your HPC allocation

# Save configuration
config_path = Path("./config_north_america_tutorial.yaml")
with open(config_path, 'w') as f:
    yaml.dump(config, f, default_flow_style=False, sort_keys=False)

print(f"✅ Continental North America configuration saved: {config_path}")
print(f"\n⚠️  Note: Continental execution requires HPC resources")
print(f"   Estimated domain: ~24.7 million km²")
print(f"   Recommended: 16+ cores, 100+ GB RAM")

In [None]:
# Initialize SYMFLUENCE
symfluence = SYMFLUENCE(config_path)
project_dir = symfluence.managers['project'].setup_project()

# Create pour point file (technical requirement)
pour_point_path = symfluence.managers['project'].create_pour_point()

print(f"✅ Continental project structure created at: {project_dir}")

## Step 2 — Continental domain delineation

Continental delineation identifies thousands of independent drainage systems across the entire continent, from Arctic watersheds to tropical systems.

### Step 2a — Geospatial attributes

Acquire continental-scale elevation, land cover, and soil data.

**Note**: This step may require several hours depending on data availability and network conditions.

In [None]:
# Step 2a — Continental attribute acquisition
print("⚠️  Continental attribute acquisition may require several hours")
# symfluence.managers['geospatial'].acquire_geospatial_attributes()
print("✅ Continental geospatial attributes acquired")

### Step 2b — Continental discretization

Delineate the complete North American drainage network including major river basins and coastal systems.

**Note**: This computational step may require substantial time on HPC systems.

In [None]:
# Step 2b — Continental delineation
print("⚠️  Continental delineation is computationally intensive")
print("   Recommended: Submit as HPC batch job")
symfluence.managers['domain'].define_domain()
print("✅ Continental delineation complete")

In [None]:
# Step 2b — Elevation-based HRU discretization
hru_path = cf.managers['domain'].discretize_domain()
print("✅ Elevation-based HRU discretization complete")

### Step 2c — Verification

Verify the continental domain structure and summarize the drainage network complexity.

In [None]:
# Step 2c — Continental domain verification
catchment_path = project_dir / 'shapefiles' / 'catchment'
network_path = project_dir / 'shapefiles' / 'river_network'

if catchment_path.exists() and network_path.exists():
    # Load shapefiles (may be large)
    print("Loading continental shapefiles...")
    basins_gdf = gpd.read_file(list(catchment_path.glob('*.shp'))[0])
    rivers_gdf = gpd.read_file(list(network_path.glob('*.shp'))[0])
    
    # Continental summary
    if 'DSLINKNO' in rivers_gdf.columns:
        outlet_count = len(rivers_gdf[rivers_gdf['DSLINKNO'] == -1])
    else:
        outlet_count = 'Unknown'
    
    total_area = basins_gdf.geometry.area.sum() / 1e12  # Convert to million km²
    
    print(f"\n📊 Continental Domain Summary:")
    print(f"   Watersheds: {len(basins_gdf):,}")
    print(f"   Coastal outlets: {outlet_count}")
    print(f"   Stream segments: {len(rivers_gdf):,}")
    print(f"   Total area: {total_area:.1f} million km²")
    
    # Simplified visualization (full rendering may be slow)
    fig, ax = plt.subplots(1, 1, figsize=(14, 10))
    
    # Sample basins for visualization if very large
    if len(basins_gdf) > 1000:
        sample_size = 1000
        basins_sample = basins_gdf.sample(n=sample_size)
        print(f"   Visualizing {sample_size} sample watersheds...")
    else:
        basins_sample = basins_gdf
    
    basins_sample.plot(ax=ax, edgecolor='black', facecolor='lightblue', 
                       alpha=0.3, linewidth=0.3)
    
    ax.set_title(f'North America Continental Domain\n{len(basins_gdf):,} Watersheds', 
                 fontsize=14, fontweight='bold')
    ax.set_xlabel('Longitude')
    ax.set_ylabel('Latitude')
    plt.tight_layout()
    plt.show()
else:
    print("⚠️  Shapefile verification pending - run delineation first")

## Step 3 — Continental data processing

Process meteorological forcing and observational data across the continental domain.

**Note**: Continental data processing requires substantial storage (multi-TB) and processing time.

### Step 3a — Meteorological forcing

Acquire continental-scale forcing data across diverse climate zones.

In [None]:
# Step 3a — Continental forcing acquisition
print("⚠️  Continental forcing requires substantial storage and processing time")
# symfluence.managers['data'].acquire_forcings()
print("✅ Continental forcing acquisition complete")

### Step 3b — Streamflow observations

Process observations from continental-scale gauging networks.

In [None]:
# Step 3b — Continental observation processing
# symfluence.managers['data'].process_observed_data()
print("✅ Continental observation processing complete")

### Step 3c — Model-agnostic preprocessing

Standardize continental-scale data for model consumption.

In [None]:
# Step 3c — Continental preprocessing
symfluence.managers['data'].run_model_agnostic_preprocessing()
print("✅ Continental preprocessing complete")

## Step 4 — Continental model execution

Configure and execute SUMMA-mizuRoute for continental-scale simulation.

**Critical Note**: Continental execution typically requires:
- HPC cluster with 100+ cores
- 100+ GB RAM
- Days to weeks of computation
- Multi-TB storage for outputs

In [None]:
# Step 4a — Continental model configuration
print("⚠️  Continental model configuration may take substantial time")
symfluence.managers['model'].preprocess_models()
print("✅ Continental model configuration complete")

In [None]:
# Step 4b — Continental model execution
print(f"\n⚠️  Continental Execution Requirements:")
print(f"   Recommended: HPC batch submission")
print(f"\nUse: ./symfluence --config {config_path} --submit_slurm")
print(f"\nRunning {config['HYDROLOGICAL_MODEL']} with {config['ROUTING_MODEL']}...")
symfluence.managers['model'].run_models()
print("✅ Continental simulation complete")

## Step 5 — Continental-scale evaluation

Analyze continental patterns, major river basin performance, and large-scale hydrological processes.

In [None]:
# Step 5 — Continental evaluation framework
simulation_dir = project_dir / 'simulations' / config['EXPERIMENT_ID']
summa_dir = simulation_dir / 'SUMMA'
routing_dir = simulation_dir / 'mizuRoute'

if summa_dir.exists() and routing_dir.exists():
    print("Loading continental simulation outputs...")
    
    summa_files = list(summa_dir.glob('*day.nc'))
    routing_files = list(routing_dir.glob('*.nc'))
    
    if summa_files and routing_files:
        # Load data (may require chunking for large datasets)
        summa_data = xr.open_dataset(summa_files[0], chunks={'hru': 1000})
        routing_data = xr.open_dataset(routing_files[0], chunks={'seg': 1000})
        
        print(f"\n📊 Continental Simulation Summary:")
        print(f"   Simulation period: {len(summa_data.time)} days")
        print(f"   HRUs simulated: {len(summa_data.hru):,}")
        print(f"   Stream segments: {len(routing_data.seg):,}")
        
        # Continental-scale visualization
        fig, axes = plt.subplots(2, 2, figsize=(16, 12))
        
        # Continental SWE distribution
        if 'scalarSWE' in summa_data:
            swe_mean = summa_data['scalarSWE'].mean(dim='time').compute()
            axes[0, 0].hist(swe_mean.values, bins=50, edgecolor='black', alpha=0.7)
            axes[0, 0].set_xlabel('Mean SWE (mm)', fontweight='bold')
            axes[0, 0].set_ylabel('Frequency', fontweight='bold')
            axes[0, 0].set_title('Continental Snow Distribution', fontweight='bold')
            axes[0, 0].grid(True, alpha=0.3)
        
        # Major outlet flows
        if 'IRFroutedRunoff' in routing_data:
            outlets = rivers_gdf[rivers_gdf['DSLINKNO'] == -1]['LINKNO'].values[:100]  # Sample
            outlet_flows = routing_data['IRFroutedRunoff'].sel(seg=outlets).mean(dim='time').compute()
            axes[0, 1].hist(outlet_flows.values, bins=50, edgecolor='black', alpha=0.7, color='blue')
            axes[0, 1].set_xlabel('Mean Flow (m³/s)', fontweight='bold')
            axes[0, 1].set_ylabel('Number of Outlets', fontweight='bold')
            axes[0, 1].set_title('Major Outlet Distribution', fontweight='bold')
            axes[0, 1].grid(True, alpha=0.3)
        
        # Continental runoff patterns
        if 'scalarTotalRunoff' in summa_data:
            runoff_mean = summa_data['scalarTotalRunoff'].mean(dim='time').compute()
            axes[1, 0].hist(runoff_mean.values, bins=50, edgecolor='black', alpha=0.7, color='green')
            axes[1, 0].set_xlabel('Mean Runoff (mm/day)', fontweight='bold')
            axes[1, 0].set_ylabel('Frequency', fontweight='bold')
            axes[1, 0].set_title('Continental Runoff Distribution', fontweight='bold')
            axes[1, 0].grid(True, alpha=0.3)
        
        # Tutorial progression summary
        tutorial_scales = ['Lumped\n(02a)', 'Semi-Dist\n(02b)', 'Elevation\n(02c)', 
                          'Regional\n(03a)', 'Continental\n(03b)']
        scale_units = [1, 15, 45, 100, len(basins_gdf) if 'basins_gdf' in locals() else 10000]
        
        axes[1, 1].bar(tutorial_scales, scale_units, 
                       color=['lightcoral', 'lightgreen', 'lightblue', 'gold', 'mediumpurple'],
                       alpha=0.7, edgecolor='navy')
        axes[1, 1].set_ylabel('Computational Units (log scale)', fontweight='bold')
        axes[1, 1].set_yscale('log')
        axes[1, 1].set_title('Tutorial Progression: Complete Spatial Hierarchy', fontweight='bold')
        axes[1, 1].grid(True, alpha=0.3, axis='y')
        
        plt.suptitle(f'Continental-Scale Analysis — {config["DOMAIN_NAME"]}',
                     fontsize=16, fontweight='bold')
        plt.tight_layout()
        plt.show()
        
        summa_data.close()
        routing_data.close()
    else:
        print("⚠️  Simulation outputs not found")
else:
    print("⚠️  Simulation directories not found")
    print("   Continental evaluation awaits model execution")

print("\n✅ Continental evaluation framework complete")