# CONFLUENCE Tutorial - 7: Continental-Scale Modeling (North America)

## Introduction

This tutorial represents the ultimate scaling challenge in our CONFLUENCE series: continental-scale hydrological modeling. Moving from regional domains like Iceland to an entire continent introduces unprecedented computational complexity and scientific opportunities. Using North America as our example, we'll demonstrate how CONFLUENCE handles the massive data volumes, computational demands, and methodological challenges of modeling hydrology across an entire continent.

### The Scale Challenge

Continental-scale modeling represents a quantum leap in complexity:

**Spatial Scale**: 
- **North America**: ~24.7 million km² (240 times larger than Iceland)
- **Watersheds**: Thousands of independent drainage systems
- **Climate zones**: Arctic to tropical, maritime to continental
- **Elevation range**: Sea level to >6,000m (Denali)

**Computational Scale**:
- **Modeling units**: Tens to hundreds of thousands of HRUs
- **Data volume**: Terabytes of input/output files
- **Processing time**: Days to weeks of computation
- **Memory requirements**: 100+ GB RAM for full continental runs

### Why Continental-Scale Modeling?

Continental modeling addresses scientific questions impossible at smaller scales:

1. **Climate Change Assessment**: Understanding how continental water resources respond to changing climate patterns
2. **Water Security**: Assessing water availability across entire continents for policy and planning
3. **Comparative Hydrology**: Studying how different regions respond to similar climate forcing
4. **Earth System Science**: Providing land surface inputs for global climate models
5. **Transboundary Water Management**: Managing water resources across international boundaries

### North America: A Continental Modeling Laboratory

North America provides an ideal continental-scale case study:

**Geographic Diversity**:
- **Arctic regions**: Permafrost, snow-dominated hydrology
- **Temperate forests**: Complex seasonal cycles
- **Great Plains**: Continental climate, agricultural impacts
- **Mountainous regions**: Elevation-dependent processes, snowmelt-driven systems
- **Coastal areas**: Maritime climate influences
- **Desert regions**: Arid and semi-arid hydrology

**Hydrological Complexity**:
- **Major river systems**: Mississippi, Colorado, Columbia, Mackenzie
- **Great Lakes**: Massive freshwater reservoirs
- **Glacial systems**: Alaska, Canadian Arctic
- **Groundwater aquifers**: Ogallala, regional confined systems
- **Seasonal patterns**: Extreme seasonal variability across regions

**Data Availability**:
- **Extensive gauging networks**: USGS, Water Survey of Canada
- **Satellite observations**: Continental-scale remote sensing
- **Climate data**: Dense meteorological networks
- **Topographic data**: High-resolution continental DEMs

### Technical Challenges

Continental modeling introduces unique technical challenges:

1. **Data Volume Management**: Processing and storing terabytes of geospatial and climate data
2. **Computational Efficiency**: Optimizing models for high-performance computing systems
3. **Spatial Heterogeneity**: Representing diverse climate, terrain, and vegetation across the continent
4. **Parameter Estimation**: Calibrating thousands of watersheds with limited observations
5. **Validation**: Assessing model performance across diverse hydrological regimes

### Computational Requirements

Continental modeling demands high-performance computing resources:

**Memory**: 100+ GB RAM for full domain processing
**Storage**: 1+ TB for complete input/output datasets  
**Processing**: 40+ CPU cores for parallel execution
**Runtime**: Days to weeks for full simulations
**Network**: High-speed data transfer capabilities

### Key Configuration Adaptations

Continental modeling requires specific configuration changes:

```yaml
STREAM_THRESHOLD: 7500          # Higher threshold for manageable complexity
MPI_PROCESSES: 40              # Massive parallelization
MIN_GRU_SIZE: 50               # Larger minimum units
BOUNDING_BOX_COORDS: 83.0/-170.0/5.0/-50.0  # Continental extent
DELINEATE_COASTAL_WATERSHEDS: True          # Include ocean-draining basins
```

### Scientific Applications

Continental-scale modeling enables research impossible at smaller scales:

- **Continental water balance**: Understanding precipitation-evapotranspiration patterns
- **Climate change impacts**: Assessing vulnerability across diverse regions
- **Extreme event analysis**: Continental-scale flood and drought assessment
- **Ecosystem services**: Quantifying water-related services across biomes
- **Policy support**: Informing continental-scale water management decisions

### Methodological Considerations

Continental modeling requires adapted approaches:

1. **Hierarchical Modeling**: Break continent into manageable regions
2. **Parallel Processing**: Leverage MPI for massive parallelization
3. **Staged Execution**: Run shorter periods before full simulations
4. **Adaptive Resolution**: Use coarser resolution in less critical areas
5. **Selective Validation**: Focus detailed validation on key regions

### What You'll Learn

This tutorial demonstrates:

1. **Configure continental domains** using bounding box coordinates
2. **Manage massive data volumes** with efficient processing strategies
3. **Optimize for HPC resources** with parallel processing configuration
4. **Handle computational complexity** through strategic parameter choices
5. **Understand scaling limitations** and trade-offs at continental scales
6. **Apply continental modeling** to real-world scientific questions

### Tutorial Structure

This tutorial follows the CONFLUENCE workflow but emphasizes continental-scale considerations:

1. **Continental Setup**: Configure domain spanning entire continent
2. **Massive Data Acquisition**: Handle terabyte-scale geospatial datasets
3. **Continental Delineation**: Create thousands of watersheds automatically
4. **HPC-Optimized Processing**: Parallel processing of continental-scale data
5. **Model Configuration**: Prepare inputs for continental-scale simulation
6. **Resource Planning**: Understand computational requirements for execution

### Computational Reality Check

**Important Note**: This tutorial demonstrates continental-scale model setup, this has substantial computational requirements. Running a continental model requires:

- Access to high-performance computing clusters
- Substantial computational allocations (thousands of CPU hours)
- Significant storage capacity and data transfer capabilities
- Expertise in HPC job scheduling and resource management

The tutorial prepares you to understand and configure continental-scale modeling, providing the foundation for execution if appropriate computational resources become available.

### Tutorial Progression Summary

Our complete spatial scaling series:

| Scale | Example | Area | Complexity |
|-------|---------|------|------------|
| Point | Paradise SNOTEL | 0.001 km² | Vertical processes |
| Watershed | Bow at Banff | 2,200 km² | Basin response |
| Regional | Iceland | 103,000 km² | Multiple basins |
| Continental | North America | 24,700,000 km² | Thousands of basins |

This tutorial represents the culmination of our spatial scaling journey, demonstrating CONFLUENCE's capability to handle the most challenging scales in hydrological modeling while maintaining scientific rigor and workflow efficiency.

## Step 1: Continental-Scale Setup with Massive Computational Configuration
This tutorial represents the ultimate scaling challenge in our CONFLUENCE series: continental-scale hydrological modeling. Moving from regional domains like Iceland to an entire continent introduces unprecedented computational complexity and transformative scientific opportunities. Using North America as our example, we demonstrate how CONFLUENCE handles massive data volumes, extreme computational demands, and the methodological challenges of modeling hydrology across an entire continent.

### Modeling Evolution: Regional → Continental Scale

- **Spatial Scale**: Regional domain (~100,000 km²) → Continental domain (~25,000,000 km²)
- **Drainage Systems**: Multiple independent systems → Thousands of independent systems
- **Computational Units**: Hundreds of HRUs → Tens to hundreds of thousands of HRUs
- **Data Volume**: Gigabytes → Terabytes of input/output data
- **Computational Requirements**: Standard computing → High-performance computing mandatory


In [3]:
# =============================================================================
# STEP 1: CONTINENTAL-SCALE SETUP WITH MASSIVE COMPUTATIONAL CONFIGURATION
# =============================================================================

# Import required libraries
import sys
from pathlib import Path
import yaml
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd
import numpy as np
from shapely.geometry import box
import contextily as cx
from datetime import datetime
import xarray as xr

# Add CONFLUENCE to path
confluence_path = Path('../').resolve()
sys.path.append(str(confluence_path))

# Import CONFLUENCE
from CONFLUENCE import CONFLUENCE

plt.style.use('default')
%matplotlib inline

print("=== CONFLUENCE Tutorial 03b: Continental-Scale Modeling ===")
print("Ultimate scaling challenge: Regional to continental hydrological modeling")

# =============================================================================
# CONFIGURATION FOR CONTINENTAL NORTH AMERICA MODELING
# =============================================================================

print("\n🌍 Configuring Continental North America Domain...")

# Set directory paths
CONFLUENCE_CODE_DIR = confluence_path
CONFLUENCE_DATA_DIR = Path('/Users/darrieythorsson/compHydro/data/CONFLUENCE_data')  # ← Update this path

# Verify paths exist and create if needed
if not CONFLUENCE_CODE_DIR.exists():
    raise FileNotFoundError(f"CONFLUENCE code directory not found: {CONFLUENCE_CODE_DIR}")

if not CONFLUENCE_DATA_DIR.exists():
    print(f"Data directory doesn't exist. Creating: {CONFLUENCE_DATA_DIR}")
    CONFLUENCE_DATA_DIR.mkdir(parents=True, exist_ok=True)

# Load North America configuration template or create from base template
na_config_path = CONFLUENCE_CODE_DIR / '0_config_files' / 'config_North_America.yaml'

if not na_config_path.exists():
    print("⚠️  North America configuration template not found. Creating from base template...")
    # Load base template and adapt for continental North America
    template_config_path = CONFLUENCE_CODE_DIR / '0_config_files' / 'config_template.yaml'
    with open(template_config_path, 'r') as f:
        config_dict = yaml.safe_load(f)
    
    # Configure for North America continental modeling
    na_config_updates = {
        'DOMAIN_NAME': 'North_America',
        'EXPERIMENT_ID': 'run_1',
        'BOUNDING_BOX_COORDS': '83.0/-170.0/5.0/-50.0',  # North America continental extent
        'DELINEATE_BY_POURPOINT': False,  # Continental bounding box approach
        'DELINEATE_COASTAL_WATERSHEDS': True,  # Include all coastal systems
        'DOMAIN_DEFINITION_METHOD': 'delineate',
        'DOMAIN_DISCRETIZATION': 'GRUs',
        'STREAM_THRESHOLD': 7500,  # High threshold for continental scale
        'MIN_GRU_SIZE': 50,  # Larger minimum for computational efficiency
        'MPI_PROCESSES': 40,  #  parallelization for HPC
        'HYDROLOGICAL_MODEL': 'SUMMA',
        'ROUTING_MODEL': 'mizuRoute',
        'FORCING_DATASET': 'ERA5',
        'EXPERIMENT_TIME_START': '2018-01-01 01:00',
        'EXPERIMENT_TIME_END': '2018-03-31 23:00',  # Short period for demo
    }
    config_dict.update(na_config_updates)
else:
    with open(na_config_path, 'r') as f:
        config_dict = yaml.safe_load(f)

# Update for tutorial-specific settings
config_updates = {
    'CONFLUENCE_CODE_DIR': str(CONFLUENCE_CODE_DIR),
    'CONFLUENCE_DATA_DIR': str(CONFLUENCE_DATA_DIR),
    'DOMAIN_NAME': 'North_America_tutorial',
    'EXPERIMENT_ID': 'continental_tutorial',
    'EXPERIMENT_TIME_START': '2018-01-01 01:00',
    'EXPERIMENT_TIME_END': '2018-03-31 23:00',  # Short for tutorial demonstration
    'SPATIAL_MODE': 'Distributed',
}

config_dict.update(config_updates)

# Add experiment metadata
config_dict['NOTEBOOK_CREATION_TIME'] = datetime.now().isoformat()
config_dict['NOTEBOOK_CREATOR'] = 'CONFLUENCE_Tutorial_03b'
config_dict['SPATIAL_EVOLUTION'] = 'Regional to continental-scale modeling'

# Save continental configuration
continental_config_path = CONFLUENCE_CODE_DIR / '0_config_files' / 'config_continental_tutorial.yaml'
with open(continental_config_path, 'w') as f:
    yaml.dump(config_dict, f, default_flow_style=False, sort_keys=False)

print(f"✅ Continental North America configuration saved: {continental_config_path}")

# =============================================================================
# INITIALIZE CONFLUENCE FOR CONTINENTAL MODELING
# =============================================================================

print(f"\n⚙️  Initializing CONFLUENCE for Continental Modeling...")

# Initialize CONFLUENCE with continental config
confluence = CONFLUENCE(continental_config_path)

=== CONFLUENCE Tutorial 03b: Continental-Scale Modeling ===
Ultimate scaling challenge: Regional to continental hydrological modeling

🌍 Configuring Continental North America Domain...
✅ Continental North America configuration saved: /Users/darrieythorsson/compHydro/code/CONFLUENCE/0_config_files/config_continental_tutorial.yaml

⚙️  Initializing CONFLUENCE for Continental Modeling...

17:45:32 ● CONFLUENCE Logging Initialized
17:45:32 ● Domain: North_America_tutorial
17:45:32 ● Experiment ID: continental_tutorial
17:45:32 ● Log Level: INFO
17:45:32 ● Log File: /Users/darrieythorsson/compHydro/data/CONFLUENCE_data/domain_North_America_tutorial/_workLog_North_America_tutorial/confluence_general_North_America_tutorial_20250718_174532.log

17:45:32 ● Configuration logged to: /Users/darrieythorsson/compHydro/data/CONFLUENCE_data/domain_North_America_tutorial/_workLog_North_America_tutorial/config_North_America_tutorial_20250718_174532.yaml
17:45:32 ● Initializing CONFLUENCE system
17:45:32

## Step 2: Continental-Scale Data Acquisition and Massive Multi-Watershed Delineation
The transition to continental modeling requires handling unprecedented data volumes and computational complexity. Unlike regional modeling with manageable datasets, we now process terabyte-scale geospatial data and delineate thousands of independent drainage systems across an entire continent. This represents the ultimate challenge in spatial data processing, requiring high-performance computing infrastructure and sophisticated data management strategies.

### Scientific Context: Continental-Scale Geospatial Processing

**Massive Data Acquisition Principles:**
- **Terabyte-Scale Datasets**: Continental DEMs, climate data, and attribute datasets measured in terabytes
- **Multi-Resolution Integration**: Combining global datasets with regional high-resolution data where available
- **Distributed Processing**: HPC-based parallel processing mandatory for reasonable execution times
- **Storage Infrastructure**: High-capacity, high-speed storage systems for data management
- **Quality Control at Scale**: Automated validation across continental-scale heterogeneous datasets

**Continental Delineation Complexity:**
- **Thousands of Watersheds**: North America contains thousands of independent drainage systems
- **Extreme Topographic Diversity**: Arctic tundra to tropical highlands, coastal plains to continental divides
- **Multi-National Boundaries**: Watersheds spanning multiple countries with different data standards
- **Scale-Dependent Processing**: Algorithms optimized for continental extent while maintaining accuracy

The continental approach captures the complete hydrological picture of an entire continent, essential for climate change assessment, transboundary water management, and Earth system science applications.


In [4]:
# =============================================================================
# STEP 2: CONTINENTAL-SCALE DATA ACQUISITION AND MASSIVE MULTI-WATERSHED DELINEATION
# =============================================================================

print(f"\n🗂️  Initializing Continental Project Structure...")

# Initialize continental project directory structure
project_dir = confluence.managers['project'].setup_project()

# Create pour point (required for technical reasons but not used for continental delineation)
pour_point_path = confluence.managers['project'].create_pour_point()

print(f"✅ Continental project structure created")
print(f"   📁 Project directory: {project_dir}")
print(f"   🎯 Pour point file: Created (not used for continental delineation)")

# List created directories and assess storage requirements
print(f"\n📋 Continental Project Directory Structure:")
created_dirs = []

print(f"\n🌍 Continental-Scale Geospatial Data Acquisition...")

# Execute continental geospatial data acquisition
confluence.managers['data'].acquire_attributes()

print("✅ Continental geospatial data acquisition complete")

# =============================================================================
# CONTINENTAL DOMAIN DELINEATION: THOUSANDS OF WATERSHEDS
# =============================================================================

print(f"\n🌊 Continental Multi-Watershed Delineation Process...")

print(f"\n🔧 Continental Delineation Configuration:")
continental_delineation_config = [
    f"Method: {confluence.config['DOMAIN_DEFINITION_METHOD']} (continental watershed delineation)",
    f"Pour point mode: {confluence.config['DELINEATE_BY_POURPOINT']} (continental bounding box)",
    f"Coastal watersheds: {confluence.config.get('DELINEATE_COASTAL_WATERSHEDS', True)} (all coastal systems)",
    f"Stream threshold: {confluence.config['STREAM_THRESHOLD']} (continental-scale optimization)",
    f"MPI processes: {confluence.config['MPI_PROCESSES']} (massive parallelization)",
    f"Expected watersheds: Thousands of independent drainage systems"
]

for config_item in continental_delineation_config:
    print(f"   ⚙️  {config_item}")

# Execute continental domain delineation
watershed_path = confluence.managers['domain'].define_domain()

print("✅ Continental multi-watershed delineation complete")

# =============================================================================
# CONTINENTAL DRAINAGE SYSTEM ANALYSIS: THOUSANDS OF WATERSHEDS
# =============================================================================

print(f"\n📊 Continental Drainage System Analysis...")

# Load and analyze continental watersheds
basin_path = project_dir / 'shapefiles' / 'river_basins'
network_path = project_dir / 'shapefiles' / 'river_network'

continental_basin_count = 0
basin_files = []
continental_basins_gdf = None

if basin_path.exists():
    basin_files = list(basin_path.glob('*.shp'))
    if basin_files:
        try:
            # For continental scale, loading all basins may require substantial memory
            print(f"📊 Analyzing continental watershed results...")
            
            # Check file size first
            basin_file_size = basin_files[0].stat().st_size / (1024**2)  # Size in MB
            print(f"   Basin shapefile size: {basin_file_size:.1f} MB")
            
            if basin_file_size > 1000:  # If larger than 1 GB
                print(f"   ⚠️  Large continental dataset detected")
                print(f"   Loading sample of watersheds for analysis...")
                
                # Load sample for analysis
                sample_size = 1000
                continental_basins_sample = gpd.read_file(basin_files[0], rows=slice(0, sample_size))
                
                # Estimate total from sample
                total_file_records = sample_size  # This would need proper estimation in practice
                continental_basin_count = len(continental_basins_sample)
                
                print(f"   Sample watersheds: {continental_basin_count}")
                print(f"   Estimated total: ~{total_file_records} (continental scale)")
                
                # Analyze sample statistics
                if not continental_basins_sample.empty:
                    sample_area = continental_basins_sample.geometry.area.sum() / 1e6  # km²
                    avg_area = sample_area / len(continental_basins_sample)
                    
                    print(f"   Sample area: {sample_area:,.0f} km²")
                    print(f"   Average watershed size: {avg_area:.1f} km²")
                    
                    # Estimate continental totals
                    if continental_basin_count > 0:
                        estimated_total_area = approx_area  # Use calculated continental area
                        estimated_watershed_count = int(estimated_total_area / avg_area)
                        print(f"   Estimated total watersheds: ~{estimated_watershed_count:,}")
                        continental_basin_count = estimated_watershed_count
                
            else:
                # Load full dataset if manageable
                continental_basins_gdf = gpd.read_file(basin_files[0])
                continental_basin_count = len(continental_basins_gdf)
                
                print(f"✅ Continental watersheds loaded")
                print(f"   Total watersheds: {continental_basin_count:,}")
                
                if not continental_basins_gdf.empty:
                    total_area = continental_basins_gdf.geometry.area.sum() / 1e6  # km²
                    avg_area = total_area / continental_basin_count
                    max_area = continental_basins_gdf.geometry.area.max() / 1e6
                    min_area = continental_basins_gdf.geometry.area.min() / 1e6
                    
                    print(f"   Total continental area: {total_area:,.0f} km²")
                    print(f"   Average watershed size: {avg_area:.1f} km²")
                    print(f"   Watershed size range: {min_area:.1f} to {max_area:.1f} km²")
                    
                    # Analyze continental watershed characteristics
                    if 'elevation' in continental_basins_gdf.columns:
                        elev_range = continental_basins_gdf['elevation'].max() - continental_basins_gdf['elevation'].min()
                        print(f"   Elevation diversity: {continental_basins_gdf['elevation'].min():.0f}m to {continental_basins_gdf['elevation'].max():.0f}m")
                        print(f"   Continental elevation span: {elev_range:.0f}m")
                        
        except Exception as e:
            print(f"❌ Error analyzing continental basin data: {str(e)}")
            print(f"   This may indicate memory limitations with continental-scale datasets")
    else:
        print(f"❌ No basin shapefiles found in {basin_path}")
else:
    print(f"❌ Basin directory not found: {basin_path}")

# Analyze continental stream network
continental_network_count = 0
network_files = []
continental_rivers_gdf = None

if network_path.exists():
    network_files = list(network_path.glob('*.shp'))
    if network_files:
        try:
            # Check stream network file size
            network_file_size = network_files[0].stat().st_size / (1024**2)  # Size in MB
            print(f"\n🌊 Continental Stream Network Analysis:")
            print(f"   Stream network file size: {network_file_size:.1f} MB")
            
            if network_file_size > 500:  # Large file handling
                print(f"   ⚠️  Large continental stream network detected")
                print(f"   Loading sample for analysis...")
                
                # Load sample of stream network
                sample_streams = gpd.read_file(network_files[0], rows=slice(0, 1000))
                continental_network_count = len(sample_streams)
                print(f"   Sample stream segments: {continental_network_count}")
                
                if 'Length' in sample_streams.columns:
                    sample_length = sample_streams['Length'].sum() / 1000  # km
                    print(f"   Sample stream length: {sample_length:.0f} km")
                    
                    # Estimate total network
                    est_total_segments = continental_network_count * (continental_basin_count / 100) if continental_basin_count > 0 else 10000
                    est_total_length = sample_length * (est_total_segments / continental_network_count)
                    print(f"   Estimated total segments: ~{est_total_segments:,.0f}")
                    print(f"   Estimated total length: ~{est_total_length:,.0f} km")
                    continental_network_count = est_total_segments
                
            else:
                # Load full network if manageable
                continental_rivers_gdf = gpd.read_file(network_files[0])
                continental_network_count = len(continental_rivers_gdf)
                
                print(f"✅ Continental stream network loaded")
                print(f"   Stream segments: {continental_network_count:,}")
                
                if 'Length' in continental_rivers_gdf.columns:
                    total_length = continental_rivers_gdf['Length'].sum() / 1000  # km
                    print(f"   Total stream length: {total_length:,.0f} km")
                    
        except Exception as e:
            print(f"❌ Error analyzing continental stream network: {str(e)}")
    else:
        print(f"❌ No stream network files found in {network_path}")
else:
    print(f"❌ Stream network directory not found: {network_path}")



🗂️  Initializing Continental Project Structure...
17:53:34 ● Setting up project for domain: North_America_tutorial
17:53:34 ● Project directory created at: /Users/darrieythorsson/compHydro/data/CONFLUENCE_data/domain_North_America_tutorial
17:53:34 ● Pour point shapefile created successfully: /Users/darrieythorsson/compHydro/data/CONFLUENCE_data/domain_North_America_tutorial/shapefiles/pour_point/North_America_tutorial_pourPoint.shp
✅ Continental project structure created
   📁 Project directory: /Users/darrieythorsson/compHydro/data/CONFLUENCE_data/domain_North_America_tutorial
   🎯 Pour point file: Created (not used for continental delineation)

📋 Continental Project Directory Structure:

🌍 Continental-Scale Geospatial Data Acquisition...
17:53:34 ● Starting attribute acquisition
17:53:34 ● Acquiring elevation data
usage: extract-gis.sh -d DATASET -io DIR -v var1[,var2,[...]] [-jVhEu] [-t BOOL] [-c DIR] [-se DATE] [-r INT] [-ln REAL,REAL] [-f PATH] [-F STR] [-p STR] [-a stat1[,stat2,

getopt: illegal option -- n


CalledProcessError: Command '['/Users/darrieythorsson/compHydro/data/CONFLUENCE_data/installs/gistool/extract-gis.sh', '--dataset=MERIT-Hydro', '--dataset-dir=/work/comphyd_lab/data/geospatial-data/MERIT-Hydro', '--output-dir=/Users/darrieythorsson/compHydro/data/CONFLUENCE_data/domain_North_America_tutorial/attributes/elevation/dem', '--lat-lims=85,5', '--lon-lims=-180,-53', '--variable=elv', '--prefix=domain_North_America_tutorial_', '--print-geotiff=true', '--cache=/work/comphyd_lab/users/darri/cache_North_America_tutorial', '--cluster=/work/comphyd_lab/users/darri/data/CONFLUENCE_data/installs/datatool/etc/clusters/ucalgary-arc.json']' returned non-zero exit status 1.

## Step 3: Continental-Scale Data Pipeline and Massive Computational Processing
The same model-agnostic preprocessing framework now scales to handle tens to hundreds of thousands of computational units across an entire continent, representing the ultimate challenge in hydrological data processing. Unlike previous tutorials managing hundreds or thousands of units, we now orchestrate massive parallel processing across continental-scale heterogeneity, requiring high-performance computing infrastructure and sophisticated memory management strategies.

### Data Pipeline Scaling: Regional → Continental Magnitude

- **Computational Units**: Thousands of HRUs → Tens to hundreds of thousands of HRUs
- **Forcing Data Volume**: Gigabytes → Multi-terabyte meteorological datasets
- **Processing Complexity**: Regional diversity → Continental heterogeneity across all climate zones
- **Memory Management**: Standard RAM → Distributed memory systems with 100+ GB requirements
- **Infrastructure**: Single-node processing → HPC clusters with massive parallelization

### Continental-Scale Processing Considerations

**Extreme Hydrological Diversity**: North America encompasses every major hydrological regime - Arctic permafrost systems, temperate forests, arid deserts, tropical highlands, glacial systems, and coastal zones - each requiring specialized process representation within the same modeling framework.

**Terabyte-Scale Forcing Distribution**: Continental meteorological datasets exceed terabyte scales, requiring sophisticated data streaming, parallel I/O operations, and distributed memory management across thousands of computational nodes.

**Massive Computational Orchestration**: Coordinating preprocessing across tens to hundreds of thousands of HRUs demands advanced job scheduling, load balancing, and fault tolerance mechanisms typical of supercomputing applications.

**Quality Control at Scale**: Ensuring data consistency and scientific validity across continental heterogeneity requires automated validation systems capable of handling millions of data points and identifying anomalies across diverse conditions.

The same preprocessing philosophy maintains consistent data standards across this unprecedented spatial and computational complexity while requiring fundamental adaptations for high-performance computing environments.


In [None]:
# =============================================================================
# STEP 3: CONTINENTAL-SCALE DATA PIPELINE AND MASSIVE COMPUTATIONAL PROCESSING
# =============================================================================

print(f"\n🌍 Continental Domain Discretization Process...")

print(f"   Discretization method: {confluence.config['DOMAIN_DISCRETIZATION']}")

print(f"\n⚙️  Executing continental domain discretization...")

# Execute continental domain discretization
hru_path = confluence.managers['domain'].discretize_domain()

print("✅ Continental domain discretization complete")

# =============================================================================
# COMPUTATIONAL UNIT ANALYSIS
# =============================================================================

print(f"\n📊 Massive Computational Unit Analysis...")

# Load and analyze continental computational units
catchment_path = project_dir / 'shapefiles' / 'catchment'
continental_hru_count = 0
continental_hru_gdf = None

if catchment_path.exists():
    hru_files = list(catchment_path.glob('*.shp'))
    if hru_files:
        try:
            # Check file size for continental HRU dataset
            hru_file_size = hru_files[0].stat().st_size / (1024**3)  # Size in GB
            print(f"✅ Continental HRU dataset created")
            print(f"   HRU shapefile size: {hru_file_size:.2f} GB")
            
            if hru_file_size > 5.0:  # Large file handling for continental scale
                print(f"   ⚠️  Massive continental HRU dataset detected")
                print(f"   Using statistical sampling for analysis...")
                
                # Load sample for computational analysis
                sample_size = 5000  # Larger sample for continental scale
                continental_hru_sample = gpd.read_file(hru_files[0], rows=slice(0, sample_size))
                continental_hru_count = len(continental_hru_sample)
                
                print(f"   Sample HRUs analyzed: {continental_hru_count}")
                
                # Estimate total continental HRUs
                if continental_basin_count > 0 and 'GRU_ID' in continental_hru_sample.columns:
                    sample_grus = continental_hru_sample['GRU_ID'].nunique()
                    if sample_grus > 0:
                        hrus_per_gru = continental_hru_count / sample_grus
                        estimated_total_hrus = int(hrus_per_gru * continental_basin_count)
                        continental_hru_count = estimated_total_hrus
                        
                        print(f"   Sample GRUs: {sample_grus}")
                        print(f"   Average HRUs per GRU: {hrus_per_gru:.1f}")
                        print(f"   Estimated total HRUs: {estimated_total_hrus:,}")
                
            else:
                # Load full dataset if manageable
                continental_hru_gdf = gpd.read_file(hru_files[0])
                continental_hru_count = len(continental_hru_gdf)
                
                print(f"   Total HRUs: {continental_hru_count:,}")
                
                if 'GRU_ID' in continental_hru_gdf.columns:
                    unique_grus = continental_hru_gdf['GRU_ID'].nunique()
                    print(f"   Unique GRUs: {unique_grus:,}")
                    
                    if unique_grus > 0:
                        avg_hrus_per_gru = continental_hru_count / unique_grus
                        print(f"   Average HRUs per GRU: {avg_hrus_per_gru:.1f}")
                
        except Exception as e:
            print(f"❌ Error analyzing continental HRUs: {str(e)}")
            print(f"   This may indicate memory limitations with continental datasets")
            continental_hru_count = continental_basin_count * 5  # Rough estimate
    else:
        print(f"📋 No continental HRU files found")
        continental_hru_count = continental_basin_count * 5  # Rough estimate
else:
    print(f"📋 HRU directory not found")
    continental_hru_count = continental_basin_count * 5  # Rough estimate

# =============================================================================
# FORCING DATA ACQUISITION AND PROCESSING
# =============================================================================

print(f"\n🌡️  Continental Terabyte-Scale Forcing Data Pipeline...")

# Check for existing forcing data or estimate requirements
forcing_dir = project_dir / 'forcing' / 'raw_data'
forcing_data_available = forcing_dir.exists() and len(list(forcing_dir.glob('*.nc'))) > 0

if not forcing_data_available:
    print(f"\n⬇️  Continental Forcing Data Acquisition Requirements:")
    
    # confluence.managers['data'].acquire_forcings()

    print("✅ Continental forcing data acquisition simulated")
    
else:
    print(f"\n✅ Continental forcing data available")
    print(f"   Reusing terabyte-scale meteorological datasets")

# =============================================================================
# MASSIVE MODEL-AGNOSTIC PREPROCESSING
# =============================================================================

print(f"\n⚙️  Executing Continental Model-Agnostic Preprocessing...")

confluence.managers['data'].run_model_agnostic_preprocessing()

print("✅ Continental model-agnostic preprocessing complete")

# =============================================================================
# MODEL-SPECIFIC PREPROCESSING FOR CONTINENTAL INFRASTRUCTURE
# =============================================================================

print(f"\n🔧 Model-Specific Preprocessing: Continental SUMMA + mizuRoute...")

confluence.managers['model'].preprocess_models()

print("✅ Continental model-specific preprocessing complete")



## Step 4: Continental-Scale Model Execution and Supercomputing Infrastructure
The same SUMMA process-based physics now faces its ultimate scaling challenge: execution across tens to hundreds of thousands of computational units spanning an entire continent. This represents a fundamental transition from standard computing to supercomputing applications, requiring sophisticated job scheduling, massive parallel coordination, and infrastructure typically reserved for climate modeling and weather prediction systems.

### Model Execution Scaling: Regional → Continental Supercomputing

- **Computational Infrastructure**: Standard HPC clusters → National supercomputing facilities
- **Resource Requirements**: 100s of cores → 1000s of cores with specialized interconnects
- **Memory Architecture**: Shared memory systems → Distributed memory supercomputers
- **Execution Time**: Hours/days → Weeks/months for full continental simulations
- **Job Management**: Simple batch jobs → Complex workflow orchestration systems


In [None]:
# =============================================================================
# STEP 4: CONTINENTAL-SCALE MODEL EXECUTION
# =============================================================================

# Execute the  model system
print(f"\n⚙️  Running continental multi-watershed simulation...")
confluence.managers['model'].run_models()

print("✅ Regional multi-watershed simulation complete")

## Step 5: Continental-Scale Analysis and Earth System Assessment
The culmination of our modeling series transcends traditional watershed performance evaluation to embrace Earth system science applications. With successful execution across tens of thousands of computational units, we now analyze continental water resources, climate sensitivity, and hydrological patterns across an entire continent. This represents the ultimate achievement in spatial hydrological modeling - providing insights impossible at any smaller scale and demonstrating CONFLUENCE's capability to contribute to global Earth system understanding.

### Analysis Framework Evolution: Regional → Continental Earth System Science

- **Assessment Scale**: Regional pattern analysis → Continental Earth system component analysis  
- **Scientific Applications**: Regional water resources → Global climate model inputs and continental water security
- **Comparative Scope**: Inter-watershed analysis → Continental-scale statistical hydrology across thousands of systems
- **Policy Relevance**: Regional water management → National and international water policy frameworks
- **Earth System Integration**: Regional hydrology → Continental water cycle component for climate science

### Continental-Scale Analysis Capabilities

**Continental Water Balance**: Comprehensive assessment of precipitation, evapotranspiration, runoff, and storage across North America's complete hydrological spectrum, providing unprecedented insights into continental-scale water cycle dynamics.

**Climate Change Sensitivity**: Analysis of how thousands of diverse watersheds respond to climate forcing, enabling robust statistical assessment of climate change impacts across the full range of North American hydrological conditions.

**Earth System Science Applications**: Continental-scale land surface fluxes and water balance components essential for global climate models, weather prediction systems, and Earth system research.

**Transboundary Water Resources**: Assessment of water availability and variability across international boundaries, supporting continental-scale water management and policy decisions.

The continental analysis framework provides transformative insights into Earth system hydrology while demonstrating the ultimate achievement of our spatial modeling progression.


In [None]:
# =============================================================================
# STEP 5: CONTINENTAL-SCALE ANALYSIS AND EARTH SYSTEM ASSESSMENT
# =============================================================================

print(f"\n🌍 Loading Continental-Scale Simulation Results...")

# Load continental simulation outputs
simulation_dir = project_dir / 'simulations' / config_dict['EXPERIMENT_ID']
summa_dir = simulation_dir / 'SUMMA'
routing_dir = simulation_dir / 'mizuRoute'

# Initialize variables for continental analysis
continental_summa_data = None
continental_routing_data = None
continental_analysis_ready = False

# Load massive SUMMA continental outputs
summa_files = list(summa_dir.glob('*.nc')) if summa_dir.exists() else []
if summa_files:
    try:
        # For continental scale, we might need to handle massive files carefully
        print(f"✅ Continental SUMMA outputs available")
        print(f"   Files: {len(summa_files)} netCDF files")
        
        # Check file sizes for continental datasets
        total_summa_size = sum(f.stat().st_size for f in summa_files) / (1024**3)  # GB
        print(f"   Total SUMMA output: {total_summa_size:.1f} GB")
        
        # Load representative file for analysis
        continental_summa_data = xr.open_dataset(summa_files[0])
        
        if 'hru' in continental_summa_data.dims:
            n_hrus_output = continental_summa_data.dims['hru']
            print(f"   HRUs in output: {n_hrus_output:,}")
        
        if 'time' in continental_summa_data.dims:
            n_timesteps = continental_summa_data.dims['time']
            print(f"   Time steps: {n_timesteps:,}")
            
        print(f"   Variables: {len(continental_summa_data.data_vars)} hydrological components")
        continental_analysis_ready = True
        
    except Exception as e:
        print(f"⚠️  Continental SUMMA analysis limited: {e}")
else:
    print(f"⚠️  No continental SUMMA outputs found - using demonstration framework")

# Load massive mizuRoute continental outputs  
routing_files = list(routing_dir.glob('*.nc')) if routing_dir.exists() else []
if routing_files:
    try:
        continental_routing_data = xr.open_dataset(routing_files[0])
        
        print(f"✅ Continental mizuRoute outputs available")
        if 'seg' in continental_routing_data.dims:
            n_segments = continental_routing_data.dims['seg']
            print(f"   Stream segments: {n_segments:,}")
            
        if 'IRFroutedRunoff' in continental_routing_data.data_vars:
            print(f"   Continental streamflow: Available for thousands of outlets")
            
        # Check routing output size
        routing_size = sum(f.stat().st_size for f in routing_files) / (1024**3)  # GB
        print(f"   Total routing output: {routing_size:.1f} GB")
        
    except Exception as e:
        print(f"⚠️  Continental routing analysis limited: {e}")
else:
    print(f"⚠️  No continental routing outputs found - using demonstration framework")

print(f"\n📊 Continental Analysis Capability: {'Full Analysis Available' if continental_analysis_ready else 'Demonstration Framework'}")

# =============================================================================
# CONTINENTAL WATER BALANCE AND EARTH SYSTEM ANALYSIS
# =============================================================================

print(f"\n💧 Continental Water Balance and Earth System Analysis...")

if continental_analysis_ready and continental_summa_data is not None:
    try:
        print(f"✅ Analyzing continental water balance across North America")
        
        # Extract continental water balance components
        available_vars = list(continental_summa_data.data_vars.keys())
        continental_water_components = {
            'Total Soil Water': 'scalarTotalSoilWat',
            'Snow Water Equivalent': 'scalarSWE',
            'Surface Runoff': 'scalarSurfaceRunoff', 
            'Evapotranspiration': 'scalarLatHeatTotal',
            'Net Precipitation': 'scalarNetPrecipitation',
            'Groundwater Flow': 'scalarGroundwater'
        }
        
        print(f"\n💧 Continental Water Balance Components:")
        continental_analysis_results = {}
        
        for component_name, var_key in continental_water_components.items():
            if var_key in available_vars:
                var_data = continental_summa_data[var_key]
                
                # Calculate continental statistics
                if 'hru' in var_data.dims and 'time' in var_data.dims:
                    # Spatial mean across continent
                    continental_mean = var_data.mean(dim=['hru', 'time']).values
                    spatial_std = var_data.mean(dim='time').std(dim='hru').values
                    temporal_std = var_data.mean(dim='hru').std(dim='time').values
                    
                    continental_analysis_results[component_name] = {
                        'mean': continental_mean,
                        'spatial_variability': spatial_std,
                        'temporal_variability': temporal_std
                    }
                    
                    print(f"   💧 {component_name}:")
                    print(f"      Continental mean: {continental_mean:.2f}")
                    print(f"      Spatial variability: {spatial_std:.2f}")
                    print(f"      Temporal variability: {temporal_std:.2f}")
                else:
                    print(f"   📋 {component_name}: Available but requires processing")
            else:
                print(f"   ❌ {component_name}: Not available in outputs")
        
        # Calculate continental water balance
        print(f"\n🌍 Continental Water Balance Summary:")
        print(f"   Analysis period: {continental_summa_data.time.min().values} to {continental_summa_data.time.max().values}")
        print(f"   Spatial coverage: {continental_summa_data.dims.get('hru', 'N/A'):,} computational units")
        print(f"   Continental extent: Complete North American coverage")
        
        if len(continental_analysis_results) >= 3:
            print(f"   Water balance closure: {len(continental_analysis_results)} components analyzed")
            print(f"   Statistical robustness: {continental_summa_data.dims.get('hru', 0):,} spatial samples")
        
    except Exception as e:
        print(f"   ⚠️  Continental water balance analysis error: {e}")
        continental_analysis_ready = False

else:
    print(f"📋 Demonstrating continental water balance analysis framework:")
    demo_continental_balance = [
        f"Precipitation patterns: Continental gradients from Arctic to tropical zones",
        f"Evapotranspiration: Climate-dependent water losses across continental diversity",
        f"Snow water equivalent: Seasonal storage across elevation and latitude gradients",
        f"Surface runoff: Continental patterns in water availability and timing",
        f"Soil water storage: Regional patterns in water retention and drought sensitivity",
        f"Groundwater contributions: Continental-scale groundwater-surface water interactions"
    ]
    
    for component in demo_continental_balance:
        print(f"   💧 {component}")

# =============================================================================
# CONTINENTAL CLIMATE SENSITIVITY AND CHANGE ASSESSMENT
# =============================================================================

print(f"\n🌡️  Continental Climate Sensitivity and Change Assessment...")

print(f"🌍 Continental Climate Gradient Analysis:")
continental_climate_analysis = [
    f"Arctic regions: Permafrost hydrology and extreme seasonal cycles",
    f"Boreal systems: Snow-dominated hydrology with forest influences", 
    f"Temperate zones: Balanced precipitation-evaporation with seasonal variability",
    f"Great Plains: Continental climate effects on agricultural water resources",
    f"Mountain systems: Elevation-dependent processes across major ranges",
    f"Coastal regions: Maritime influences on temperature and precipitation",
    f"Arid Southwest: Water-limited hydrology and drought sensitivity"
]

for analysis in continental_climate_analysis:
    print(f"   🌡️  {analysis}")

if continental_analysis_ready and continental_summa_data is not None:
    try:
        # Analyze continental climate sensitivity
        print(f"\n🌊 Continental Climate Sensitivity Analysis:")
        
        # Temperature analysis if available
        temp_vars = ['scalarAirTemperature', 'scalarAirTemp', 'airTemp']
        temp_var = None
        for var in temp_vars:
            if var in continental_summa_data.data_vars:
                temp_var = var
                break
        
        if temp_var:
            temp_data = continental_summa_data[temp_var]
            print(f"   Temperature analysis across {continental_summa_data.dims.get('hru', 0):,} sites:")
            
            if 'hru' in temp_data.dims:
                continental_temp_mean = temp_data.mean().values
                continental_temp_range = temp_data.max().values - temp_data.min().values
                
                print(f"   Continental mean temperature: {continental_temp_mean:.2f}°C")
                print(f"   Continental temperature range: {continental_temp_range:.2f}°C")
                print(f"   Climate diversity: Arctic to tropical conditions represented")
        
        # Snow analysis for climate sensitivity
        if 'scalarSWE' in continental_summa_data.data_vars:
            swe_data = continental_summa_data['scalarSWE']
            
            if 'hru' in swe_data.dims and 'time' in swe_data.dims:
                max_swe = swe_data.max(dim='time')
                snow_coverage = (max_swe > 10).sum().values / swe_data.dims['hru'] * 100
                
                print(f"   Snow-influenced area: {snow_coverage:.1f}% of continental domain")
                print(f"   Snow variability: Critical for continental water resources")
        
    except Exception as e:
        print(f"   ⚠️  Climate sensitivity analysis error: {e}")

# =============================================================================
# CONTINENTAL STREAMFLOW ANALYSIS: THOUSANDS OF OUTLETS
# =============================================================================

print(f"\n🌊 Continental Streamflow Analysis: Thousands of Outlets...")

if continental_routing_data is not None and 'IRFroutedRunoff' in continental_routing_data.data_vars:
    try:
        streamflow_data = continental_routing_data['IRFroutedRunoff']
        n_outlets = streamflow_data.dims.get('seg', 0)
        
        print(f"✅ Continental streamflow analysis across {n_outlets:,} outlets")
        
        # Analyze continental streamflow patterns
        if n_outlets > 0:
            # Calculate streamflow statistics across continental outlets
            mean_flows = streamflow_data.mean(dim='time')
            max_flows = streamflow_data.max(dim='time')
            
            print(f"\n🌊 Continental Streamflow Patterns:")
            print(f"   Outlet count: {n_outlets:,} independent discharge points")
            print(f"   Flow magnitude range: {mean_flows.min().values:.2f} to {mean_flows.max().values:.2f} m³/s (mean)")
            print(f"   Peak flow range: {max_flows.min().values:.2f} to {max_flows.max().values:.2f} m³/s (maximum)")
            
            # Identify major continental outlets
            if n_outlets >= 10:
                # Find largest outlets by mean flow
                largest_outlets = mean_flows.argsort()[-10:]  # Top 10 outlets
                largest_flows = mean_flows.isel(seg=largest_outlets)
                
                print(f"   Major continental systems: {len(largest_outlets)} primary outlets identified")
                print(f"   Largest outlet discharge: {largest_flows.max().values:.0f} m³/s mean flow")
            
            # Calculate continental discharge totals
            total_continental_discharge = mean_flows.sum().values
            print(f"   Total continental discharge: {total_continental_discharge:.0f} m³/s")
            
            # Seasonal analysis if temporal data available
            if 'time' in streamflow_data.dims:
                # Monthly analysis across continental outlets
                monthly_mean = streamflow_data.groupby('time.month').mean()
                peak_month = monthly_mean.mean(dim='seg').argmax().values + 1
                low_month = monthly_mean.mean(dim='seg').argmin().values + 1
                
                month_names = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
                              'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
                
                print(f"   Continental peak flow: {month_names[peak_month-1]} (averaged)")
                print(f"   Continental low flow: {month_names[low_month-1]} (averaged)")
        
    except Exception as e:
        print(f"   ⚠️  Continental streamflow analysis error: {e}")
else:
    print(f"📋 Continental streamflow analysis framework:")
    continental_streamflow_characteristics = [
        f"Thousands of outlets: Independent coastal and transboundary discharge points",
        f"Major river systems: Mississippi, Colorado, Columbia, Mackenzie, St. Lawrence",
        f"Arctic discharge: Unique seasonal patterns in northern watersheds",
        f"Pacific coastal: Steep gradient, short residence time systems",
        f"Atlantic coastal: Diverse from tropical to boreal discharge patterns",
        f"Great Lakes: Massive freshwater system effects on regional hydrology"
    ]
    
    for characteristic in continental_streamflow_characteristics:
        print(f"   🌊 {characteristic}")

# =============================================================================
# EARTH SYSTEM SCIENCE APPLICATIONS
# =============================================================================

print(f"\n🌍 Earth System Science Applications...")

print(f"🔬 Continental Modeling Contributions to Earth System Science:")
earth_system_applications = [
    f"Global climate models: Land surface water and energy fluxes at continental scale",
    f"Weather prediction: Continental-scale land surface boundary conditions",
    f"Climate change assessment: Statistical hydrology across thousands of watersheds",
    f"Carbon cycle research: Water controls on continental carbon fluxes",
    f"Ecosystem modeling: Water availability for continental vegetation dynamics",
    f"Atmospheric modeling: Evapotranspiration and surface energy budget"
]

for application in earth_system_applications:
    print(f"   🌍 {application}")

if continental_analysis_ready:
    print(f"\n📊 Earth System Science Data Products:")
    earth_system_products = [
        f"Continental water balance: Precipitation-evapotranspiration patterns",
        f"Seasonal water storage: Snow and soil water across climate gradients",
        f"Surface energy fluxes: Latent and sensible heat across continental diversity",
        f"Runoff coefficients: Precipitation-to-runoff relationships by region",
        f"Drought indicators: Continental-scale water stress assessment",
        f"Flood potential: Peak flow statistics across thousands of watersheds"
    ]
    
    for product in earth_system_products:
        print(f"   📈 {product}")

# =============================================================================
# CONTINENTAL WATER RESOURCES AND POLICY APPLICATIONS
# =============================================================================

print(f"\n💧 Continental Water Resources and Policy Applications...")

print(f"🏛️  National and International Water Policy Support:")
policy_applications = [
    f"Water security assessment: National-scale water availability and vulnerability",
    f"Transboundary management: International watershed cooperation frameworks",
    f"Climate adaptation: Continental-scale climate change impact assessment",
    f"Infrastructure planning: Continental-scale water storage and conveyance needs",
    f"Ecosystem services: Water-related services across continental landscapes",
    f"Emergency management: Continental-scale flood and drought early warning"
]

for application in policy_applications:
    print(f"   💧 {application}")

# Calculate continental water resources metrics if data available
if continental_analysis_ready and 'continental_analysis_results' in locals():
    print(f"\n📊 Continental Water Resources Assessment:")
    
    # Example water resources calculations
    continental_metrics = [
        f"Water availability: Distributed across {continental_summa_data.dims.get('hru', 0):,} assessment units",
        f"Seasonal storage: Snow and soil water patterns across continental gradients",
        f"Regional variability: Statistical analysis across thousands of independent systems",
        f"Climate sensitivity: Robust assessment across complete North American diversity",
        f"Resource security: Continental-scale evaluation of water stress and abundance"
    ]
    
    for metric in continental_metrics:
        print(f"   📊 {metric}")

# =============================================================================
# COMPREHENSIVE CONTINENTAL VISUALIZATION FRAMEWORK
# =============================================================================

print(f"\n📈 Creating comprehensive continental analysis visualization...")

# For continental scale, we would create multiple visualization products
print(f"🗺️  Continental Visualization Products:")

if 'continental_basins_gdf' in locals() or 'basins_gdf' in locals():
    
    # Set up continental visualization framework
    fig, axes = plt.subplots(3, 2, figsize=(20, 24))
    
    # Use available basin data for visualization framework
    if 'continental_basins_gdf' in locals() and continental_basins_gdf is not None:
        basins_for_viz = continental_basins_gdf
    elif 'basins_gdf' in locals() and basins_gdf is not None:
        basins_for_viz = basins_gdf
    else:
        basins_for_viz = None
    
    if basins_for_viz is not None and not basins_for_viz.empty:
        
        # Continental watersheds overview (top left)
        ax1 = axes[0, 0]
        if 'GRU_ID' in basins_for_viz.columns:
            basins_for_viz.plot(ax=ax1, column='GRU_ID', cmap='tab20', 
                               edgecolor='gray', linewidth=0.1, alpha=0.7, legend=False)
        else:
            basins_for_viz.plot(ax=ax1, cmap='tab20', 
                               edgecolor='gray', linewidth=0.1, alpha=0.7, legend=False)
        
        ax1.set_title(f'Continental Watershed Network\n{len(basins_for_viz):,} Independent Systems', 
                     fontweight='bold', fontsize=12)
        ax1.set_xlabel('Longitude')
        ax1.set_ylabel('Latitude')
        ax1.grid(True, alpha=0.3)
        
        # Continental water balance (top right)
        ax2 = axes[0, 1]
        if continental_analysis_ready and 'continental_analysis_results' in locals():
            # Plot water balance components
            components = list(continental_analysis_results.keys())[:5]  # Top 5 components
            values = [continental_analysis_results[comp]['mean'] for comp in components]
            
            bars = ax2.bar(range(len(components)), values, color='steelblue', alpha=0.7)
            ax2.set_xticks(range(len(components)))
            ax2.set_xticklabels([comp.replace(' ', '\n') for comp in components], rotation=0, ha='center')
            ax2.set_ylabel('Continental Mean Value')
            ax2.set_title('Continental Water Balance Components', fontweight='bold')
            ax2.grid(True, alpha=0.3, axis='y')
            
            # Add value labels
            for bar, value in zip(bars, values):
                ax2.text(bar.get_x() + bar.get_width()/2., bar.get_height() + max(values)*0.01,
                        f'{value:.2f}', ha='center', va='bottom', fontsize=9)
        else:
            ax2.text(0.5, 0.5, 'Continental\nWater Balance\nAnalysis\n\n(Requires Output Data)', 
                    transform=ax2.transAxes, ha='center', va='center',
                    fontsize=14, bbox=dict(facecolor='lightblue', alpha=0.5))
            ax2.set_title('Continental Water Balance', fontweight='bold')
        
        # Climate sensitivity analysis (middle left)
        ax3 = axes[1, 0]
        # Demo climate zones across continent
        climate_zones = ['Arctic', 'Boreal', 'Temperate', 'Continental', 'Arid', 'Coastal']
        zone_counts = [len(basins_for_viz)//6] * 6  # Equal distribution for demo
        
        bars = ax3.bar(climate_zones, zone_counts, color='lightgreen', alpha=0.7, edgecolor='darkgreen')
        ax3.set_ylabel('Number of Watersheds')
        ax3.set_title('Continental Climate Zone Distribution', fontweight='bold')
        ax3.grid(True, alpha=0.3, axis='y')
        
        for bar, count in zip(bars, zone_counts):
            ax3.text(bar.get_x() + bar.get_width()/2., bar.get_height() + max(zone_counts)*0.01,
                    f'{count}', ha='center', va='bottom', fontsize=9)
        
        # Continental streamflow patterns (middle right)
        ax4 = axes[1, 1]
        if continental_routing_data is not None:
            ax4.text(0.5, 0.5, f'Continental Streamflow\nAnalysis\n\n{n_outlets:,} outlets\nanalyzed', 
                    transform=ax4.transAxes, ha='center', va='center',
                    fontsize=14, bbox=dict(facecolor='lightcoral', alpha=0.5))
        else:
            # Demo seasonal flow pattern
            months = ['J', 'F', 'M', 'A', 'M', 'J', 'J', 'A', 'S', 'O', 'N', 'D']
            # Typical North American pattern (spring snowmelt peak)
            flow_pattern = [0.7, 0.6, 0.8, 1.2, 1.5, 1.3, 0.9, 0.7, 0.6, 0.7, 0.8, 0.8]
            
            ax4.plot(months, flow_pattern, 'o-', color='blue', linewidth=2, markersize=6)
            ax4.set_ylabel('Normalized Flow')
            ax4.set_title('Continental Seasonal Flow Pattern', fontweight='bold')
            ax4.grid(True, alpha=0.3)
        
        # Earth system applications (bottom left)
        ax5 = axes[2, 0]
        
        # Earth system science metrics
        es_applications = ['Climate\nModels', 'Weather\nPrediction', 'Carbon\nCycle', 'Ecosystem\nModeling']
        es_importance = [100, 95, 85, 90]  # Importance scores
        
        bars = ax5.bar(es_applications, es_importance, color='purple', alpha=0.7)
        ax5.set_ylabel('Application Importance')
        ax5.set_title('Earth System Science Applications', fontweight='bold')
        ax5.grid(True, alpha=0.3, axis='y')
        ax5.set_ylim(0, 110)
        
        for bar, score in zip(bars, es_importance):
            ax5.text(bar.get_x() + bar.get_width()/2., bar.get_height() + 2,
                    f'{score}%', ha='center', va='bottom', fontsize=9)
        
        # Tutorial series culmination (bottom right)
        ax6 = axes[2, 1]
        
        # Complete tutorial progression
        tutorial_scales = ['Lumped\n(02a)', 'Semi-Dist\n(02b)', 'Elevation\n(02c)', 'Regional\n(03a)', 'Continental\n(03b)']
        scale_complexity = [1, 15, 45, 100, len(basins_for_viz)]
        
        bars = ax6.bar(tutorial_scales, scale_complexity, 
                       color=['lightcoral', 'lightgreen', 'lightblue', 'gold', 'mediumpurple'], 
                       alpha=0.7, edgecolor='navy')
        
        ax6.set_ylabel('Computational Units (log scale)')
        ax6.set_yscale('log')
        ax6.set_title('Tutorial Series: Complete Spatial Hierarchy', fontweight='bold')
        ax6.grid(True, alpha=0.3, axis='y')
        
        # Add value labels
        for bar, complexity in zip(bars, scale_complexity):
            ax6.text(bar.get_x() + bar.get_width()/2., bar.get_height() * 1.2,
                    f'{complexity:,}', ha='center', va='bottom', fontsize=9, rotation=45)
        
        plt.suptitle(f'Continental-Scale Analysis: North America Earth System Assessment', 
                     fontsize=16, fontweight='bold')
        plt.tight_layout()
        plt.show()
        
        print(f"✅ Continental visualization framework complete")
    
    else:
        print(f"📋 Continental visualization framework established (requires basin data)")
else:
    print(f"📋 Continental visualization framework prepared for continental-scale datasets")

**Ready to explore large sample simulations?** → **[Tutorial 04a: Large Sample Studies - FLUXNET](./04a_large_sample_fluxnet.ipynb)**