# CONFLUENCE Tutorial - 4: Semi-Distributed Basin Workflow (Bow River at Banff)

## Introduction

This tutorial demonstrates the next step in spatial modeling complexity: semi-distributed basin modeling. Building on the lumped basin approach from the previous tutorial, we now introduce spatial discretization by dividing the watershed into multiple connected units called Grouped Response Units (GRUs). This approach bridges the gap between simple lumped models and fully distributed models, offering a balance between computational efficiency and spatial realism.

### What is Semi-Distributed Modeling?

Semi-distributed modeling divides the watershed into multiple sub-basins, each treated as a separate modeling unit:

- **Spatial discretization**: The watershed is divided into multiple GRUs based on stream network topology
- **Connected units**: Each GRU is connected to downstream units through a routing network
- **Intermediate complexity**: More realistic than lumped models but simpler than fully distributed approaches
- **Computational efficiency**: Fewer units than fully distributed models make it suitable for calibration and uncertainty analysis

### Key Concepts

**Grouped Response Units (GRUs)**: Sub-basins that drain to specific points along the stream network. Each GRU contains similar hydrological characteristics and responds as a unit.

**Stream Network Delineation**: Using digital elevation models and flow accumulation algorithms to automatically identify stream channels and divide the watershed into connected sub-basins.

**Routing**: The process of moving water from upstream GRUs to downstream GRUs through the stream network, accounting for travel time and channel storage.

**Stream Threshold**: A parameter that controls how many sub-basins are created - higher thresholds create fewer, larger GRUs.

### Why Semi-Distributed Modeling?

1. **Spatial heterogeneity**: Captures important spatial variations in climate, topography, and land cover across the watershed
2. **Process representation**: Better represents elevation-dependent processes like snow accumulation and temperature gradients
3. **Computational efficiency**: Fewer units than fully distributed models while maintaining key spatial patterns
4. **Routing dynamics**: Explicitly represents travel time and attenuation effects in the stream network
5. **Diagnostic capability**: Allows examination of contributions from different parts of the watershed

### Case Study: Bow River at Banff - Semi-Distributed Setup

For this tutorial, we'll use the same Bow River watershed but divide it into multiple GRUs:

**Configuration changes from lumped model**:
- **Domain Method**: `delineate` instead of `lumped`
- **Stream Threshold**: 5000 (creates multiple sub-basins)
- **Routing Model**: mizuRoute connects the GRUs
- **Spatial Complexity**: Multiple units instead of single unit

**Expected outcomes**:
- Better representation of elevation gradients
- Improved timing of snowmelt contributions
- More realistic representation of spatial climate variability
- Enhanced ability to diagnose spatial patterns

### Technical Implementation

The semi-distributed approach uses several key components:

1. **Watershed Delineation**: Automatic identification of sub-basins using flow accumulation algorithms
2. **Stream Network Extraction**: Creation of river network topology connecting the sub-basins
3. **GRU Characterization**: Calculation of average characteristics for each sub-basin
4. **Routing Setup**: Configuration of mizuRoute to move water between GRUs
5. **Model Integration**: Coupling of SUMMA (land surface) with mizuRoute (routing)

### What You'll Learn

This tutorial will teach you how to:

1. **Configure semi-distributed models** with multiple GRUs
2. **Control spatial discretization** using stream threshold parameters
3. **Understand routing processes** and their impact on streamflow timing
4. **Visualize spatial model structure** with GRUs and stream networks
5. **Interpret distributed model results** and compare with lumped approaches
6. **Manage increased model complexity** while maintaining workflow efficiency

### Tutorial Structure

We'll follow the same CONFLUENCE workflow as before, but with key differences:

1. **Project Setup**: Initialize directory structure for semi-distributed modeling
2. **Domain Delineation**: Create multiple sub-basins using stream network analysis
3. **Spatial Discretization**: Convert sub-basins to GRUs for modeling
4. **Data Processing**: Prepare inputs for multiple modeling units
5. **Model Configuration**: Set up SUMMA + mizuRoute for routing
6. **Model Execution**: Run the coupled land surface + routing model
7. **Results Analysis**: Compare semi-distributed vs. lumped model performance

### Key Differences from Lumped Modeling

| Aspect | Lumped Model | Semi-Distributed Model |
|--------|--------------|----------------------|
| **Spatial Units** | 1 unit | Multiple GRUs |
| **Domain Method** | `lumped` | `delineate` |
| **Routing** | None | mizuRoute |
| **Complexity** | Simple | Intermediate |
| **Computation** | Fast | Moderate |
| **Spatial Detail** | None | Sub-basin level |

By completing this tutorial, you'll understand how to add spatial complexity to your hydrological models while maintaining computational efficiency - a crucial step toward fully distributed modeling applications.

## Step 1: Semi-Distributed Setup with Data Reuse
Building on the lumped basin modeling from Tutorial 02a, we now advance to semi-distributed watershed modeling. This represents an optimal balance between computational efficiency and spatial realism: multiple connected sub-basins that capture key spatial heterogeneity while maintaining manageable model complexity.
Modeling Evolution: Lumped → Semi-Distributed

- Spatial Units: Single watershed → Multiple connected sub-basins (GRUs)
- Domain Method: 'lumped' → 'delineate' with stream network analysis
- Routing Complexity: No routing → Explicit stream network routing
- Spatial Detail: Basin-averaged → Sub-basin scale heterogeneity
- Process Representation: Uniform response → Spatially-distributed runoff generation

The same CONFLUENCE framework seamlessly handles this complexity increase while smart data reuse from Tutorial 02a eliminates redundant preprocessing, demonstrating efficient workflow management for iterative model development.

In [None]:
# =============================================================================
# STEP 1: SEMI-DISTRIBUTED SETUP WITH DATA REUSE
# =============================================================================

# Import required libraries
import sys
import os
from pathlib import Path
import yaml
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd
from datetime import datetime
import xarray as xr
import numpy as np
import shutil

# Add CONFLUENCE to path
confluence_path = Path('../').resolve()
sys.path.append(str(confluence_path))

# Import main CONFLUENCE class
from CONFLUENCE import CONFLUENCE

# Set up plotting style
plt.style.use('default')
%matplotlib inline

print("=== CONFLUENCE Tutorial 02b: Semi-Distributed Basin Modeling ===")
print("Advancing from lumped to spatially-explicit watershed representation")

# =============================================================================
# CONFIGURATION FOR SEMI-DISTRIBUTED BOW RIVER MODELING
# =============================================================================

print("\n🌊 Configuring Semi-Distributed Bow River Watershed...")

# Set directory paths
CONFLUENCE_CODE_DIR = confluence_path
CONFLUENCE_DATA_DIR = Path('/Users/darrieythorsson/compHydro/data/CONFLUENCE_data')  # ← Update this path

# Load template configuration and customize for semi-distributed modeling
config_template_path = CONFLUENCE_CODE_DIR / '0_config_files' / 'config_template.yaml'

with open(config_template_path, 'r') as f:
    config_dict = yaml.safe_load(f)

# Update for semi-distributed Bow River modeling
config_updates = {
    'CONFLUENCE_CODE_DIR': str(CONFLUENCE_CODE_DIR),
    'CONFLUENCE_DATA_DIR': str(CONFLUENCE_DATA_DIR),
    'DOMAIN_NAME': 'Bow_at_Banff_distributed',
    'EXPERIMENT_ID': 'semi_distributed_tutorial',
    'POUR_POINT_COORDS': '51.1722/-115.5717',  # Same as lumped model
    'DOMAIN_DEFINITION_METHOD': 'delineate',    # KEY CHANGE: watershed delineation vs lumped
    'STREAM_THRESHOLD': 5000,                   # Controls number of sub-basins
    'DOMAIN_DISCRETIZATION': 'GRUs',            # Grouped Response Units
    'HYDROLOGICAL_MODEL': 'SUMMA',
    'ROUTING_MODEL': 'mizuRoute',               # Essential for connected sub-basins
    'EXPERIMENT_TIME_START': '2011-01-01 01:00',
    'EXPERIMENT_TIME_END': '2018-12-31 23:00',
    'CALIBRATION_PERIOD': '2011-01-01, 2015-12-31',
    'EVALUATION_PERIOD': '2016-01-01, 2018-12-31',
    'SPINUP_PERIOD': '2011-01-01, 2011-12-31',
    'STATION_ID': '05BB001',
    'DOWNLOAD_WSC_DATA': True
}

config_dict.update(config_updates)

# Add experiment metadata
config_dict['NOTEBOOK_CREATION_TIME'] = datetime.now().isoformat()
config_dict['NOTEBOOK_CREATOR'] = 'CONFLUENCE_Tutorial_02b'
config_dict['SPATIAL_EVOLUTION'] = 'Lumped to semi-distributed watershed modeling'

# Save configuration
temp_config_path = CONFLUENCE_CODE_DIR / '0_config_files' / 'config_semi_distributed.yaml'
with open(temp_config_path, 'w') as f:
    yaml.dump(config_dict, f, default_flow_style=False, sort_keys=False)

print(f"✅ Semi-distributed configuration saved: {temp_config_path}")

# =============================================================================
# INTELLIGENT DATA REUSE FROM TUTORIAL 02A
# =============================================================================

print(f"\n📂 Smart Data Reuse from Tutorial 02a...")

# Check for existing data from lumped model tutorial
lumped_domain = 'Bow_at_Banff'  # From Tutorial 02a
lumped_data_dir = CONFLUENCE_DATA_DIR / f'domain_{lumped_domain}'

if lumped_data_dir.exists():
    print(f"✅ Found existing data from Tutorial 02a: {lumped_data_dir}")
    
    # Define reusable data categories
    reusable_data = {
        'Elevation (DEM)': lumped_data_dir / 'attributes' / 'elevation',
        'Soil Data': lumped_data_dir / 'attributes' / 'soilclass', 
        'Land Cover': lumped_data_dir / 'attributes' / 'landclass',
        'ERA5 Forcing': lumped_data_dir / 'forcing' / 'raw_data',
        'WSC Observations': lumped_data_dir / 'observations' / 'streamflow'
    }
    
    # Check availability and copy reusable data
    print(f"\n🔄 Copying and Adapting Reusable Data...")
    
    # Initialize CONFLUENCE first to create directory structure
    confluence = CONFLUENCE(temp_config_path)
    project_dir = confluence.managers['project'].setup_project()
    
    def copy_with_name_adaptation(src_path, dst_path, old_name, new_name):
        """Copy files with name adaptation for new domain"""
        if not src_path.exists():
            return False
            
        dst_path.parent.mkdir(parents=True, exist_ok=True)
        
        if src_path.is_dir():
            # Copy directory contents with name adaptation
            for src_file in src_path.rglob('*'):
                if src_file.is_file():
                    rel_path = src_file.relative_to(src_path)
                    # Adapt filename
                    new_filename = src_file.name.replace(old_name, new_name)
                    dst_file = dst_path / rel_path.parent / new_filename
                    dst_file.parent.mkdir(parents=True, exist_ok=True)
                    shutil.copy2(src_file, dst_file)
            return True
        elif src_path.is_file():
            # Copy single file with name adaptation
            new_filename = dst_path.name.replace(old_name, new_name)
            dst_file = dst_path.parent / new_filename
            dst_file.parent.mkdir(parents=True, exist_ok=True)
            shutil.copy2(src_path, dst_file)
            return True
        return False
    
    # Copy reusable data with appropriate naming
    for data_type, src_path in reusable_data.items():
        if src_path.exists():
            # Determine destination path
            rel_path = src_path.relative_to(lumped_data_dir)
            dst_path = project_dir / rel_path
            
            # Copy with name adaptation
            success = copy_with_name_adaptation(
                src_path, dst_path, 
                lumped_domain, config_dict['DOMAIN_NAME']
            )
            
            if success:
                print(f"   ✅ {data_type}: Copied and adapted")
            else:
                print(f"   ⚠️  {data_type}: Copy failed")
        else:
            print(f"   📋 {data_type}: Not found, will acquire fresh")
    
    print(f"\n💡 Data Reuse Benefits:")
    reuse_benefits = [
        "Eliminates redundant DEM and forcing data downloads",
        "Accelerates workflow development and testing",
        "Maintains data consistency across model comparisons",
        "Enables rapid exploration of alternative configurations",
        "Reduces computational overhead for iterative modeling"
    ]
    
    for benefit in reuse_benefits:
        print(f"   🚀 {benefit}")

else:
    print(f"⚠️  No existing data found from Tutorial 02a")
    print(f"   Will acquire all data from scratch")
    
    # Initialize CONFLUENCE and create project structure
    confluence = CONFLUENCE(temp_config_path)
    project_dir = confluence.managers['project'].setup_project()

# Create pour point
pour_point_path = confluence.managers['project'].create_pour_point()

# =============================================================================
# SEMI-DISTRIBUTED CONFIGURATION SUMMARY
# =============================================================================

print(f"\n🌊 Semi-Distributed Configuration Summary:")
distributed_info = {
    "Spatial Approach": "Semi-distributed with connected sub-basins",
    "Domain Method": f"{config_dict['DOMAIN_DEFINITION_METHOD']} (automatic watershed + stream network)",
    "Stream Threshold": f"{config_dict['STREAM_THRESHOLD']} (controls sub-basin count)",
    "Discretization": f"{config_dict['DOMAIN_DISCRETIZATION']} (Grouped Response Units)",
    "Routing Model": f"{config_dict['ROUTING_MODEL']} (stream network routing)",
    "Expected GRUs": "~6-12 sub-basins (depends on stream threshold)",
    "Spatial Detail": "Sub-basin scale heterogeneity capture"
}

for key, value in distributed_info.items():
    print(f"   🏞️  {key}: {value}")

print(f"\n🔄 Key Differences from Tutorial 02a (Lumped):")
differences = [
    "Multiple sub-basins vs single watershed unit",
    "Stream network delineation vs geometric boundary",
    "mizuRoute routing vs no routing component", 
    "Spatial heterogeneity vs spatially-averaged representation",
    "Connected GRU topology vs isolated modeling unit"
]

for diff in differences:
    print(f"   🔀 {diff}")

print(f"\n🎯 Expected Modeling Advantages:")
advantages = [
    "Captures elevation-dependent snow processes",
    "Represents spatial climate gradients",
    "Explicit routing delays and channel storage",
    "Sub-basin contribution analysis capability",
    "Foundation for fully-distributed modeling"
]

for advantage in advantages:
    print(f"   📈 {advantage}")

print(f"\n🚀 Semi-distributed setup complete - Ready for stream network delineation!")
print(f"   → Data reuse: Efficient workflow development")
print(f"   → Configuration: Semi-distributed spatial representation")
print(f"   → Framework: Same CONFLUENCE architecture, increased complexity")

## Step 2: Stream Network Delineation and Spatial Connectivity
The transition to semi-distributed modeling requires sophisticated spatial analysis to automatically identify sub-basins and their connectivity. This process transforms a continuous landscape into a network of connected modeling units that preserve the essential topology of watershed drainage while creating computationally-manageable spatial discretization.
Scientific Context: Stream Network Analysis
Hydrologic Network Principles:

- Flow Accumulation: Upslope area contributing to each grid cell
- Stream Threshold: Minimum contributing area to define stream channels
- Watershed Segmentation: Division of landscape by stream network topology
- Connectivity Preservation: Maintaining upstream-downstream relationships
- Scale Optimization: Balancing spatial detail with computational tractability

The stream threshold parameter critically controls model complexity: lower values create more sub-basins with finer spatial detail, while higher values produce fewer, larger units with reduced computational demands.

In [None]:
# =============================================================================
# STEP 2: STREAM NETWORK DELINEATION AND SPATIAL CONNECTIVITY
# =============================================================================

print("=== Step 2: Stream Network Delineation and Spatial Connectivity ===")
print("Transforming continuous landscape into connected sub-basin network")

# =============================================================================
# ATTRIBUTE ACQUISITION FOR NETWORK ANALYSIS
# =============================================================================

print(f"\n🗺️  Ensuring Digital Elevation Model Availability...")

# Check if DEM was copied from Tutorial 02a, otherwise acquire
dem_path = project_dir / 'attributes' / 'elevation' / 'dem'
if not dem_path.exists() or len(list(dem_path.glob('*.tif'))) == 0:
    print(f"   DEM not found, acquiring fresh geospatial attributes...")
    confluence.managers['data'].acquire_attributes()
    print("✅ Geospatial attributes acquired")
else:
    print(f"✅ DEM available from previous workflow")

# =============================================================================
# STREAM NETWORK DELINEATION PROCESS
# =============================================================================

print(f"\n🌊 Stream Network Delineation Process...")
print(f"   Method: {config_dict['DOMAIN_DEFINITION_METHOD']} (automatic watershed delineation)")
print(f"   Stream threshold: {config_dict['STREAM_THRESHOLD']} (flow accumulation cells)")
print(f"   Pour point: {config_dict['POUR_POINT_COORDS']}")

print(f"\n🔧 Automated Delineation Workflow:")
delineation_steps = [
    "DEM preprocessing: Sink filling and flow direction calculation",
    "Flow accumulation: Cumulative upslope contributing area",
    f"Stream definition: Channels where accumulation ≥ {config_dict['STREAM_THRESHOLD']} cells",
    "Stream segmentation: Breaking network into computational reaches",
    "Sub-basin delineation: Contributing areas for each stream segment", 
    "Topology creation: Upstream-downstream connectivity matrix",
    "Quality control: Geometric and topological validation"
]

for i, step in enumerate(delineation_steps, 1):
    print(f"   {i}. {step}")

print(f"\n⚙️  Executing stream network delineation...")
watershed_path = confluence.managers['domain'].define_domain()

print("✅ Stream network delineation complete")

# =============================================================================
# SPATIAL CONNECTIVITY ANALYSIS
# =============================================================================

print(f"\n🔷 Domain Discretization: Creating Connected GRUs...")

# Execute domain discretization to create GRUs
hru_path = confluence.managers['domain'].discretize_domain()

print("✅ GRU discretization complete")

# =============================================================================
# NETWORK STRUCTURE ANALYSIS AND VISUALIZATION
# =============================================================================

print(f"\n📊 Analyzing Created Network Structure...")

# Load and analyze created spatial products
basin_dir = project_dir / 'shapefiles' / 'river_basins'
network_dir = project_dir / 'shapefiles' / 'river_network'
catchment_dir = project_dir / 'shapefiles' / 'catchment'

if basin_dir.exists() and network_dir.exists():
    # Load spatial data
    basin_files = list(basin_dir.glob('*.shp'))
    network_files = list(network_dir.glob('*.shp'))
    
    if basin_files and network_files:
        basins_gdf = gpd.read_file(basin_files[0])
        network_gdf = gpd.read_file(network_files[0])
        
        print(f"\n📋 Network Structure Summary:")
        print(f"   Sub-basins (GRUs): {len(basins_gdf)}")
        print(f"   Stream segments: {len(network_gdf)}")
        print(f"   Total watershed area: {basins_gdf.geometry.area.sum() / 1e6:.1f} km²")
        print(f"   Average GRU size: {(basins_gdf.geometry.area.sum() / 1e6) / len(basins_gdf):.1f} km²")
        
        # Analyze GRU characteristics
        if 'elevation' in basins_gdf.columns:
            print(f"   Elevation range: {basins_gdf['elevation'].min():.0f}m to {basins_gdf['elevation'].max():.0f}m")
            print(f"   Elevation gradient: {basins_gdf['elevation'].max() - basins_gdf['elevation'].min():.0f}m span")
        
        # Stream network characteristics
        if 'Length' in network_gdf.columns:
            total_length = network_gdf['Length'].sum() / 1000  # Convert to km
            print(f"   Total stream length: {total_length:.1f} km")
        
        # =============================================================================
        # COMPREHENSIVE NETWORK VISUALIZATION
        # =============================================================================
        
        print(f"\n🗺️  Creating network structure visualization...")
        
        fig, axes = plt.subplots(1, 2, figsize=(18, 9))
        
        # Left plot: Sub-basin network with elevation
        ax1 = axes[0]
        
        if 'elevation' in basins_gdf.columns:
            # Color by elevation
            basins_plot = basins_gdf.plot(ax=ax1, column='elevation', cmap='terrain',
                                        edgecolor='black', linewidth=1, legend=True,
                                        legend_kwds={'label': 'Elevation (m)', 'shrink': 0.8})
        else:
            # Color by GRU ID
            basins_plot = basins_gdf.plot(ax=ax1, column='GRU_ID', cmap='viridis',
                                        edgecolor='black', linewidth=1, legend=True,
                                        legend_kwds={'label': 'GRU ID', 'shrink': 0.8})
        
        # Add stream network
        network_gdf.plot(ax=ax1, color='blue', linewidth=2, alpha=0.8)
        
        # Add pour point
        pour_point_gdf = gpd.read_file(pour_point_path)
        pour_point_gdf.plot(ax=ax1, color='red', markersize=150, marker='o',
                           edgecolor='white', linewidth=2, zorder=5)
        
        ax1.set_title(f'Semi-Distributed Network\n{len(basins_gdf)} Sub-basins', 
                     fontsize=14, fontweight='bold')
        ax1.set_xlabel('Longitude', fontsize=12)
        ax1.set_ylabel('Latitude', fontsize=12)
        ax1.grid(True, alpha=0.3)
        
        # Right plot: Network topology schematic
        ax2 = axes[1]
        
        # Create simplified network topology visualization
        if 'gru_to_seg' in basins_gdf.columns and 'DSLINKNO' in network_gdf.columns:
            # This would require more complex network analysis
            # For now, show GRU connectivity conceptually
            ax2.text(0.5, 0.9, 'Network Topology', ha='center', va='top', 
                    transform=ax2.transAxes, fontsize=16, fontweight='bold')
            
            # Show some basic connectivity info
            connectivity_info = [
                f"Stream Threshold: {config_dict['STREAM_THRESHOLD']} cells",
                f"Generated {len(basins_gdf)} connected sub-basins",
                f"Each GRU drains to downstream neighbor",
                f"Network preserves watershed topology",
                f"Enables spatially-distributed routing"
            ]
            
            for i, info in enumerate(connectivity_info):
                ax2.text(0.05, 0.8 - i*0.1, f"• {info}", transform=ax2.transAxes,
                        fontsize=12, va='top')
            
            # Add schematic network diagram
            ax2.text(0.5, 0.4, 'GRU₁ → GRU₂ → GRU₃ → ... → Outlet', 
                    ha='center', va='center', transform=ax2.transAxes,
                    fontsize=14, fontweight='bold',
                    bbox=dict(facecolor='lightblue', alpha=0.7, boxstyle='round,pad=0.5'))
            
        ax2.set_xlim(0, 1)
        ax2.set_ylim(0, 1)
        ax2.axis('off')
        
        plt.suptitle(f'Semi-Distributed Bow River Watershed: Network Analysis', 
                     fontsize=16, fontweight='bold')
        plt.tight_layout()
        plt.show()

## Step 3: Multi-GRU Data Pipeline
The same model-agnostic preprocessing framework now scales to multiple connected sub-basins, demonstrating CONFLUENCE's seamless transition from single-unit to multi-unit spatial modeling. The core data quality and standardization principles remain unchanged, but the spatial processing now handles distributed forcing across the GRU network and routing connectivity between sub-basins.
Data Pipeline Scaling: Lumped → Semi-Distributed

- Forcing Distribution: Single watershed average → Multiple GRU-specific forcing
- Spatial Processing: One computational unit → Network of connected units
- Routing Integration: No connectivity → Explicit stream network routing
- Model Configuration: Single SUMMA instance → Multi-GRU SUMMA + mizuRoute
- Computational Scaling: Linear increase with GRU count while maintaining quality

The same preprocessing philosophy ensures consistent data standards across spatial scales, enabling robust model intercomparison and maintaining the scientific rigor established in previous tutorials.

In [None]:
# =============================================================================
# STEP 3: MULTI-GRU DATA PIPELINE
# =============================================================================

print("=== Step 3: Multi-GRU Data Pipeline for Semi-Distributed Modeling ===")
print("Scaling model-agnostic preprocessing to connected sub-basin networks")

# =============================================================================
# STREAMFLOW OBSERVATIONS: SAME OUTLET, DISTRIBUTED PROCESSES
# =============================================================================

print(f"\n🌊 Streamflow Observations for Multi-GRU Validation...")
print(f"   Station: WSC {config_dict['STATION_ID']} (same outlet as lumped model)")
print(f"   Integration concept: Multiple GRU contributions → single outlet response")
print(f"   Scientific advantage: Spatial process attribution with same validation target")

print(f"\n🎯 Multi-GRU Validation Framework:")
multi_gru_context = [
    "Same outlet validation enables direct lumped vs distributed comparison",
    "Upstream GRU contributions can be individually analyzed",
    "Routing delays and channel storage explicitly represented",
    "Sub-basin hydrologic signatures can be extracted",
    "Spatial process attribution while maintaining validation consistency"
]

for context in multi_gru_context:
    print(f"   📊 {context}")

# Execute streamflow data processing (reuses processed data if available)
print(f"\n📥 Processing WSC streamflow observations...")
confluence.managers['data'].process_observed_data()
print("✅ Streamflow validation data ready")

# =============================================================================
# MULTI-GRU METEOROLOGICAL FORCING DISTRIBUTION
# =============================================================================

print(f"\n🌦️  Multi-GRU Meteorological Forcing Distribution...")
print(f"   GRU count: {len(basins_gdf)} sub-basins")
print(f"   Elevation range: {basins_gdf['elevation'].min():.0f}m to {basins_gdf['elevation'].max():.0f}m")
print(f"   Spatial challenge: Distribute ERA5 grids across elevation gradient")

print(f"\n📈 Semi-Distributed Forcing Strategy:")
forcing_strategy = [
    "ERA5 spatial interpolation: Grid cells → individual GRU centroids",
    "Elevation corrections: Lapse rate adjustments for mountain gradient",
    "Conservative remapping: Mass/energy balance preservation across GRUs",
    "Quality assurance: Ensure realistic gradients and no discontinuities",
    "Routing preparation: Forcing aligned with GRU network topology"
]

for strategy in forcing_strategy:
    print(f"   ⛰️  {strategy}")

# Check if forcing data was copied, otherwise acquire
forcing_dir = project_dir / 'forcing' / 'raw_data'
if not forcing_dir.exists() or len(list(forcing_dir.glob('*.nc'))) == 0:
    print(f"\n⬇️  Acquiring fresh ERA5 forcing data...")
    # confluence.managers['data'].acquire_forcings()
    print("✅ ERA5 forcing acquisition complete (simulated)")
else:
    print(f"\n✅ ERA5 forcing available from data reuse")

# =============================================================================
# MODEL-AGNOSTIC PREPROCESSING: MULTI-GRU SPATIAL PROCESSING
# =============================================================================

print(f"\n🔧 Model-Agnostic Preprocessing for {len(basins_gdf)}-GRU Network...")

print(f"\n⚙️  Multi-GRU Preprocessing Pipeline:")
multi_gru_preprocessing = [
    f"Spatial remapping: ERA5 → {len(basins_gdf)} GRU-specific forcing datasets",
    "GRU characterization: Individual elevation, soil, land cover statistics",
    "Network topology: Upstream-downstream connectivity preservation",
    "Quality control: Cross-GRU consistency and gradient validation",
    "Format standardization: Multi-GRU NetCDF with routing topology"
]

for process in multi_gru_preprocessing:
    print(f"   🔄 {process}")

# Execute model-agnostic preprocessing
print(f"\n⚙️  Executing multi-GRU model-agnostic preprocessing...")
confluence.managers['data'].run_model_agnostic_preprocessing()
print("✅ Multi-GRU preprocessing complete")

print(f"\n🎯 Multi-GRU Preprocessing Outputs:")
multi_gru_outputs = [
    f"GRU-specific forcing: {len(basins_gdf)} individual meteorological datasets",
    "Network topology file: Stream connectivity and routing parameters",
    "GRU attribute table: Individual sub-basin characteristics",
    "Spatial mapping: Conservative remapping with mass balance closure",
    "Quality reports: Multi-GRU consistency and gradient analysis"
]

for output in multi_gru_outputs:
    print(f"   📦 {output}")

# =============================================================================
# MODEL-SPECIFIC PREPROCESSING: SUMMA + MIZUROUTE INTEGRATION
# =============================================================================

print(f"\n🌊 SUMMA + mizuRoute Integration for Semi-Distributed Modeling...")
print(f"   Hydrological model: {config_dict['HYDROLOGICAL_MODEL']} ({len(basins_gdf)} instances)")
print(f"   Routing model: {config_dict['ROUTING_MODEL']} (network connectivity)")
print(f"   Integration: Distributed physics + explicit routing")

print(f"\n🔧 Semi-Distributed Model Configuration:")
semi_distributed_config = [
    f"SUMMA setup: {len(basins_gdf)} independent GRU simulations",
    "Parameter assignment: GRU-specific soil, vegetation, topographic parameters",
    "Network configuration: mizuRoute connectivity matrix and routing parameters",
    "Runoff coupling: GRU surface/subsurface flow → stream network input",
    "Output coordination: Individual GRU states + integrated outlet streamflow"
]

for config in semi_distributed_config:
    print(f"   🌲 {config}")

# Execute model-specific preprocessing
print(f"\n🔧 Executing SUMMA + mizuRoute configuration...")
confluence.managers['model'].preprocess_models()
print("✅ Semi-distributed model configuration complete")

print(f"\n📊 Expected Semi-Distributed Outputs:")
semi_distributed_outputs = [
    "Multi-GRU streamflow: Individual sub-basin contributions",
    "Outlet streamflow: Integrated response with routing delays",
    "Spatial water balance: GRU-level ET, storage, runoff generation",
    "Routing diagnostics: Channel storage and travel times",
    "Network attribution: Upstream vs downstream process contributions"
]

for output in semi_distributed_outputs:
    print(f"   📈 {output}")

# =============================================================================
# PREPROCESSING SCALING ANALYSIS
# =============================================================================

print(f"\n📊 Preprocessing Scaling Analysis:")

scaling_metrics = {
    "Spatial units": f"{len(basins_gdf)} GRUs vs 1 lumped unit",
    "Forcing datasets": f"{len(basins_gdf)} GRU-specific vs 1 watershed average",
    "Model instances": f"{len(basins_gdf)} SUMMA + 1 mizuRoute vs 1 SUMMA only",
    "Configuration complexity": "Network topology + individual GRU parameters vs single parameter set",
    "Expected runtime": f"~{len(basins_gdf)}× increase vs lumped baseline"
}

for metric, value in scaling_metrics.items():
    print(f"   📏 {metric}: {value}")

print(f"\n🔬 Scientific Advantages Achieved:")
scientific_advantages = [
    "Spatial process attribution: Identify which sub-basins contribute to outlet response",
    "Elevation gradient representation: Capture mountain watershed heterogeneity",
    "Routing process inclusion: Explicit channel delays and storage effects",
    "Comparative framework: Same validation target as lumped model",
    "Scaling foundation: Intermediate step toward fully distributed modeling"
]

for advantage in scientific_advantages:
    print(f"   🎯 {advantage}")

# =============================================================================
# DATA PIPELINE SUMMARY FOR SEMI-DISTRIBUTED MODELING
# =============================================================================

print(f"\n✅ Semi-Distributed Data Pipeline Summary:")

pipeline_achievements = [
    "✅ Multi-GRU forcing distribution with elevation corrections",
    "✅ Stream network topology preserved in preprocessing outputs",
    "✅ SUMMA + mizuRoute integration configured for connected sub-basins",
    "✅ Same model-agnostic framework scaled to multi-unit complexity",
    "✅ Quality-controlled inputs ready for semi-distributed simulation"
]

for achievement in pipeline_achievements:
    print(f"   {achievement}")

print(f"\n🌐 Framework Versatility Across Spatial Scales:")
print(f"   📊 Same preprocessing pipeline handles:")
print(f"      • Tutorial 02a: Lumped basin (1 unit)")
print(f"      • Tutorial 02b: Semi-distributed ({len(basins_gdf)} connected units)")
print(f"      • Future distributed: Hundreds of spatially-explicit units")
print(f"      • Large-sample: Thousands of watersheds across continents")

print(f"\n🚀 Ready for semi-distributed SUMMA + mizuRoute execution!")
print(f"   → Multi-GRU inputs: Spatially-distributed and quality-controlled")
print(f"   → Network topology: Stream connectivity preserved")
print(f"   → Model integration: Physics + routing ready for execution")
print(f"   → Scaling demonstration: Same framework, increased spatial complexity")

## Geospatial Domain Definition - Data Acquisition and Preparation

We'll reuse some of the geospatial data from the lumped model tutorial, where appropriate.

In [None]:
# Check if we can reuse data from the lumped model
lumped_dem_path = CONFLUENCE_DATA_DIR / 'domain_Bow_at_Banff_lumped_tutorial' / 'attributes' / 'elevation' / 'dem'
lumped_forcing_path = CONFLUENCE_DATA_DIR / 'domain_Bow_at_Banff_lumped_tutorial' / 'forcing' / 'raw_data'
can_reuse = lumped_dem_path.exists()
can_reuse_forcing = lumped_forcing_path.exists()

if can_reuse or can_reuse_forcing:
    import shutil
    
    # Create a function to copy files with name substitution
    def copy_with_name_substitution(src_path, dst_path, old_str='_lumped', new_str='_distributed'):
        if not src_path.exists():
            return False
            
        # Create destination directory if it doesn't exist
        dst_path.parent.mkdir(parents=True, exist_ok=True)
        
        if src_path.is_dir():
            # Copy entire directory
            if not dst_path.exists():
                dst_path.mkdir(parents=True, exist_ok=True)
                
            # Copy all files with name substitution
            for src_file in src_path.glob('**/*'):
                if src_file.is_file():
                    # Create relative path
                    rel_path = src_file.relative_to(src_path)
                    # Create new filename with substitution
                    new_name = src_file.name.replace(old_str, new_str)
                    # Create destination path
                    dst_file = dst_path / rel_path.parent / new_name
                    # Create parent directories if they don't exist
                    dst_file.parent.mkdir(parents=True, exist_ok=True)
                    # Copy the file
                    shutil.copy2(src_file, dst_file)
            return True
        elif src_path.is_file():
            # Copy single file with name substitution
            new_name = dst_path.name.replace(old_str, new_str)
            dst_file = dst_path.parent / new_name
            dst_file.parent.mkdir(parents=True, exist_ok=True)
            shutil.copy2(src_path, dst_file)
            return True
        
        return False

    print("Found existing geospatial data from lumped model. Copying and renaming files...")
    
    # Copy and rename DEM and other attribute data
    if can_reuse:
        # Define paths
        src_attr_path = CONFLUENCE_DATA_DIR / 'domain_Bow_at_Banff_lumped_tutorial' / 'attributes'
        dst_attr_path = CONFLUENCE_DATA_DIR / 'domain_Bow_at_Banff_distributed' / 'attributes'
        
        # Copy attributes with name substitution
        copied = copy_with_name_substitution(src_attr_path, dst_attr_path, '_lumped_tutorial', '_distributed')
        if copied:
            print("✓ Copied and renamed attribute files from lumped model")
    
    # Copy and rename forcing data
    if can_reuse_forcing:
        # Define paths
        src_forcing_path = CONFLUENCE_DATA_DIR / 'domain_Bow_at_Banff_lumped_tutorial' / 'forcing' / 'raw_data'
        dst_forcing_path = CONFLUENCE_DATA_DIR / 'domain_Bow_at_Banff_distributed' / 'forcing' / 'raw_data'
         
        # Copy forcing data with name substitution
        copied = copy_with_name_substitution(src_forcing_path, dst_forcing_path, '_lumped_tutorial', '_distributed')
        if copied:
            print("✓ Copied and renamed forcing data from lumped model")
            
    print("The distributed model will use these copied files as a starting point.")
else:
    print("No existing data found from the lumped model. Will acquire all data from scratch.")

    # Step 2: Geospatial Domain Definition - Data Acquisition
    print("\n=== Step 2: Geospatial Domain Definition - Data Acquisition ===")
    
    # Acquire attributes
    print("Acquiring geospatial attributes (DEM, soil, land cover)...")
    confluence.managers['data'].acquire_attributes()

    # Acquire forcings
    print(f"\nAcquiring forcing data: {confluence.config['FORCING_DATASET']}")
    confluence.managers['data'].acquire_forcings()
    
print("\n✓ Geospatial attributes acquired")

## Step 4: Streamlined Semi-Distributed Model Execution
The same SUMMA process-based physics now executes across multiple connected sub-basins, representing a significant advance in spatial modeling complexity. This integration of distributed runoff generation with explicit network routing demonstrates how the same computational framework scales from single-unit to multi-unit watershed simulation while maintaining physical realism and computational efficiency.
Model Execution Scaling: Lumped → Semi-Distributed

- Computational Units: Single SUMMA instance → Multiple coordinated GRU simulations
- Process Integration: Isolated water balance → Network of connected water balances
- Routing Complexity: No routing → Explicit stream network with travel times
- Spatial Coupling: Uniform response → Spatially-distributed runoff + routing
- Output Integration: Direct streamflow → Multi-GRU contributions + network routing

The same workflow orchestration ensures robust execution across this increased complexity while mizuRoute integration transforms the distributed runoff into realistic streamflow with routing delays and channel storage effects.

In [None]:
# =============================================================================
# STEP 4: STREAMLINED SEMI-DISTRIBUTED MODEL EXECUTION
# =============================================================================

print("=== Step 4: Semi-Distributed SUMMA + mizuRoute Execution ===")
print("Orchestrating multi-GRU physics with explicit stream network routing")

# =============================================================================
# MULTI-GRU SUMMA + NETWORK ROUTING EXECUTION
# =============================================================================

print(f"\n🌊 Executing Semi-Distributed Watershed Simulation...")
print(f"   Hydrological model: {config_dict['HYDROLOGICAL_MODEL']} ({len(basins_gdf)} GRU instances)")
print(f"   Routing model: {config_dict['ROUTING_MODEL']} (stream network integration)")
print(f"   Domain complexity: {len(basins_gdf)} connected sub-basins")
print(f"   Target: Routed streamflow at WSC {config_dict['STATION_ID']}")

print(f"\n⚡ Semi-Distributed Execution Framework:")
execution_framework = [
    f"Multi-GRU SUMMA: {len(basins_gdf)} independent physics simulations",
    "Runoff generation: Surface and subsurface flow from each GRU",
    "mizuRoute coupling: GRU runoff → stream network input",
    "Network routing: Flow transport with travel times and storage",
    "Outlet integration: Multi-GRU contributions → final streamflow"
]

for process in execution_framework:
    print(f"   🌊 {process}")

print(f"\n🔄 Computational Complexity Scaling:")
complexity_aspects = [
    f"Spatial processing: {len(basins_gdf)}× increase in model units",
    "Network routing: Additional computational overhead for stream transport",
    "Memory requirements: Multi-GRU state variables + routing network",
    "I/O operations: Distributed outputs + network connectivity data",
    "Quality assurance: Multi-unit mass balance + routing conservation"
]

for aspect in complexity_aspects:
    print(f"   📊 {aspect}")

# Execute the semi-distributed model system
print(f"\n🏃‍♂️ Running semi-distributed SUMMA + mizuRoute simulation...")
print(f"   Note: Execution time ~{len(basins_gdf)}× longer than lumped model")
confluence.managers['model'].run_models()
print("✅ Semi-distributed simulation complete")

# =============================================================================
# MULTI-GRU OUTPUT VERIFICATION AND ANALYSIS
# =============================================================================

print(f"\n🔍 Semi-Distributed Simulation Output Verification...")

# Locate and verify simulation outputs
sim_dir = confluence.project_dir / "simulations" / config_dict['EXPERIMENT_ID']
summa_outputs = sim_dir / "SUMMA"
routing_outputs = sim_dir / "mizuRoute"

print(f"   📁 SUMMA outputs: {summa_outputs}")
print(f"   📁 mizuRoute outputs: {routing_outputs}")

# Check for key output files
output_categories = {
    "SUMMA multi-GRU": f"{config_dict['EXPERIMENT_ID']}_day.nc",
    "mizuRoute streamflow": f"{config_dict['EXPERIMENT_ID']}_mizuRoute_output.nc",
    "Network topology": "topology.nc"
}

for output_type, filename in output_categories.items():
    summa_file = summa_outputs / filename
    routing_file = routing_outputs / filename
    
    if summa_file.exists():
        file_size = summa_file.stat().st_size / (1024*1024)  # MB
        print(f"   ✅ {output_type}: {filename} ({file_size:.1f} MB)")
    elif routing_file.exists():
        file_size = routing_file.stat().st_size / (1024*1024)  # MB
        print(f"   ✅ {output_type}: {filename} ({file_size:.1f} MB)")
    else:
        print(f"   📋 {output_type}: {filename} (checking...)")

print(f"\n📊 Semi-Distributed Simulation Products:")
simulation_products = [
    f"Multi-GRU water balance: Individual sub-basin ET, storage, runoff",
    f"Distributed runoff: {len(basins_gdf)} spatially-explicit runoff contributions",
    "Routed streamflow: Network-integrated flow with delays and storage",
    "Spatial attribution: Upstream vs downstream process contributions",
    "Routing diagnostics: Channel travel times and storage dynamics"
]

for product in simulation_products:
    print(f"   📈 {product}")

# =============================================================================
# NETWORK ROUTING VERIFICATION
# =============================================================================

print(f"\n🌊 Network Routing Integration Assessment...")

# Quick verification of routing outputs
try:
    routing_files = list(routing_outputs.glob("*.nc"))
    if routing_files:
        import xarray as xr
        
        # Load routing output
        routing_ds = xr.open_dataset(routing_files[0])
        
        print(f"   ✅ Network routing simulation loaded")
        print(f"   Network segments: {len(routing_ds.seg) if 'seg' in routing_ds.dims else 'Unknown'}")
        print(f"   Available variables: {list(routing_ds.data_vars)[:5]}...")
        
        # Check for streamflow variable
        if 'IRFroutedRunoff' in routing_ds.data_vars:
            streamflow = routing_ds['IRFroutedRunoff']
            print(f"   📊 Routed streamflow range: {float(streamflow.min()):.2f} to {float(streamflow.max()):.2f} m³/s")
            
            # Check for multiple segments (network complexity)
            if 'seg' in streamflow.dims and streamflow.sizes['seg'] > 1:
                print(f"   🌊 Network complexity: {streamflow.sizes['seg']} stream segments")
            
        routing_ds.close()
        
except Exception as e:
    print(f"   📋 Routing verification pending: {e}")

print(f"\n🎯 Semi-Distributed Integration Achievements:")
integration_achievements = [
    f"✅ {len(basins_gdf)}-GRU SUMMA simulation executed successfully",
    "✅ Distributed runoff generation completed across sub-basins",
    "✅ Stream network routing with travel times and storage",
    "✅ Multi-scale water balance maintained (GRU + network levels)",
    "✅ Spatially-explicit streamflow generation at basin outlet"
]

for achievement in integration_achievements:
    print(f"   {achievement}")

# =============================================================================
# COMPUTATIONAL PERFORMANCE ANALYSIS
# =============================================================================

print(f"\n⚙️  Computational Performance Analysis:")

performance_metrics = {
    "Spatial scaling": f"{len(basins_gdf)} sub-basins vs 1 lumped unit",
    "Model complexity": f"Multi-GRU SUMMA + mizuRoute vs single SUMMA",
    "Output complexity": f"Distributed states + network routing vs lumped response",
    "Memory scaling": f"~{len(basins_gdf)}× state variables + routing network",
    "Processing overhead": "Network topology + multi-unit coordination"
}

for metric, description in performance_metrics.items():
    print(f"   📊 {metric}: {description}")

print(f"\n🔬 Scientific Modeling Advances:")
modeling_advances = [
    "Spatial process representation: Sub-basin scale heterogeneity capture",
    "Routing physics: Explicit channel delays and storage effects",
    "Network attribution: Individual GRU contributions to outlet response",
    "Scaling demonstration: Same framework handles increased complexity",
    "Foundation established: Ready for fully distributed applications"
]

for advance in modeling_advances:
    print(f"   📈 {advance}")

print(f"\n🚀 Semi-distributed execution complete!")
print(f"   → Multi-GRU simulation: Distributed physics across sub-basin network")
print(f"   → Network routing: Realistic streamflow with travel times")
print(f"   → Spatial attribution: Individual sub-basin process analysis capability")
print(f"   → Ready for comprehensive performance evaluation and comparison")

## Step 5: Evaluation and Performance Comparison
The same evaluation framework now assesses semi-distributed watershed performance, enabling direct comparison with the lumped modeling approach from Tutorial 02a. This evaluation reveals how spatial discretization and explicit routing affect streamflow prediction skill while maintaining the same validation target and performance metrics.
Evaluation Framework Extension: Lumped → Semi-Distributed

- Validation Target: Same WSC outlet streamflow for direct comparison
- Process Attribution: Individual GRU contributions vs aggregated watershed response
- Routing Effects: Travel times and channel storage vs instantaneous response
- Spatial Insights: Sub-basin process analysis vs lumped representation
- Performance Trade-offs: Increased complexity vs prediction accuracy

The same CONFLUENCE evaluation infrastructure seamlessly handles this spatial complexity while providing new analytical capabilities for spatial process attribution and network routing assessment.

In [None]:
# =============================================================================
# STEP 5: STREAMLINED SEMI-DISTRIBUTED EVALUATION AND PERFORMANCE COMPARISON
# =============================================================================

print("=== Step 5: Semi-Distributed Performance Evaluation and Spatial Analysis ===")
print("Comprehensive assessment of multi-GRU modeling with routing integration")

# =============================================================================
# STREAMFLOW DATA LOADING: MULTI-GRU + ROUTING INTEGRATION
# =============================================================================

print(f"\n🌊 Loading Semi-Distributed Streamflow Results...")

# Load observed streamflow (same as lumped model for direct comparison)
obs_path = confluence.project_dir / "observations" / "streamflow" / "preprocessed" / f"{config_dict['DOMAIN_NAME']}_streamflow_processed.csv"

if obs_path.exists():
    obs_df = pd.read_csv(obs_path, parse_dates=['datetime'])
    obs_df.set_index('datetime', inplace=True)
    
    print(f"✅ WSC observations loaded")
    print(f"   Station: {config_dict['STATION_ID']} (same as lumped model)")
    print(f"   Period: {obs_df.index.min()} to {obs_df.index.max()}")
    print(f"   Flow range: {obs_df['discharge_cms'].min():.1f} to {obs_df['discharge_cms'].max():.1f} m³/s")
else:
    print(f"⚠️  Observed streamflow not found")
    obs_df = None

# Load semi-distributed simulation from mizuRoute
routing_dir = confluence.project_dir / "simulations" / config_dict['EXPERIMENT_ID'] / "mizuRoute"
routing_files = list(routing_dir.glob("*.nc"))

if routing_files:
    # Load mizuRoute network output
    routing_ds = xr.open_dataset(routing_files[0])
    
    # Extract outlet streamflow (typically the downstream-most segment)
    if 'IRFroutedRunoff' in routing_ds.data_vars:
        # Find outlet segment (could be identified by SIM_REACH_ID or maximum downstream position)
        reach_id = int(config_dict.get('SIM_REACH_ID', routing_ds.reachID.values[-1]))
        
        # Find segment index for outlet
        segment_indices = np.where(routing_ds.reachID.values == reach_id)[0]
        
        if len(segment_indices) > 0:
            segment_idx = segment_indices[0]
            sim_streamflow = routing_ds['IRFroutedRunoff'].isel(seg=segment_idx)
            sim_df = sim_streamflow.to_pandas()
            
            print(f"✅ Semi-distributed simulation loaded")
            print(f"   Outlet segment: {reach_id}")
            print(f"   Period: {sim_df.index.min()} to {sim_df.index.max()}")
            print(f"   Flow range: {sim_df.min():.1f} to {sim_df.max():.1f} m³/s")
            
        else:
            print(f"⚠️  Outlet segment {reach_id} not found")
            sim_df = None
    else:
        print(f"⚠️  Streamflow variable not found in routing output")
        sim_df = None
        
    routing_ds.close()
else:
    print(f"⚠️  mizuRoute output not found")
    sim_df = None

# =============================================================================
# SEMI-DISTRIBUTED PERFORMANCE ASSESSMENT
# =============================================================================

if obs_df is not None and sim_df is not None:
    print(f"\n📊 Semi-Distributed Streamflow Performance Assessment...")
    
    # Align data to common period
    start_date = max(obs_df.index.min(), sim_df.index.min())
    end_date = min(obs_df.index.max(), sim_df.index.max())
    
    # Skip initial spinup period
    start_date = start_date + pd.DateOffset(months=6)
    
    print(f"   Evaluation period: {start_date} to {end_date}")
    print(f"   Duration: {(end_date - start_date).days} days")
    
    # Resample to daily and filter to common period
    obs_daily = obs_df['discharge_cms'].resample('D').mean().loc[start_date:end_date]
    sim_daily = sim_df.resample('D').mean().loc[start_date:end_date]
    
    # Remove NaN values
    valid_mask = ~(obs_daily.isna() | sim_daily.isna())
    obs_valid = obs_daily[valid_mask]
    sim_valid = sim_daily[valid_mask]
    
    print(f"   Valid paired observations: {len(obs_valid)} days")
    
    # Calculate comprehensive performance metrics
    print(f"\n📈 Semi-Distributed Performance Metrics:")
    
    # Basic statistics
    rmse = np.sqrt(((obs_valid - sim_valid) ** 2).mean())
    bias = (sim_valid - obs_valid).mean()
    mae = np.abs(obs_valid - sim_valid).mean()
    pbias = 100 * bias / obs_valid.mean()
    
    # Efficiency metrics
    nse = 1 - ((obs_valid - sim_valid) ** 2).sum() / ((obs_valid - obs_valid.mean()) ** 2).sum()
    
    # Kling-Gupta Efficiency
    r = obs_valid.corr(sim_valid)
    alpha = sim_valid.std() / obs_valid.std()
    beta = sim_valid.mean() / obs_valid.mean()
    kge = 1 - np.sqrt((r - 1)**2 + (alpha - 1)**2 + (beta - 1)**2)
    
    # Display performance metrics
    print(f"   📊 RMSE: {rmse:.2f} m³/s")
    print(f"   📊 Bias: {bias:+.2f} m³/s ({pbias:+.1f}%)")
    print(f"   📊 MAE: {mae:.2f} m³/s")
    print(f"   📊 Correlation (r): {r:.3f}")
    print(f"   📊 Nash-Sutcliffe (NSE): {nse:.3f}")
    print(f"   📊 Kling-Gupta (KGE): {kge:.3f}")
    
    # =============================================================================
    # ROUTING AND SPATIAL EFFECTS ANALYSIS
    # =============================================================================
    
    print(f"\n🌊 Routing and Spatial Effects Analysis:")
    
    # Analyze peak flow timing (routing delay effects)
    obs_peaks = obs_valid[obs_valid > obs_valid.quantile(0.95)]
    sim_peaks = sim_valid[sim_valid > sim_valid.quantile(0.95)]
    
    if len(obs_peaks) > 0 and len(sim_peaks) > 0:
        # Find largest peak in common period
        obs_max_date = obs_valid.idxmax()
        sim_max_date = sim_valid.idxmax()
        peak_timing_diff = (sim_max_date - obs_max_date).days
        
        print(f"   ⏱️  Peak timing: {peak_timing_diff:+d} days difference")
        print(f"   🌊 Routing effects: {'Delayed' if peak_timing_diff > 0 else 'Advanced' if peak_timing_diff < 0 else 'Aligned'} peak response")
    
    # Flow regime analysis
    flow_stats = {
        'High flows (Q95)': (obs_valid.quantile(0.95), sim_valid.quantile(0.95)),
        'Medium flows (Q50)': (obs_valid.quantile(0.50), sim_valid.quantile(0.50)),
        'Low flows (Q05)': (obs_valid.quantile(0.05), sim_valid.quantile(0.05))
    }
    
    print(f"\n📊 Flow Regime Assessment:")
    for regime, (obs_q, sim_q) in flow_stats.items():
        bias_pct = 100 * (sim_q - obs_q) / obs_q
        print(f"   {regime}: Obs={obs_q:.1f}, Sim={sim_q:.1f} m³/s ({bias_pct:+.1f}%)")
    
    # =============================================================================
    # COMPREHENSIVE SEMI-DISTRIBUTED VISUALIZATION
    # =============================================================================
    
    print(f"\n📈 Creating semi-distributed evaluation visualization...")
    
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    
    # Time series comparison (top left)
    ax1 = axes[0, 0]
    ax1.plot(obs_valid.index, obs_valid.values, 'b-',
             label='WSC Observed', linewidth=1.5, alpha=0.8)
    ax1.plot(sim_valid.index, sim_valid.values, 'r-',
             label=f'Semi-Distributed ({len(basins_gdf)} GRUs)', linewidth=1.5, alpha=0.8)
    
    ax1.set_ylabel('Discharge (m³/s)', fontsize=11)
    ax1.set_title('Semi-Distributed Streamflow Comparison', fontweight='bold')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # Add performance metrics
    metrics_text = f'NSE: {nse:.3f}\nKGE: {kge:.3f}\nBias: {pbias:+.1f}%\nGRUs: {len(basins_gdf)}'
    ax1.text(0.02, 0.95, metrics_text, transform=ax1.transAxes,
             bbox=dict(facecolor='white', alpha=0.8), fontsize=10, verticalalignment='top')
    
    # Scatter plot with routing emphasis (top right)
    ax2 = axes[0, 1]
    ax2.scatter(obs_valid, sim_valid, alpha=0.5, c='green', s=20)
    max_val = max(obs_valid.max(), sim_valid.max())
    ax2.plot([0, max_val], [0, max_val], 'k--', label='1:1 line')
    ax2.set_xlabel('Observed (m³/s)', fontsize=11)
    ax2.set_ylabel('Semi-Distributed (m³/s)', fontsize=11)
    ax2.set_title('Obs vs Sim with Network Routing', fontweight='bold')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    # Monthly climatology (bottom left)
    ax3 = axes[1, 0]
    monthly_obs = obs_valid.groupby(obs_valid.index.month).mean()
    monthly_sim = sim_valid.groupby(sim_valid.index.month).mean()
    months = range(1, 13)
    month_names = ['J', 'F', 'M', 'A', 'M', 'J', 'J', 'A', 'S', 'O', 'N', 'D']
    
    ax3.plot(months, monthly_obs.values, 'o-', label='Observed',
             color='blue', linewidth=2, markersize=6)
    ax3.plot(months, monthly_sim.values, 's-', label='Semi-Distributed',
             color='red', linewidth=2, markersize=6)
    
    ax3.set_xticks(months)
    ax3.set_xticklabels(month_names)
    ax3.set_ylabel('Mean Discharge (m³/s)', fontsize=11)
    ax3.set_title('Seasonal Flow Regime', fontweight='bold')
    ax3.legend()
    ax3.grid(True, alpha=0.3)
    
    # Flow duration curve (bottom right)
    ax4 = axes[1, 1]
    
    # Calculate exceedance probabilities
    obs_sorted = obs_valid.sort_values(ascending=False)
    sim_sorted = sim_valid.sort_values(ascending=False)
    obs_ranks = np.arange(1., len(obs_sorted) + 1) / len(obs_sorted) * 100
    sim_ranks = np.arange(1., len(sim_sorted) + 1) / len(sim_sorted) * 100
    
    ax4.semilogy(obs_ranks, obs_sorted, 'b-', label='Observed', linewidth=2)
    ax4.semilogy(sim_ranks, sim_sorted, 'r-', label='Semi-Distributed', linewidth=2)
    
    ax4.set_xlabel('Exceedance Probability (%)', fontsize=11)
    ax4.set_ylabel('Discharge (m³/s)', fontsize=11)
    ax4.set_title('Flow Duration Curve', fontweight='bold')
    ax4.legend()
    ax4.grid(True, alpha=0.3)
    
    plt.suptitle(f'Semi-Distributed Evaluation - {config_dict["DOMAIN_NAME"]} ({len(basins_gdf)} GRUs)',
                 fontsize=14, fontweight='bold')
    plt.tight_layout()
    plt.show()

else:
    print("⚠️  Cannot perform semi-distributed evaluation - missing data")
