# CONFLUENCE Tutorial - 4: Semi-Distributed Basin Workflow (Bow River at Banff)

## Introduction
This tutorial demonstrates the next advancement in spatial modeling complexity through semi-distributed basin modeling. Building on the lumped basin approach from Tutorial 02a, we now introduce spatial discretization by dividing the watershed into multiple connected units called Grouped Response Units (GRUs). This approach bridges the gap between simple lumped models and fully distributed representations, offering an optimal balance between computational efficiency and spatial realism.

## Semi-Distributed Modeling Philosophy
Semi-distributed modeling divides the watershed into multiple sub-basins, with each unit treated as a separate modeling component within a connected network. This approach employs spatial discretization where the watershed is divided into multiple GRUs based on stream network topology, creating connected units where each GRU links to downstream units through a routing network. The methodology represents intermediate complexity that provides more spatial realism than lumped models while maintaining greater simplicity than fully distributed approaches, ensuring computational efficiency with fewer units than fully distributed models while remaining suitable for calibration and uncertainty analysis.

## Key Scientific Concepts
The Grouped Response Unit concept represents sub-basins that drain to specific points along the stream network, with each GRU containing similar hydrological characteristics and responding as a unified computational unit. Stream network delineation employs digital elevation models and flow accumulation algorithms to automatically identify stream channels and divide the watershed into connected sub-basins. Routing processes move water from upstream GRUs to downstream GRUs through the stream network while accounting for travel time and channel storage effects. The stream threshold parameter critically controls model complexity by determining how many sub-basins are created, with higher thresholds producing fewer, larger GRUs.

## Scientific Advantages and Applications
Semi-distributed modeling captures important spatial variations in climate, topography, and land cover across the watershed while better representing elevation-dependent processes like snow accumulation and temperature gradients. This approach maintains computational efficiency with fewer units than fully distributed models while preserving key spatial patterns, explicitly represents routing dynamics including travel time and attenuation effects in the stream network, and provides diagnostic capabilities that allow examination of contributions from different watershed regions.

## Case Study: Bow River at Banff Semi-Distributed Configuration
For this tutorial, we employ the same Bow River watershed from Tutorial 02a but divide it into multiple GRUs through several configuration modifications. The domain method changes from lumped to delineate for watershed subdivision, a stream threshold of 5000 creates multiple sub-basins, mizuRoute provides routing connectivity between GRUs, and spatial complexity increases from single-unit to multi-unit representation. Expected outcomes include better representation of elevation gradients, improved timing of snowmelt contributions, more realistic representation of spatial climate variability, and enhanced ability to diagnose spatial process patterns.

## Technical Implementation Framework
The semi-distributed approach integrates several key components through automated watershed delineation that identifies sub-basins using flow accumulation algorithms, stream network extraction that creates river network topology connecting the sub-basins, GRU characterization that calculates average characteristics for each sub-basin, routing setup that configures mizuRoute to move water between GRUs, and model integration that couples SUMMA land surface processes with mizuRoute routing capabilities.

## Learning Objectives and Tutorial Structure
Through this tutorial, you will master configuration of semi-distributed models with multiple GRUs, understand control of spatial discretization using stream threshold parameters, comprehend routing processes and their impact on streamflow timing, develop skills in visualizing spatial model structure with GRUs and stream networks, learn to interpret distributed model results and compare with lumped approaches, and manage increased model complexity while maintaining workflow efficiency.

This tutorial follows the established CONFLUENCE workflow with key modifications for spatial complexity, including project setup initialization for semi-distributed modeling, domain delineation to create multiple sub-basins using stream network analysis, spatial discretization to convert sub-basins to GRUs for modeling, data processing to prepare inputs for multiple modeling units, model configuration to set up coupled SUMMA and mizuRoute systems, model execution to run the integrated land surface and routing model, and comprehensive results analysis comparing semi-distributed versus lumped model performance. By completing this tutorial, you will understand how to add spatial complexity to hydrological models while maintaining computational efficiency, establishing crucial foundations for fully distributed modeling applications.

## Step 1: Semi-Distributed Setup with Data Reuse
Building on the lumped basin modeling from Tutorial 02a, we now advance to semi-distributed watershed modeling. This represents an optimal balance between computational efficiency and spatial realism: multiple connected sub-basins that capture key spatial heterogeneity while maintaining manageable model complexity.

The same CONFLUENCE framework seamlessly handles this complexity increase while data reuse from Tutorial 02a eliminates redundant preprocessing, demonstrating efficient workflow management for iterative model development.

In [None]:
# Import the libraries we'll need in this notebook
import sys
import os
from pathlib import Path
import yaml
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd
from datetime import datetime
import xarray as xr
import numpy as np
import shutil

# Add CONFLUENCE to path
confluence_path = Path('../').resolve()
sys.path.append(str(confluence_path))

# Import main CONFLUENCE class
from CONFLUENCE import CONFLUENCE

# Set up plotting style
plt.style.use('default')
%matplotlib inline

# =============================================================================
# CONFIGURATION FOR SEMI-DISTRIBUTED BOW RIVER MODELING
# =============================================================================

# Set directory paths
CONFLUENCE_CODE_DIR = confluence_path
CONFLUENCE_DATA_DIR = Path('/Users/darrieythorsson/compHydro/data/CONFLUENCE_data')  
#CONFLUENCE_DATA_DIR = Path('/path/to/your/CONFLUENCE_data') 

# Load template configuration and customize for semi-distributed modeling
config_template_path = CONFLUENCE_CODE_DIR / '0_config_files' / 'config_template.yaml'

with open(config_template_path, 'r') as f:
    config_dict = yaml.safe_load(f)

# Update for semi-distributed Bow River modeling
config_updates = {
    'CONFLUENCE_CODE_DIR': str(CONFLUENCE_CODE_DIR),
    'CONFLUENCE_DATA_DIR': str(CONFLUENCE_DATA_DIR),
    'DOMAIN_NAME': 'Bow_at_Banff_distributed',
    'EXPERIMENT_ID': 'distributed_tutorial',
    'POUR_POINT_COORDS': '51.1722/-115.5717',  # Same as lumped model
    'DOMAIN_DEFINITION_METHOD': 'delineate',    # KEY CHANGE: watershed delineation vs lumped
    'STREAM_THRESHOLD': 5000,                   # Controls number of sub-basins
    'DOMAIN_DISCRETIZATION': 'GRUs',            # Grouped Response Units
    'HYDROLOGICAL_MODEL': 'SUMMA',
    'ROUTING_MODEL': 'mizuRoute',               
    'EXPERIMENT_TIME_START': '2011-01-01 01:00',
    'EXPERIMENT_TIME_END': '2018-12-31 23:00',
    'CALIBRATION_PERIOD': '2011-01-01, 2015-12-31',
    'EVALUATION_PERIOD': '2016-01-01, 2018-12-31',
    'SPINUP_PERIOD': '2011-01-01, 2011-12-31',
    'STATION_ID': '05BB001',
    'DOWNLOAD_WSC_DATA': True
}

config_dict.update(config_updates)

# Save configuration
temp_config_path = CONFLUENCE_CODE_DIR / '0_config_files' / 'config_semi_distributed.yaml'
with open(temp_config_path, 'w') as f:
    yaml.dump(config_dict, f, default_flow_style=False, sort_keys=False)

# =============================================================================
# DATA REUSE FROM TUTORIAL 02A
# =============================================================================

# Check for existing data from lumped model tutorial
lumped_domain = 'Bow_at_Banff'  # From Tutorial 02a
lumped_data_dir = CONFLUENCE_DATA_DIR / f'domain_{lumped_domain}'

if lumped_data_dir.exists():
    print(f"✅ Found existing data from Tutorial 02a: {lumped_data_dir}")
    
    # Define reusable data categories
    reusable_data = {
        'Elevation (DEM)': lumped_data_dir / 'attributes' / 'elevation',
        'Soil Data': lumped_data_dir / 'attributes' / 'soilclass', 
        'Land Cover': lumped_data_dir / 'attributes' / 'landclass',
        'ERA5 Forcing': lumped_data_dir / 'forcing' / 'raw_data',
        'WSC Observations': lumped_data_dir / 'observations' / 'streamflow'
    }
    
    # Check availability and copy reusable data
    print(f"\n🔄 Copying and Adapting Reusable Data...")
    
    # Initialize CONFLUENCE first to create directory structure
    confluence = CONFLUENCE(temp_config_path)
    project_dir = confluence.managers['project'].setup_project()
    
    def copy_with_name_adaptation(src_path, dst_path, old_name, new_name):
        """Copy files with name adaptation for new domain"""
        if not src_path.exists():
            return False
            
        dst_path.parent.mkdir(parents=True, exist_ok=True)
        
        if src_path.is_dir():
            # Copy directory contents with name adaptation
            for src_file in src_path.rglob('*'):
                if src_file.is_file():
                    rel_path = src_file.relative_to(src_path)
                    # Adapt filename
                    new_filename = src_file.name.replace(old_name, new_name)
                    dst_file = dst_path / rel_path.parent / new_filename
                    dst_file.parent.mkdir(parents=True, exist_ok=True)
                    shutil.copy2(src_file, dst_file)
            return True
        elif src_path.is_file():
            # Copy single file with name adaptation
            new_filename = dst_path.name.replace(old_name, new_name)
            dst_file = dst_path.parent / new_filename
            dst_file.parent.mkdir(parents=True, exist_ok=True)
            shutil.copy2(src_path, dst_file)
            return True
        return False
    
    # Copy reusable data with appropriate naming
    for data_type, src_path in reusable_data.items():
        if src_path.exists():
            # Determine destination path
            rel_path = src_path.relative_to(lumped_data_dir)
            dst_path = project_dir / rel_path
            
            # Copy with name adaptation
            success = copy_with_name_adaptation(
                src_path, dst_path, 
                lumped_domain, config_dict['DOMAIN_NAME']
            )
            
            if success:
                print(f"   ✅ {data_type}: Copied and adapted")
            else:
                print(f"   ⚠️  {data_type}: Copy failed")
        else:
            print(f"   📋 {data_type}: Not found, will acquire fresh")
    
else:
    print(f"⚠️  No existing data found from Tutorial 02a")
    print(f"   Will acquire all data from scratch")
    
    # Initialize CONFLUENCE and create project structure
    confluence = CONFLUENCE(temp_config_path)
    project_dir = confluence.managers['project'].setup_project()

# Create pour point
pour_point_path = confluence.managers['project'].create_pour_point()


## Step 2: Stream Network Delineation and Spatial Connectivity
The transition to semi-distributed modeling requires sophisticated spatial analysis to automatically identify sub-basins and their connectivity. This process transforms a continuous landscape into a network of connected modeling units that preserve the essential topology of watershed drainage while creating computationally-manageable spatial discretization.
Scientific Context: Stream Network Analysis
Hydrologic Network Principles:

- Flow Accumulation: Upslope area contributing to each grid cell
- Stream Threshold: Minimum contributing area to define stream channels
- Watershed Segmentation: Division of landscape by stream network topology
- Connectivity Preservation: Maintaining upstream-downstream relationships
- Scale Optimization: Balancing spatial detail with computational tractability

The stream threshold parameter critically controls model complexity: lower values create more sub-basins with finer spatial detail, while higher values produce fewer, larger units with reduced computational demands.

In [None]:
# Check if DEM was copied from Tutorial 02a, otherwise run data acquisition
dem_path = project_dir / 'attributes' / 'elevation' / 'dem'
if not dem_path.exists() or len(list(dem_path.glob('*.tif'))) == 0:
    print(f"   DEM not found, acquiring fresh geospatial attributes...")
    confluence.managers['data'].acquire_attributes()
    print("✅ Geospatial attributes acquired")
else:
    print(f"✅ DEM available from previous workflow")

# Executing stream network delineation
watershed_path = confluence.managers['domain'].define_domain()
print("✅ Stream network delineation complete")

# Execute domain discretization to create GRUs
hru_path = confluence.managers['domain'].discretize_domain()
print("✅ GRU discretization complete")

## NETWORK STRUCTURE ANALYSIS AND VISUALIZATION

In [None]:
# Load and analyze created spatial products
basin_dir = project_dir / 'shapefiles' / 'river_basins'
network_dir = project_dir / 'shapefiles' / 'river_network'
catchment_dir = project_dir / 'shapefiles' / 'catchment'

# Load spatial data
basin_files = list(basin_dir.glob('*.shp'))
network_files = list(network_dir.glob('*.shp'))

basins_gdf = gpd.read_file(basin_files[0])
network_gdf = gpd.read_file(network_files[0])

# Project to appropriate CRS for area calculations
# For Bow River at Banff (Alberta), UTM Zone 11N is appropriate
target_crs = 'EPSG:32611'  # UTM Zone 11N
basins_projected = basins_gdf.to_crs(target_crs)

print(f"\n📋 Network Structure Summary:")
print(f"   Sub-basins (GRUs): {len(basins_gdf)}")
print(f"   Stream segments: {len(network_gdf)}")

# Calculate areas using projected geometries
total_area_m2 = basins_projected.geometry.area.sum()
total_area_km2 = total_area_m2 / 1e6
avg_gru_size_km2 = total_area_km2 / len(basins_gdf)

print(f"   Total watershed area: {total_area_km2:.1f} km²")
print(f"   Average GRU size: {avg_gru_size_km2:.1f} km²")

# Analyze GRU characteristics
if 'elevation' in basins_gdf.columns:
    print(f"   Elevation range: {basins_gdf['elevation'].min():.0f}m to {basins_gdf['elevation'].max():.0f}m")
    print(f"   Elevation gradient: {basins_gdf['elevation'].max() - basins_gdf['elevation'].min():.0f}m span")

# Stream network characteristics
if 'Length' in network_gdf.columns:
    total_length = network_gdf['Length'].sum() / 1000  # Convert to km
    print(f"   Total stream length: {total_length:.1f} km")

print(f"\n🗺️  Creating network structure visualization...")

fig, axes = plt.subplots(figsize=(18, 9))

# Left plot: Sub-basin network with elevation (use original CRS for plotting)
ax1 = axes

if 'elevation' in basins_gdf.columns:
    # Color by elevation
    basins_plot = basins_gdf.plot(ax=ax1, column='elevation', cmap='terrain',
                                edgecolor='black', linewidth=1, legend=True,
                                legend_kwds={'label': 'Elevation (m)', 'shrink': 0.8})
else:
    # Color by GRU ID
    basins_plot = basins_gdf.plot(ax=ax1, column='GRU_ID', cmap='viridis',
                                edgecolor='black', linewidth=1, legend=True,
                                legend_kwds={'label': 'GRU ID', 'shrink': 0.8})

# Add stream network
network_gdf.plot(ax=ax1, color='blue', linewidth=2, alpha=0.8)

# Add pour point
pour_point_gdf = gpd.read_file(pour_point_path)
pour_point_gdf.plot(ax=ax1, color='red', markersize=150, marker='o',
                   edgecolor='white', linewidth=2, zorder=5)

ax1.set_title(f'Semi-Distributed Network\n{len(basins_gdf)} Sub-basins', 
             fontsize=14, fontweight='bold')
ax1.set_xlabel('Longitude', fontsize=12)
ax1.set_ylabel('Latitude', fontsize=12)
ax1.grid(True, alpha=0.3)



plt.tight_layout()
plt.show()

## Step 3: Multi-GRU Data Pipeline
The same model-agnostic preprocessing framework now scales to multiple connected sub-basins, demonstrating CONFLUENCE's seamless transition from single-unit to multi-unit spatial modeling. The core data quality and standardization principles remain unchanged; however, spatial processing now handles distributed forcing across the GRU network and routing connectivity between sub-basins.

The same preprocessing philosophy ensures consistent data standards across spatial scales, enabling robust model intercomparison and maintaining the scientific rigor established in previous tutorials.

In [None]:
# Execute streamflow data processing 
confluence.managers['data'].process_observed_data()
print("✅ Streamflow validation data ready")

# Check if forcing data was copied, otherwise acquire
forcing_dir = project_dir / 'forcing' / 'raw_data'
if not forcing_dir.exists() or len(list(forcing_dir.glob('*.nc'))) == 0:
    # confluence.managers['data'].acquire_forcings()
    print("✅ Forcing acquisition complete (simulated)")

# Execute model-agnostic preprocessing
confluence.managers['data'].run_model_agnostic_preprocessing()
print("✅ Multi-GRU preprocessing complete")


# Execute model-specific preprocessing
confluence.managers['model'].preprocess_models()
print("✅ Semi-distributed model configuration complete")

## Step 4: Streamlined Semi-Distributed Model Execution
The same SUMMA process-based physics now executes across multiple connected sub-basins, representing an advance in spatial modeling complexity. This integration of distributed runoff generation with explicit network routing demonstrates how the same computational framework scales from single-unit to multi-unit watershed simulation while maintaining physical realism and computational efficiency. The same workflow orchestration ensures robust execution across this increased complexity while mizuRoute integration transforms the distributed runoff into realistic streamflow with routing delays and channel storage effects.

In [None]:
# Execute the semi-distributed model system
confluence.managers['model'].run_models()
print("✅ Semi-distributed simulation complete")

## Step 5: Evaluation and Performance Comparison

In [None]:
# Load observed streamflow (same as lumped model for direct comparison)
obs_path = confluence.project_dir / "observations" / "streamflow" / "preprocessed" / f"{config_dict['DOMAIN_NAME']}_streamflow_processed.csv"
obs_df = pd.read_csv(obs_path, parse_dates=['datetime'])
obs_df.set_index('datetime', inplace=True)

# Load semi-distributed simulation from mizuRoute
routing_dir = confluence.project_dir / "simulations" / config_dict['EXPERIMENT_ID'] / "mizuRoute"
routing_files = list(routing_dir.glob("*.nc"))
print(routing_dir)

if routing_files:
    # Load mizuRoute network output
    routing_ds = xr.open_dataset(routing_files[0])
    
    # Extract outlet streamflow (typically the downstream-most segment)
    if 'IRFroutedRunoff' in routing_ds.data_vars:
        # Find outlet segment (could be identified by SIM_REACH_ID or maximum downstream position)
        reach_id = int(config_dict.get('SIM_REACH_ID', routing_ds.reachID.values[-1]))
        
        # Find segment index for outlet
        segment_indices = np.where(routing_ds.reachID.values == reach_id)[0]
        
        if len(segment_indices) > 0:
            segment_idx = segment_indices[0]
            sim_streamflow = routing_ds['IRFroutedRunoff'].isel(seg=segment_idx)
            sim_df = sim_streamflow.to_pandas()
                        
        else:
            print(f"⚠️  Outlet segment {reach_id} not found")
            sim_df = None
    else:
        print(f"⚠️  Streamflow variable not found in routing output")
        sim_df = None
        
    routing_ds.close()
else:
    print(f"⚠️  mizuRoute output not found")
    sim_df = None

# =============================================================================
# SEMI-DISTRIBUTED PERFORMANCE ASSESSMENT
# =============================================================================
    
# Align data to common period
start_date = max(obs_df.index.min(), sim_df.index.min())
end_date = min(obs_df.index.max(), sim_df.index.max())

# Skip initial spinup period
start_date = start_date + pd.DateOffset(months=6)

# Resample to daily and filter to common period
obs_daily = obs_df['discharge_cms'].resample('D').mean().loc[start_date:end_date]
sim_daily = sim_df.resample('D').mean().loc[start_date:end_date]

# Remove NaN values
valid_mask = ~(obs_daily.isna() | sim_daily.isna())
obs_valid = obs_daily[valid_mask]
sim_valid = sim_daily[valid_mask]

# Calculate comprehensive performance metrics
print(f"\n📈 Semi-Distributed Performance Metrics:")

# Basic statistics
rmse = np.sqrt(((obs_valid - sim_valid) ** 2).mean())
bias = (sim_valid - obs_valid).mean()
mae = np.abs(obs_valid - sim_valid).mean()
pbias = 100 * bias / obs_valid.mean()

# Efficiency metrics
nse = 1 - ((obs_valid - sim_valid) ** 2).sum() / ((obs_valid - obs_valid.mean()) ** 2).sum()

# Kling-Gupta Efficiency
r = obs_valid.corr(sim_valid)
alpha = sim_valid.std() / obs_valid.std()
beta = sim_valid.mean() / obs_valid.mean()
kge = 1 - np.sqrt((r - 1)**2 + (alpha - 1)**2 + (beta - 1)**2)

# Display performance metrics
print(f"   📊 RMSE: {rmse:.2f} m³/s")
print(f"   📊 Bias: {bias:+.2f} m³/s ({pbias:+.1f}%)")
print(f"   📊 MAE: {mae:.2f} m³/s")
print(f"   📊 Correlation (r): {r:.3f}")
print(f"   📊 Nash-Sutcliffe (NSE): {nse:.3f}")
print(f"   📊 Kling-Gupta (KGE): {kge:.3f}")

# =============================================================================
# ROUTING AND SPATIAL EFFECTS ANALYSIS
# =============================================================================

# Analyze peak flow timing (routing delay effects)
obs_peaks = obs_valid[obs_valid > obs_valid.quantile(0.95)]
sim_peaks = sim_valid[sim_valid > sim_valid.quantile(0.95)]

if len(obs_peaks) > 0 and len(sim_peaks) > 0:
    # Find largest peak in common period
    obs_max_date = obs_valid.idxmax()
    sim_max_date = sim_valid.idxmax()
    peak_timing_diff = (sim_max_date - obs_max_date).days
    
# Flow regime analysis
flow_stats = {
    'High flows (Q95)': (obs_valid.quantile(0.95), sim_valid.quantile(0.95)),
    'Medium flows (Q50)': (obs_valid.quantile(0.50), sim_valid.quantile(0.50)),
    'Low flows (Q05)': (obs_valid.quantile(0.05), sim_valid.quantile(0.05))
}

print(f"\n📊 Flow Regime Assessment:")
for regime, (obs_q, sim_q) in flow_stats.items():
    bias_pct = 100 * (sim_q - obs_q) / obs_q
    print(f"   {regime}: Obs={obs_q:.1f}, Sim={sim_q:.1f} m³/s ({bias_pct:+.1f}%)")

# =============================================================================
# COMPREHENSIVE SEMI-DISTRIBUTED VISUALIZATION
# =============================================================================

print(f"\n📈 Creating semi-distributed evaluation visualization...")

fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# Time series comparison (top left)
ax1 = axes[0, 0]
ax1.plot(obs_valid.index, obs_valid.values, 'b-',
         label='WSC Observed', linewidth=1.5, alpha=0.8)
ax1.plot(sim_valid.index, sim_valid.values, 'r-',
         label=f'Semi-Distributed ({len(basins_gdf)} GRUs)', linewidth=1.5, alpha=0.8)

ax1.set_ylabel('Discharge (m³/s)', fontsize=11)
ax1.set_title('Semi-Distributed Streamflow Comparison', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Add performance metrics
metrics_text = f'NSE: {nse:.3f}\nKGE: {kge:.3f}\nBias: {pbias:+.1f}%\nGRUs: {len(basins_gdf)}'
ax1.text(0.02, 0.95, metrics_text, transform=ax1.transAxes,
         bbox=dict(facecolor='white', alpha=0.8), fontsize=10, verticalalignment='top')

# Scatter plot with routing emphasis (top right)
ax2 = axes[0, 1]
ax2.scatter(obs_valid, sim_valid, alpha=0.5, c='green', s=20)
max_val = max(obs_valid.max(), sim_valid.max())
ax2.plot([0, max_val], [0, max_val], 'k--', label='1:1 line')
ax2.set_xlabel('Observed (m³/s)', fontsize=11)
ax2.set_ylabel('Semi-Distributed (m³/s)', fontsize=11)
ax2.set_title('Obs vs Sim with Network Routing', fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

# Monthly climatology (bottom left)
ax3 = axes[1, 0]
monthly_obs = obs_valid.groupby(obs_valid.index.month).mean()
monthly_sim = sim_valid.groupby(sim_valid.index.month).mean()
months = range(1, 13)
month_names = ['J', 'F', 'M', 'A', 'M', 'J', 'J', 'A', 'S', 'O', 'N', 'D']

ax3.plot(months, monthly_obs.values, 'o-', label='Observed',
         color='blue', linewidth=2, markersize=6)
ax3.plot(months, monthly_sim.values, 's-', label='Semi-Distributed',
         color='red', linewidth=2, markersize=6)

ax3.set_xticks(months)
ax3.set_xticklabels(month_names)
ax3.set_ylabel('Mean Discharge (m³/s)', fontsize=11)
ax3.set_title('Seasonal Flow Regime', fontweight='bold')
ax3.legend()
ax3.grid(True, alpha=0.3)

# Flow duration curve (bottom right)
ax4 = axes[1, 1]

# Calculate exceedance probabilities
obs_sorted = obs_valid.sort_values(ascending=False)
sim_sorted = sim_valid.sort_values(ascending=False)
obs_ranks = np.arange(1., len(obs_sorted) + 1) / len(obs_sorted) * 100
sim_ranks = np.arange(1., len(sim_sorted) + 1) / len(sim_sorted) * 100

ax4.semilogy(obs_ranks, obs_sorted, 'b-', label='Observed', linewidth=2)
ax4.semilogy(sim_ranks, sim_sorted, 'r-', label='Semi-Distributed', linewidth=2)

ax4.set_xlabel('Exceedance Probability (%)', fontsize=11)
ax4.set_ylabel('Discharge (m³/s)', fontsize=11)
ax4.set_title('Flow Duration Curve', fontweight='bold')
ax4.legend()
ax4.grid(True, alpha=0.3)

plt.suptitle(f'Semi-Distributed Evaluation - {config_dict["DOMAIN_NAME"]} ({len(basins_gdf)} GRUs)',
             fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

## Summary: Semi-Distributed Basin-Scale Modeling
This tutorial successfully demonstrated the advancement from lumped to semi-distributed watershed modeling using CONFLUENCE. Through the enhanced Bow River at Banff case study, we illustrated how the same standardized workflow framework seamlessly scales from single-unit to multi-unit spatial representation while introducing explicit stream network routing and sub-basin connectivity, establishing the foundation for fully distributed hydrological modeling applications.

## Key Methodological Achievements
The tutorial established multi-unit spatial discretization through automated watershed delineation that creates connected sub-basins based on stream network topology and flow accumulation algorithms. Stream network routing integration was successfully implemented through mizuRoute coupling with SUMMA, enabling explicit representation of travel times and channel storage effects between connected GRUs. Intelligent data reuse capabilities were demonstrated through efficient adaptation of geospatial and forcing data from Tutorial 02a, showcasing CONFLUENCE's support for iterative model development and comparative analysis.

## Scientific Process Understanding
The evaluation demonstrated CONFLUENCE's ability to represent spatially-distributed watershed processes through multiple connected sub-basins that capture elevation gradients and heterogeneous runoff generation while maintaining integrated outlet response. Routing dynamics and timing effects were successfully simulated through explicit stream network connectivity that accounts for travel delays and channel storage in streamflow generation. Spatial process attribution capabilities were established through sub-basin-level analysis that enables identification of contributing areas and process patterns across the watershed network.

## Framework Scalability Validation
This tutorial confirmed CONFLUENCE's seamless complexity scaling by applying identical workflow principles from lumped through semi-distributed modeling without requiring fundamental architectural modifications. The model-agnostic preprocessing approach proved equally effective for multi-GRU spatial processing and routing network configuration, reinforcing the framework's broad applicability across modeling scales. Computational efficiency optimization was demonstrated through intelligent data reuse and workflow management that minimizes redundant processing while enabling rapid exploration of alternative spatial discretization strategies.
This foundation in semi-distributed basin modeling establishes essential principles for managing spatial complexity and network connectivity while preparing for the fully distributed modeling approaches and large-scale applications in subsequent tutorials.

### Next Focus: Distributed Watershed Modelling 

**Ready to explore Distributed basin simulations?** → **[Tutorial 02c: Basin Scale - Distributed Watershed](./02c_basin_distributed.ipynb)**