# CONFLUENCE Tutorial - 3: Lumped Basin Workflow (Bow River at Banff)

## Introduction

This tutorial shows how to scale up from point-scale modeling to basin-scale streamflow simulation using CONFLUENCE. Building on our previous tutorials with SNOTEL and FLUXNET data, we now demonstrate how to model an entire watershed as a single unit to generate streamflow at the basin outlet.

### What is Lumped Basin Modeling?

Lumped basin modeling treats the entire watershed as one homogeneous unit, averaging all the spatial variability across the catchment. While this is a simplification, it's a valuable approach because:

- **Simplicity**: Easier to understand and implement than distributed models
- **Computational efficiency**: Fast execution makes it ideal for calibration and uncertainty analysis  
- **Baseline performance**: Establishes whether a model can capture the basic watershed response before adding spatial complexity
- **Parameter identification**: Simpler structure makes it easier to understand which parameters control model behavior

### Case Study: Bow River at Banff

We'll use the Bow River at Banff as our example watershed:

- **Location**: Canadian Rockies, Alberta, Canada
- **Drainage area**: ~2,210 km²
- **Elevation**: Ranges from 1,384 m at the outlet to over 3,400 m in the headwaters
- **Climate**: Snow-dominated mountain system with pronounced seasonal cycles
- **Gauging station**: Water Survey of Canada station 05BB001 with long-term observations

This watershed presents interesting modeling challenges:
- Strong elevation gradients affecting temperature and precipitation
- Complex snow dynamics across elevation zones
- Seasonal storage in snowpack and glaciers
- Pronounced spring freshet from snowmelt

### What You'll Learn

This tutorial will teach you how to:

1. **Set up a basin-scale project** with CONFLUENCE's automated workflow
2. **Delineate watersheds** automatically from digital elevation models
3. **Aggregate spatial data** to create catchment-averaged characteristics
4. **Process meteorological forcing** data for basin-scale modeling
5. **Configure and run SUMMA** for lumped basin simulation
6. **Evaluate model performance** using standard hydrological metrics
7. **Interpret results** and understand model limitations

### Tutorial Overview

We'll walk through the complete CONFLUENCE workflow step by step:

1. **Project Setup**: Create the organized directory structure
2. **Watershed Delineation**: Automatically identify the watershed boundary
3. **Data Acquisition**: Get elevation, soil, and land cover data
4. **Forcing Data**: Process meteorological inputs
5. **Model Configuration**: Set up SUMMA for the lumped basin
6. **Model Execution**: Run the simulation
7. **Results Analysis**: Compare simulated and observed streamflow

By the end of this tutorial, you'll understand how CONFLUENCE handles the transition from point-scale to basin-scale modeling and be ready to explore more complex distributed modeling approaches.

## Step 1: Basin-Scale Workflow Setup
Building on the point-scale modeling expertise from Tutorials 01a and 01b, we now advance to basin-scale hydrological modeling. This represents a fundamental scaling transition: from process validation at individual sites to integrated watershed simulation that captures the collective hydrological response of an entire catchment.
Scaling Transition: Point → Basin

- Spatial Scale: Single location → Entire watershed (~2,210 km²)
- Process Integration: Isolated vertical processes → Integrated water balance with routing
- Validation Target: Local states (SWE, SM, LE) → Streamflow at basin outlet
- Complexity: Uniform characteristics → Spatially-averaged catchment properties
- Scientific Challenge: Process understanding → Emergent watershed behavior

The same CONFLUENCE architecture seamlessly handles this transition, demonstrating the framework's scalability from point validation through basin-scale prediction while maintaining reproducible workflow principles.

In [None]:
# =============================================================================
# STEP 1: BASIN-SCALE WORKFLOW SETUP
# =============================================================================

# Import required libraries
import sys
import os
from pathlib import Path
import yaml
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd
from datetime import datetime
import xarray as xr
import numpy as np

# Add CONFLUENCE to path
confluence_path = Path('../').resolve()
sys.path.append(str(confluence_path))

# Import main CONFLUENCE class
from CONFLUENCE import CONFLUENCE

# Set up plotting style
plt.style.use('default')
%matplotlib inline

print("=== CONFLUENCE Tutorial 02a: Lumped Basin Modeling ===")
print("Scaling from point-scale validation to basin-scale streamflow simulation")

# =============================================================================
# CONFIGURATION FOR BOW RIVER AT BANFF WATERSHED
# =============================================================================

print("\n🏔️ Configuring for Bow River at Banff Watershed")

# Set directory paths
CONFLUENCE_CODE_DIR = confluence_path
CONFLUENCE_DATA_DIR = Path('/Users/darrieythorsson/compHydro/data/CONFLUENCE_data')  # ← Update this path

# Load template configuration and customize for basin-scale modeling
config_template_path = CONFLUENCE_CODE_DIR / '0_config_files' / 'config_template.yaml'

with open(config_template_path, 'r') as f:
    config_dict = yaml.safe_load(f)

# Update for Bow River basin-scale modeling
config_updates = {
    'CONFLUENCE_CODE_DIR': str(CONFLUENCE_CODE_DIR),
    'CONFLUENCE_DATA_DIR': str(CONFLUENCE_DATA_DIR),
    'DOMAIN_NAME': 'Bow_at_Banff_lumped',
    'EXPERIMENT_ID': 'run_1',
    'POUR_POINT_COORDS': '51.1722/-115.5717',  # Banff gauging station
    'DOMAIN_DEFINITION_METHOD': 'lumped',    # Watershed delineation vs point buffer
    'DOMAIN_DISCRETIZATION': 'GRUS',          # Single HRU for entire watershed
    'HYDROLOGICAL_MODEL': 'SUMMA',
    'EXPERIMENT_TIME_START': '2004-01-01 01:00',
    'EXPERIMENT_TIME_END': '2018-12-31 23:00',
    'CALIBRATION_PERIOD': '2004-01-01, 2010-12-31',
    'EVALUATION_PERIOD': '2011-01-01, 2018-12-31',
    'SPINUP_PERIOD': '2004-01-01, 2005-12-31',
    'STATION_ID': '05BB001',                     # WSC streamflow station
    'DOWNLOAD_WSC_DATA': True
}

config_dict.update(config_updates)

# Add experiment metadata for traceability
config_dict['NOTEBOOK_CREATION_TIME'] = datetime.now().isoformat()
config_dict['NOTEBOOK_CREATOR'] = 'CONFLUENCE_Tutorial_02a'
config_dict['SCALING_TRANSITION'] = 'Point-scale to basin-scale lumped modeling'

# Save configuration
temp_config_path = CONFLUENCE_CODE_DIR / '0_config_files' / 'config_basin_notebook.yaml'
with open(temp_config_path, 'w') as f:
    yaml.dump(config_dict, f, default_flow_style=False, sort_keys=False)

print(f"✅ Configuration saved: {temp_config_path}")

# =============================================================================
# SYSTEM INITIALIZATION AND PROJECT STRUCTURE
# =============================================================================

print("\n🏗️  Initializing CONFLUENCE for Basin-Scale Modeling...")

# Initialize CONFLUENCE with basin configuration
confluence = CONFLUENCE(temp_config_path)

print(f"✅ System initialized with {len(confluence.managers)} managers")

# Create project structure
print(f"\n📁 Creating basin-scale project structure for {config_dict['DOMAIN_NAME']}...")
project_dir = confluence.managers['project'].setup_project()
pour_point_path = confluence.managers['project'].create_pour_point()

print(f"✅ Project directory: {project_dir}")
print(f"✅ Pour point created: {pour_point_path}")

print(f"\n🚀 Basin-scale setup complete - Ready for watershed delineation and streamflow modeling!")

## Step 2: Basin Representation and Spatial Discretization Fundamentals
The transition from point-scale to basin-scale modeling requires fundamental decisions about how to represent spatial heterogeneity within the watershed. Unlike point-scale modeling where we assume uniform conditions, basin-scale modeling must address the challenge of capturing spatial variability while maintaining computational tractability.
Scientific Context: Basin Representation Philosophy
Spatial Heterogeneity Challenges:

- Elevation Gradients: Temperature lapse rates, precipitation patterns, snow line dynamics
- Vegetation Patterns: Forest vs alpine zones affecting evapotranspiration and interception
- Soil Variability: Infiltration, storage capacity, and drainage characteristics
- Topographic Effects: Slope, aspect, and drainage network configuration
- Climate Gradients: Orographic precipitation, temperature inversions, wind patterns

Representation Strategies:

- Lumped Approach: Single computational unit with averaged characteristics
- Semi-Distributed: Multiple units based on similarity (elevation bands, soil types, land cover)
- Fully Distributed: Grid-based representation with explicit spatial patterns

The Grouped Response Unit (GRU) concept provides flexible spatial discretization, allowing users to choose the appropriate level of complexity for their scientific objectives and computational constraints.

In [None]:
# =============================================================================
# STEP 2: BASIN REPRESENTATION AND SPATIAL DISCRETIZATION
# =============================================================================

print("=== Step 2: Basin Representation and Spatial Discretization ===")

# Update discretization method to GRUs for this demonstration
config_dict['DOMAIN_DISCRETIZATION'] = 'GRUs'

# Save updated configuration
with open(temp_config_path, 'w') as f:
    yaml.dump(config_dict, f, default_flow_style=False, sort_keys=False)

# Reinitialize with updated config
confluence = CONFLUENCE(temp_config_path)

# Execute attribute acquisition
print(f"\n⬇️  Executing geospatial attribute acquisition...")
#confluence.managers['data'].acquire_attributes()
print("✅ Basin-scale attribute acquisition complete")

print(f"\n⚙️  Executing watershed delineation...")
watershed_path = confluence.managers['domain'].define_domain()
print("✅ Watershed delineation complete")

print(f"\n🔧 Executing domain discretization...")
confluence.managers['domain'].discretize_domain()
hru_path = str(Path(config_dict['CONFLUENCE_DATA_DIR']) / f"domain_{config_dict['DOMAIN_NAME']}" / 'shapefiles' / 'catchment' / f"{config_dict['DOMAIN_NAME']}_HRUs_{config_dict['DOMAIN_DISCRETIZATION']}.shp")
print("✅ Domain discretization complete")

# =============================================================================
# VISUALIZATION AND ANALYSIS OF BASIN REPRESENTATION
# =============================================================================

print(f"\n📊 Analyzing Created Basin Representation...")

print(watershed_path)
print(hru_path)
print(pour_point_path)

# Load spatial data
watershed_gdf = gpd.read_file(str(watershed_path[1]))
hru_gdf = gpd.read_file(hru_path)
pour_point_gdf = gpd.read_file(pour_point_path)

print(f"\n📋 Basin Characteristics:")
total_area_m2 = watershed_gdf.geometry.area.sum()
total_area_km2 = total_area_m2 / 1e6
print(f"   Watershed area: {total_area_km2:.1f} km²")
print(f"   Number of GRUs: {len(hru_gdf)}")
print(f"   Average GRU size: {total_area_km2/len(hru_gdf):.1f} km²")

# Display HRU characteristics if available
if 'elevation' in hru_gdf.columns:
    print(f"   Elevation range: {hru_gdf['elevation'].min():.0f}m to {hru_gdf['elevation'].max():.0f}m")
if 'slope' in hru_gdf.columns:
    print(f"   Slope range: {hru_gdf['slope'].min():.1f}° to {hru_gdf['slope'].max():.1f}°")

# Create comprehensive visualization
print(f"\n🗺️  Creating basin representation visualization...")

fig, axes = plt.subplots(1, 2, figsize=(16, 8))

# Left plot: Watershed boundary and pour point
ax1 = axes[0]
watershed_gdf.plot(ax=ax1, facecolor='lightblue', edgecolor='navy', 
                  linewidth=2, alpha=0.7)
pour_point_gdf.plot(ax=ax1, color='red', markersize=100, marker='o',
                   edgecolor='white', linewidth=2, zorder=5)

ax1.set_title('Delineated Watershed Boundary', fontsize=14, fontweight='bold')
ax1.set_xlabel('Longitude', fontsize=12)
ax1.set_ylabel('Latitude', fontsize=12)
ax1.grid(True, alpha=0.3)

# Add area annotation
ax1.text(0.02, 0.98, f'Area: {total_area_km2:.1f} km²\nPour Point: WSC {config_dict["STATION_ID"]}',
         transform=ax1.transAxes, fontsize=10, verticalalignment='top',
         bbox=dict(facecolor='white', alpha=0.8, boxstyle='round,pad=0.3'))

# Right plot: HRU discretization
ax2 = axes[1]

# Color HRUs by elevation if available, otherwise by area
if 'elevation' in hru_gdf.columns:
    hru_gdf.plot(ax=ax2, column='elevation', cmap='terrain', 
                edgecolor='black', linewidth=0.5, legend=True)
    colorbar_label = 'Elevation (m)'
else:
    hru_gdf.plot(ax=ax2, column=hru_gdf.geometry.area, cmap='viridis',
                edgecolor='black', linewidth=0.5, legend=True) 
    colorbar_label = 'Area (deg²)'

pour_point_gdf.plot(ax=ax2, color='red', markersize=100, marker='o',
                   edgecolor='white', linewidth=2, zorder=5)

ax2.set_title(f'GRU Discretization ({len(hru_gdf)} units)', fontsize=14, fontweight='bold')
ax2.set_xlabel('Longitude', fontsize=12)
ax2.set_ylabel('Latitude', fontsize=12)
ax2.grid(True, alpha=0.3)

# Add discretization info
ax2.text(0.02, 0.98, f'GRUs: {len(hru_gdf)}\nAvg. size: {total_area_km2/len(hru_gdf):.1f} km²',
         transform=ax2.transAxes, fontsize=10, verticalalignment='top',
         bbox=dict(facecolor='white', alpha=0.8, boxstyle='round,pad=0.3'))

plt.suptitle(f'Bow River at Banff: Basin Representation', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()


## Step 3: Data Pipeline for Basin-Scale Streamflow Modeling
The same model-agnostic preprocessing framework now scales from point validation to basin-scale streamflow simulation. The core philosophy remains unchanged—standardized, quality-controlled data products—but the spatial context shifts from single locations to integrated watershed responses.
Data Pipeline Scaling: Point → Basin

- Forcing Data: Same ERA5 global data, now basin-averaged across watershed
- Validation Target: Streamflow hydrographs vs local states (SWE, SM, LE)
- Spatial Processing: Watershed-scale remapping vs single-point extraction
- Temporal Integration: Daily streamflow vs sub-daily energy cycles
- Process Focus: Integrated water balance with routing vs isolated vertical processes

The same CONFLUENCE preprocessing pipeline handles both scales seamlessly, demonstrating the framework's scalability while maintaining data quality and reproducibility standards.

In [None]:
# =============================================================================
# STEP 3: STREAMLINED DATA PIPELINE FOR BASIN-SCALE STREAMFLOW MODELING
# =============================================================================

print(f"\n🌊 Processing Streamflow Observations for Basin Outlet...")
print(f"   Station: WSC {config_dict['STATION_ID']} (Bow River at Banff)")

# Execute streamflow data processing
print(f"\n📥 Processing WSC streamflow observations...")
confluence.managers['data'].process_observed_data()
print("✅ Streamflow data processing complete")

print(f"\n🌦️  Acquiring Basin-Averaged Meteorological Forcing...")
print(f"   Watershed area: ~{total_area_km2:.1f} km² (Bow River at Banff)")
print(f"   Elevation range: {hru_gdf['elevation'].min():.0f}m to {hru_gdf['elevation'].max():.0f}m")

# Execute forcing acquisition (commented for demonstration)
print(f"\n⬇️  Executing basin-scale forcing acquisition...")
# confluence.managers['data'].acquire_forcings()
print("✅ forcing acquisition complete (simulated)")

# Execute model-agnostic preprocessing
print(f"\n⚙️  Executing basin-scale model-agnostic preprocessing...")
confluence.managers['data'].run_model_agnostic_preprocessing()
print("✅ Model-agnostic preprocessing complete")

print(f"\n🌊 SUMMA + mizuRoute Configuration for Basin Streamflow...")
print(f"   Hydrological model: {config_dict['HYDROLOGICAL_MODEL']} (process-based)")
print(f"   Routing model: {config_dict['ROUTING_MODEL']} (streamflow routing)")

# Execute model-specific preprocessing
print(f"\n🔧 Executing SUMMA + mizuRoute preprocessing...")
confluence.managers['model'].preprocess_models()
print("✅ Basin-scale model configuration complete")

## Step 4: Streamlined Basin-Scale Model Execution
The same SUMMA process-based physics now scales from point validation to integrated basin simulation, but with the critical addition of streamflow routing. This represents a fundamental modeling advancement: from isolated vertical processes to coupled vertical-horizontal water transport that generates streamflow at the basin outlet.
Model Execution Scaling: Point → Basin

- Spatial Integration: Single HRU → Multiple GRUs with routing connectivity
- Process Coupling: Vertical water balance → Vertical + horizontal flow routing
- Output Target: Local states → Streamflow hydrograph at outlet
- Temporal Integration: Sub-daily energy cycles → Daily streamflow generation
- Validation Shift: Direct state comparison → Integrated basin response

The same workflow orchestration ensures robust execution while mizuRoute routing transforms distributed runoff into the streamflow observations that drive water resources management.

In [None]:
# =============================================================================
# STEP 4: STREAMLINED BASIN-SCALE MODEL EXECUTION
# =============================================================================

# Execute the integrated model system
print(f"\n🏃‍♂️ Running SUMMA + mizuRoute basin simulation...")
confluence.managers['model'].run_models()
print("✅ Basin-scale integrated simulation complete")

## Step 5: Streamflow Evaluation and Basin Performance Assessment
The same CONFLUENCE evaluation framework now transitions from point-scale validation to basin-scale streamflow assessment. This represents a fundamental shift in validation philosophy: from direct process comparison (SWE, SM, LE) to integrated response evaluation where all upstream processes collectively generate the streamflow signal at the basin outlet.
Evaluation Framework Transition: Point → Basin

- Validation Target: Local states (SWE, soil moisture, energy fluxes) → Streamflow hydrograph
- Process Integration: Direct measurement comparison → Emergent watershed response
- Temporal Patterns: Sub-daily cycles → Seasonal flow regimes and flood events
- Performance Metrics: State variable accuracy → Hydrologic signatures and timing
- Scientific Interpretation: Process physics → Water balance closure and prediction skill

The same evaluation infrastructure seamlessly handles this transition, demonstrating CONFLUENCE's versatility across validation scales while maintaining rigorous performance assessment standards.

In [None]:
# =============================================================================
# STEP 5: STREAMLINED STREAMFLOW EVALUATION AND BASIN PERFORMANCE ASSESSMENT
# =============================================================================

print(f"\n🌊 Loading Streamflow Simulation and Observations...")

# Load observed streamflow data
obs_path = confluence.project_dir / "observations" / "streamflow" / "preprocessed" / f"{config_dict['DOMAIN_NAME']}_streamflow_processed.csv"

if obs_path.exists():
    obs_df = pd.read_csv(obs_path, parse_dates=['datetime'])
    obs_df.set_index('datetime', inplace=True)
    
    print(f"✅ WSC observations loaded")
    print(f"   Station: {config_dict['STATION_ID']} (Bow River at Banff)")
    print(f"   Period: {obs_df.index.min()} to {obs_df.index.max()}")
    print(f"   Flow range: {obs_df['discharge_cms'].min():.1f} to {obs_df['discharge_cms'].max():.1f} m³/s")
else:
    print(f"⚠️  Observed streamflow not found at {obs_path}")
    obs_df = None

# Load simulated streamflow from mizuRoute
summa_dir = confluence.project_dir / "simulations" / config_dict['EXPERIMENT_ID'] / "SUMMA"
routing_dir = confluence.project_dir / "simulations" / config_dict['EXPERIMENT_ID'] / "mizuRoute"
routing_files = list(routing_dir.glob("*.nc"))

if routing_files:
    # Load mizuRoute output
    routing_ds = xr.open_dataset(routing_files[0])
    
    # Extract streamflow variable (typically IRFroutedRunoff)
    if 'IRFroutedRunoff' in routing_ds.data_vars:
        sim_streamflow = routing_ds['IRFroutedRunoff']
        
        # Convert to pandas for easier analysis
        sim_df = sim_streamflow.to_pandas()
        
        print(f"✅ mizuRoute simulation loaded")
        print(f"   Period: {sim_df.index.min()} to {sim_df.index.max()}")
        print(f"   Flow range: {sim_df.min():.1f} to {sim_df.max():.1f} m³/s")
        
        routing_ds.close()
    else:
        print(f"⚠️  Streamflow variable not found in mizuRoute output")
        print(f"   Available variables: {list(routing_ds.data_vars)}")
        sim_df = None
        routing_ds.close()
else:
    print(f"⚠️  mizuRoute output files not found in {routing_dir}, checking summa files in {summa_dir}")
    sim_ds = xr.open_dataset(list(summa_dir.glob("*_timestep.nc"))[0])
    shp_file = gpd.read_file(str(confluence.project_dir / "shapefiles" / "catchment" / f"{config_dict['DOMAIN_NAME']}_HRUs_{config_dict['DOMAIN_DISCRETIZATION']}.shp"))
    shp_area = shp_file['GRU_area'].values[0]
    sim_ds['averageRoutedRunoff'] = sim_ds['averageRoutedRunoff'] * shp_area
    sim_streamflow = sim_ds['averageRoutedRunoff']    
    sim_df = sim_streamflow.to_pandas()
    print(f"✅ SUMMA simulation loaded")
    
    sim_ds.close()


# =============================================================================
# STREAMFLOW PERFORMANCE EVALUATION
# =============================================================================

if obs_df is not None and sim_df is not None:
    print(f"\n📊 Streamflow Performance Assessment...")
    
    # Align data to common period
    start_date = max(obs_df.index.min(), sim_df.index.min())
    end_date = min(obs_df.index.max(), sim_df.index.max())
    
    # Skip initial spinup period
    start_date = start_date + pd.DateOffset(months=6)
    
    print(f"   Evaluation period: {start_date} to {end_date}")
    print(f"   Duration: {(end_date - start_date).days} days")
    
    # Filter to common period and resample to daily
    obs_daily = obs_df['discharge_cms'].resample('D').mean().loc[start_date:end_date]
    sim_daily = sim_df.resample('D').mean().loc[start_date:end_date]
    
    # Ensure sim_daily is a Series (in case it's a DataFrame with one column)
    if isinstance(sim_daily, pd.DataFrame):
        if sim_daily.shape[1] == 1:
            sim_daily = sim_daily.iloc[:, 0]  # Take the first (and likely only) column
        else:
            print(f"⚠️  sim_daily has {sim_daily.shape[1]} columns. Using the first column.")
            print(f"   Available columns: {list(sim_daily.columns)}")
            sim_daily = sim_daily.iloc[:, 0]
    
    # Remove any remaining NaN values
    valid_mask = ~(obs_daily.isna() | sim_daily.isna())
    obs_valid = obs_daily[valid_mask]
    sim_valid = sim_daily[valid_mask]    
    print(f"   Valid paired observations: {len(obs_valid)} days")
    
    # Calculate comprehensive performance metrics
    print(f"\n📈 Streamflow Performance Metrics:")
    
    # Basic statistics
    rmse = np.sqrt(((obs_valid - sim_valid) ** 2).mean())
    bias = (sim_valid - obs_valid).mean()
    mae = np.abs(obs_valid - sim_valid).mean()
    
    # Relative metrics
    pbias = 100 * bias / obs_valid.mean()
    
    # Nash-Sutcliffe Efficiency
    nse = 1 - ((obs_valid - sim_valid) ** 2).sum() / ((obs_valid - obs_valid.mean()) ** 2).sum()
    
    # Kling-Gupta Efficiency  
    r = obs_valid.corr(sim_valid)
    alpha = sim_valid.std() / obs_valid.std()
    beta = sim_valid.mean() / obs_valid.mean()
    kge = 1 - np.sqrt((r - 1)**2 + (alpha - 1)**2 + (beta - 1)**2)
    
    # Display performance metrics
    print(f"   📊 RMSE: {rmse:.2f} m³/s")
    print(f"   📊 Bias: {bias:+.2f} m³/s ({pbias:+.1f}%)")
    print(f"   📊 MAE: {mae:.2f} m³/s")
    print(f"   📊 Correlation (r): {r:.3f}")
    print(f"   📊 Nash-Sutcliffe (NSE): {nse:.3f}")
    print(f"   📊 Kling-Gupta (KGE): {kge:.3f}")
    
    # Hydrologic signature analysis
    print(f"\n🌊 Hydrologic Signature Analysis:")
    
    # Flow statistics
    obs_q95 = obs_valid.quantile(0.95)  # High flows
    sim_q95 = sim_valid.quantile(0.95)
    obs_q05 = obs_valid.quantile(0.05)  # Low flows  
    sim_q05 = sim_valid.quantile(0.05)
    
    print(f"   High flows (Q95): Obs={obs_q95:.1f}, Sim={sim_q95:.1f} m³/s")
    print(f"   Low flows (Q05): Obs={obs_q05:.1f}, Sim={sim_q05:.1f} m³/s")
    
    # Seasonal timing
    obs_monthly = obs_valid.groupby(obs_valid.index.month).mean()
    sim_monthly = sim_valid.groupby(sim_valid.index.month).mean()
    peak_month_obs = obs_monthly.idxmax()
    peak_month_sim = sim_monthly.idxmax()
    
    print(f"   Peak flow timing: Obs=Month {peak_month_obs}, Sim=Month {peak_month_sim}")
    
    # =============================================================================
    # STREAMFLOW VISUALIZATION
    # =============================================================================
    
    print(f"\n📈 Creating comprehensive streamflow evaluation...")
    
    fig, axes = plt.subplots(2, 2, figsize=(16, 12))
    
    # Time series comparison (top left)
    ax1 = axes[0, 0]
    ax1.plot(obs_valid.index, obs_valid.values, 'b-', 
             label='WSC Observed', linewidth=1.5, alpha=0.8)
    ax1.plot(sim_valid.index, sim_valid.values, 'r-', 
             label='SUMMA + mizuRoute', linewidth=1.5, alpha=0.8)
    
    ax1.set_ylabel('Discharge (m³/s)', fontsize=11)
    ax1.set_title('Streamflow Time Series', fontweight='bold')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # Add performance metrics
    metrics_text = f'NSE: {nse:.3f}\nKGE: {kge:.3f}\nBias: {pbias:+.1f}%'
    ax1.text(0.02, 0.95, metrics_text, transform=ax1.transAxes,
             bbox=dict(facecolor='white', alpha=0.8), fontsize=10, verticalalignment='top')
    
    # Scatter plot (top right)
    ax2 = axes[0, 1]
    ax2.scatter(obs_valid, sim_valid, alpha=0.5, c='blue', s=20)
    max_val = max(obs_valid.max(), sim_valid.max())
    ax2.plot([0, max_val], [0, max_val], 'k--', label='1:1 line')
    ax2.set_xlabel('Observed (m³/s)', fontsize=11)
    ax2.set_ylabel('Simulated (m³/s)', fontsize=11)
    ax2.set_title('Obs vs Sim Streamflow', fontweight='bold')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    # Monthly climatology (bottom left)
    ax3 = axes[1, 0]
    months = range(1, 13)
    month_names = ['J', 'F', 'M', 'A', 'M', 'J', 'J', 'A', 'S', 'O', 'N', 'D']
    
    ax3.plot(months, obs_monthly.values, 'o-', label='Observed', 
             color='blue', linewidth=2, markersize=6)
    ax3.plot(months, sim_monthly.values, 'o-', label='Simulated', 
             color='red', linewidth=2, markersize=6)
    
    ax3.set_xticks(months)
    ax3.set_xticklabels(month_names)
    ax3.set_ylabel('Mean Discharge (m³/s)', fontsize=11)
    ax3.set_title('Seasonal Flow Regime', fontweight='bold')
    ax3.legend()
    ax3.grid(True, alpha=0.3)
    
    # Flow duration curve (bottom right)
    ax4 = axes[1, 1]
    
    # Calculate exceedance probabilities
    obs_sorted = obs_valid.sort_values(ascending=False)
    sim_sorted = sim_valid.sort_values(ascending=False)
    obs_ranks = np.arange(1., len(obs_sorted) + 1) / len(obs_sorted) * 100
    sim_ranks = np.arange(1., len(sim_sorted) + 1) / len(sim_sorted) * 100
    
    ax4.semilogy(obs_ranks, obs_sorted, 'b-', label='Observed', linewidth=2)
    ax4.semilogy(sim_ranks, sim_sorted, 'r-', label='Simulated', linewidth=2)
    
    ax4.set_xlabel('Exceedance Probability (%)', fontsize=11)
    ax4.set_ylabel('Discharge (m³/s)', fontsize=11)
    ax4.set_title('Flow Duration Curve', fontweight='bold')
    ax4.legend()
    ax4.grid(True, alpha=0.3)
    
    plt.suptitle(f'Basin-Scale Streamflow Evaluation - {config_dict["DOMAIN_NAME"]}',
                 fontsize=14, fontweight='bold')
    plt.tight_layout()
    plt.show()

else:
    print("⚠️  Cannot perform streamflow evaluation - missing simulation or observation data")


**Ready to explore Semi-Distributed basin simulations?** → **[Tutorial 03b: Basin Scale - Semi-Distributed Watershed](./02b_basin_semi_distributed.ipynb)**