# CONFLUENCE Tutorial 02b ‚Äî Basin-Scale Workflow (Bow River at Banff, Semi-Distributed)

## Introduction

This tutorial advances from lumped to semi-distributed watershed modeling. Instead of representing the basin as a single unit, we now subdivide it into multiple connected sub-basins (GRUs) that capture spatial variability while maintaining computational efficiency.

Building on Tutorial 02a's lumped approach, semi-distributed modeling adds spatial detail through automated watershed delineation that creates multiple sub-basins, stream network topology that connects GRUs through routing, and spatially-distributed characteristics that better represent elevation gradients and heterogeneous processes.

The key configuration change is `DOMAIN_DEFINITION_METHOD: 'delineate'` with a `STREAM_THRESHOLD` parameter controlling the number of sub-basins. Smaller thresholds create more GRUs (finer spatial detail) but increase computational cost.

We continue with the **Bow River at Banff** watershed, now discretized into multiple GRUs connected by mizuRoute for explicit stream network routing. This approach improves representation of snowmelt timing, spatial climate variability, and runoff generation patterns.


# Step 1 ‚Äî Configuration and data reuse

We generate a semi-distributed configuration and intelligently reuse data from Tutorial 02a where possible.

In [None]:
# Step 1 ‚Äî Semi-distributed configuration with data reuse

from pathlib import Path
import yaml
import shutil
import sys
sys.path.append(str(Path("../..").resolve()))
from CONFLUENCE import CONFLUENCE

# Define directories
CONFLUENCE_CODE_DIR = Path("../..").resolve()
CONFLUENCE_DATA_DIR = Path("/path/to/CONFLUENCE_data").resolve()

# Load template
config_template = CONFLUENCE_CODE_DIR / '0_config_files' / 'config_template.yaml'
with open(config_template, 'r') as f:
    config = yaml.safe_load(f)

# === Modify for semi-distributed basin ===
config['CONFLUENCE_CODE_DIR'] = str(CONFLUENCE_CODE_DIR)
config['CONFLUENCE_DATA_DIR'] = str(CONFLUENCE_DATA_DIR)
config['DOMAIN_NAME'] = 'Bow_at_Banff_semi_distributed'
config['EXPERIMENT_ID'] = 'run_1'
config['POUR_POINT_COORDS'] = '51.1722/-115.5717'

# Key changes for semi-distributed
config['DOMAIN_DEFINITION_METHOD'] = 'delineate'  # Watershed subdivision
config['STREAM_THRESHOLD'] = 5000  # Controls number of sub-basins
config['DOMAIN_DISCRETIZATION'] = 'GRUs'

config['HYDROLOGICAL_MODEL'] = 'SUMMA'
config['ROUTING_MODEL'] = 'mizuRoute'

config['EXPERIMENT_TIME_START'] = '2011-01-01 01:00'
config['EXPERIMENT_TIME_END'] = '2018-12-31 23:00'
config['CALIBRATION_PERIOD'] = '2011-01-01, 2015-12-31'
config['EVALUATION_PERIOD'] = '2016-01-01, 2018-12-31'
config['SPINUP_PERIOD'] = '2011-01-01, 2011-12-31'
config['STATION_ID'] = '05BB001'
config['DOWNLOAD_WSC_DATA'] = True

# Save configuration
config_path = CONFLUENCE_CODE_DIR / '0_config_files' / 'config_semi_distributed.yaml'
with open(config_path, 'w') as f:
    yaml.dump(config, f, default_flow_style=False, sort_keys=False)

print(f"‚úÖ Configuration saved: {config_path}")

# === Data reuse from Tutorial 02a ===
lumped_domain = 'Bow_at_Banff_lumped'
lumped_data_dir = CONFLUENCE_DATA_DIR / f'domain_{lumped_domain}'

def copy_with_name_adaptation(src, dst, old_name, new_name):
    """Copy directory and adapt filenames"""
    if not src.exists():
        return False
    dst.parent.mkdir(parents=True, exist_ok=True)
    if src.is_file():
        shutil.copy2(src, dst)
        return True
    shutil.copytree(src, dst, dirs_exist_ok=True)
    # Rename files containing old domain name
    for file in dst.rglob('*'):
        if file.is_file() and old_name in file.name:
            new_file = file.parent / file.name.replace(old_name, new_name)
            file.rename(new_file)
    return True

# Initialize CONFLUENCE first
confluence = CONFLUENCE(config_path)
project_dir = confluence.managers['project'].setup_project()

if lumped_data_dir.exists():
    print(f"\nüìã Reusing data from Tutorial 02a: {lumped_data_dir}")
    
    reusable_data = {
        'Elevation': lumped_data_dir / 'attributes' / 'elevation',
        'Land Cover': lumped_data_dir / 'attributes' / 'land_cover',
        'Soils': lumped_data_dir / 'attributes' / 'soils',
        'Forcing': lumped_data_dir / 'forcing' / 'raw_data',
        'Streamflow': lumped_data_dir / 'observations' / 'streamflow'
    }
    
    for data_type, src_path in reusable_data.items():
        if src_path.exists():
            rel_path = src_path.relative_to(lumped_data_dir)
            dst_path = project_dir / rel_path
            success = copy_with_name_adaptation(src_path, dst_path, lumped_domain, config['DOMAIN_NAME'])
            if success:
                print(f"   ‚úÖ {data_type}: Copied")
        else:
            print(f"   üìã {data_type}: Not found")
else:
    print(f"\n‚ö†Ô∏è  No data from Tutorial 02a found. Will acquire fresh data.")

# Create pour point
pour_point_path = confluence.managers['project'].create_pour_point()
print(f"\n‚úÖ Project structure created at: {project_dir}")

## Step 2 ‚Äî Domain definition (multi-GRU)

Delineate the watershed into multiple sub-basins using stream network analysis and create connected GRUs.

### Step 2a ‚Äî Attribute check

Verify DEM availability from data reuse, or acquire fresh if needed.

In [None]:
# Step 2a ‚Äî DEM availability check
dem_path = project_dir / 'attributes' / 'elevation' / 'dem'
if not dem_path.exists() or len(list(dem_path.glob('*.tif'))) == 0:
    print("   DEM not found, acquiring geospatial attributes...")
    # If using MAF supported HPC, uncomment the line below
    # confluence.managers['data'].acquire_attributes()
    print("‚úÖ Geospatial attributes acquired")
else:
    print("‚úÖ DEM available from previous workflow")

### Step 2b ‚Äî Stream network delineation

Automated watershed subdivision based on stream threshold parameter.

In [None]:
# Step 2b ‚Äî Stream network delineation
watershed_path = confluence.managers['domain'].define_domain()
print("‚úÖ Stream network delineation complete")

### Step 2c ‚Äî GRU discretization

Convert sub-basins to GRUs with routing connectivity.

In [None]:
# Step 2c ‚Äî GRU discretization
hru_path = confluence.managers['domain'].discretize_domain()
print("‚úÖ GRU discretization complete")

### Step 2d ‚Äî Network visualization

Visualize the semi-distributed structure: sub-basins and stream network.

In [None]:
# Step 2d ‚Äî Network structure visualization

import geopandas as gpd
import matplotlib.pyplot as plt

# Load spatial products
basin_dir = project_dir / 'shapefiles' / 'river_basins'
network_dir = project_dir / 'shapefiles' / 'river_network'

basin_files = list(basin_dir.glob('*.shp'))
network_files = list(network_dir.glob('*.shp'))

if basin_files:
    basins_gdf = gpd.read_file(basin_files[0])
    print(f"Number of GRUs: {len(basins_gdf)}")
    
    fig, ax = plt.subplots(1, 1, figsize=(10, 10))
    basins_gdf.boundary.plot(ax=ax, color='blue', linewidth=1)
    basins_gdf.plot(ax=ax, column='GRU_ID', cmap='tab20', alpha=0.5, legend=False)
    
    if network_files:
        network_gdf = gpd.read_file(network_files[0])
        network_gdf.plot(ax=ax, color='darkblue', linewidth=2, label='Stream Network')
    
    pour_point_gdf = gpd.read_file(pour_point_path)
    pour_point_gdf.plot(ax=ax, color='red', markersize=150, marker='*', label='Pour Point')
    
    ax.set_title(f'Semi-Distributed Structure\n{len(basins_gdf)} Connected GRUs', fontweight='bold')
    ax.legend()
    ax.set_xlabel('Longitude')
    ax.set_ylabel('Latitude')
    plt.tight_layout()
    plt.show()
else:
    print("‚ö†Ô∏è  Basin shapefiles not found")

## Step 3 ‚Äî Data preprocessing

Process forcing and observation data for multiple GRUs.

In [None]:
# Step 3a ‚Äî Streamflow observations
# If using MAF supported HPC, uncomment the line below
# confluence.managers['data'].process_observed_data()
print("‚úÖ Streamflow data processing complete")

In [None]:
# Step 3b ‚Äî Forcing data
# If using MAF supported HPC, uncomment the line below  
# confluence.managers['data'].acquire_forcings()
print("‚úÖ Forcing acquisition complete")

In [None]:
# Step 3c ‚Äî Model-agnostic preprocessing
confluence.managers['data'].run_model_agnostic_preprocessing()
print("‚úÖ Model-agnostic preprocessing complete")

## Step 4 ‚Äî Model execution

Configure and run SUMMA-mizuRoute with multiple connected GRUs.

In [None]:
# Step 4a ‚Äî Model configuration
confluence.managers['model'].preprocess_models()
print("‚úÖ Semi-distributed model configuration complete")

In [None]:
# Step 4b ‚Äî Model execution
print(f"Running {config['HYDROLOGICAL_MODEL']} with {config['ROUTING_MODEL']} ({len(basins_gdf)} GRUs)...")
confluence.managers['model'].run_models()
print("‚úÖ Semi-distributed simulation complete")

## Step 5 ‚Äî Evaluation

Compare semi-distributed results against observations and lumped baseline.

In [None]:
# Step 5 ‚Äî Semi-distributed evaluation

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import xarray as xr

# Load observed streamflow
obs_path = project_dir / "observations" / "streamflow" / "preprocessed" / f"{config['DOMAIN_NAME']}_streamflow_processed.csv"
obs_df = pd.read_csv(obs_path, parse_dates=['datetime'])
obs_df.set_index('datetime', inplace=True)

# Load simulated streamflow
routing_dir = project_dir / "simulations" / config['EXPERIMENT_ID'] / "mizuRoute"
sim_files = list(routing_dir.glob('*_routed.nc'))
if not sim_files:
    raise FileNotFoundError(f"No routed streamflow in: {routing_dir}")

sim_ds = xr.open_dataset(sim_files[0])
sim_df = sim_ds['IRFroutedRunoff'].to_dataframe().reset_index()
sim_df = sim_df.rename(columns={'time': 'datetime', 'IRFroutedRunoff': 'discharge_sim'})
sim_df.set_index('datetime', inplace=True)

# Merge and align
eval_df = obs_df.join(sim_df, how='inner')
obs_valid = eval_df['discharge_obs'].dropna()
sim_valid = eval_df.loc[obs_valid.index, 'discharge_sim']

# Metrics
def nse(obs, sim):
    return float(1 - np.sum((obs - sim)**2) / np.sum((obs - obs.mean())**2))

def kge(obs, sim):
    r = obs.corr(sim)
    alpha = sim.std() / obs.std()
    beta = sim.mean() / obs.mean()
    return float(1 - np.sqrt((r-1)**2 + (alpha-1)**2 + (beta-1)**2))

def pbias(obs, sim):
    return float(100 * (sim.sum() - obs.sum()) / obs.sum())

nse_val = round(nse(obs_valid, sim_valid), 3)
kge_val = round(kge(obs_valid, sim_valid), 3)
pbias_val = round(pbias(obs_valid, sim_valid), 1)

print(f"Performance Metrics:")
print(f"  NSE: {nse_val}")
print(f"  KGE: {kge_val}")
print(f"  PBIAS: {pbias_val}%")

# Visualization
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Time series
axes[0, 0].plot(obs_valid.index, obs_valid.values, 'b-', label='Observed', linewidth=1.2, alpha=0.7)
axes[0, 0].plot(sim_valid.index, sim_valid.values, 'r-', label=f'Semi-Distributed ({len(basins_gdf)} GRUs)', 
                linewidth=1.2, alpha=0.7)
axes[0, 0].set_ylabel('Discharge (m¬≥/s)')
axes[0, 0].set_title('Semi-Distributed Streamflow')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)
axes[0, 0].text(0.02, 0.95, f"NSE: {nse_val}\nKGE: {kge_val}\nBias: {pbias_val}%\nGRUs: {len(basins_gdf)}",
                transform=axes[0, 0].transAxes, verticalalignment='top',
                bbox=dict(facecolor='white', alpha=0.8), fontsize=9)

# Scatter
axes[0, 1].scatter(obs_valid, sim_valid, alpha=0.5, s=10, c='green')
max_val = max(obs_valid.max(), sim_valid.max())
axes[0, 1].plot([0, max_val], [0, max_val], 'k--', alpha=0.5)
axes[0, 1].set_xlabel('Observed (m¬≥/s)')
axes[0, 1].set_ylabel('Simulated (m¬≥/s)')
axes[0, 1].set_title('Observed vs Simulated')
axes[0, 1].grid(True, alpha=0.3)

# Monthly climatology
monthly_obs = obs_valid.groupby(obs_valid.index.month).mean()
monthly_sim = sim_valid.groupby(sim_valid.index.month).mean()
month_names = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
               'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
axes[1, 0].plot(monthly_obs.index, monthly_obs.values, 'b-o', label='Observed', markersize=6)
axes[1, 0].plot(monthly_sim.index, monthly_sim.values, 'r-o', label='Simulated', markersize=6)
axes[1, 0].set_xticks(range(1, 13))
axes[1, 0].set_xticklabels(month_names)
axes[1, 0].set_ylabel('Mean Discharge (m¬≥/s)')
axes[1, 0].set_title('Seasonal Flow Regime')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Flow duration curve
obs_sorted = obs_valid.sort_values(ascending=False)
sim_sorted = sim_valid.sort_values(ascending=False)
obs_ranks = np.arange(1., len(obs_sorted) + 1) / len(obs_sorted) * 100
sim_ranks = np.arange(1., len(sim_sorted) + 1) / len(sim_sorted) * 100
axes[1, 1].semilogy(obs_ranks, obs_sorted, 'b-', label='Observed', linewidth=2)
axes[1, 1].semilogy(sim_ranks, sim_sorted, 'r-', label='Simulated', linewidth=2)
axes[1, 1].set_xlabel('Exceedance Probability (%)')
axes[1, 1].set_ylabel('Discharge (m¬≥/s)')
axes[1, 1].set_title('Flow Duration Curve')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.suptitle(f'Semi-Distributed Evaluation ‚Äî {config["DOMAIN_NAME"]} ({len(basins_gdf)} GRUs)',
             fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("\n‚úÖ Semi-distributed evaluation complete")

## Summary

This tutorial demonstrated semi-distributed watershed modeling with multiple connected GRUs. Key advances over lumped modeling include spatial representation of elevation gradients, explicit stream network routing between sub-basins, and improved process attribution across the watershed.

Achievements:
- Automated watershed subdivision using stream threshold
- Multi-GRU model configuration with routing connectivity
- Efficient data reuse from Tutorial 02a
- Spatially-distributed process simulation

The semi-distributed approach balances spatial detail with computational efficiency, providing the foundation for fully distributed modeling applications.

### Next: Elevation-Based Distributed Modeling

**Ready for maximum spatial detail?** ‚Üí **[Tutorial 02c: Elevation-Based Distributed Watershed](./02c_basin_distributed.ipynb)**