# CONFLUENCE Tutorial: Distributed Basin Workflow with Delineation

This notebook demonstrates the distributed modeling approach using the delineation method. We'll use the same Bow River at Banff location but create a distributed model with multiple GRUs (Grouped Response Units).

## Key Differences from Lumped Model

- **Domain Method**: `delineate` instead of `lumped`
- **Stream Threshold**: 5000 (creates more sub-basins)
- **Multiple GRUs**: Each sub-basin becomes a GRU
- **Routing**: mizuRoute connects the GRUs

## Learning Objectives

1. Understand watershed delineation with stream networks
2. Create a distributed model with multiple GRUs
3. Compare with lumped approach from Tutorial 1

## 1. Setup and Import Libraries

In [None]:
# Import required libraries
import sys
import os
from pathlib import Path
import yaml
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd
from datetime import datetime
import numpy as np

# Add CONFLUENCE to path
confluence_path = Path('../').resolve()
sys.path.append(str(confluence_path))

# Import CONFLUENCE
from CONFLUENCE import CONFLUENCE

# Set up plotting style
plt.style.use('default')
%matplotlib inline

## 2. Create Configuration for Distributed Model

We'll modify the configuration from Tutorial 1 to create a distributed model.

In [None]:
# Load the template configuration
config_template_path = confluence_path / '0_config_files' / 'config_template.yaml'
with open(config_template_path, 'r') as f:
    config = yaml.safe_load(f)

# Modify for distributed delineation
config['DOMAIN_NAME'] = 'Bow_at_Banff_distributed'
config['EXPERIMENT_ID'] = 'distributed_tutorial'
config['DOMAIN_DEFINITION_METHOD'] = 'delineate'  # Changed from 'lumped'
config['STREAM_THRESHOLD'] = 5000  # Higher threshold for fewer sub-basins
config['DOMAIN_DISCRETIZATION'] = 'GRUs'  # Keep as GRUs
config['SPATIAL_MODE'] = 'Distributed'  # Changed from 'Lumped'

# Use just one parallel process for this tutorial
config['MPI_PROCESSES'] = 1

# Save the modified configuration
distributed_config_path = Path('./bow_distributed_config.yaml')
with open(distributed_config_path, 'w') as f:
    yaml.dump(config, f, default_flow_style=False)

print("=== Modified Configuration for Distributed Model ===")
print(f"Domain Name: {config['DOMAIN_NAME']}")
print(f"Domain Method: {config['DOMAIN_DEFINITION_METHOD']}")
print(f"Stream Threshold: {config['STREAM_THRESHOLD']}")
print(f"Spatial Mode: {config['SPATIAL_MODE']}")
print(f"\nConfiguration saved to: {distributed_config_path}")

## 3. Initialize CONFLUENCE with Distributed Configuration

In [None]:
# Initialize CONFLUENCE with distributed configuration
confluence = CONFLUENCE(distributed_config_path)

print("=== CONFLUENCE Initialized for Distributed Model ===")
print(f"Project Directory: {confluence.project_dir}")
print(f"Data Directory: {confluence.data_dir}")

## 4. Visualize Lumped vs Distributed Concept

In [None]:
# Create visualization comparing lumped vs distributed
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# Lumped representation
ax1.add_patch(plt.Rectangle((0, 0), 1, 1, fill=True, color='lightblue', alpha=0.7))
ax1.text(0.5, 0.5, 'Entire Basin\n(1 Unit)', ha='center', va='center', fontsize=14, fontweight='bold')
ax1.arrow(0.5, 0.05, 0, -0.1, head_width=0.03, head_length=0.02, fc='blue', ec='blue')
ax1.text(0.5, -0.08, 'Single Output', ha='center', fontsize=10)
ax1.set_xlim(-0.1, 1.1)
ax1.set_ylim(-0.15, 1.1)
ax1.set_title('Lumped Model (Tutorial 1)', fontsize=16, fontweight='bold')
ax1.axis('off')

# Distributed representation
# Create sub-basins with river network
colors = ['lightblue', 'lightgreen', 'lightcoral', 'lightyellow']
positions = [(0.2, 0.7), (0.7, 0.7), (0.2, 0.3), (0.5, 0.1)]
labels = ['GRU 1', 'GRU 2', 'GRU 3', 'GRU 4']

for i, (pos, color, label) in enumerate(zip(positions, colors, labels)):
    circle = plt.Circle(pos, 0.15, fill=True, color=color, alpha=0.7, edgecolor='black')
    ax2.add_patch(circle)
    ax2.text(pos[0], pos[1], label, ha='center', va='center', fontsize=10, fontweight='bold')

# Draw river network
ax2.plot([0.2, 0.2, 0.5], [0.7, 0.3, 0.1], 'b-', linewidth=3)  # Main stem
ax2.plot([0.7, 0.5], [0.7, 0.1], 'b-', linewidth=2)  # Tributary
ax2.arrow(0.5, 0.1, 0, -0.1, head_width=0.03, head_length=0.02, fc='blue', ec='blue')
ax2.text(0.5, -0.02, 'Routed Output', ha='center', fontsize=10)

ax2.set_xlim(-0.1, 1.1)
ax2.set_ylim(-0.15, 1.1)
ax2.set_title('Distributed Model (This Tutorial)', fontsize=16, fontweight='bold')
ax2.axis('off')

plt.tight_layout()
plt.show()

## 5. Step 1: Setup Project Structure

In [None]:
# Setup project directories
print("Creating project directory structure for distributed model...")
confluence.setup_project()

# List created directories
print("\nCreated directories:")
for item in sorted(confluence.project_dir.iterdir()):
    if item.is_dir():
        print(f"  📁 {item.name}")

## 6. Step 2: Create Pour Point (Same as Lumped)

In [None]:
# Create pour point shapefile - same location as lumped model
print(f"Creating pour point shapefile from coordinates: {confluence.config['POUR_POINT_COORDS']}")
confluence.create_pourPoint()

# The pour point is the same as the lumped model
print("\nNote: The pour point location is identical to the lumped model.")
print("The difference is in how we subdivide the watershed above this point.")

## 7. Step 3: Acquire Geospatial Attributes

In [None]:
# Acquire DEM, soil classes, and land cover
print("Acquiring geospatial attributes...")
print("These are the same data sources as the lumped model.")

confluence.acquire_attributes()

print("\n✓ Geospatial attributes acquired")

## 8. Step 4: Delineate Distributed Domain

This is where the main difference occurs - we'll create multiple sub-basins instead of one lumped basin.

In [None]:
# Delineate the watershed with stream network
print(f"Delineating distributed watershed...")
print(f"Method: {confluence.config['DOMAIN_DEFINITION_METHOD']}")
print(f"Stream threshold: {confluence.config['STREAM_THRESHOLD']}")
print("\nThis will create multiple sub-basins connected by a stream network.")

confluence.define_domain()

# Check outputs
basin_path = confluence.project_dir / 'shapefiles' / 'river_basins'
network_path = confluence.project_dir / 'shapefiles' / 'river_network'

if basin_path.exists():
    basin_files = list(basin_path.glob('*.shp'))
    print(f"\n✓ Created basin shapefiles: {len(basin_files)}")
    
if network_path.exists():
    network_files = list(network_path.glob('*.shp'))
    print(f"✓ Created river network shapefiles: {len(network_files)}")
    
    # Load and check number of basins
    if basin_files:
        gdf = gpd.read_file(basin_files[0])
        print(f"\nNumber of sub-basins (GRUs): {len(gdf)}")
        print(f"Total area: {gdf.geometry.area.sum() / 1e6:.2f} km²")

## 9. Visualize the Distributed Domain

In [None]:
# Plot the distributed domain
print("Creating distributed domain visualization...")
confluence.plot_domain()

# Display the domain plot
plot_path = confluence.project_dir / 'plots' / 'domain' / 'domain_map.png'
if plot_path.exists():
    from IPython.display import Image, display
    display(Image(filename=str(plot_path)))
else:
    # Create custom visualization
    basin_files = list((confluence.project_dir / 'shapefiles' / 'river_basins').glob('*.shp'))
    network_files = list((confluence.project_dir / 'shapefiles' / 'river_network').glob('*.shp'))
    
    if basin_files and network_files:
        fig, ax = plt.subplots(figsize=(12, 10))
        
        # Load data
        basins = gpd.read_file(basin_files[0])
        rivers = gpd.read_file(network_files[0])
        
        # Plot basins with different colors
        basins.plot(ax=ax, column='GRU_ID', cmap='viridis', 
                   alpha=0.7, edgecolor='black', linewidth=0.5)
        
        # Plot river network
        rivers.plot(ax=ax, color='blue', linewidth=2)
        
        # Add pour point
        pour_point_path = confluence.project_dir / 'shapefiles' / 'pour_point' / f"{confluence.config['DOMAIN_NAME']}_pourPoint.shp"
        if pour_point_path.exists():
            pour_point = gpd.read_file(pour_point_path)
            pour_point.plot(ax=ax, color='red', markersize=150, marker='o', zorder=5)
        
        ax.set_title(f'Distributed Domain: {len(basins)} Sub-basins', fontsize=16, fontweight='bold')
        ax.set_xlabel('Longitude')
        ax.set_ylabel('Latitude')
        
        # Add colorbar for GRU IDs
        sm = plt.cm.ScalarMappable(cmap='viridis', 
                                   norm=plt.Normalize(vmin=basins['GRU_ID'].min(), 
                                                     vmax=basins['GRU_ID'].max()))
        sm._A = []
        cbar = fig.colorbar(sm, ax=ax, shrink=0.8)
        cbar.set_label('GRU ID', fontsize=12)
        
        plt.tight_layout()
        plt.show()

## 10. Step 5: Create GRU-based HRUs

In [None]:
# Create HRUs based on GRUs (1 GRU = 1 HRU)
print(f"Creating HRUs based on GRUs...")
print(f"Method: {confluence.config['DOMAIN_DISCRETIZATION']}")
print("For this tutorial: 1 GRU = 1 HRU (simplest case)")

confluence.discretize_domain()

# Check the created HRU shapefile
hru_path = confluence.project_dir / 'shapefiles' / 'catchment'
if hru_path.exists():
    hru_files = list(hru_path.glob('*.shp'))
    print(f"\n✓ Created HRU shapefiles: {len(hru_files)}")
    
    if hru_files:
        hru_gdf = gpd.read_file(hru_files[0])
        print(f"\nHRU Statistics:")
        print(f"Number of HRUs: {len(hru_gdf)}")
        print(f"Number of GRUs: {hru_gdf['GRU_ID'].nunique()}")
        print(f"Total area: {hru_gdf.geometry.area.sum() / 1e6:.2f} km²")
        
        # Show HRU distribution
        hru_counts = hru_gdf['GRU_ID'].value_counts()
        print(f"\nHRUs per GRU:")
        for gru_id, count in hru_counts.items():
            print(f"  GRU {gru_id}: {count} HRUs")

## 11. Remaining Steps (Same as Lumped)

The remaining workflow steps are similar to the lumped model, but CONFLUENCE handles the multiple GRUs automatically.

In [None]:
# Process observed data
print("Step 6: Processing observed streamflow data...")
confluence.process_observed_data()
print("✓ Observed data processed\n")

# Acquire forcing data
print("Step 7: Acquiring forcing data...")
confluence.acquire_forcings()
print("✓ Forcing data acquired\n")

# Model-agnostic preprocessing
print("Step 8: Running model-agnostic preprocessing...")
confluence.model_agnostic_pre_processing()
print("✓ Preprocessing completed\n")

# Model-specific preprocessing
print("Step 9: Preparing model input files...")
confluence.model_specific_pre_processing()
print("✓ Model inputs prepared")

## 12. Run the Distributed Model

In [None]:
# Run the model
print(f"Running distributed {confluence.config['HYDROLOGICAL_MODEL']} model...")
print(f"Number of GRUs: (check from previous output)")
print("Note: This will take longer than the lumped model due to multiple units.")

confluence.run_models()

print("\n✓ Model execution completed")

## 13. Visualize Distributed Results

In [None]:
# Create visualization
print("Creating distributed model visualization...")
confluence.visualise_model_output()

# Display results
plot_path = confluence.project_dir / 'plots' / 'results' / 'streamflow_comparison.png'
if plot_path.exists():
    from IPython.display import Image, display
    display(Image(filename=str(plot_path)))
    
    print("\nNote: The streamflow output is from the basin outlet (routed through all GRUs).")

## 14. Compare Distributed vs Lumped Structure

Let's create a visual comparison of the domain structures.

In [None]:
# Create comparison visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 8))

# Lumped model visualization (if exists)
lumped_basin_path = confluence.data_dir / 'domain_Bow_at_Banff_lumped' / 'shapefiles' / 'river_basins'
if lumped_basin_path.exists():
    lumped_files = list(lumped_basin_path.glob('*.shp'))
    if lumped_files:
        lumped_basin = gpd.read_file(lumped_files[0])
        lumped_basin.plot(ax=ax1, color='lightblue', edgecolor='navy', linewidth=2)
        ax1.set_title('Lumped Model\n(1 Unit)', fontsize=14, fontweight='bold')
        ax1.axis('off')

# Distributed model visualization
dist_basin_path = confluence.project_dir / 'shapefiles' / 'river_basins'
dist_network_path = confluence.project_dir / 'shapefiles' / 'river_network'

if dist_basin_path.exists() and dist_network_path.exists():
    dist_basin_files = list(dist_basin_path.glob('*.shp'))
    dist_network_files = list(dist_network_path.glob('*.shp'))
    
    if dist_basin_files and dist_network_files:
        basins = gpd.read_file(dist_basin_files[0])
        network = gpd.read_file(dist_network_files[0])
        
        basins.plot(ax=ax2, column='GRU_ID', cmap='viridis', 
                   edgecolor='black', linewidth=0.5, alpha=0.7)
        network.plot(ax=ax2, color='blue', linewidth=2)
        ax2.set_title(f'Distributed Model\n({len(basins)} GRUs)', fontsize=14, fontweight='bold')
        ax2.axis('off')

plt.suptitle('Lumped vs Distributed Domain Structure', fontsize=16, fontweight='bold')
plt.tight_layout()
plt.show()

## 15. Summary and Key Differences

### What we accomplished:
1. Created a distributed model with multiple GRUs
2. Used stream network delineation
3. Maintained 1:1 relationship between GRUs and HRUs
4. Ran the same model (SUMMA) in distributed mode

### Key differences from lumped model:
- **Multiple spatial units**: Several GRUs instead of one
- **River routing**: mizuRoute connects the GRUs
- **More detailed representation**: Captures spatial variability
- **Longer computation time**: More units to simulate

### Next steps:
1. Compare performance metrics between lumped and distributed
2. Experiment with different stream thresholds
3. Try different discretization methods (elevation bands, land cover)
4. Calibrate parameters for individual GRUs

In [None]:
# Print final summary
print("=== Distributed Workflow Complete ===\n")
print(f"Project: {confluence.config['DOMAIN_NAME']}")
print(f"Method: {confluence.config['DOMAIN_DEFINITION_METHOD']}")
print(f"Stream Threshold: {confluence.config['STREAM_THRESHOLD']}")
print(f"Model: {confluence.config['HYDROLOGICAL_MODEL']}")
print(f"\nKey outputs:")
print(f"  - Basin shapefile: shapefiles/river_basins/")
print(f"  - River network: shapefiles/river_network/")
print(f"  - Model results: simulations/{confluence.config['EXPERIMENT_ID']}/")
print(f"  - Comparison plots: plots/results/")