# CONFLUENCE Tutorial: Lumped Basin Workflow (Bow River at Banff)

This notebook walks through the complete CONFLUENCE workflow for a lumped basin model using the Bow River at Banff as an example. We'll execute each step individually to understand what's happening at each stage.

## Overview

We'll run through:
1. Project setup and configuration
2. Domain definition (watershed delineation)
3. Data acquisition (forcings and attributes)
4. Model preprocessing
5. Model execution
6. Results visualization

## 1. Setup and Import Libraries

In [1]:
# Import required libraries
import sys
import os
from pathlib import Path
import yaml
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd
from datetime import datetime
import contextily as cx

# Add CONFLUENCE to path
confluence_path = Path('../').resolve()
sys.path.append(str(confluence_path))

# Import CONFLUENCE
from CONFLUENCE import CONFLUENCE

# Set up plotting style
plt.style.use('default')
%matplotlib inline

## 2. Load and Examine Configuration

In [2]:
# Set directory paths
CONFLUENCE_CODE_DIR = confluence_path
CONFLUENCE_DATA_DIR = Path('/work/comphyd_lab/data/CONFLUENCE_data')  # ← User should modify this path 

# Verify paths exist
if not CONFLUENCE_CODE_DIR.exists():
    raise FileNotFoundError(f"CONFLUENCE code directory not found: {CONFLUENCE_CODE_DIR}")

if not CONFLUENCE_DATA_DIR.exists():
    print(f"Data directory doesn't exist. Creating: {CONFLUENCE_DATA_DIR}")
    CONFLUENCE_DATA_DIR.mkdir(parents=True, exist_ok=True)

# Load the template configuration
config_path = CONFLUENCE_CODE_DIR / '0_config_files' / 'config_template.yaml'

# Before loading, let's update the config with our directory paths
with open(config_path, 'r') as f:
    config_dict = yaml.safe_load(f)

# Update the directory paths in the config
config_dict['CONFLUENCE_CODE_DIR'] = str(CONFLUENCE_CODE_DIR)
config_dict['CONFLUENCE_DATA_DIR'] = str(CONFLUENCE_DATA_DIR)

# Save updated config to a temporary file
temp_config_path = CONFLUENCE_CODE_DIR / 'config_active.yaml'
with open(temp_config_path, 'w') as f:
    yaml.dump(config_dict, f, default_flow_style=False)

# Initialize CONFLUENCE with updated config
confluence = CONFLUENCE(temp_config_path)

# Display key configuration settings
print("=== Directory Configuration ===")
print(f"Code Directory: {CONFLUENCE_CODE_DIR}")
print(f"Data Directory: {CONFLUENCE_DATA_DIR}")
print("\n=== Key Configuration Settings ===")
print(f"Domain Name: {confluence.config['DOMAIN_NAME']}")
print(f"Pour Point: {confluence.config['POUR_POINT_COORDS']}")
print(f"Spatial Mode: {confluence.config['SPATIAL_MODE']}")
print(f"Model: {confluence.config['HYDROLOGICAL_MODEL']}")
print(f"Simulation Period: {confluence.config['EXPERIMENT_TIME_START']} to {confluence.config['EXPERIMENT_TIME_END']}")
print(f"Project Directory: {confluence.project_dir}")

2025-05-10 13:18:35,619 - confluence_general - INFO - Initializing VariableHandler for dataset: ERA5 and model: SUMMA


=== Directory Configuration ===
Code Directory: /home/darri.eythorsson/code/CONFLUENCE
Data Directory: /work/comphyd_lab/data/CONFLUENCE_data

=== Key Configuration Settings ===
Domain Name: Bow_at_Banff_lumped
Pour Point: 51.1722/-115.5717
Spatial Mode: Lumped
Model: SUMMA
Simulation Period: 2011-01-01 01:00 to 2022-12-31 23:00
Project Directory: /work/comphyd_lab/data/CONFLUENCE_data/domain_Bow_at_Banff_lumped


## 3. Step 1: Setup Project Structure

In [3]:
# Setup project directories
print("Creating project directory structure...")
confluence.setup_project()

# List created directories
print("\nCreated directories:")
for item in sorted(confluence.project_dir.iterdir()):
    if item.is_dir():
        print(f"  📁 {item.name}")

2025-05-10 13:18:37,465 - confluence_general - INFO - Setting up project for domain: Bow_at_Banff_lumped
2025-05-10 13:18:37,479 - confluence_general - INFO - Project directory created at: /work/comphyd_lab/data/CONFLUENCE_data/domain_Bow_at_Banff_lumped
2025-05-10 13:18:37,480 - confluence_general - INFO - shapefiles directories created


Creating project directory structure...

Created directories:
  📁 _workLog_Bow_at_Banff_lumped
  📁 attributes
  📁 cache
  📁 documentation
  📁 emulation
  📁 evaluation
  📁 forcing
  📁 observations
  📁 optimisation
  📁 plots
  📁 settings
  📁 shapefiles
  📁 simulations


## 4. Step 2: Create Pour Point Shapefile

In [None]:
# Create pour point shapefile from coordinates
print(f"Creating pour point shapefile from coordinates: {confluence.config['POUR_POINT_COORDS']}")
confluence.create_pourPoint()

# Visualize the pour point
pour_point_path = confluence.project_dir / 'shapefiles' / 'pour_point' / f"{confluence.config['DOMAIN_NAME']}_pourPoint.shp"
if pour_point_path.exists():
    import contextily as cx
    
    gdf = gpd.read_file(pour_point_path)
    
    # Reproject to Web Mercator for basemap compatibility
    gdf_web = gdf.to_crs(epsg=3857)
    
    fig, ax = plt.subplots(figsize=(12, 10))
    
    # Plot the pour point
    gdf_web.plot(ax=ax, color='red', markersize=200, marker='o', 
                 edgecolor='white', linewidth=2, zorder=5)
    
    # Add basemap
    cx.add_basemap(ax, 
                   source=cx.providers.CartoDB.Positron,
                   zoom=15,
                   alpha=0.8)
    
    # Calculate bounds with some padding
    minx, miny, maxx, maxy = gdf_web.total_bounds
    pad = 5000  # 5km padding in Web Mercator units
    ax.set_xlim(minx - pad, maxx + pad)
    ax.set_ylim(miny - pad, maxy + pad)
    
    # Add context label
    # Convert back to lat/lon for the label positioning
    lat, lon = gdf.geometry.iloc[0].y, gdf.geometry.iloc[0].x
    label_point = gpd.GeoDataFrame(
        geometry=gpd.points_from_xy([lon + 0.01], [lat + 0.01]),
        crs='EPSG:4326'
    ).to_crs(epsg=3857)
    
    ax.text(label_point.geometry.iloc[0].x, 
            label_point.geometry.iloc[0].y,
            'Bow River at Banff\n(Pour Point)', 
            fontsize=14, 
            bbox=dict(boxstyle='round,pad=0.5', facecolor='yellow', alpha=0.8),
            fontweight='bold',
            verticalalignment='bottom')
    
    # Add north arrow and scale bar
    from matplotlib.patches import FancyArrowPatch
    
    # North arrow
    arrow_x = ax.get_xlim()[0] + (ax.get_xlim()[1] - ax.get_xlim()[0]) * 0.9
    arrow_y = ax.get_ylim()[0] + (ax.get_ylim()[1] - ax.get_ylim()[0]) * 0.85
    arrow = FancyArrowPatch((arrow_x, arrow_y), 
                           (arrow_x, arrow_y + 1000),
                           mutation_scale=20, 
                           color='black',
                           zorder=10)
    ax.add_patch(arrow)
    ax.text(arrow_x, arrow_y + 1500, 'N', ha='center', va='bottom', fontweight='bold', fontsize=14)
    
    # Add coordinates to title
    ax.set_title(f'Pour Point Location: Bow River at Banff\nCoordinates: {lat:.4f}°N, {lon:.4f}°W', 
                fontsize=16, fontweight='bold', pad=20)
    
    # Remove axis labels (not meaningful in Web Mercator)
    ax.set_xlabel('')
    ax.set_ylabel('')
    ax.set_xticklabels([])
    ax.set_yticklabels([])
    
    plt.tight_layout()
    plt.show()
else:
    print("Pour point shapefile not found!")

Creating pour point shapefile from coordinates: 51.1722/-115.5717


2025-05-10 13:18:40,018 - pyogrio._io - INFO - Created 1 records
2025-05-10 13:18:40,021 - confluence_general - INFO - Pour point shapefile created successfully: /work/comphyd_lab/data/CONFLUENCE_data/domain_Bow_at_Banff_lumped/shapefiles/pour_point/Bow_at_Banff_lumped_pourPoint.shp


## 5. Step 3: Acquire Geospatial Attributes

In [None]:
# Acquire DEM, soil classes, and land cover
print("Acquiring geospatial attributes (DEM, soil, land cover)...")
print("This step downloads data from configured sources.")
print(f"Bounding box: {confluence.config['BOUNDING_BOX_COORDS']}")

confluence.acquire_attributes()

# Check downloaded files
attribute_dirs = {
    'DEM': confluence.project_dir / 'attributes' / 'elevation' / 'dem',
    'Soil': confluence.project_dir / 'attributes' / 'soilclass',
    'Land': confluence.project_dir / 'attributes' / 'landclass'
}

print("\nDownloaded attribute files:")
for name, path in attribute_dirs.items():
    if path.exists():
        files = list(path.glob('*.tif')) + list(path.glob('*.tiff'))
        print(f"  {name}: {len(files)} files")

## 6. Step 4: Define Domain (Watershed Delineation)

In [None]:
# Delineate the watershed
print(f"Delineating watershed using method: {confluence.config['DOMAIN_DEFINITION_METHOD']}")
print(f"Tool: {confluence.config['LUMPED_WATERSHED_METHOD']}")
print(f"Stream threshold: {confluence.config['STREAM_THRESHOLD']}")

confluence.define_domain()

# Check outputs
basin_path = confluence.project_dir / 'shapefiles' / 'river_basins'
if basin_path.exists():
    basin_files = list(basin_path.glob('*.shp'))
    print(f"\nCreated {len(basin_files)} basin shapefile(s)")
    for f in basin_files:
        print(f"  - {f.name}")

## 7. Visualize the Delineated Domain

In [None]:
# Plot the delineated domain
print("Creating domain visualization...")
confluence.plot_domain()

# Display the plot if created
plot_path = confluence.project_dir / 'plots' / 'domain' / 'domain_map.png'
if plot_path.exists():
    from IPython.display import Image, display
    display(Image(filename=str(plot_path)))
else:
    print("Domain plot not found. Creating simple visualization...")
    
    # Try to load and plot the basin shapefile
    basin_files = list((confluence.project_dir / 'shapefiles' / 'river_basins').glob('*.shp'))
    if basin_files:
        basin_gdf = gpd.read_file(basin_files[0])
        
        fig, ax = plt.subplots(figsize=(10, 8))
        basin_gdf.plot(ax=ax, facecolor='lightblue', edgecolor='navy', linewidth=2)
        
        # Add pour point
        if pour_point_path.exists():
            pour_gdf = gpd.read_file(pour_point_path)
            pour_gdf.plot(ax=ax, color='red', markersize=100, marker='o', zorder=5)
        
        ax.set_title('Bow River Basin at Banff', fontsize=16, fontweight='bold')
        ax.set_xlabel('Longitude')
        ax.set_ylabel('Latitude')
        ax.grid(True, alpha=0.3)
        plt.tight_layout()
        plt.show()

## 8. Step 5: Domain Discretization (Create Lumped HRU)

In [None]:
# Create lumped HRU
print(f"Creating lumped HRU using method: {confluence.config['DOMAIN_DISCRETIZATION']}")
confluence.discretize_domain()

# Check the created HRU shapefile
hru_path = confluence.project_dir / 'shapefiles' / 'catchment'
if hru_path.exists():
    hru_files = list(hru_path.glob('*.shp'))
    print(f"\nCreated {len(hru_files)} HRU shapefile(s)")
    
    if hru_files:
        # Load and display HRU properties
        hru_gdf = gpd.read_file(hru_files[0])
        print("\nHRU Properties:")
        print(f"Number of HRUs: {len(hru_gdf)}")
        print(f"Total area: {hru_gdf.geometry.area.sum() / 1e6:.2f} km²")
        
        # For lumped model, should be single HRU
        if len(hru_gdf) == 1:
            print("✓ Successfully created single lumped HRU")

## 9. Step 6: Process Observed Streamflow Data

In [None]:
# Process observed streamflow data
print(f"Processing observed streamflow data from: {confluence.config['STREAMFLOW_DATA_PROVIDER']}")
print(f"Station ID: {confluence.config['STATION_ID']}")

confluence.process_observed_data()

# Check if processed data exists
obs_path = confluence.project_dir / 'observations' / 'streamflow' / 'preprocessed' / f"{confluence.config['DOMAIN_NAME']}_streamflow_processed.csv"
if obs_path.exists():
    obs_df = pd.read_csv(obs_path)
    print(f"\nProcessed streamflow data:")
    print(f"Period: {obs_df.iloc[0]['Date']} to {obs_df.iloc[-1]['Date']}")
    print(f"Number of records: {len(obs_df)}")
    
    # Quick plot
    fig, ax = plt.subplots(figsize=(12, 6))
    ax.plot(pd.to_datetime(obs_df['Date']), obs_df['dischargeCubicMetresPerSecond'])
    ax.set_xlabel('Date')
    ax.set_ylabel('Discharge (m³/s)')
    ax.set_title(f'Observed Streamflow - Bow River at Banff ({confluence.config["STATION_ID"]})')
    ax.grid(True, alpha=0.3)
    plt.tight_layout()
    plt.show()

## 10. Step 7: Acquire Forcing Data

In [None]:
# Acquire forcing data
print(f"Acquiring forcing data: {confluence.config['FORCING_DATASET']}")
print(f"Period: {confluence.config['FORCING_START_YEAR']} to {confluence.config['FORCING_END_YEAR']}")
print(f"Variables: {confluence.config['FORCING_VARIABLES']}")

confluence.acquire_forcings()

# Check downloaded data
forcing_path = confluence.project_dir / 'forcing' / 'raw_data'
if forcing_path.exists():
    files = list(forcing_path.glob('*'))
    print(f"\nDownloaded {len(files)} forcing files")
    for f in files[:5]:  # Show first 5
        print(f"  - {f.name}")
    if len(files) > 5:
        print("  ...")

## 11. Step 8: Model-Agnostic Preprocessing

In [None]:
# Process forcing data for the basin
print("Running model-agnostic preprocessing...")
print("This step:")
print("  - Calculates basin-averaged forcing")
print("  - Applies lapse rate corrections")
print("  - Creates intersection shapefiles")

confluence.model_agnostic_pre_processing()

# Check outputs
basin_forcing_path = confluence.project_dir / 'forcing' / 'basin_averaged_data'
if basin_forcing_path.exists():
    files = list(basin_forcing_path.glob('*.nc'))
    print(f"\nCreated {len(files)} basin-averaged forcing files")

## 12. Step 9: Model-Specific Preprocessing

In [None]:
# Prepare model-specific input files
print(f"Preparing {confluence.config['HYDROLOGICAL_MODEL']} input files...")
confluence.model_specific_pre_processing()

# Check model input directory
model_input_path = confluence.project_dir / 'forcing' / f"{confluence.config['HYDROLOGICAL_MODEL']}_input"
if model_input_path.exists():
    files = list(model_input_path.glob('*'))
    print(f"\nCreated {len(files)} model input files")
    
# Check model settings
settings_path = confluence.project_dir / 'settings' / confluence.config['HYDROLOGICAL_MODEL']
if settings_path.exists():
    files = list(settings_path.glob('*'))
    print(f"\nCreated {len(files)} model configuration files")
    for f in files[:10]:  # Show first 10
        print(f"  - {f.name}")

## 13. Step 10: Run the Model

In [None]:
# Run the hydrological model
print(f"Running {confluence.config['HYDROLOGICAL_MODEL']} model...")
print("This may take several minutes depending on simulation length and system performance.")

confluence.run_models()

# Check output files
sim_path = confluence.project_dir / 'simulations' / confluence.config['EXPERIMENT_ID'] / confluence.config['HYDROLOGICAL_MODEL']
if sim_path.exists():
    files = list(sim_path.glob('*.nc'))
    print(f"\nModel completed. Created {len(files)} output files.")
    
    if confluence.config['ROUTING_MODEL'] == 'mizuRoute':
        mizu_path = confluence.project_dir / 'simulations' / confluence.config['EXPERIMENT_ID'] / 'mizuRoute'
        if mizu_path.exists():
            files = list(mizu_path.glob('*.nc'))
            print(f"Routing completed. Created {len(files)} output files.")

## 14. Step 11: Visualize Results

In [None]:
# Create visualization
print("Creating model output visualization...")
confluence.visualise_model_output()

# Display the streamflow comparison plot
plot_path = confluence.project_dir / 'plots' / 'results' / 'streamflow_comparison.png'
if plot_path.exists():
    from IPython.display import Image, display
    display(Image(filename=str(plot_path)))
else:
    print("Streamflow comparison plot not found")

## 15. Summary and Next Steps

Congratulations! You've completed a full lumped basin modeling workflow with CONFLUENCE.

### What we accomplished:
1. Set up a project for the Bow River at Banff
2. Delineated the watershed as a single lumped unit
3. Acquired and processed forcing data
4. Ran a hydrological model (SUMMA)
5. Visualized results against observations

### Next steps you could try:
1. Run model calibration (see notebook 05)
2. Try different model configurations
3. Extend the simulation period
4. Compare with distributed model results

### Key files created:
- Project configuration: `config_template.yaml`
- Model outputs: `simulations/{experiment_id}/`
- Plots: `plots/results/`
- Processed data: `forcing/basin_averaged_data/`

In [None]:
# Print summary of key outputs
print("=== Workflow Complete ===\n")
print(f"Project: {confluence.config['DOMAIN_NAME']}")
print(f"Experiment: {confluence.config['EXPERIMENT_ID']}")
print(f"Model: {confluence.config['HYDROLOGICAL_MODEL']}")
print(f"\nKey outputs:")
print(f"  - Watershed shapefile: shapefiles/river_basins/")
print(f"  - Model results: simulations/{confluence.config['EXPERIMENT_ID']}/")
print(f"  - Plots: plots/results/")
print(f"  - Forcing data: forcing/basin_averaged_data/")