# CONFLUENCE Tutorial: Point-Scale Workflow (Snotel Example)

This notebook demonstrates CONFLUENCE's simplest spatial configuration: point-scale modeling. We'll simulate vertical processes at a single location, specifically for a snow monitoring location: [The Loveland Basin SNOTEL Station (id: 602)]('https://wcc.sc.egov.usda.gov/nwcc/site?sitenum=602')

## Overview

Point-scale modeling focuses on vertical processes without spatial heterogeneity here we will compare snow accumulation and melt at the station. 
 
## Learning Objectives

1. Understand point-scale modeling in CONFLUENCE
2. Configure CONFLUENCE for single-point simulations
3. Analyze point-scale SUMMA snow simulations

In [None]:
# Import required libraries
import sys
import os
from pathlib import Path
import yaml
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd
from datetime import datetime
import contextily as cx
import xarray as xr
import numpy as np

# Add CONFLUENCE to path
confluence_path = Path('../').resolve()
sys.path.append(str(confluence_path))

# Import main CONFLUENCE class
from CONFLUENCE import CONFLUENCE

# Set up plotting style
plt.style.use('default')
%matplotlib inline

## Create Point-Scale Configuration

We'll use the point configuration template in 0_config_files/0_config_point_template.yaml as our baseline. 

We copy the template to a new location and update teh key configuration settings for our specific experiment.

In [None]:
# Set directory paths
CONFLUENCE_CODE_DIR = confluence_path
CONFLUENCE_DATA_DIR = Path('/work/comphyd_lab/data/CONFLUENCE_data')  # ← User should modify this path

# Load template configuration
config_template_path = CONFLUENCE_CODE_DIR / '0_config_files' / 'config_point_template.yaml'

# Read config file
with open(config_template_path, 'r') as f:
    config_dict = yaml.safe_load(f)

# Update paths and settings 
config_dict['CONFLUENCE_CODE_DIR'] = str(CONFLUENCE_CODE_DIR)
config_dict['CONFLUENCE_DATA_DIR'] = str(CONFLUENCE_DATA_DIR)

# Update name and experiment id
config_dict['DOMAIN_NAME'] = 'loveland_pass_snotel'
config_dict['EXPERIMENT_ID'] = 'point_scale_tutorial'

# Save point-scale configuration to temporary file
temp_config_path = CONFLUENCE_CODE_DIR / '0_config_files' / 'config_point_notebook.yaml'
with open(temp_config_path, 'w') as f:
    yaml.dump(config_dict, f)

# Initialize CONFLUENCE with the new configuration
confluence = CONFLUENCE(temp_config_path)

# Print summary of the key settings
print("=== Point-Scale Configuration ===")
print(f"Domain Name: {config_dict['DOMAIN_NAME']}")
print(f"Spatial Mode: {config_dict['SPATIAL_MODE']}")
print(f"Location: {config_dict['POUR_POINT_COORDS']}")
print(f"Period: {config_dict['EXPERIMENT_TIME_START']} to {config_dict['EXPERIMENT_TIME_END']}")
print(f"Forcing Data: {config_dict['FORCING_DATASET']}")

## 3. Setup Project Structure

1. We setup the basic directory structure under the root data directory specified in the CONFLUENCE_DATA_DIR setting in the configuration file
2. We create a point shapefile from the coordinates given in POUR_POINT_COORDS

In [None]:
# Step 1: Project Initialization
print("=== Step 1: Project Initialization ===")
print("Creating point-scale project structure...")

# Setup project
project_dir = confluence.managers['project'].setup_project()

# Create pour point (in this case, our SNOTEL location)
pour_point_path = confluence.managers['project'].create_pour_point()

# List created directories
print("\nCreated directories:")
for item in sorted(project_dir.iterdir()):
    if item.is_dir():
        print(f"  📁 {item.name}")

## 4. Geospatial Domain Definition - Data acquisition 
Now we will acquire the geospatial attributes we need to setup our model

- Elevation
- Land Cover
- Soil Classifications

We use the Model Agnostic Framework [gistool (Keshavarz et al., 2025)](https://github.com/CH-Earth/gistool) to subset the data based on the coordinates set in: BOUNDING_BOX_COORDS

When SPATIAL_MODE is set to Point confluence automatically updates the BOUNDING_BOX_COORDS to a square buffer that is by default 0.001 degrees, the point buffer distance setting can be set in the POINT_BUFFER_DISTANCE setting

In [None]:
# Acquire attributes
print("Acquiring geospatial attributes for point location...")

print(f"Minimal bounding box: {confluence.config.get('BOUNDING_BOX_COORDS')}")

#confluence.managers['data'].acquire_attributes()

## 6. Geospatial Domain Definition - Domain creation

Confluence creates the required shapefiles to pre-process and configure the models

When run in point configuration a square polygon is produced in path/to/domain_dir/shapefiles/catchment

In [None]:
# Geospatial Domain Definition and Analysis
print("=== Step 2: Geospatial Domain Definition and Analysis ===")

# Define domain
print("\nDefining minimal domain for point-scale simulation...")
watershed_path = confluence.managers['domain'].define_domain()
# Discretize domain (single HRU for point-scale)
print("\nCreating single HRU for point-scale simulation...")
hru_path = confluence.managers['domain'].discretize_domain()

# Check outputs
print("\nDomain definition complete:")
print(f"  - Watershed defined: {watershed_path is not None}")
print(f"  - HRUs created: {hru_path is not None}")

## 7. Model Agnostic Data Pre-Processing - Data Acquisition
Next we need meteorological data to run our models and observations to compare them with. 

We use the Model Agnostic Framework [datatool (Keshavarz et al., 2025)](https://github.com/CH-Earth/datatool) to subset the forcing dataset defined in FORCING_DATASET: for the spatial bounding box for the period configured between EXPERIMENT_TIME_START and EXPERIMENT_TIME_END.

A temperature lapse rate can be applied to the forcing data by setting APPLY_LAPSE_RATE: True, the lapse rate can be set with the LAPSE_RATE which defaults to 0.0065 K m-1

In [None]:
# Step 3: Model Agnostic Data Pre-Processing
print("=== Step 3: Model Agnostic Data Pre-Processing ===")

# Acquire forcings
print(f"\nAcquiring forcing data for point location...")
print(f"Dataset: {confluence.config['FORCING_DATASET']}")
#confluence.managers['data'].acquire_forcings()

MAF has the snotel dataset in storage and confluence can aquire data based on station id by setting DOWNLOAD_SNOTEL: 'true' and the appropriate setting for SNOTEL_STATION, which in our case is '602'. The path to the SNOTEL data can be set manually with the SNOTEL_PATH configuration 


In [None]:
# Process observed data 
print("Processing observed data...")
#confluence.managers['data'].process_observed_data()

## 8. Model Agnostic Data Pre-Processing - Remapping and zonal statistics

Remapping of the forcing data and zonal statistics calcuations for the geospatial attributes is preformed in one model agnostic pre-processing step. 

The forcing data are remapped onto the defined hydrofabric using [EASYMORE (Gherari et al., 2023)](https://www.sciencedirect.com/science/article/pii/S2352711023002431)
Zonal statistics are run using [rasterstats]('https://pypi.org/project/rasterstats/')

In [None]:
# Run model-agnostic preprocessing
print("\nRunning model-agnostic preprocessing...")

confluence.managers['data'].run_model_agnostic_preprocessing()

print("\nModel-agnostic preprocessing complete")

## 9. Model Specific - Pre Processing 

Using the model agnostic output the model specific input files are prepared in one model specific preprocessing step

In [None]:
# Step 4: Model Specific Processing and Initialization
print("=== Step 4: Model Specific Processing and Initialization ===")

confluence.managers['model'].preprocess_models()

print("\nModel-specific preprocessing complete")

## 10. Model Specific - Initialisation

Once the model input files have been created the models are instantiated with their default configurations

In [None]:
# Run models
print(f"\nRunning {confluence.config['HYDROLOGICAL_MODEL']} for point-scale simulation...")
confluence.managers['model'].run_models()

print("\nPoint-scale model run complete")

## 11 - Result visualisation

Now let's look how our simulations turned out

In [None]:
# Step 11: Visualize Observed vs. Simulated SWE
print("=== Step 11: Comparing Observed vs. Simulated SWE ===")

# 1. Load the observed SWE data
obs_swe_path = Path(config_dict['CONFLUENCE_DATA_DIR']) / f"domain_{config_dict['DOMAIN_NAME']}" / "observations" / "snow" / "swe" / f"{config_dict['DOMAIN_NAME']}_swe_processed.csv"

# Check if observed data exists
if not obs_swe_path.exists():
    print(f"Warning: Observed SWE data not found at {obs_swe_path}")
    print("Checking for alternative locations...")
    # Try to find data in parent directories
    alt_paths = list(Path(config_dict['CONFLUENCE_DATA_DIR']).glob(f"**/observations/snow/swe/*_swe_processed.csv"))
    if alt_paths:
        obs_swe_path = alt_paths[0]
        print(f"Found alternative SWE data at: {obs_swe_path}")
    else:
        print("No observed SWE data found. Only simulated data will be displayed.")

# Load observed SWE data if available
if obs_swe_path.exists():
    print(f"Loading observed SWE data from: {obs_swe_path}")
    obs_swe = pd.read_csv(obs_swe_path, parse_dates=['Date'])
    obs_swe.set_index('Date', inplace=True)
    print(f"Observed data period: {obs_swe.index.min()} to {obs_swe.index.max()}")
    print(f"Observed SWE range: {obs_swe['SWE'].min():.2f} to {obs_swe['SWE'].max():.2f} mm")
else:
    obs_swe = None

# 2. Load the simulated SWE data
sim_path = Path(config_dict['CONFLUENCE_DATA_DIR']) / f"domain_{config_dict['DOMAIN_NAME']}" / "simulations" / config_dict['EXPERIMENT_ID'] / "SUMMA" / f"{config_dict['EXPERIMENT_ID']}_day.nc"

# Check for alternative NetCDF file patterns if not found
if not sim_path.exists():
    print(f"Simulated data not found at {sim_path}")
    print("Checking for alternative NetCDF files...")
    alt_sim_paths = list(Path(config_dict['CONFLUENCE_DATA_DIR']).glob(f"domain_{config_dict['DOMAIN_NAME']}/simulations/{config_dict['EXPERIMENT_ID']}/SUMMA/*.nc"))
    
    if alt_sim_paths:
        sim_path = alt_sim_paths[0]
        print(f"Found alternative simulation data at: {sim_path}")
    else:
        raise FileNotFoundError(f"No simulation results found for experiment {config_dict['EXPERIMENT_ID']}")

# Load simulated data
print(f"Loading simulated data from: {sim_path}")
ds = xr.open_dataset(sim_path)

# Extract scalarSWE and convert to DataFrame
sim_swe = ds['scalarSWE'].to_dataframe().reset_index()
# Assuming first HRU for point-scale simulation
sim_swe = sim_swe[sim_swe['hru'] == 1][['time', 'scalarSWE']]
sim_swe.columns = ['Date', 'SWE']
sim_swe.set_index('Date', inplace=True)
print(f"Simulated data period: {sim_swe.index.min()} to {sim_swe.index.max()}")
print(f"Simulated SWE range: {sim_swe['SWE'].min():.2f} to {sim_swe['SWE'].max():.2f} mm")

# 3. Find common date range if observed data exists
if obs_swe is not None:
    # Ensure same frequency for both datasets
    obs_swe = obs_swe.resample('D').mean()  # Daily mean if multiple obs per day
    sim_swe = sim_swe.resample('D').mean()  # Daily mean if sub-daily sim data
    
    # Find common date range
    start_date = max(obs_swe.index.min(), sim_swe.index.min())
    end_date = min(obs_swe.index.max(), sim_swe.index.max())
    
    print(f"\nCommon data period: {start_date} to {end_date}")
    
    # Filter to common period
    obs_period = obs_swe.loc[start_date:end_date]
    sim_period = sim_swe.loc[start_date:end_date]
    
    # Handle different units if necessary (assuming both are in mm)
    # If different units, add conversion here
    
    # Calculate performance metrics
    rmse = np.sqrt(((obs_period['SWE'] - sim_period['SWE']) ** 2).mean())
    bias = (sim_period['SWE'] - obs_period['SWE']).mean()
    corr = obs_period['SWE'].corr(sim_period['SWE'])
    
    print(f"Performance metrics:")
    print(f"  - RMSE: {rmse:.2f} mm")
    print(f"  - Bias: {bias:.2f} mm")
    print(f"  - Correlation: {corr:.2f}")
    
    # 4. Visualize the comparison
    plt.figure(figsize=(12, 6))
    
    # Plot both time series
    plt.plot(obs_period.index, obs_period['SWE'], 'o-', label='Observed SWE', color='black', alpha=0.7, markersize=4)
    plt.plot(sim_period.index, sim_period['SWE'], '-', label='Simulated SWE', color='blue', linewidth=2)
        
    # Styling
    plt.title(f"SWE Comparison at {config_dict['DOMAIN_NAME'].replace('_', ' ').title()}", fontsize=14)
    plt.xlabel('Date', fontsize=12)
    plt.ylabel('Snow Water Equivalent (mm)', fontsize=12)
    plt.grid(True, alpha=0.3)
    plt.legend(fontsize=12)
    
    # Add annotation with metrics
    plt.text(0.02, 0.95, f"RMSE: {rmse:.2f} mm\nBias: {bias:.2f} mm\nCorr: {corr:.2f}", 
             transform=plt.gca().transAxes, fontsize=12, 
             bbox=dict(facecolor='white', alpha=0.8, boxstyle='round,pad=0.5'))
    
    plt.tight_layout()
    plt.show()
    
    # 5. Scatter plot
    plt.figure(figsize=(8, 8))
    plt.scatter(obs_period['SWE'], sim_period['SWE'], color='blue', alpha=0.7)
    
    # Add 1:1 line
    max_val = max(obs_period['SWE'].max(), sim_period['SWE'].max())
    plt.plot([0, max_val], [0, max_val], 'k--', label='1:1 line')
    
    # Styling
    plt.title(f"Observed vs. Simulated SWE", fontsize=14)
    plt.xlabel('Observed SWE (mm)', fontsize=12)
    plt.ylabel('Simulated SWE (mm)', fontsize=12)
    plt.grid(True, alpha=0.3)
    plt.legend(fontsize=12)
    plt.axis('equal')
    
    # Add annotation with metrics
    plt.text(0.02, 0.95, f"RMSE: {rmse:.2f} mm\nBias: {bias:.2f} mm\nCorr: {corr:.2f}", 
             transform=plt.gca().transAxes, fontsize=12, 
             bbox=dict(facecolor='white', alpha=0.8, boxstyle='round,pad=0.5'))
    
    plt.tight_layout()
    plt.show()

else:
    # If no observed data, just plot simulated
    plt.figure(figsize=(12, 6))
    plt.plot(sim_swe.index, sim_swe['SWE'], '-', label='Simulated SWE', color='blue', linewidth=2)
    plt.title(f"Simulated SWE at {config_dict['DOMAIN_NAME'].replace('_', ' ').title()}", fontsize=14)
    plt.xlabel('Date', fontsize=12)
    plt.ylabel('Snow Water Equivalent (mm)', fontsize=12)
    plt.grid(True, alpha=0.3)
    plt.legend(fontsize=12)
    plt.tight_layout()
    plt.show()

# Close the dataset
ds.close()

print("\nSWE visualization complete")

## Summary: Point-Scale Modeling Insights

### Next Steps:

1. **Scale up to lumped basin**: Use calibrated parameters
2. **Compare multiple sites**: Test model transferability
