# SYMFLUENCE Tutorial 1a — Point-Scale Workflow (Paradise SNOTEL)

## Introduction

This notebook demonstrates the point-scale modeling workflow in **SYMFLUENCE**, a framework for reproducible and modular computational hydrology. At the point scale, we simulate vertical energy and water fluxes at a single site, independent of routing or lateral flow, to isolate and evaluate model process representations.

Here, we focus on the **Paradise SNOTEL station (ID 602)**, located at 1,630 m elevation in Washington’s Cascade Range. This site represents a transitional snow climate and provides long-term observations of snow water equivalent (SWE) and soil moisture across multiple depths. By reproducing the observed seasonal snow and soil moisture dynamics, this tutorial demonstrates how SYMFLUENCE structures a controlled, transparent, and fully reproducible point-scale experiment.

Through this example, you will see how configuration-driven workflows manage experiment setup, geospatial definition, input data preprocessing, model instantiation, and performance evaluation—building a foundation for more complex distributed modeling studies later in the series.

### What you will learn

1. **Typed configuration** — Create a validated config using `SymfluenceConfig.from_minimal()`
2. **Point-scale domain** — Define a single-GRU domain around a SNOTEL station
3. **Input preprocessing** — Acquire forcings, observations, and run model-agnostic preprocessing
4. **Model execution** — Run SUMMA for a point-scale simulation
5. **Calibration** — Calibrate snow parameters using Differential Evolution (DE)

In [None]:
# Environment setup — HDF5 locking workaround and verification
#
# The HDF5_USE_FILE_LOCKING='FALSE' setting prevents file-locking errors
# that can occur on certain filesystems (NFS, FUSE, conda-managed HDF stacks).

import os
import sys
import warnings
from pathlib import Path

os.environ['HDF5_USE_FILE_LOCKING'] = 'FALSE'

warnings.filterwarnings('ignore', message='.*is an EXPERIMENTAL module.*')
warnings.filterwarnings('ignore', message='.*import failed.*')

print(f"Python executable: {sys.executable}")

try:
    import symfluence
    print(f"SYMFLUENCE version: {symfluence.__version__}")
    print(f"SYMFLUENCE location: {Path(symfluence.__file__).parent}")
except ImportError:
    print("ERROR: SYMFLUENCE not found. Please activate the symfluence environment.")
    sys.exit(1)

# Step 1 — Configuration

We create a typed, validated configuration using `SymfluenceConfig.from_minimal()`. This replaces the legacy YAML-template workflow with a single factory call that provides:

- **Type safety** — Pydantic validates all fields at creation time
- **Smart defaults** — Model-specific defaults are applied automatically
- **Immutability** — Configuration is frozen after creation (thread-safe, reproducible)

> **Note:** All configuration is frozen once created. If you need to change a setting, you must re-create the config and re-initialize SYMFLUENCE.

In [None]:
# Step 1a — Create typed configuration for Paradise SNOTEL

from pathlib import Path
from symfluence.core.config.models import SymfluenceConfig

# Resolve code and data directories
# Convention: SYMFLUENCE_data sits alongside the SYMFLUENCE repo directory
repo_root = Path(__file__).resolve().parent.parent.parent if '__file__' in dir() else Path.cwd().resolve()
# Walk up until we find the repo root (contains src/symfluence)
while repo_root != repo_root.parent:
    if (repo_root / 'src' / 'symfluence').exists():
        break
    repo_root = repo_root.parent

data_dir = repo_root.parent / 'SYMFLUENCE_data'
if not data_dir.exists():
    # Fallback: check environment variable or use default
    import os
    data_dir = Path(os.getenv('SYMFLUENCE_DATA_DIR', str(repo_root / 'data')))

print(f"Repo root: {repo_root}")
print(f"Data dir:  {data_dir}")

config = SymfluenceConfig.from_minimal(
    # === Identification ===
    domain_name='paradise',
    experiment_id='run_1',

    # === Paths ===
    SYMFLUENCE_DATA_DIR=str(data_dir),
    SYMFLUENCE_CODE_DIR=str(repo_root),

    # === Model & Forcing ===
    model='SUMMA',
    forcing_dataset='ERA5',

    # === Spatial Domain (point-scale) ===
    definition_method='point',
    discretization='GRUs',
    pour_point_coords='46.78/-121.75',
    bounding_box_coords='46.781/-121.751/46.779/-121.749',

    # === Temporal Extent ===
    time_start='2000-01-01 01:00',
    time_end='2002-12-31 23:00',
    spinup_period='2000-01-01, 2000-09-30',
    calibration_period='2000-10-01, 2001-09-30',
    evaluation_period='2001-10-01, 2002-09-30',

    # === Observations ===
    DOWNLOAD_SNOTEL=True,
    SNOTEL_STATION='602',                       # Paradise SNOTEL (ID 602)

    # === Calibration (Differential Evolution) ===
    # DE is a generation-based algorithm that evaluates a full population each
    # iteration, making it robust to occasional model failures from poor
    # parameter combinations — unlike DDS which starts from a single point.
    OPTIMIZATION_METHODS=['iteration'],
    optimization_algorithm='DE',
    optimization_metric='KGE',
    optimization_target='swe',
    calibration_timestep='daily',
    iterations=20,
    POPULATION_SIZE=10,
    PARAMS_TO_CALIBRATE='tempCritRain,tempRangeTimestep,frozenPrecipMultip,albedoMax,albedoMinWinter,albedoDecayRate,k_soil,vGn_n,theta_sat',
)

print(f"\nDomain:      {config.domain.name}")
print(f"Model:       {config.model.hydrological_model}")
print(f"Forcing:     {config.forcing.dataset}")
print(f"Period:      {config.domain.time_start} to {config.domain.time_end}")
print(f"Algorithm:   {config.optimization.algorithm}")
print(f"Metric:      {config.optimization.metric}")
print(f"Data dir:    {config.system.data_dir}")

## Step 1b — Download Example Data (Optional)

You can download pre-processed example data from GitHub releases. This step downloads and extracts the example data to your `SYMFLUENCE_DATA_DIR`.

If you already have the example data, skip this step.

## Step 1c — Initialize SYMFLUENCE & Project Setup

Initialize the framework and create the standardized project directory. The config object is passed directly (no intermediate YAML file needed).

In [None]:
# Step 1c — Initialize SYMFLUENCE and create project structure

from symfluence import SYMFLUENCE

symfluence = SYMFLUENCE(config, visualize=True)

# Create standardized project layout and pour-point feature
project_dir = symfluence.managers['project'].setup_project()
pour_point_path = symfluence.managers['project'].create_pour_point()

print(f"Project root: {project_dir}")
print(f"Pour point:   {pour_point_path}")

# Brief top-level directory preview
print("\nTop-level structure:")
for p in sorted(Path(project_dir).iterdir()):
    if p.is_dir():
        print(f"├── {p.name}")

## Step 2 — Domain Definition (point-scale GRU)

For the Paradise SNOTEL example, the domain is a **single GRU** representing the site footprint.  
This keeps the workflow strictly point-scale (no routing), aligning the geometry with the pour point created in Step 1.

### Step 2a — Geospatial Attribute Acquisition

Acquire site attributes (elevation, land cover, soils, etc.). These are model-agnostic inputs used to parameterize vertical energy and water balance at the site.

- Uncomment the acquisition line below to download data (set `DATA_ACCESS` to `'cloud'` or `'maf'` in config)
- Alternatively, copy pre-downloaded attributes, forcing, and observation directories into the domain directory

In [None]:
# Step 2a — Acquire attributes (model-agnostic)
# Uncomment to acquire data (set DATA_ACCESS to 'cloud' or 'maf' in config)
# symfluence.managers['data'].acquire_attributes()
print("Attribute acquisition complete")

### Step 2b — Domain Definition (point-scale)

With attributes prepared, we define a point-scale domain consistent with the pour point.  
For this example, the domain is a minimal footprint around the Paradise SNOTEL site.

In [None]:
# Step 2b — Define the point-scale domain
watershed_path = symfluence.managers['domain'].define_domain()
print(f"Domain definition complete")
print(f"Domain file: {watershed_path}")

### Step 2c — Discretization

Discretization writes the **catchment HRU shapefile** and related artifacts required by downstream steps.  
For the point-scale case we set `discretization='GRUs'`, which creates a **single HRU** identical to the GRU while still generating the standardized outputs.

In [None]:
# Step 2c — Discretization (GRUs → HRUs 1:1)
hru_path = symfluence.managers['domain'].discretize_domain()
print(f"Domain discretization complete")
print(f"HRU file: {hru_path}")

### Step 2d — Verification & Visualization

Verify that discretization produced the expected shapefiles and plot the GRU–HRU overlay.

In [None]:
# Step 2d — Verify domain outputs and visualize
from IPython.display import Image, display

plot_path = symfluence.managers['domain'].visualize_domain()
print(f"Domain plot saved to: {plot_path}")

if plot_path:
    display(Image(filename=str(plot_path)))

# Step 3 — Input Preprocessing (model-agnostic)

We prepare inputs in three steps:
1. Acquire **meteorological forcings** (ERA5)
2. Process **observations** (SNOTEL SWE and soil moisture)
3. Run **model-agnostic preprocessing** to standardize time steps, variables, and units

### Step 3a — Acquire Meteorological Forcings (ERA5)

Downloads/subsets the forcings for the Paradise domain.

In [None]:
# Step 3a — Forcings
# Uncomment to acquire data (set DATA_ACCESS to 'cloud' or 'maf' in config)
# symfluence.managers['data'].acquire_forcings()
print("Forcing data acquisition complete")

### Step 3b — Process Observations (SNOTEL)

Parses site observations (SWE, soil moisture), applies basic QA/QC, and stores standardized outputs.

In [None]:
# Step 3b — Observations
# Uncomment to download and process observations
# symfluence.managers['data'].process_observed_data()
print("Observational data processing complete")

### Step 3c — Model-Agnostic Preprocessing

Standardizes variable names, units, and time steps so multiple models can consume the same inputs consistently.

In [None]:
# Step 3c — Model-agnostic preprocessing
symfluence.managers['data'].run_model_agnostic_preprocessing()
print("Model-agnostic preprocessing complete")

### Step 3d — Verification

Confirm the expected folders exist and contain files.

In [None]:
# Step 3d — Verify preprocessing outputs

from pathlib import Path

domain_dir = config.system.data_dir / f"domain_{config.domain.name}"

targets = {
    "forcing/raw_data":                     domain_dir / "forcing" / "raw_data",
    "forcing/basin_averaged_data":          domain_dir / "forcing" / "basin_averaged_data",
    "observations/snow/raw":                domain_dir / "observations" / "snow" / "swe" / "raw",
    "observations/snow/processed":          domain_dir / "observations" / "snow" / "swe" / "processed",
    "observations/soil_moisture/raw":       domain_dir / "observations" / "soil_moisture" / "ismn" / "raw",
    "observations/soil_moisture/processed": domain_dir / "observations" / "soil_moisture" / "ismn" / "processed",
}

def count_files(p: Path) -> int:
    return sum(1 for x in p.iterdir() if x.is_file()) if p.exists() else 0

for label, path in targets.items():
    exists = path.exists()
    n = count_files(path)
    status = "OK" if exists and n > 0 else ("empty" if exists else "missing")
    suffix = f"({n} files)" if exists else ""
    print(f"[{status:>7}] {label}  {suffix}")

# Step 4 — Model-Specific Preprocessing & Run (SUMMA)

We now convert the model-agnostic inputs into **SUMMA-ready inputs**, then run the model for the Paradise point-scale case.

### Step 4a — SUMMA-Specific Preprocessing

Creates the SUMMA input bundle (metadata, parameter tables, forcing links) from the standardized inputs.

In [None]:
# Step 4a — SUMMA-specific preprocessing
symfluence.managers['model'].preprocess_models()
print("Model-specific preprocessing complete")

### Step 4b — Run the Model

Executes the point-scale SUMMA simulation.

In [None]:
# Step 4b — Run SUMMA
print(f"Running {config.model.hydrological_model} for point-scale simulation...")
symfluence.managers['model'].run_models()
print("Point-scale model run complete")

### Step 4c — Verification

Print where SUMMA inputs and run outputs were written.

In [None]:
# Step 4c — Verify SUMMA outputs

domain_dir = config.system.data_dir / f"domain_{config.domain.name}"
summa_in   = domain_dir / "forcing" / "SUMMA_input"
results    = domain_dir / "simulations" / config.domain.experiment_id / "SUMMA"

print(f"SUMMA input dir: {summa_in if summa_in.exists() else '(not found)'}")
print(f"Results dir:     {results if results.exists() else '(not found)'}")

### Step 4d — SWE Evaluation (uncalibrated)

Visualize the uncalibrated SWE simulation against SNOTEL observations.

In [None]:
# Step 4d — SWE evaluation (uncalibrated)
from IPython.display import Image, display

plot_paths = symfluence.managers['reporting'].visualize_summa_outputs(
    experiment_id=config.domain.experiment_id
)

if 'scalarSWE' in plot_paths:
    swe_plot = Path(plot_paths['scalarSWE'])
    print(f"SWE evaluation plot: {swe_plot}")
    display(Image(filename=str(swe_plot)))
else:
    print("scalarSWE plot not found. Available plots:")
    for var, path in plot_paths.items():
        print(f"  - {var}: {path}")

print("\nUncalibrated SWE evaluation complete")

# Step 5 — Calibration (Differential Evolution)

We calibrate SUMMA snow parameters against SNOTEL SWE observations using **Differential Evolution (DE)**.

### Why DE instead of DDS?

- **DDS** (Dynamically Dimensioned Search) starts from a single initial point. If that point produces NaN values or model failures, the entire calibration can fail immediately.
- **DE** is a generation-based algorithm that evaluates an entire population each iteration. It can tolerate individual model failures within a generation and still converge.

### Calibration parameters

| Parameter | Description |
|-----------|-------------|
| `tempCritRain` | Temperature threshold for rain vs snow |
| `tempRangeTimestep` | Temperature range for mixed precipitation |
| `frozenPrecipMultip` | Frozen precipitation undercatch correction |
| `albedoMax` | Maximum snow albedo |
| `albedoMinWinter` | Minimum winter snow albedo |
| `albedoDecayRate` | Rate of snow albedo decay |
| `k_soil` | Hydraulic conductivity of soil |
| `vGn_n` | van Genuchten n parameter (pore-size distribution) |
| `theta_sat` | Saturated water content |

> **Note:** The configuration was set in Step 1a. If you need to change calibration settings, you must re-create the config and re-initialize SYMFLUENCE (all configs are frozen after creation).

In [None]:
# Step 5a — Run calibration (DE + KGE)

print(f"Algorithm:          {config.optimization.algorithm}")
print(f"Metric:             {config.optimization.metric}")
print(f"Target:             {config.optimization.target}")
print(f"Iterations:         {config.optimization.iterations}")
print(f"Population size:    {config.optimization.population_size}")
print(f"Calibration period: {config.domain.calibration_period}")
print()

results_file = symfluence.managers['optimization'].calibrate_model()
print(f"\nCalibration results file: {results_file}")

In [None]:
# Step 5b — Post-calibration visualization

from IPython.display import Image, display

plot_paths = symfluence.managers['reporting'].visualize_calibration_results(
    experiment_id=config.domain.experiment_id
)

for plot_name, plot_path in plot_paths.items():
    print(f"\n{plot_name}:")
    display(Image(filename=str(plot_path)))

print("\nPost-calibration visualization complete")