# SYMFLUENCE Tutorial 1b — Point-Scale Workflow (FLUXNET CA-NS7)

## Introduction

This notebook mirrors the configuration-first style established in **Tutorial 01a** and adapts it for **energy-balance validation at a FLUXNET tower (CA-NS7)**. We simulate point-scale land–atmosphere exchanges and evaluate **evapotranspiration (LE)** and **sensible heat (H)** using FLUXNET observations.

The workflow is strictly configuration-driven and fully reproducible:
1. Create a typed config with `SymfluenceConfig.from_minimal()`
2. Initialize SYMFLUENCE and standard project layout
3. Define the point-scale domain
4. Acquire & preprocess inputs
5. Run **SUMMA**
6. Evaluate and calibrate energy fluxes

### What you will learn

1. **Energy-balance evaluation** — Compare simulated LE/H against FLUXNET tower data
2. **ET-focused calibration** — Calibrate vegetation and canopy parameters using DE
3. **FLUXNET integration** — Use FLUXNET tower data as the observation source

In [None]:
# Environment setup — HDF5 locking workaround and verification
#
# The HDF5_USE_FILE_LOCKING='FALSE' setting prevents file-locking errors
# that can occur on certain filesystems (NFS, FUSE, conda-managed HDF stacks).

import os
import sys
import warnings
from pathlib import Path

os.environ['HDF5_USE_FILE_LOCKING'] = 'FALSE'

warnings.filterwarnings('ignore', message='.*is an EXPERIMENTAL module.*')
warnings.filterwarnings('ignore', message='.*import failed.*')

print(f"Python executable: {sys.executable}")

try:
    import symfluence
    print(f"SYMFLUENCE version: {symfluence.__version__}")
    print(f"SYMFLUENCE location: {Path(symfluence.__file__).parent}")
except ImportError:
    print("ERROR: SYMFLUENCE not found. Please activate the symfluence environment.")
    sys.exit(1)

# Step 1 — Configuration

We create a typed, validated configuration for the **CA-NS7** FLUXNET site using `SymfluenceConfig.from_minimal()`. This provides Pydantic-validated fields, smart model defaults, and frozen immutability.

> **Note:** All configuration is frozen once created. If you need to change a setting (e.g., switch the calibration algorithm), you must re-create the config and re-initialize SYMFLUENCE.

In [None]:
# Step 1a — Create typed configuration for CA-NS7 FLUXNET

from pathlib import Path
from symfluence.core.config.models import SymfluenceConfig

# Resolve code and data directories
# Convention: SYMFLUENCE_data sits alongside the SYMFLUENCE repo directory
repo_root = Path(__file__).resolve().parent.parent.parent if '__file__' in dir() else Path.cwd().resolve()
# Walk up until we find the repo root (contains src/symfluence)
while repo_root != repo_root.parent:
    if (repo_root / 'src' / 'symfluence').exists():
        break
    repo_root = repo_root.parent

data_dir = repo_root.parent / 'SYMFLUENCE_data'
if not data_dir.exists():
    # Fallback: check environment variable or use default
    import os
    data_dir = Path(os.getenv('SYMFLUENCE_DATA_DIR', str(repo_root / 'data')))

print(f"Repo root: {repo_root}")
print(f"Data dir:  {data_dir}")

config = SymfluenceConfig.from_minimal(
    # === Identification ===
    domain_name='CA-NS7',
    experiment_id='run_fluxnet_1',

    # === Paths ===
    SYMFLUENCE_DATA_DIR=str(data_dir),
    SYMFLUENCE_CODE_DIR=str(repo_root),

    # === Model & Forcing ===
    model='SUMMA',
    forcing_dataset='ERA5',

    # === Spatial Domain (point-scale) ===
    definition_method='point',
    discretization='GRUs',                          # 1 GRU => 1 HRU
    pour_point_coords='56.6358/-99.9483',           # CA-NS7 coordinates
    bounding_box_coords='56.6858/-99.9983/56.5858/-99.8983',

    # === Temporal Extent ===
    time_start='2001-01-01 01:00',
    time_end='2005-12-31 23:00',
    spinup_period='2001-01-01, 2002-09-30',
    calibration_period='2002-10-01, 2003-09-30',
    evaluation_period='2003-10-01, 2004-09-30',

    # === Observations ===
    DOWNLOAD_FLUXNET=True,
    FLUXNET_STATION='CA-NS7',
    ET_OBS_SOURCE='fluxnet',                        # Use FLUXNET tower data (not MODIS MOD16)

    # === Calibration (Differential Evolution) ===
    # DE is a generation-based algorithm that evaluates a full population each
    # iteration, making it robust to occasional model failures from poor
    # parameter combinations.
    OPTIMIZATION_METHODS=['iteration'],
    optimization_algorithm='DE',
    optimization_metric='KGE',
    optimization_target='et',
    calibration_timestep='daily',
    iterations=20,
    POPULATION_SIZE=10,
    PARAMS_TO_CALIBRATE='minStomatalResistance,cond2photo_slope,vcmax25_canopyTop,jmax25_scale,summerLAI,rootingDepth,z0Canopy,windReductionParam',
)

print(f"\nDomain:      {config.domain.name}")
print(f"Model:       {config.model.hydrological_model}")
print(f"Forcing:     {config.forcing.dataset}")
print(f"Period:      {config.domain.time_start} to {config.domain.time_end}")
print(f"Algorithm:   {config.optimization.algorithm}")
print(f"Metric:      {config.optimization.metric}")
print(f"Target:      {config.optimization.target}")
print(f"Data dir:    {config.system.data_dir}")

## Step 1b — Initialize SYMFLUENCE & Project Setup

Initialize the framework and create the standardized project directory. The config object is passed directly.

In [None]:
# Step 1b — Initialize SYMFLUENCE and create project structure

from symfluence import SYMFLUENCE

symfluence = SYMFLUENCE(config, visualize=True)

# Create standardized project layout and pour-point feature
project_dir = symfluence.managers['project'].setup_project()
pour_point_path = symfluence.managers['project'].create_pour_point()

print(f"Project root: {project_dir}")
print(f"Pour point:   {pour_point_path}")

# Brief top-level directory preview
print("\nTop-level structure:")
for p in sorted(Path(project_dir).iterdir()):
    if p.is_dir():
        print(f"├── {p.name}")

# Step 2 — Domain Definition (point-scale GRU)

The domain is a **single GRU** around the flux tower footprint, ensuring a strictly point-scale (non-routed) experiment.

### Step 2a — Geospatial Attribute Acquisition

Acquire site attributes (elevation, land cover, soils). These are model-agnostic inputs.

- Uncomment the acquisition line below to download data (set `DATA_ACCESS` to `'cloud'` or `'maf'` in config)
- Alternatively, copy pre-downloaded attributes, forcing, and observation directories into the domain directory

In [None]:
# Step 2a — Acquire attributes (model-agnostic)
# Uncomment to acquire data (set DATA_ACCESS to 'cloud' or 'maf' in config)
# symfluence.managers['data'].acquire_attributes()
print("Attribute acquisition complete")

### Step 2b — Domain Definition (point-scale)

Define a minimal footprint around **CA-NS7** consistent with the pour point.

In [None]:
# Step 2b — Define the point-scale domain
watershed_path = symfluence.managers['domain'].define_domain()
print(f"Domain definition complete")
print(f"Domain file: {watershed_path}")

### Step 2c — Discretization

Creates the **catchment HRU** artifacts required by downstream steps (1:1 with the GRU for point scale).

In [None]:
# Step 2c — Discretization (GRUs → HRUs 1:1)
hru_path = symfluence.managers['domain'].discretize_domain()
print(f"Domain discretization complete")
print(f"HRU file: {hru_path}")

### Step 2d — Verification & Visualization

Verify the expected shapefiles and draw the GRU–HRU overlay.

In [None]:
# Step 2d — Verify domain outputs and visualize
from IPython.display import Image, display

plot_path = symfluence.managers['domain'].visualize_domain()
print(f"Domain plot saved to: {plot_path}")

if plot_path:
    display(Image(filename=str(plot_path)))

# Step 3 — Input Preprocessing (model-agnostic)

We prepare inputs in three steps:
1. Acquire **meteorological forcings** (ERA5)
2. Process **FLUXNET observations** (LE, H)
3. Run **model-agnostic preprocessing** to standardize variables and time steps

### Step 3a — Acquire Meteorological Forcings (ERA5)

In [None]:
# Step 3a — Forcings
# Uncomment to acquire data (set DATA_ACCESS to 'cloud' or 'maf' in config)
# symfluence.managers['data'].acquire_forcings()
print("Forcing data acquisition complete")

### Step 3b — Process Observations (FLUXNET)

Parses FLUXNET tower observations (latent heat, sensible heat), applies QA/QC, and stores standardized outputs.

In [None]:
# Step 3b — Observations
# Uncomment to download and process observations
# symfluence.managers['data'].process_observed_data()
print("FLUXNET observational data processing complete")

### Step 3c — Model-Agnostic Preprocessing

Standardizes variable names, units, and time steps so multiple models can consume the same inputs consistently.

In [None]:
# Step 3c — Model-agnostic preprocessing
symfluence.managers['data'].run_model_agnostic_preprocessing()
print("Model-agnostic preprocessing complete")

### Step 3d — Verification

Confirm the expected folders exist and contain files.

In [None]:
# Step 3d — Verify preprocessing outputs

from pathlib import Path

domain_dir = config.system.data_dir / f"domain_{config.domain.name}"

targets = {
    "forcing/raw_data":            domain_dir / "forcing" / "raw_data",
    "forcing/basin_averaged_data": domain_dir / "forcing" / "basin_averaged_data",
}

def count_files(p: Path) -> int:
    return sum(1 for x in p.iterdir() if x.is_file()) if p.exists() else 0

for label, path in targets.items():
    exists = path.exists()
    n = count_files(path)
    status = "OK" if exists and n > 0 else ("empty" if exists else "missing")
    suffix = f"({n} files)" if exists else ""
    print(f"[{status:>7}] {label}  {suffix}")

# Step 4 — Model-Specific Preprocessing & Run (SUMMA)

### Step 4a — SUMMA-Specific Preprocessing

Creates the SUMMA input bundle (metadata, parameter tables, forcing links) from the standardized inputs.

In [None]:
# Step 4a — SUMMA-specific preprocessing
symfluence.managers['model'].preprocess_models()
print("Model-specific preprocessing complete")

### Step 4b — Run the Model

Executes the point-scale SUMMA simulation.

In [None]:
# Step 4b — Run SUMMA
print(f"Running {config.model.hydrological_model} for point-scale simulation...")
symfluence.managers['model'].run_models()
print("Point-scale model run complete")

### Step 4c — Verification

Print where SUMMA inputs and run outputs were written.

In [None]:
# Step 4c — Verify SUMMA outputs

domain_dir = config.system.data_dir / f"domain_{config.domain.name}"
summa_in   = domain_dir / "forcing" / "SUMMA_input"
results    = domain_dir / "simulations" / config.domain.experiment_id / "SUMMA"

print(f"SUMMA input dir: {summa_in if summa_in.exists() else '(not found)'}")
print(f"Results dir:     {results if results.exists() else '(not found)'}")

# Step 5 — ET & H Evaluation (FLUXNET vs Simulation)

Compare simulated latent heat (ET) and sensible heat against FLUXNET tower observations.

In [None]:
# Step 5a — Energy flux evaluation (uncalibrated)

from IPython.display import Image, display

plot_paths = symfluence.managers['reporting'].visualize_summa_outputs(
    experiment_id=config.domain.experiment_id
)

flux_vars = ['scalarLatHeatTotal']
found_plots = False

for var in flux_vars:
    if var in plot_paths:
        plot_file = Path(plot_paths[var])
        var_label = "Latent Heat (ET)" if "Lat" in var else "Sensible Heat"
        print(f"\n{var_label} evaluation plot: {plot_file}")
        display(Image(filename=str(plot_file)))
        found_plots = True

if not found_plots:
    print("Energy flux plots not found. Available plots:")
    for var, path in plot_paths.items():
        print(f"  - {var}: {path}")

print("\nUncalibrated energy flux evaluation complete")

# Step 6 — Calibration (Differential Evolution)

We calibrate SUMMA vegetation and canopy parameters against FLUXNET ET observations using **Differential Evolution (DE)**.

### Why DE instead of DDS?

- **DDS** starts from a single initial point and can fail immediately if that point produces NaN values or model failures.
- **DE** evaluates an entire population each iteration, tolerating individual failures within a generation.

### Calibration parameters

| Parameter | Description |
|-----------|-------------|
| `minStomatalResistance` | Minimum stomatal resistance |
| `cond2photo_slope` | Slope of the Ball–Berry stomatal conductance model |
| `vcmax25_canopyTop` | Maximum rate of carboxylation at 25°C |
| `jmax25_scale` | Scaling factor for electron transport rate |
| `summerLAI` | Peak leaf area index |
| `rootingDepth` | Maximum rooting depth |
| `z0Canopy` | Canopy roughness length |
| `windReductionParam` | Within-canopy wind reduction parameter |

> **Note:** The configuration was set in Step 1a. If you need to change calibration settings, you must re-create the config and re-initialize SYMFLUENCE (all configs are frozen after creation).

In [None]:
# Step 6a — Run calibration (DE + KGE on ET)

print(f"Algorithm:          {config.optimization.algorithm}")
print(f"Metric:             {config.optimization.metric}")
print(f"Target:             {config.optimization.target}")
print(f"Iterations:         {config.optimization.iterations}")
print(f"Population size:    {config.optimization.population_size}")
print(f"Calibration period: {config.domain.calibration_period}")
print()

results_file = symfluence.managers['optimization'].calibrate_model()
print(f"\nCalibration results file: {results_file}")

In [None]:
# Step 6b — Post-calibration visualization

from IPython.display import Image, display

plot_paths = symfluence.managers['reporting'].visualize_calibration_results(
    experiment_id=config.domain.experiment_id
)

for plot_name, plot_path in plot_paths.items():
    print(f"\n{plot_name}:")
    display(Image(filename=str(plot_path)))

print("\nPost-calibration visualization complete")