# CONFLUENCE Tutorial 1b — Point-Scale Workflow (FLUXNET CA-NS7)

## Introduction
This notebook mirrors the concise, configuration-first style established in **Tutorial 01a** and adapts it for **energy-balance validation at a FLUXNET tower (CA-NS7)**. We simulate point-scale land–atmosphere exchanges and evaluate **evapotranspiration (LE)** and **sensible heat (H)** using FLUXNET observations.

The workflow is strictly configuration-driven and fully reproducible:
1) write a minimal config, 2) initialize CONFLUENCE and standard project layout, 3) define the point-scale domain, 4) acquire & preprocess inputs, 5) run **SUMMA**, and 6) evaluate fluxes.


# Step 1 — Configuration (pick or generate)

We start by generating a compact configuration for the **CA-NS7** FLUXNET site using the same pattern as 01a. This keeps initialization a one-liner and the workflow fully reproducible.

In [None]:
# Step 1 — Create a site-specific configuration for the CA-NS7 FLUXNET example
from pathlib import Path
import yaml

# Path to the default template configuration (same pattern as 01a)
config_template = Path("../../0_config_files/config_template.yaml")

# Load the base configuration
with open(config_template, "r") as f:
    config = yaml.safe_load(f)

# === Modify key entries for the CA-NS7 point-scale case ===
# Code & data directories
config["CONFLUENCE_CODE_DIR"] = str(Path("../../").resolve())
#config["CONFLUENCE_DATA_DIR"] = str(Path("/path/to/CONFLUENCE_data").resolve())

# Point-scale domain settings
config["DOMAIN_DEFINITION_METHOD"] = "point"
config["DOMAIN_DISCRETIZATION"] = "GRUs"  # 1 GRU => 1 HRU
config["DOMAIN_NAME"] = "CA-NS7"
config["POUR_POINT_COORDS"] = "56.6358/-99.9483"  # CA-NS7 coordinates
config["BOUNDING_BOX_COORDS"] = "56.6858/-99.9983/56.585800000000006/-99.8983"


# Data/forcing & model
config["HYDROLOGICAL_MODEL"] = "SUMMA"
config["FORCING_DATASET"] = "ERA5"  # Used for meteorological inputs
config["DOWNLOAD_FLUXNET"] = True
config["FLUXNET_STATION"] = "CA-NS7"

# Define the temporal extent of the experiment
config["EXPERIMENT_TIME_START"] = "2000-01-01 01:00"
config["EXPERIMENT_TIME_END"] = "2002-12-31 23:00"
config['CALIBRATION_PERIOD'] = "2000-10-01, 2001-09-30"
config['EVALUATION_PERIOD'] = "2001-10-01, 2002-09-30"
config['SPINUP_PERIOD'] = "2000-01-01, 2000-09-30"

# (Optional) Paths to institutional data roots — customize if using shared infra
config['DATATOOL_DATASET_ROOT'] = '/path/to/meteorological-data/'
config['GISTOOL_DATASET_ROOT']  = '/path/to/geospatial-data/'
config['TOOL_CACHE']            = '/path/to/cache/dir'
config['CLUSTER_JSON']          = '/path/to/cluster.json'

# Basic optimization knobs if desired (example only)
config['PARAMS_TO_CALIBRATE'] = 'minStomatalResistance,cond2photo_slope,vcmax25_canopyTop,jmax25_scale,summerLAI,rootingDepth,soilStressParam,z0Canopy,windReductionParam'
config['OPTIMISATION_TARGET'] = 'et'
config['ITERATIVE_OPTIMIZATION_ALGORITHM'] = 'DDS'
config['OPTIMIZATION_METRIC'] = 'RMSE'
config['CALIBRATION_TIMESTEP'] = 'daily'  
# Unique experiment ID for outputs
config["EXPERIMENT_ID"] = "run_fluxnet_1"

# === Save the customized configuration ===
out_config = Path("../../0_config_files/config_fluxnet_CA-NS7.yaml")
with open(out_config, "w") as f:
    yaml.dump(config, f, default_flow_style=False, sort_keys=False)

print(f"✅ New configuration written to: {out_config}")

## Step 1b — Initialize CONFLUENCE
Initialize the framework using the configuration prepared above.

In [None]:
# Step 1b — Initialize CONFLUENCE
import os, sys
sys.path.append(os.path.abspath(os.path.join("..", "..")))
from CONFLUENCE import CONFLUENCE  # adjust if your import path differs

config_path = "../../0_config_files/config_fluxnet_CA-NS7.yaml"
confluence = CONFLUENCE(config_path)

print("✅ CONFLUENCE initialized successfully.")
print(f"Configuration loaded from: {config_path}")

## Step 1c — Project structure setup
Create the standardized project directory and a pour-point feature for the site.

In [None]:
# Step 1c — Project structure setup
from pathlib import Path

# 1) Create the standardized project layout (logs, config link, data/output folders, etc.)
project_dir = confluence.managers['project'].setup_project()

# 2) Create a pour-point feature (site reference geometry for point-scale workflows)
pour_point_path = confluence.managers['project'].create_pour_point()

print("✅ Project structure created.")
print(f"Project root: {project_dir}")
print(f"Pour point:   {pour_point_path}")

# 3) Brief top-level directory preview
print("\nTop-level structure:")
for p in sorted(Path(project_dir).iterdir()):
    if p.is_dir():
        print(f"├── {p.name}")

# Step 2 — Domain definition (point-scale GRU)
The domain is a **single GRU** around the flux tower footprint, ensuring a strictly point-scale (non-routed) experiment.

### Step 2a — Geospatial attribute acquisition

In [None]:
# Step 2a — Acquire attributes (model-agnostic)
#confluence.managers['data'].acquire_attributes()
print("✅ Attribute acquisition complete")

### Step 2b — Domain definition (point-scale)
Define a minimal footprint around **CA-NS7** consistent with the pour point.

In [None]:
# Step 2b — Define the point-scale domain
watershed_path = confluence.managers['domain'].define_domain()
print("✅ Domain definition complete")
print(f"Domain file: {watershed_path}")

### Step 2c — Discretization (required even for 1 GRU = 1 HRU)
Creates the **catchment HRU** artifacts required by downstream steps (still 1:1 with the GRU for point scale).

In [None]:
# Step 2c — Discretization (GRUs → HRUs 1:1)
hru_path = confluence.managers['domain'].discretize_domain()
print("✅ Domain discretization complete")
print(f"HRU file: {hru_path}")

## Step 2d — Verification & inspection (CA-NS7)
We verify the expected shapefiles in standardized locations, then draw a minimal GRU–HRU overlay.

In [None]:
# Step 2d — Verify domain outputs and inspect geometry
from pathlib import Path
import geopandas as gpd
import matplotlib.pyplot as plt
import yaml

# 1) Read config to derive data & domain paths
with open("../../0_config_files/config_fluxnet_CA-NS7.yaml") as f:
    cfg = yaml.safe_load(f)

data_dir   = Path(cfg["CONFLUENCE_DATA_DIR"])
domain_dir = data_dir / f"domain_{cfg['DOMAIN_NAME']}"
shp_dir    = domain_dir / "shapefiles"

# 2) Explicit expected shapefiles for CA-NS7
gru_fp = shp_dir / "river_basins" / f"{cfg['DOMAIN_NAME']}_riverBasins_point.shp"
hru_fp = shp_dir / "catchment"     / f"{cfg['DOMAIN_NAME']}_HRUs_GRUs.shp"

# 3) Verify presence
for label, path in [("GRU", gru_fp), ("HRU", hru_fp)]:
    if not path.exists():
        raise FileNotFoundError(f"❌ Expected {label} file not found: {path}")
    print(f"✅ {label} file found: {path}")

# 4) Minimal overlay plot
gru = gpd.read_file(gru_fp)
hru = gpd.read_file(hru_fp)
if hru.crs != gru.crs:
    hru = hru.to_crs(gru.crs)
ax = gru.plot(figsize=(6, 6))
hru.plot(ax=ax, facecolor="none")
ax.set_title("CA-NS7 — GRU vs HRU")
ax.set_xlabel("")
ax.set_ylabel("")
ax.set_aspect("equal")
plt.tight_layout()
plt.show()

# Step 3 — Input preprocessing (model-agnostic)
We prepare inputs in three small moves: 1) acquire **meteorological forcings**, 2) process **FLUXNET observations**, and 3) run **model-agnostic preprocessing** to standardize variables and time steps.

### Step 3a — Acquire meteorological forcings (ERA5)

In [None]:
# Step 3a — Forcings
#confluence.managers['data'].acquire_forcings()
print("✅ Forcing data acquisition complete")

### Step 3b — Process observations (FLUXNET)

In [None]:
# Step 3b — Observations
#confluence.managers['data'].process_observed_data()
print("✅ FLUXNET observational data processing complete")

### Step 3c — Model-agnostic preprocessing

In [None]:
# Step 3c — Model-agnostic preprocessing
confluence.managers['data'].run_model_agnostic_preprocessing()
print("✅ Model-agnostic preprocessing complete")

### Step 3d — Quick verification
Confirm the expected folders exist and contain files (derived from configuration; no hard-coded paths).

In [None]:
from pathlib import Path
import yaml

with open("../../0_config_files/config_fluxnet_CA-NS7.yaml") as f:
    cfg = yaml.safe_load(f)

data_dir   = Path(cfg["CONFLUENCE_DATA_DIR"])
domain_dir = data_dir / f"domain_{cfg['DOMAIN_NAME']}"

targets = {
    "forcing/raw_data":                        domain_dir / "forcing" / "raw_data",
    "forcing/basin_averaged_data":             domain_dir / "forcing" / "basin_averaged_data",
}

def count_files(p: Path) -> int:
    return sum(1 for x in p.iterdir() if x.is_file()) if p.exists() else 0

for label, path in targets.items():
    exists = path.exists()
    n = count_files(path)
    status = "✅" if exists and n > 0 else ("⚠️ empty" if exists else "❌ missing")
    suffix = f"({n} files)" if exists else ""
    print(f"{status} {label}  {suffix}")

# Step 4 — Model-specific preprocessing & model run (SUMMA)

### Step 4a — SUMMA-specific preprocessing

In [None]:
# Step 4a — SUMMA-specific preprocessing
confluence.managers['model'].preprocess_models()
print("✅ Model-specific preprocessing complete")

## Step 4b — Instantiate & run the model

In [None]:
# Step 4b — Instantiate & run SUMMA
print(f"Running {confluence.config['HYDROLOGICAL_MODEL']} for point-scale simulation…")
confluence.managers['model'].run_models()
print("✅ Point-scale model run complete")

### Step 4c — Quick verification
Print where SUMMA inputs and run outputs were written (paths are derived from the configuration).

In [None]:
from pathlib import Path
import yaml

with open("../../0_config_files/config_fluxnet_CA-NS7.yaml") as f:
    cfg = yaml.safe_load(f)

data_dir   = Path(cfg["CONFLUENCE_DATA_DIR"])
domain_dir = data_dir / f"domain_{cfg['DOMAIN_NAME']}"

# Common locations used by the model manager
summa_in   = domain_dir / "forcing" / "SUMMA_input"
results    = domain_dir / "simulations" / cfg['EXPERIMENT_ID'] / 'SUMMA'

print("SUMMA input dir:", summa_in if summa_in.exists() else "(not found)")
print("Results dir:",    results if results.exists()    else "(not found)")

## Step 5 — Energy Flux Validation (FLUXNET vs Simulation)
We evaluate simulated energy fluxes against FLUXNET tower observations, focusing on evapotranspiration (ET) and sensible heat (H).

In [None]:
# Step 5 — Energy flux evaluation

from pathlib import Path
import yaml, pandas as pd, numpy as np, xarray as xr
import matplotlib.pyplot as plt
import re

# --- Load configuration ---
with open("../../0_config_files/config_fluxnet_CA-NS7.yaml") as f:
    cfg = yaml.safe_load(f)

data_dir   = Path(cfg["CONFLUENCE_DATA_DIR"])
domain_dir = data_dir / f"domain_{cfg['DOMAIN_NAME']}"

# --- Helper functions ---
def load_summa_daily(nc_path, skip_spinup=True):
    """Load daily SUMMA output and optionally skip spinup year"""
    ds = xr.open_dataset(nc_path)
    if skip_spinup:
        start_year = int(ds["time"].dt.year.min()) + 1
        ds = ds.sel(time=slice(f"{start_year}-01-01", None))
    return ds

def find_variable(candidates, dataset):
    """Find first matching variable from candidate list"""
    ds_vars = set(dataset.data_vars)
    for var in candidates:
        if var in ds_vars:
            return var
    return None

def to_series(ds, var_name):
    """Convert xarray variable to pandas series"""
    var = ds[var_name]
    # Handle multi-dimensional variables (take first if needed)
    while len(var.dims) > 1:
        var = var.isel({var.dims[-1]: 0})
    return var.to_series()

def align_series(sim, obs):
    """Align simulation and observation series"""
    idx = sim.index.intersection(obs.index)
    return sim.loc[idx].astype(float), obs.loc[idx].astype(float)

def compute_metrics(sim, obs):
    """Compute RMSE, bias, and correlation"""
    diff = (sim - obs).to_numpy(dtype=float)
    return {
        'RMSE': round(float(np.sqrt(np.nanmean(diff**2))), 3),
        'Bias': round(float((sim - obs).mean()), 3),
        'r': round(float(sim.corr(obs)), 3)
    }

# --- Load simulation results ---
nc_files = list(domain_dir.rglob("*_day.nc"))
if not nc_files:
    raise FileNotFoundError("No daily SUMMA output found (pattern '*_day.nc')")

ds_eval = load_summa_daily(nc_files[0], skip_spinup=True)

# Extract ET (prefer direct ET variable, fall back to LE conversion)
et_candidates = ["ET", "evapotranspiration", "evspsbl", "tEvap"]
le_candidates = ["Qle", "qle", "latentHeatFlux", "scalarLatHeatTotal", "LE"]

et_var = find_variable(et_candidates, ds_eval)
if et_var is None:
    le_var = find_variable(le_candidates, ds_eval)
    if le_var is None:
        raise KeyError(f"Could not find ET or LE in simulation. Tried: {et_candidates + le_candidates}")
    # Convert LE (W/m²) to ET (mm/day): 1 W/m² ≈ 0.0353 mm/day
    et_sim = to_series(ds_eval, le_var) * 0.0353
else:
    et_sim = to_series(ds_eval, et_var)

# Extract H (sensible heat)
h_candidates = ["Qh", "qh", "sensibleHeatFlux", "H"]
h_var = find_variable(h_candidates, ds_eval)
h_sim = to_series(ds_eval, h_var) if h_var else None

# --- Load FLUXNET observations ---
proc_dir = domain_dir / "observations" / "energy_fluxes" / "processed"
obs_files = list(proc_dir.rglob("*.csv"))
if not obs_files:
    raise FileNotFoundError(f"No processed FLUXNET CSV found in: {proc_dir}")

obs = pd.read_csv(obs_files[0])

# Parse timestamp
ts_col = next((c for c in obs.columns if re.search(r"timestamp|time|date", c, re.I)), None)
if ts_col is None and "TIMESTAMP_START" in obs.columns:
    obs["timestamp"] = pd.to_datetime(obs["TIMESTAMP_START"].astype(str), format="%Y%m%d%H%M", errors="coerce")
    ts_col = "timestamp"
elif ts_col:
    obs[ts_col] = pd.to_datetime(obs[ts_col])
else:
    raise KeyError("Could not find timestamp column in FLUXNET data")

obs = obs.dropna(subset=[ts_col]).set_index(ts_col).sort_index()

# Extract LE and H from observations
le_obs_col = find_variable(["LE_F_MDS", "LE", "ET_from_LE_mm_per_day"], obs.columns)
h_obs_col = find_variable(["H_F_MDS", "H"], obs.columns)

if le_obs_col is None:
    raise KeyError("Could not find LE in FLUXNET observations")

# Convert LE to ET if needed and aggregate to daily
if le_obs_col in ["LE_F_MDS", "LE"]:
    obs["ET_obs"] = obs[le_obs_col] * 0.0353
else:
    obs["ET_obs"] = obs[le_obs_col]

obs_daily = obs.resample("1D").mean(numeric_only=True)
et_obs = obs_daily["ET_obs"].dropna()
h_obs = obs_daily[h_obs_col].dropna() if h_obs_col else None

# --- Compute metrics ---
et_sim_a, et_obs_a = align_series(et_sim, et_obs)
et_metrics = compute_metrics(et_sim_a, et_obs_a)
print("ET metrics:", et_metrics)

if h_sim is not None and h_obs is not None:
    h_sim_a, h_obs_a = align_series(h_sim, h_obs)
    h_metrics = compute_metrics(h_sim_a, h_obs_a)
    print("H metrics:", h_metrics)
else:
    h_metrics = None
    print("H metrics: Not available")

# --- Visualization ---
n_plots = 2 if h_metrics else 1
fig, axes = plt.subplots(n_plots, 2, figsize=(12, 5*n_plots), squeeze=False)

# ET time series
axes[0, 0].plot(et_obs_a.index, et_obs_a.values, 'b-', label='Observed', linewidth=1.2, alpha=0.7)
axes[0, 0].plot(et_sim_a.index, et_sim_a.values, 'r-', label='Simulated', linewidth=1.2, alpha=0.7)
axes[0, 0].set_ylabel('ET (mm/day)')
axes[0, 0].set_title('Evapotranspiration')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

# ET scatter
axes[0, 1].scatter(et_obs_a, et_sim_a, alpha=0.5, s=10)
max_val = max(et_obs_a.max(), et_sim_a.max())
axes[0, 1].plot([0, max_val], [0, max_val], 'k--', alpha=0.5)
axes[0, 1].set_xlabel('Observed ET (mm/day)')
axes[0, 1].set_ylabel('Simulated ET (mm/day)')
axes[0, 1].set_title(f"ET Comparison (r={et_metrics['r']}, RMSE={et_metrics['RMSE']})")
axes[0, 1].grid(True, alpha=0.3)

# H plots (if available)
if h_metrics:
    axes[1, 0].plot(h_obs_a.index, h_obs_a.values, 'b-', label='Observed', linewidth=1.2, alpha=0.7)
    axes[1, 0].plot(h_sim_a.index, h_sim_a.values, 'r-', label='Simulated', linewidth=1.2, alpha=0.7)
    axes[1, 0].set_ylabel('H (W/m²)')
    axes[1, 0].set_title('Sensible Heat Flux')
    axes[1, 0].legend()
    axes[1, 0].grid(True, alpha=0.3)
    
    axes[1, 1].scatter(h_obs_a, h_sim_a, alpha=0.5, s=10)
    max_val_h = max(h_obs_a.max(), h_sim_a.max())
    axes[1, 1].plot([0, max_val_h], [0, max_val_h], 'k--', alpha=0.5)
    axes[1, 1].set_xlabel('Observed H (W/m²)')
    axes[1, 1].set_ylabel('Simulated H (W/m²)')
    axes[1, 1].set_title(f"H Comparison (r={h_metrics['r']}, RMSE={h_metrics['RMSE']})")
    axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("✅ Energy flux evaluation complete")

In [None]:
# Step 5b — Run calibration (DE + KGE)

results_file = confluence.managers['optimization'].calibrate_model()  
print("Calibration results file:", results_file)