# CONFLUENCE Tutorial - 2: Point-Scale Workflow (FLUXNET Example)

## Introduction

Building on the previous tutorial's foundation in CONFLUENCE workflow management and point-scale modeling, this notebook extends our analysis to focus on energy balance and evapotranspiration processes. While the SNOTEL tutorial emphasized snow dynamics and soil moisture in mountain environments, this example demonstrates CONFLUENCE's capabilities for simulating land-atmosphere interactions using eddy covariance flux tower observations.

### FLUXNET: A Global Network for Energy and Carbon Flux Observations

The FLUXNET network represents one of the most comprehensive global observational frameworks for studying land-atmosphere interactions, providing continuous measurements of energy, water, and carbon fluxes using the eddy covariance technique. These towers offer unique advantages for hydrological model evaluation:

1. **Direct flux measurements**: Evapotranspiration and sensible heat flux observations provide direct validation targets for land surface energy balance models
2. **High temporal resolution**: Sub-daily measurements capture diurnal cycles and rapid response to environmental drivers
3. **Multi-year records**: Long-term observations enable assessment of seasonal dynamics and interannual variability
4. **Ecosystem diversity**: Sites span major biomes, allowing process-based model evaluation across diverse vegetation types and climatic conditions

### Scientific Importance of Energy Balance Modeling

Accurate representation of land-atmosphere energy exchanges is fundamental to hydrological modeling for several reasons:

1. **Evapotranspiration partitioning**: Understanding the relative contributions of soil evaporation, plant transpiration, and canopy interception to total water loss
2. **Coupling with soil moisture**: Energy balance directly influences soil moisture dynamics through evapotranspiration demand and soil-plant-atmosphere feedback mechanisms
3. **Vegetation stress**: Accurate simulation of plant water stress and stomatal response to environmental conditions
4. **Climate sensitivity**: Land-atmosphere interactions represent a key feedback mechanism in climate variability and change

### Case Study: CA-NS7 Boreal Forest Site

This tutorial focuses on the CA-NS7 FLUXNET site, located in the boreal forest of Saskatchewan, Canada (56.6358°N, 99.9483°W). This site presents distinct scientific challenges compared to the mountain snow environment of the previous tutorial:

**Site characteristics:**
- **Ecosystem type**: Mature boreal forest dominated by black spruce (*Picea mariana*)
- **Climate regime**: Continental boreal climate with pronounced seasonal temperature variations
- **Elevation**: 260 m above sea level
- **Soil conditions**: Organic-rich soils with seasonal freezing and variable drainage
- **Observational period**: Multi-year records of energy, water, and carbon fluxes

**Scientific challenges:**
- **Seasonal vegetation dynamics**: Pronounced phenological cycles affecting canopy conductance and interception
- **Freeze-thaw processes**: Soil and vegetation interactions during spring thaw periods
- **Boreal forest energy balance**: Complex canopy structure effects on radiation partitioning and aerodynamic properties
- **Interannual variability**: Sensitivity to climate drivers and ecosystem disturbance history

## Learning Objectives

Through this tutorial, you will:

1. **Extend CONFLUENCE applications**: Apply the workflow to energy balance modeling and flux tower validation
2. **Understand ecosystem-specific modeling**: Configure SUMMA for boreal forest conditions and vegetation parameterizations
3. **Evaluate energy balance processes**: Compare simulated and observed evapotranspiration and sensible heat flux using established metrics
4. **Interpret land-atmosphere interactions**: Analyze the physical drivers of model-observation discrepancies in energy partitioning
5. **Connect point-scale to ecosystem scales**: Understand how flux tower "footprints" relate to model grid cell assumptions

### Tutorial Structure

This tutorial follows the established CONFLUENCE workflow while emphasizing energy balance processes:

1. **Configuration**: Adapt point-scale setup for boreal forest conditions
2. **Data acquisition**: Integrate FLUXNET observations with meteorological forcing
3. **Model execution**: Run SUMMA with appropriate vegetation and soil parameterizations
4. **Flux validation**: Compare simulated and observed energy balance components
5. **Process analysis**: Interpret results in the context of boreal ecosystem dynamics

By completing this tutorial, you'll develop expertise in energy balance modeling that complements the snow and soil moisture focus of the previous example, providing a more comprehensive foundation for distributed hydrological modeling applications.

## Step 1: Rapid Workflow Setup for FLUXNET Energy Balance Modeling
Building on the CONFLUENCE fundamentals established in Tutorial 01a, we can now streamline the initial workflow setup. This step efficiently configures the system for energy balance validation at the CA-NS7 boreal forest FLUXNET site, leveraging the same reproducible framework while focusing on ecosystem-specific parameterization.
Key Differences from Tutorial 01a:

- Location: CA-NS7 (Saskatchewan boreal forest) vs. Paradise SNOTEL (Cascade Mountains)
- Validation Focus: Energy fluxes (LE, H, Rn) vs. snow/soil moisture (SWE, SM)
- Ecosystem Type: Mature boreal forest vs. transitional snow zone
- Temporal Emphasis: Sub-daily energy cycles vs. seasonal snow dynamic

In [None]:
# =============================================================================
# STEP 1: RAPID WORKFLOW SETUP FOR FLUXNET ENERGY BALANCE MODELING
# =============================================================================

# Import required libraries
import sys
import os
from pathlib import Path
import yaml
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd
from datetime import datetime
import xarray as xr
import numpy as np

# Add CONFLUENCE to path
confluence_path = Path('../').resolve()
sys.path.append(str(confluence_path))

# Import main CONFLUENCE class
from CONFLUENCE import CONFLUENCE

# Set up plotting style
plt.style.use('default')
%matplotlib inline

print("=== CONFLUENCE Tutorial 01b: FLUXNET Energy Balance Validation ===")
print(f"Building on Tutorial 01a foundations for rapid workflow deployment")

# =============================================================================
# CONFIGURATION FOR CA-NS7 BOREAL FOREST SITE
# =============================================================================

print("\n🌿 Configuring for CA-NS7 Boreal Forest FLUXNET Site")

# Set directory paths
CONFLUENCE_CODE_DIR = confluence_path
CONFLUENCE_DATA_DIR = Path('/Users/darrieythorsson/compHydro/data/CONFLUENCE_data')  # ← Update this path

# Load template configuration and customize for FLUXNET site
config_template_path = CONFLUENCE_CODE_DIR / '0_config_files' / 'config_point_template.yaml'

with open(config_template_path, 'r') as f:
    config_dict = yaml.safe_load(f)

# Update for CA-NS7 boreal forest site
config_updates = {
    'CONFLUENCE_CODE_DIR': str(CONFLUENCE_CODE_DIR),
    'CONFLUENCE_DATA_DIR': str(CONFLUENCE_DATA_DIR),
    'DOMAIN_NAME': 'CA-NS7',
    'POUR_POINT_COORDS': '56.6358/-99.9483',  # CA-NS7 coordinates
    'DOWNLOAD_FLUXNET': 'true',
    'FLUXNET_STATION': 'CA-NS7',
    'EXPERIMENT_TIME_START': '2001-01-01 01:00',  # FLUXNET data availability
    'EXPERIMENT_TIME_END': '2005-12-31 23:00',
    'CALIBRATION_PERIOD': '2002-01-01, 2003-12-31',
    'EVALUATION_PERIOD': '2004-01-01, 2005-12-31',
    'SPINUP_PERIOD': '2001-01-01, 2001-12-31'
}

config_dict.update(config_updates)

# Add experiment metadata for traceability
config_dict['NOTEBOOK_CREATION_TIME'] = datetime.now().isoformat()
config_dict['NOTEBOOK_CREATOR'] = 'CONFLUENCE_Tutorial_01b'
config_dict['TARGET_PROCESSES'] = 'Energy balance, evapotranspiration, boreal forest dynamics'

# Save configuration
temp_config_path = CONFLUENCE_CODE_DIR / '0_config_files' / 'config_fluxnet_notebook.yaml'
with open(temp_config_path, 'w') as f:
    yaml.dump(config_dict, f, default_flow_style=False, sort_keys=False)

print(f"✅ Configuration saved: {temp_config_path}")

# =============================================================================
# SYSTEM INITIALIZATION AND PROJECT STRUCTURE
# =============================================================================

print("\n🏗️  Initializing CONFLUENCE System...")

# Initialize CONFLUENCE with FLUXNET configuration
confluence = CONFLUENCE(temp_config_path)

print(f"✅ System initialized with {len(confluence.managers)} managers")

# Create project structure
print(f"\n📁 Creating project structure for {config_dict['DOMAIN_NAME']}...")
project_dir = confluence.managers['project'].setup_project()
pour_point_path = confluence.managers['project'].create_pour_point()

print(f"✅ Project directory: {project_dir}")
print(f"✅ Pour point created: {pour_point_path}")

print(f"\n🔄 Workflow Status:")
workflow_status = confluence.workflow_orchestrator.get_workflow_status()
print(f"   Total steps: {workflow_status['total_steps']}")
print(f"   Completed: {workflow_status['completed_steps']}")

## Step 2: Geospatial Domain Setup 
Having established the geospatial domain definition principles in Tutorial 01a, we can now efficiently configure the spatial framework for our boreal forest FLUXNET site. The same point-scale approach applies, but the underlying geospatial characteristics reflect the distinct boreal ecosystem.
Geospatial Contrasts: CA-NS7 vs Paradise SNOTEL

- Elevation: 260m (boreal lowland) vs 1,630m (mountain transitional zone)
- Vegetation: Mature black spruce forest vs mixed coniferous/alpine vegetation
- Soils: Organic-rich boreal soils vs mineral mountain soils
- Climate: Continental boreal vs maritime-influenced mountain climate
- Drainage: Variable boreal drainage vs steep mountain topography

The same CONFLUENCE spatial framework handles both environments seamlessly, demonstrating the transferability of the modeling approach across diverse ecosystems while capturing site-specific physical characteristics through the attribute acquisition process.

In [None]:
# =============================================================================
# STEP 2: STREAMLINED GEOSPATIAL DOMAIN SETUP FOR BOREAL FOREST SITE
# =============================================================================

print("=== Step 2: Geospatial Domain Setup for Boreal Forest Energy Balance ===")
print("Efficient spatial configuration leveraging Tutorial 01a foundations")

# =============================================================================
# RAPID ATTRIBUTE ACQUISITION FOR BOREAL FOREST CHARACTERISTICS
# =============================================================================

print(f"\n🌲 Acquiring Boreal Forest Geospatial Characteristics...")
print(f"   Location: CA-NS7 ({config_dict['POUR_POINT_COORDS']})")
print(f"   Bounding box: {config_dict.get('BOUNDING_BOX_COORDS', 'Auto-generated for point-scale')}")

print(f"\n📊 Expected Boreal Forest Attributes:")
boreal_attributes = [
    "Low elevation (~260m) with minimal topographic complexity",
    "Organic-rich soils with seasonal freeze-thaw dynamics",
    "Mature coniferous forest (Picea mariana dominated)", 
    "Continental climate with pronounced seasonal temperature range",
    "Variable drainage conditions typical of boreal landscapes"
]

for attr in boreal_attributes:
    print(f"   🌿 {attr}")

# Execute attribute acquisition
print(f"\n⬇️  Executing geospatial attribute acquisition...")
#confluence.managers['data'].acquire_attributes()
print("✅ Attribute acquisition complete")

# =============================================================================
# DOMAIN DELINEATION AND DISCRETIZATION FOR POINT-SCALE FLUX TOWER
# =============================================================================

print(f"\n🎯 Point-Scale Domain Configuration for Flux Tower Footprint...")

# Domain delineation (single GRU representing flux tower footprint)
print(f"   Creating computational boundary representing flux tower footprint...")
watershed_path = confluence.managers['domain'].define_domain()

# Domain discretization (single HRU for energy balance modeling)  
print(f"   Creating single HRU for energy balance simulation...")
hru_path = confluence.managers['domain'].discretize_domain()

print(f"✅ Spatial domain configuration complete")

# =============================================================================
# VERIFICATION AND BOREAL FOREST CHARACTERIZATION
# =============================================================================

print(f"\n🔍 Verifying Boreal Forest Domain Characteristics...")

# Verify HRU creation and inspect characteristics
if hru_path and hru_path.exists():
    hru_gdf = gpd.read_file(hru_path)
    
    print(f"\n📋 Spatial Domain Summary:")
    print(f"   Number of HRUs: {len(hru_gdf)} (point-scale representation)")
    print(f"   Domain area: {hru_gdf.geometry.area.sum():.6f} degree²")
    print(f"   Centroid: ({hru_gdf.geometry.centroid.x.iloc[0]:.6f}, {hru_gdf.geometry.centroid.y.iloc[0]:.6f})")
    
    # Display representative characteristics if available
    if 'elevation' in hru_gdf.columns:
        print(f"   Elevation: {hru_gdf['elevation'].iloc[0]:.1f} m")
    if 'landclass' in hru_gdf.columns:
        print(f"   Dominant land cover: {hru_gdf['landclass'].iloc[0]}")
    if 'soilclass' in hru_gdf.columns:
        print(f"   Soil classification: {hru_gdf['soilclass'].iloc[0]}")
    
    print(f"\n🌲 Boreal Forest Spatial Context:")
    print(f"   → Single HRU represents flux tower measurement footprint")
    print(f"   → Uniform characteristics assumption appropriate for homogeneous forest")
    print(f"   → Contrasts with distributed modeling where spatial heterogeneity matters")
    
else:
    print("⚠️  HRU verification failed - check domain discretization")


## Step 3: Data Pipeline
Leveraging the model-agnostic preprocessing concepts established in Tutorial 01a, we can now efficiently prepare the data pipeline for boreal forest energy balance modeling. The same standardized framework seamlessly handles the transition from snow/soil validation to energy flux evaluation, demonstrating CONFLUENCE's versatility across diverse validation objectives.
Data Pipeline Adaptation: SNOTEL → FLUXNET

- Forcing Data: Same ERA5 global reanalysis, different coordinates and period
- Validation Targets: Energy fluxes (LE, H, Rn, G) vs snow/soil states (SWE, SM)
- Temporal Focus: Sub-daily energy cycles vs seasonal snow dynamics
- Ecosystem Context: Boreal forest processes vs mountain snow processes
- Preprocessing Benefits: Same quality-controlled, standardized pipeline serves both applications

This demonstrates the core strength of CONFLUENCE's model-agnostic philosophy: consistent data preparation enables true process comparisons across sites, ecosystems, and validation targets.

In [None]:
print("=== Step 3: Data Pipeline ===")

# =============================================================================
# OBSERVATIONAL DATA: FLUXNET ENERGY BALANCE MEASUREMENTS
# =============================================================================

# Execute observational data processing
print(f"\n📥 Processing FLUXNET observational datasets...")
confluence.managers['data'].process_observed_data()
print("✅ FLUXNET data processing complete")

print(f"\n🌦️  Acquiring ERA5 Forcing for Boreal Forest Site...")
print(f"   Location: {config_dict['POUR_POINT_COORDS']} (Saskatchewan)")
print(f"   Period: {config_dict['EXPERIMENT_TIME_START']} to {config_dict['EXPERIMENT_TIME_END']}")
print(f"   Climate context: Continental boreal with pronounced seasonality")

# Execute forcing acquisition (commented for demonstration)
print(f"\n⬇️  Executing forcing data acquisition...")
# confluence.managers['data'].acquire_forcings()
print("✅ ERA5 forcing acquisition complete (simulated)")

print(f"\n🔧 Model-Agnostic Preprocessing Pipeline...")
print(f"   Same framework as Tutorial 01a, different validation targets")

# Execute model-agnostic preprocessing
print(f"\n⚙️  Executing model-agnostic preprocessing...")
#confluence.managers['data'].run_model_agnostic_preprocessing()
print("✅ Model-agnostic preprocessing complete")

print(f"\n🌿 SUMMA-Specific Configuration for Boreal Forest Energy Balance...")

# Execute model-specific preprocessing
print(f"\n🔧 Executing SUMMA-specific preprocessing...")
confluence.managers['model'].preprocess_models()
print("✅ SUMMA configuration complete")

## Step 4: Model Execution 
Building on the detailed model instantiation concepts from Tutorial 01a, we can now efficiently execute the energy balance simulation. The same SUMMA process-based physics applies, but with emphasis on land-atmosphere energy exchange rather than snow accumulation and soil moisture dynamics.


The same workflow orchestration ensures reproducible and comparable simulations across both applications.

In [None]:
print("=== Step 4: Energy Balance Simulation Execution ===")

# Execute the model
print(f"\n🏃‍♂️ Running SUMMA energy balance simulation...")
confluence.managers['model'].run_models()
print("✅ SUMMA simulation complete")

## Step 5: ET Process Validation
Building on the comprehensive evaluation framework established in Tutorial 01a, we now focus on energy flux validation using FLUXNET observations. The same scientific evaluation principles apply, but with emphasis on land-atmosphere energy exchange rather than snow/soil state variables.
Evaluation Framework Adaptation: Snow/Soil → Energy Balance

- Validation Targets: Latent heat (LE), sensible heat (H), net radiation (Rn) vs SWE, soil moisture
- Process Focus: Evapotranspiration partitioning vs snow accumulation/melt dynamics
- Temporal Scales: Sub-daily energy cycles vs seasonal snow evolution
- Performance Metrics: Energy balance closure and flux magnitude accuracy
- Physical Interpretation: Stomatal conductance and canopy processes vs snow physics

In [None]:
print("=== Step 5: ET Process Evaluation  ===")

def process_fluxnet_data_inline(domain_name, data_dir):
    """Process raw FLUXNET data into standardized format for CONFLUENCE """
    
    # Set up paths
    data_dir = Path(data_dir)
    domain_dir = data_dir / f"domain_{domain_name}"
    raw_fluxnet_dir = domain_dir / "observations" / "fluxnet" / "raw_data"
    processed_dir = domain_dir / "observations" / "energy_fluxes" / "fluxnet" / "processed"
    
    # Create processed directory if it doesn't exist
    processed_dir.mkdir(parents=True, exist_ok=True)
    
    print(f"🔄 Processing FLUXNET data for domain: {domain_name}")
    print(f"   Raw data directory: {raw_fluxnet_dir}")
    print(f"   Output directory: {processed_dir}")
    
    # Find FLUXNET files
    fluxnet_files = list(raw_fluxnet_dir.glob(f"FLX_{domain_name}_FLUXNET2015_FULLSET_*.csv"))
    
    if not fluxnet_files:
        print(f"❌ No FLUXNET files found in {raw_fluxnet_dir}")
        return False
    
    print(f"   Found {len(fluxnet_files)} FLUXNET files")
    
    # Process halfhourly data (most detailed for energy balance)
    hh_files = [f for f in fluxnet_files if "_HH_" in f.name]
    
    if not hh_files:
        print("❌ No halfhourly (_HH_) files found")
        return False
    
    file_path = hh_files[0]  # Use first halfhourly file
    print(f"   Processing: {file_path.name}")
    
    try:
        # Read the CSV file
        df = pd.read_csv(file_path)
        print(f"   Loaded {len(df)} rows, {len(df.columns)} columns")
        
        # Create timestamp from TIMESTAMP_START
        df['timestamp'] = pd.to_datetime(df['TIMESTAMP_START'].astype(str), format='%Y%m%d%H%M', errors='coerce')
        df = df.dropna(subset=['timestamp'])
        
        if len(df) == 0:
            print("❌ No valid timestamps found")
            return False
        
        print(f"   Date range: {df['timestamp'].min()} to {df['timestamp'].max()}")
        
        # Key FLUXNET variables for energy balance
        key_variables = {
            'LE_F_MDS': 'Latent heat flux (gap-filled)',
            'H_F_MDS': 'Sensible heat flux (gap-filled)', 
            'RNET': 'Net radiation',
            'G_F_MDS': 'Ground heat flux (gap-filled)',
            'LE_F_MDS_QC': 'LE quality flag',
            'H_F_MDS_QC': 'H quality flag',
            'TA_F_MDS': 'Air temperature (gap-filled)',
            'PA_F': 'Atmospheric pressure',
            'WS_F': 'Wind speed',
            'RH': 'Relative humidity',
            'VPD_F_MDS': 'Vapor pressure deficit (gap-filled)',
            'SW_IN_F_MDS': 'Incoming shortwave radiation (gap-filled)',
            'P_F': 'Precipitation (gap-filled)',
        }
        
        # Select available variables
        available_vars = ['timestamp']
        for var in key_variables.keys():
            if var in df.columns:
                available_vars.append(var)
        
        print(f"   Available energy variables: {[v for v in available_vars if v != 'timestamp']}")
        
        # Create subset with available variables
        processed_df = df[available_vars].copy()
        
        # Replace FLUXNET missing value codes with NaN
        missing_value_codes = [-9999, -9999.0, -6999, -6999.0]
        for code in missing_value_codes:
            processed_df = processed_df.replace(code, np.nan)
        
        # Convert LE (W/m²) to ET (mm/day)
        if 'LE_F_MDS' in processed_df.columns:
            processed_df['ET_from_LE_mm_per_day'] = processed_df['LE_F_MDS'] * 0.0353
            print("   ✅ Created ET_from_LE_mm_per_day (conversion factor: 0.0353)")
        
        # Calculate energy balance closure if components available
        energy_components = ['LE_F_MDS', 'H_F_MDS', 'G_F_MDS', 'RNET']
        if all(var in processed_df.columns for var in energy_components):
            processed_df['ENERGY_CLOSURE'] = (processed_df['LE_F_MDS'] + processed_df['H_F_MDS']) / (processed_df['RNET'] - processed_df['G_F_MDS'])
            print("   ✅ Calculated energy balance closure ratio")
        
        # Save processed data
        output_file = processed_dir / f"{domain_name}_fluxnet_processed.csv"
        processed_df.to_csv(output_file, index=False)
        print(f"   💾 Saved: {output_file}")
        
        # Print data quality summary
        if 'LE_F_MDS' in processed_df.columns:
            valid_le = processed_df['LE_F_MDS'].notna().sum()
            print(f"   📊 Valid LE observations: {valid_le}/{len(processed_df)} ({100*valid_le/len(processed_df):.1f}%)")
        
        if 'ET_from_LE_mm_per_day' in processed_df.columns:
            et_stats = processed_df['ET_from_LE_mm_per_day'].describe()
            print(f"   📊 ET range: {et_stats['min']:.2f} to {et_stats['max']:.2f} mm/day (mean: {et_stats['mean']:.2f})")
        
        return True
        
    except Exception as e:
        print(f"   ❌ Error processing {file_path.name}: {str(e)}")
        return False

# =============================================================================
# CHECK AND PROCESS FLUXNET DATA
# =============================================================================

print("\n🔧 Checking and Processing FLUXNET Data...")

# Check if processed FLUXNET data exists
fluxnet_processed_path = confluence.project_dir / "observations" / "energy_fluxes" / "fluxnet" / "processed" / f"{config_dict['DOMAIN_NAME']}_fluxnet_processed.csv"

if not fluxnet_processed_path.exists():
    print("⚠️  Processed FLUXNET data not found. Processing raw data...")
    
    # Process the data using our inline function
    success = process_fluxnet_data_inline(config_dict['DOMAIN_NAME'], str(CONFLUENCE_DATA_DIR))
    
    if success:
        print("✅ FLUXNET data processed successfully")
    else:
        print("❌ FLUXNET processing failed")
else:
    print("✅ Processed FLUXNET data already exists")

# =============================================================================
# SIMULATION DATA LOADING WITH PROPER DATE FILTERING
# =============================================================================

print(f"\n⚡ Loading Energy Balance Simulation Results...")

# Load simulation data with proper filtering
sim_dir = confluence.project_dir / "simulations" / config_dict['EXPERIMENT_ID'] / "SUMMA"

# Try different possible output files
output_files = [
    sim_dir / f"{config_dict['EXPERIMENT_ID']}_day.nc",
    sim_dir / f"{config_dict['DOMAIN_NAME']}_day.nc", 
    sim_dir / "day.nc"
]

# Find existing output file
sim_file = None
for file_path in output_files:
    if file_path.exists():
        sim_file = file_path
        break


    
else:
    # Load actual simulation data
    ds = xr.open_dataset(sim_file)
    print(f"✅ Loaded simulation data from: {sim_file}")
    print(f"   Full period: {ds.time.min().values} to {ds.time.max().values}")
    
    # Check data structure
    print(f"   Dataset dimensions: {dict(ds.dims)}")
    print(f"   Coordinate variables: {list(ds.coords.keys())}")
    
    # Filter to experiment period
    start_date = pd.to_datetime(config_dict['EXPERIMENT_TIME_START'])
    end_date = pd.to_datetime(config_dict['EXPERIMENT_TIME_END'])
    
    time_mask = (ds.time >= start_date) & (ds.time <= end_date)
    evaluation_data = ds.isel(time=time_mask)
    
    if len(evaluation_data.time) == 0:
        print(f"⚠️  No data in experiment period. Using full dataset for demonstration.")
        evaluation_data = ds
    
    print(f"   Evaluation period: {evaluation_data.time.min().values} to {evaluation_data.time.max().values}")

# Identify available energy balance variables
energy_variables = {
    'scalarLatHeatTotal': 'Latent heat flux (LE) - Evapotranspiration energy',
    'scalarSenHeatTotal': 'Sensible heat flux (H) - Convective energy transfer',
    'scalarNetRadiation': 'Net radiation (Rn) - Available energy',
    'scalarGroundHeatFlux': 'Ground heat flux (G) - Soil energy storage'
}

available_energy_vars = {var: desc for var, desc in energy_variables.items() 
                       if var in evaluation_data.data_vars}

print(f"\n📊 Available Energy Balance Variables:")
for var, desc in available_energy_vars.items():
    print(f"   ⚡ {var}: {desc}")

# ET component variables for detailed analysis
et_components = {
    'scalarTotalET': 'Total evapotranspiration',
    'scalarCanopyTranspiration': 'Plant transpiration',
    'scalarCanopyEvaporation': 'Canopy interception evaporation', 
    'scalarGroundEvaporation': 'Soil surface evaporation',
    'scalarCanopySublimation': 'Canopy sublimation',
    'scalarSnowSublimation': 'Snow sublimation'
}

available_et_components = {var: desc for var, desc in et_components.items()
                         if var in evaluation_data.data_vars}

print(f"\n🌿 Available ET Component Variables:")
for var, desc in available_et_components.items():
    print(f"   🍃 {var}: {desc}")

# =============================================================================
# FLUXNET OBSERVATION DATA LOADING
# =============================================================================

print(f"\n📊 Loading FLUXNET Energy Flux Observations...")

# Load processed FLUXNET data
if fluxnet_processed_path.exists():
    fluxnet_df = pd.read_csv(fluxnet_processed_path)
    fluxnet_df['timestamp'] = pd.to_datetime(fluxnet_df['timestamp'])
    fluxnet_df.set_index('timestamp', inplace=True)
    
    print(f"✅ FLUXNET data loaded")
    print(f"   Records: {len(fluxnet_df)}")
    print(f"   Period: {fluxnet_df.index.min()} to {fluxnet_df.index.max()}")
    
    # Show key available variables (excluding QC flags)
    key_vars = [col for col in fluxnet_df.columns if not col.endswith('_QC') and col in 
                ['LE_F_MDS', 'H_F_MDS', 'RNET', 'G_F_MDS', 'ET_from_LE_mm_per_day', 'TA_F_MDS', 'VPD_F_MDS']]
    print(f"   Key variables: {key_vars}")
    
    # Check data quality
    if 'ET_from_LE_mm_per_day' in fluxnet_df.columns:
        et_valid = fluxnet_df['ET_from_LE_mm_per_day'].notna().sum()
        print(f"   Valid ET observations: {et_valid}/{len(fluxnet_df)} ({100*et_valid/len(fluxnet_df):.1f}%)")
        
        et_stats = fluxnet_df['ET_from_LE_mm_per_day'].describe()
        print(f"   ET range: {et_stats['min']:.2f} to {et_stats['max']:.2f} mm/day (mean: {et_stats['mean']:.2f})")
    
else:
    print(f"⚠️  FLUXNET data still not found at {fluxnet_processed_path}")
    print("   Proceeding with simulation-only analysis")
    fluxnet_df = None

# =============================================================================
# ENERGY BALANCE EVALUATION: LATENT HEAT FLUX (ET)
# =============================================================================

print(f"\n🌿 Latent Heat Flux (Evapotranspiration) Evaluation...")

# Check for ET data in simulation
et_var = None
conversion_factor = None

if 'scalarLatHeatTotal' in evaluation_data.data_vars:
    et_var = 'scalarLatHeatTotal'
    conversion_factor = 0.0353  # W/m² to mm/day
    print(f"   Using {et_var} (W/m²) → converted to mm/day")
elif 'scalarTotalET' in evaluation_data.data_vars:
    et_var = 'scalarTotalET'
    conversion_factor = 86400  # kg m-2 s-1 to mm/day  
    print(f"   Using {et_var} (kg m-2 s-1) → converted to mm/day")

if et_var and fluxnet_df is not None and 'ET_from_LE_mm_per_day' in fluxnet_df.columns:
    
    # Extract simulated ET and convert units
    sim_et_xr = evaluation_data[et_var]
    
    # If multi-dimensional, take spatial mean first
    if len(sim_et_xr.dims) > 1:
        spatial_dims = [dim for dim in sim_et_xr.dims if dim != 'time']
        sim_et_xr = sim_et_xr.mean(dim=spatial_dims)
        print(f"   📍 Averaged over spatial dimensions: {spatial_dims}")
    
    # Convert to pandas Series
    sim_et_raw = sim_et_xr.to_pandas()
    
    # Handle negative values (SUMMA convention: negative = leaving system)
    median_val = sim_et_raw.median()
    if median_val < 0:
        sim_et_raw = -sim_et_raw
        print(f"   ⚡ Inverted sign for {et_var} (negative values indicate water leaving system)")
    
    sim_et_mm_day = sim_et_raw * conversion_factor
    
    print(f"   ✅ SUMMA ET extracted and converted")
    print(f"   Raw range: {sim_et_raw.min():.3f} to {sim_et_raw.max():.3f}")
    print(f"   ET range: {sim_et_mm_day.min():.2f} to {sim_et_mm_day.max():.2f} mm/day")
    
    # Find common period and align data
    common_start = max(fluxnet_df.index.min(), sim_et_mm_day.index.min())
    common_end = min(fluxnet_df.index.max(), sim_et_mm_day.index.max())
    
    print(f"\n🔄 Data Alignment:")
    print(f"   FLUXNET period: {fluxnet_df.index.min()} to {fluxnet_df.index.max()}")
    print(f"   SUMMA period: {sim_et_mm_day.index.min()} to {sim_et_mm_day.index.max()}")
    print(f"   Common period: {common_start} to {common_end}")
    print(f"   Duration: {(common_end - common_start).days} days")
    
    # Resample to daily and filter to common period
    obs_daily = fluxnet_df['ET_from_LE_mm_per_day'].resample('D').mean().loc[common_start:common_end]
    sim_daily = sim_et_mm_day.resample('D').mean().loc[common_start:common_end]
    
    # Remove NaN values for metrics calculation
    valid_mask = ~(obs_daily.isna() | sim_daily.isna())
    obs_valid = obs_daily[valid_mask]
    sim_valid = sim_daily[valid_mask]
    
    print(f"   Valid paired observations: {len(obs_valid)} days")
    
    if len(obs_valid) > 10:  # Need minimum data for meaningful analysis
        
        # Calculate performance metrics
        print(f"\n📊 Evapotranspiration Performance Metrics:")
        
        rmse = np.sqrt(((obs_valid - sim_valid) ** 2).mean())
        bias = (sim_valid - obs_valid).mean()
        mae = np.abs(obs_valid - sim_valid).mean()
        
        # Handle correlation calculation
        try:
            corr = obs_valid.corr(sim_valid)
            if pd.isna(corr):
                corr = 0.0
        except:
            corr = 0.0
        
        # Nash-Sutcliffe Efficiency
        if obs_valid.var() > 0:
            nse = 1 - ((obs_valid - sim_valid) ** 2).sum() / ((obs_valid - obs_valid.mean()) ** 2).sum()
        else:
            nse = np.nan
        
        print(f"   📈 RMSE: {rmse:.2f} mm/day")
        print(f"   📈 Bias: {bias:+.2f} mm/day")
        print(f"   📈 MAE: {mae:.2f} mm/day") 
        print(f"   📈 Correlation: {corr:.3f}")
        print(f"   📈 Nash-Sutcliffe Efficiency: {nse:.3f}")
        
        # Seasonal analysis
        print(f"\n🗓️ Seasonal ET Performance:")
        seasonal_data = pd.DataFrame({
            'obs': obs_valid,
            'sim': sim_valid,
            'month': obs_valid.index.month
        })
        
        seasonal_stats = seasonal_data.groupby('month').apply(
            lambda x: pd.Series({
                'obs_mean': x['obs'].mean(),
                'sim_mean': x['sim'].mean(),
                'bias': x['sim'].mean() - x['obs'].mean(),
                'corr': x['obs'].corr(x['sim']) if len(x) > 3 else np.nan
            })
        )
        
        seasons = [(12, 'Winter'), (3, 'Spring'), (6, 'Summer'), (9, 'Fall')]
        for month, label in seasons:
            if month in seasonal_stats.index:
                stats = seasonal_stats.loc[month]
                print(f"   {label:6s}: Obs={stats['obs_mean']:.2f}, Sim={stats['sim_mean']:.2f} mm/day, "
                      f"Bias={stats['bias']:+.2f}, r={stats['corr']:.3f}")
        
        # Create comprehensive ET visualization
        print(f"\n📈 Creating ET comparison visualization...")
        
        fig, axes = plt.subplots(2, 2, figsize=(15, 10))
        
        # Time series comparison
        ax1 = axes[0, 0]
        obs_plot = obs_daily.dropna()
        sim_plot = sim_daily.dropna()
        
        ax1.plot(obs_plot.index, obs_plot.values, 'o-', label='FLUXNET ET', 
                 color='blue', alpha=0.7, markersize=1, linewidth=1)
        ax1.plot(sim_plot.index, sim_plot.values, '-', label='SUMMA ET', 
                 color='red', linewidth=2)
        ax1.set_title('Evapotranspiration Time Series', fontweight='bold')
        ax1.set_ylabel('ET (mm/day)')
        ax1.legend()
        ax1.grid(True, alpha=0.3)
        
        # Scatter plot
        ax2 = axes[0, 1]
        ax2.scatter(obs_valid, sim_valid, alpha=0.6, c='green', s=20)
        max_val = max(obs_valid.max(), sim_valid.max())
        min_val = min(obs_valid.min(), sim_valid.min())
        ax2.plot([min_val, max_val], [min_val, max_val], 'k--', label='1:1 line')
        ax2.set_xlabel('FLUXNET ET (mm/day)')
        ax2.set_ylabel('SUMMA ET (mm/day)')
        ax2.set_title('Observed vs. Simulated ET', fontweight='bold')
        ax2.legend()
        ax2.grid(True, alpha=0.3)
        
        # Add metrics text
        metrics_text = f'r = {corr:.3f}\\nRMSE = {rmse:.2f}\\nBias = {bias:+.2f}'
        ax2.text(0.05, 0.95, metrics_text, transform=ax2.transAxes,
                 bbox=dict(facecolor='white', alpha=0.8), fontsize=10, verticalalignment='top')
        
        # Monthly climatology
        ax3 = axes[1, 0]
        monthly_obs = obs_valid.groupby(obs_valid.index.month).mean()
        monthly_sim = sim_valid.groupby(sim_valid.index.month).mean()
        months = range(1, 13)
        month_names = ['J', 'F', 'M', 'A', 'M', 'J', 'J', 'A', 'S', 'O', 'N', 'D']
        
        # Ensure we have data for plotting
        full_monthly_obs = pd.Series(index=months, dtype=float)
        full_monthly_sim = pd.Series(index=months, dtype=float)
        
        for month in months:
            if month in monthly_obs.index:
                full_monthly_obs[month] = monthly_obs[month]
            if month in monthly_sim.index:
                full_monthly_sim[month] = monthly_sim[month]
        
        ax3.plot(months, full_monthly_obs, 'o-', label='FLUXNET', color='blue', linewidth=2)
        ax3.plot(months, full_monthly_sim, 'o-', label='SUMMA', color='red', linewidth=2)
        ax3.set_xticks(months)
        ax3.set_xticklabels(month_names)
        ax3.set_ylabel('ET (mm/day)')
        ax3.set_title('Monthly ET Climatology', fontweight='bold')
        ax3.legend()
        ax3.grid(True, alpha=0.3)
        
        # Residuals
        ax4 = axes[1, 1]
        residuals = sim_valid - obs_valid
        ax4.scatter(obs_valid.index, residuals, alpha=0.6, c='purple', s=15)
        ax4.axhline(y=0, color='black', linestyle='-', alpha=0.5)
        if residuals.std() > 0:
            ax4.axhline(y=residuals.std(), color='red', linestyle='--', alpha=0.5, label='+1σ')
            ax4.axhline(y=-residuals.std(), color='red', linestyle='--', alpha=0.5, label='-1σ')
            ax4.legend()
        ax4.set_ylabel('Residuals (mm/day)')
        ax4.set_title('Model Residuals', fontweight='bold')
        ax4.grid(True, alpha=0.3)
        
        plt.suptitle(f'Evapotranspiration Evaluation - {config_dict["DOMAIN_NAME"]} Boreal Forest', 
                     fontsize=14, fontweight='bold')
        plt.tight_layout()
        plt.show()
        
    else:
        print(f"⚠️  Insufficient overlapping data for analysis ({len(obs_valid)} days)")
        print("   Need at least 10 days of valid paired observations")

else:
    print("⚠️  Cannot perform ET evaluation:")
    if et_var is None:
        print("   - No suitable ET variable found in simulation")
        print(f"   - Available variables: {list(evaluation_data.data_vars.keys())[:10]}...")
    if fluxnet_df is None:
        print("   - No FLUXNET observation data available")
    if fluxnet_df is not None and 'ET_from_LE_mm_per_day' not in fluxnet_df.columns:
        print("   - No ET data in FLUXNET observations")

# =============================================================================
# ET COMPONENT ANALYSIS (IF AVAILABLE)
# =============================================================================

if available_et_components and len(available_et_components) > 1:
    print(f"\n🍃 ET Component Process Analysis...")
    
    # Extract and convert ET components
    et_comp_data = {}
    for comp_var, description in available_et_components.items():
        comp_xr = evaluation_data[comp_var]
        
        # If multi-dimensional, take spatial mean first
        if len(comp_xr.dims) > 1:
            spatial_dims = [dim for dim in comp_xr.dims if dim != 'time']
            comp_xr = comp_xr.mean(dim=spatial_dims)
        
        # Convert to pandas Series
        comp_ts = comp_xr.to_pandas()
        
        # Handle negative values
        median_val = comp_ts.median()
        if median_val < 0:
            comp_ts = -comp_ts
        
        # Convert from kg m-2 s-1 to mm/day
        comp_ts_mm_day = comp_ts * 86400
        et_comp_data[comp_var] = comp_ts_mm_day
        
        print(f"   🌱 {comp_var}: {comp_ts_mm_day.mean():.3f} ± {comp_ts_mm_day.std():.3f} mm/day")
    
    # Create component visualization
    print(f"\n📊 Creating ET component analysis...")
    
    fig, axes = plt.subplots(2, 1, figsize=(14, 8))
    
    # Component time series (monthly means)
    ax1 = axes[0]
    colors = plt.cm.Set2.colors
    
    for i, (comp_var, comp_data) in enumerate(et_comp_data.items()):
        if comp_var != 'scalarTotalET':  # Skip total for component plot
            monthly_comp = comp_data.resample('M').mean()
            ax1.plot(monthly_comp.index, monthly_comp.values, 
                    label=comp_var.replace('scalar', '').replace('Canopy', 'Can.').replace('Ground', 'Grd.'), 
                    color=colors[i % len(colors)], linewidth=2, marker='o', markersize=3)
    
    ax1.set_title('SUMMA ET Components - Monthly Means (Boreal Forest)', fontweight='bold')
    ax1.set_ylabel('ET Component (mm/day)')
    ax1.legend(loc='upper right', fontsize=9)
    ax1.grid(True, alpha=0.3)
    
    # Component seasonal climatology
    ax2 = axes[1]
    months = range(1, 13)
    month_names = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 
                  'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
    
    for i, (comp_var, comp_data) in enumerate(et_comp_data.items()):
        if comp_var != 'scalarTotalET':
            monthly_mean = comp_data.groupby(comp_data.index.month).mean()
            
            # Ensure all months are represented
            full_monthly = pd.Series(index=months, dtype=float)
            for month in months:
                if month in monthly_mean.index:
                    full_monthly[month] = monthly_mean[month]
                else:
                    full_monthly[month] = 0.0
            
            ax2.plot(months, full_monthly.values, 'o-', linewidth=2,
                    label=comp_var.replace('scalar', '').replace('Canopy', 'Can.').replace('Ground', 'Grd.'), 
                    color=colors[i % len(colors)])
    
    ax2.set_title('ET Component Seasonal Climatology (Boreal Forest)', fontweight='bold')
    ax2.set_xlabel('Month')
    ax2.set_ylabel('ET Component (mm/day)')
    ax2.set_xticks(months)
    ax2.set_xticklabels(month_names)
    ax2.legend(loc='upper right', fontsize=9)
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Calculate component contributions
    print(f"\n📊 Annual ET Component Contributions:")
    annual_totals = {}
    for comp_var, comp_data in et_comp_data.items():
        if comp_var != 'scalarTotalET':
            annual_total = comp_data.resample('A').sum().mean()  # Average annual total
            annual_totals[comp_var] = annual_total
    
    total_annual = sum(annual_totals.values())
    for comp_var, annual_total in annual_totals.items():
        percentage = (annual_total / total_annual) * 100 if total_annual > 0 else 0
        print(f"   🌿 {comp_var.replace('scalar', '')}: {annual_total:.1f} mm/yr ({percentage:.1f}%)")

**Ready to explore Basin Scale simulations?** → **[Tutorial 03a: Basin Scale - Lumped Watershed](./02a_basin_lumped.ipynb)**