# CONFLUENCE Tutorial - 9: NorSWE Large Sample Study (Snow Observation Network)

## Introduction

This tutorial extends our large sample studies approach to focus specifically on snow hydrology validation using the NorSWE (Northern Hemisphere Snow Water Equivalent) dataset. Building on the multi-site analysis framework demonstrated with FLUXNET, we now apply CONFLUENCE to systematically evaluate snow modeling performance across a network of snow observation stations throughout the northern hemisphere.

### NorSWE: A Critical Snow Observation Network

The NorSWE dataset represents one of the most comprehensive collections of snow observations available for hydrological model validation:

**Spatial Coverage**:
- **Northern Hemisphere focus**: Stations across snow-dominated regions
- **Nordic emphasis**: Dense coverage in Scandinavia, Finland, and Norway
- **Elevation gradients**: From coastal lowlands to high mountain regions
- **Climate diversity**: Maritime, continental, and Arctic snow climates

**Observational Richness**:
- **Snow Water Equivalent (SWE)**: Direct measurements of snow mass
- **Snow Depth**: Complementary snow pack structure information
- **Long-term records**: Multi-decade observations at many sites
- **Quality control**: Standardized measurement protocols and data processing

### Scientific Importance of Snow Validation

Snow processes represent some of the most challenging aspects of hydrological modeling:

**Physical Complexity**:
- **Phase transitions**: Freezing, melting, and sublimation processes
- **Energy balance**: Complex interactions between radiation, temperature, and wind
- **Layered structure**: Metamorphism and density changes within the snowpack
- **Spatial variability**: Strong elevation and aspect dependencies

**Hydrological Significance**:
- **Seasonal storage**: Snow acts as a natural reservoir in many regions
- **Timing control**: Snowmelt timing affects peak flows and water availability
- **Climate sensitivity**: Snow processes are highly sensitive to temperature changes
- **Extreme events**: Snow-rain transitions and rain-on-snow events

### Why NorSWE for Large Sample Snow Studies?

NorSWE provides unique advantages for systematic snow model evaluation:

1. **Process Focus**: Dedicated snow observations rather than mixed-variable datasets
2. **Measurement Quality**: Direct SWE measurements provide unambiguous validation targets
3. **Environmental Gradients**: Sites span elevation, latitude, and climate gradients
4. **Seasonal Dynamics**: Full seasonal cycle observations capture accumulation and ablation
5. **Nordic Expertise**: Stations operated by countries with world-leading snow science

### Research Questions for Snow Modeling

Large sample studies with NorSWE enable investigation of critical snow science questions:

1. **Model Physics**: How well do different snow process representations perform across environments?
2. **Climate Controls**: Which meteorological variables most strongly control snow accumulation and melt?
3. **Elevation Effects**: How do snow processes change with elevation and their representation in models?
4. **Regional Patterns**: Are there systematic regional biases in snow modeling?
5. **Seasonal Dynamics**: Can models capture both accumulation and ablation processes accurately?

### Unique Challenges of Snow Modeling

Snow modeling presents distinct challenges compared to other hydrological processes:

**Meteorological Sensitivity**:
- **Temperature thresholds**: Critical temperature for rain-snow transitions
- **Radiation balance**: Complex interactions between shortwave and longwave radiation
- **Wind effects**: Redistribution and sublimation processes
- **Humidity control**: Sublimation rates and surface energy balance

**Temporal Dynamics**:
- **Seasonal cycle**: Distinct accumulation and ablation seasons
- **Diurnal variation**: Strong daily cycles in energy balance
- **Event-based processes**: Individual storm impacts on snowpack
- **Intermittency**: Episodic accumulation and melt events

### CONFLUENCE's Snow Modeling Capabilities

CONFLUENCE's integration with SUMMA provides sophisticated snow modeling capabilities:

**Advanced Snow Physics**:
- **Multi-layer snowpack**: Explicit representation of snow stratigraphy
- **Energy balance**: Detailed surface energy balance calculations
- **Metamorphism**: Snow density and thermal property evolution
- **Liquid water**: Representation of liquid water flow through snow

**Flexible Parameterizations**:
- **Multiple options**: Different approaches for key snow processes
- **Sensitivity analysis**: Test different process representations
- **Decision analysis**: Compare alternative model structures
- **Uncertainty quantification**: Assess parameter and structural uncertainty

### NorSWE vs. FLUXNET: Complementary Approaches

While FLUXNET focused on energy balance validation, NorSWE provides complementary insights:

| Aspect | FLUXNET | NorSWE |
|--------|---------|--------|
| **Focus** | Energy/carbon fluxes | Snow mass/depth |
| **Process** | Continuous processes | Seasonal accumulation |
| **Validation** | Flux measurements | State variables |
| **Complexity** | Ecosystem interactions | Phase change physics |
| **Temporal** | Year-round | Seasonal focus |

### Expected Outcomes

This tutorial demonstrates several key capabilities for snow-focused large sample studies:

1. **Snow-Specific Configuration**: Adapt CONFLUENCE configurations for snow observation sites
2. **Seasonal Analysis**: Focus on snow accumulation and ablation periods
3. **Multi-Variable Validation**: Compare both SWE and snow depth simulations
4. **Elevation Analysis**: Examine how model performance varies with elevation
5. **Climate Sensitivity**: Assess model performance across different snow climates

### Methodological Considerations

Snow-focused large sample studies require specific methodological approaches:

**Site Selection**:
- **Elevation gradients**: Represent different snow accumulation zones
- **Climate diversity**: Include maritime, continental, and Arctic sites
- **Data quality**: Ensure reliable SWE and snow depth measurements
- **Temporal coverage**: Adequate seasonal cycle representation

**Analysis Approaches**:
- **Seasonal statistics**: Focus on peak SWE, melt timing, and duration
- **Process evaluation**: Assess accumulation vs. ablation performance
- **Threshold analysis**: Evaluate temperature and precipitation thresholds
- **Extreme events**: Analyze performance during unusual snow years

### Tutorial Structure

This tutorial follows the established large sample framework while emphasizing snow-specific aspects:

1. **NorSWE Site Selection**: Choose representative sites across snow environments
2. **Snow-Focused Configuration**: Adapt CONFLUENCE for snow observation validation
3. **Seasonal Analysis Setup**: Configure for snow season evaluation
4. **Batch Processing**: Execute CONFLUENCE across multiple snow sites
5. **Snow-Specific Results**: Collect and analyze SWE and snow depth outputs
6. **Elevation Analysis**: Examine performance across elevation gradients
7. **Climate Comparison**: Compare results across different snow climates

### Scientific Impact

NorSWE large sample studies contribute to advancing snow science:

- **Model Validation**: Systematic evaluation of snow process representations
- **Process Understanding**: Identify key controls on snow accumulation and melt
- **Climate Applications**: Improve projections of snow under changing climate
- **Operational Applications**: Enhance seasonal forecasting and water management
- **Uncertainty Assessment**: Quantify reliability of snow predictions

### Building on Previous Tutorials

This tutorial leverages all the skills developed throughout the CONFLUENCE series:

- **Point-scale understanding**: Foundation in vertical snow processes
- **Workflow automation**: Efficient multi-site processing
- **Configuration management**: Template-based site setup
- **Results analysis**: Statistical evaluation of multi-site results
- **Visualization**: Clear presentation of spatial and temporal patterns

By applying these skills to snow-focused validation, you'll gain expertise in one of the most challenging aspects of hydrological modeling while contributing to improved understanding of snow processes across diverse northern hemisphere environments.

The combination of CONFLUENCE's sophisticated snow modeling capabilities with NorSWE's comprehensive observation network provides a powerful framework for advancing snow science through systematic, large sample analysis.

## Step 1: Large Sample Snow Study Experimental Design and Site Selection
Transitioning from the FLUXNET energy balance focus to systematic snow hydrology validation, this step establishes the foundation for large sample snow modeling using the comprehensive NorSWE observation network. We demonstrate how CONFLUENCE's workflow efficiency enables systematic snow process evaluation across the full spectrum of northern hemisphere snow environments, from temperate mountain ranges to Arctic tundra.

### Snow Modeling Evolution: Energy Balance → Snow Process Physics

**From FLUXNET to NorSWE**: Complementary large sample approaches addressing different Earth system processes
- **FLUXNET focus**: Energy and carbon fluxes across diverse ecosystems and climate zones
- **NorSWE focus**: Snow mass and depth dynamics across elevation and climate gradients
- **Process emphasis**: Continuous energy exchange → Seasonal accumulation and ablation dynamics
- **Validation targets**: Flux measurements → State variable validation with strong seasonal signals
- **Environmental controls**: Ecosystem-climate interactions → Topographic-climate interactions

### The Unique Challenge of Snow Hydrology at Scale

Snow processes present distinct modeling challenges that require specialized large sample approaches:

**Physical Process Complexity**:
- **Phase transitions**: Accurate representation of freezing, melting, and sublimation across diverse environments
- **Energy balance**: Complex multi-component surface energy balance varying with snow cover evolution
- **Layered snowpack dynamics**: Multi-layer snow metamorphism, density evolution, and thermal property changes
- **Spatial variability**: Strong elevation, aspect, and microclimate dependencies requiring careful upscaling

**Temporal Process Dynamics**:
- **Seasonal accumulation**: Episodic storm-by-storm snow accumulation throughout winter months
- **Ablation complexity**: Energy-driven melt processes with strong diurnal and seasonal cycles
- **Intermittent dynamics**: Snow-rain transitions, sublimation events, and refreeze cycles
- **Critical timing**: Peak SWE timing, melt onset, and snow disappearance dates affecting water resources

**Environmental Control Complexity**:
- **Elevation gradients**: Systematic changes in temperature, precipitation phase, and energy balance
- **Climate regime diversity**: Maritime, continental, and Arctic snow climates with distinct process signatures
- **Latitude effects**: Radiation balance changes affecting snow energy balance across climate zones
- **Topographic interactions**: Slope, aspect, and shelter effects on snow accumulation and ablation

### NorSWE: Premier Snow Observation Network for Large Sample Studies

The Northern Hemisphere Snow Water Equivalent (NorSWE) dataset provides unparalleled opportunities for systematic snow model validation:

**Comprehensive Spatial Coverage**:
- **Northern Hemisphere scope**: Extensive coverage across all major snow-dominated regions
- **Elevation gradients**: From coastal lowlands to high mountain environments (0-3000+ m)
- **Climate diversity**: Maritime, continental, boreal, and Arctic snow climate regimes
- **Multi-national network**: Coordinated observations from leading snow science institutions

**High-Quality Observations**:
- **Direct SWE measurements**: Snow Water Equivalent providing unambiguous validation targets for snow mass
- **Complementary snow depth**: Structural validation enabling density and compaction process evaluation
- **Quality-controlled data**: Standardized measurement protocols and comprehensive data validation
- **Long-term records**: Multi-decade observations enabling robust statistical analysis and trend assessment

**Process-Focused Design**:
- **Snow-specific measurements**: Dedicated snow observations rather than mixed-variable datasets
- **Seasonal dynamics**: Complete seasonal cycle capture from accumulation through ablation
- **Event resolution**: Sufficient temporal resolution to capture individual storm and melt events
- **Measurement precision**: High-accuracy observations suitable for detailed model validation

### Strategic Site Selection for Snow Process Understanding

Large sample snow studies require specialized site selection strategies that differ from energy balance applications:

**Environmental Gradient Prioritization**:
- **Elevation stratification**: Systematic sampling across elevation zones representing different snow climates
- **Climate regime coverage**: Maritime, continental, and Arctic environments with distinct snow physics
- **Latitude transects**: Temperature and radiation gradients affecting snow accumulation and ablation
- **Regional representation**: Coverage of major snow-dominated mountain ranges and Arctic regions

**Data Quality Assessment**:
- **Multi-variable completeness**: Both SWE and snow depth data availability for comprehensive validation
- **Seasonal coverage**: Adequate representation of both accumulation and ablation seasons
- **Measurement precision**: High-quality observations suitable for detailed process evaluation
- **Temporal consistency**: Long-term records enabling robust statistical analysis

**Process Validation Objectives**:
- **Mass balance validation**: SWE observations providing fundamental snow mass validation
- **Structural validation**: Snow depth enabling snowpack density and compaction assessment
- **Seasonal timing**: Peak SWE, melt onset, and snow disappearance timing validation
- **Process-specific metrics**: Accumulation rate, ablation rate, and persistence evaluation

### Advanced Snow Physics Through SUMMA Integration

CONFLUENCE's integration with SUMMA provides sophisticated snow modeling capabilities essential for process-based validation:

**Multi-Layer Snowpack Representation**:
- **Explicit snow stratigraphy**: Layer-by-layer representation of snow metamorphism and density evolution
- **Thermal properties**: Dynamic snow thermal conductivity and heat capacity based on density and temperature
- **Liquid water dynamics**: Representation of liquid water percolation, retention, and refreezing processes
- **Snow-vegetation interactions**: Canopy interception, sublimation, and under-canopy snow processes

**Comprehensive Energy Balance**:
- **Radiation components**: Shortwave and longwave radiation balance with snow albedo evolution
- **Turbulent fluxes**: Sensible and latent heat exchange with detailed aerodynamic formulations
- **Ground heat flux**: Soil-snow interface heat transfer affecting basal melt processes
- **Precipitation phase**: Temperature-dependent rain-snow partitioning algorithms

**Flexible Process Parameterizations**:
- **Multiple options**: Alternative formulations for key snow processes enabling systematic evaluation
- **Decision analysis**: Comparison of different process representations across environmental gradients
- **Parameter sensitivity**: Assessment of parameter uncertainty across diverse snow environments
- **Process uncertainty**: Quantification of structural uncertainty in snow process representation

### Scientific Innovation: Continental-Scale Snow Process Validation

This large sample approach enables breakthrough capabilities for snow science:

❄️ **Process Generalization**: Statistical identification of universal snow process patterns vs. environment-specific behaviors across northern hemisphere snow regimes

⛰️ **Elevation-Climate Synthesis**: Systematic quantification of how snow processes change across elevation and climate gradients

🌨️ **Seasonal Dynamics**: Comprehensive evaluation of both accumulation and ablation season process representation in models

📊 **Multi-Variable Integration**: Combined SWE and snow depth validation providing both mass balance and structural validation

🔍 **Uncertainty Quantification**: Robust assessment of snow model reliability across diverse northern hemisphere environments

📈 **Model Physics Evaluation**: Systematic testing of snow energy balance and phase change representations across environmental gradients

The experimental design demonstrated here establishes the foundation for transforming snow hydrology from individual mountain case studies to systematic, statistically robust analysis across the full spectrum of northern hemisphere snow environments, enabling confident identification of universal snow process patterns while quantifying regional variations and model uncertainties.


In [None]:
# =============================================================================
# STEP 1: LARGE SAMPLE SNOW STUDY EXPERIMENTAL DESIGN AND SITE SELECTION
# =============================================================================

import sys
import os
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import subprocess
import yaml
from datetime import datetime
import xarray as xr
import seaborn as sns
import warnings

# Suppress warnings for cleaner output
warnings.filterwarnings("ignore", category=FutureWarning)
warnings.filterwarnings("ignore", category=UserWarning)

# Add CONFLUENCE to path
confluence_path = Path('../').resolve()
sys.path.append(str(confluence_path))

# Set up plotting style for snow visualization
plt.style.use('default')
sns.set_palette("coolwarm")
%matplotlib inline

print("=== CONFLUENCE Tutorial 04b: NorSWE Large Sample Snow Study ===")
print("Snow hydrology scaling: Single sites to systematic northern hemisphere validation")

# =============================================================================
# LARGE SAMPLE SNOW EXPERIMENTAL DESIGN CONFIGURATION
# =============================================================================

print(f"\n❄️ Large Sample Snow Experimental Design Configuration...")

# Define the large sample snow experiment parameters
snow_sample_config = {
    # Experiment identification
    'experiment_name': 'norswe_large_sample_tutorial',
    'experiment_type': 'multi_site_snow_validation',
    'analysis_scale': 'northern_hemisphere_snow_gradients',
    
    # Site selection criteria specific to snow hydrology
    'max_stations': 25,  # Manageable number for tutorial demonstration
    'site_selection_strategy': 'elevation_climate_diversity',
    'min_swe_completeness': 30.0,  # Minimum % SWE data completeness
    'min_depth_completeness': 30.0,  # Minimum % snow depth data completeness
    'elevation_range': (0, 3000),  # Elevation range for snow sites (meters)
    'latitude_range': (45, 75),   # Northern snow regions focus
    
    # Data and processing configuration
    'norswe_path': '/work/comphyd_lab/data/geospatial-data/NorSWE/NorSWE-NorEEN_1979-2021_v2.nc',
    'template_config': '../CONFLUENCE/0_config_files/config_norswe_template.yaml',
    'config_output_dir': '../CONFLUENCE/0_config_files/norswe',
    'norswe_script': './run_norswe-2.py',
    'base_data_path': '/work/comphyd_lab/data/CONFLUENCE_data/norswe',
    
    # Temporal filtering for focused analysis
    'start_year': 2010,
    'end_year': 2020,
    'focus_months': [10, 11, 12, 1, 2, 3, 4, 5],  # Snow season months
    
    # Processing options
    'batch_processing': True,
    'parallel_execution': True,
    'dry_run_mode': False,  # Set to True for testing without job submission
    
    # Snow-specific analysis objectives
    'primary_variables': ['SWE', 'snow_depth', 'snow_season_length', 'peak_SWE_timing'],
    'comparison_metrics': ['correlation', 'rmse', 'bias', 'nse', 'seasonal_timing'],
    'snow_processes': ['accumulation', 'ablation', 'seasonal_dynamics', 'elevation_gradients']
}

print(f"✅ Snow experimental design configured")
print(f"   ❄️ Experiment: {snow_sample_config['experiment_name']}")
print(f"   🏔️ Scale: {snow_sample_config['analysis_scale']}")
print(f"   📈 Strategy: {snow_sample_config['site_selection_strategy']}")
print(f"   🎯 Max stations: {snow_sample_config['max_stations']}")
print(f"   📅 Period: {snow_sample_config['start_year']}-{snow_sample_config['end_year']}")

# =============================================================================
# CREATE SNOW EXPERIMENT DIRECTORY STRUCTURE
# =============================================================================

print(f"\n📁 Creating Snow Experiment Directory Structure...")

# Create main experiment directory
experiment_dir = Path(f"./experiments/{snow_sample_config['experiment_name']}")
experiment_dir.mkdir(parents=True, exist_ok=True)

# Create subdirectories for snow analysis organization
subdirs = {
    'configs': 'Generated CONFLUENCE configuration files for snow sites',
    'logs': 'Snow modeling execution logs and monitoring',
    'results': 'Aggregated snow validation results and analysis outputs',
    'plots': 'Snow visualization outputs and performance maps',
    'reports': 'Snow validation summary reports and statistics',
    'snow_data': 'Processed NorSWE observation data extracts'
}

for subdir, description in subdirs.items():
    (experiment_dir / subdir).mkdir(exist_ok=True)
    print(f"   📁 {subdir}/: {description}")

# Save experiment configuration
config_file = experiment_dir / 'snow_experiment_config.yaml'
with open(config_file, 'w') as f:
    yaml.dump(snow_sample_config, f, default_flow_style=False)

print(f"✅ Snow experiment directory structure created: {experiment_dir}")
print(f"   📋 Configuration saved: {config_file}")

# =============================================================================
# LOAD AND EXPLORE NORSWE DATASET
# =============================================================================

print(f"\n🌨️ Loading and Exploring NorSWE Dataset...")

# Check if NorSWE file exists
norswe_file = Path(snow_sample_config['norswe_path'])
if not norswe_file.exists():
    print(f"❌ NorSWE file not found: {norswe_file}")
    print(f"   Please ensure the NorSWE dataset is available at the specified path")
    raise FileNotFoundError(f"NorSWE dataset not found: {norswe_file}")

try:
    # Open NorSWE dataset for exploration
    print(f"   📊 Opening NorSWE dataset: {norswe_file.name}")
    ds = xr.open_dataset(norswe_file)
    
    print(f"✅ NorSWE dataset loaded successfully")
    print(f"   📊 Dataset structure:")
    print(f"      Time range: {ds.time.values[0]} to {ds.time.values[-1]}")
    print(f"      Number of stations: {len(ds.station_id)}")
    print(f"      Spatial extent: {ds.lat.min().values:.1f}°N to {ds.lat.max().values:.1f}°N")
    print(f"      Elevation range: {ds.elevation.min().values:.0f}m to {ds.elevation.max().values:.0f}m")
    
    # Display key variables
    print(f"   📋 Key snow variables:")
    snow_vars = {
        'snw': 'Snow Water Equivalent (SWE) [kg/m²]',
        'snd': 'Snow Depth [m]'
    }
    
    for var, description in snow_vars.items():
        if var in ds.data_vars:
            print(f"      ❄️ {var}: {description}")
            
            # Basic statistics
            data_vals = ds[var].values
            valid_data = data_vals[~np.isnan(data_vals)]
            
            if len(valid_data) > 0:
                print(f"         Range: {valid_data.min():.2f} to {valid_data.max():.2f}")
                print(f"         Valid observations: {len(valid_data):,}")
    
    # Check coordinate information
    print(f"   🗺️ Coordinate information:")
    print(f"      Coordinate variables: {list(ds.coords.keys())}")
    
    # Spatial distribution overview
    print(f"   🌍 Spatial distribution:")
    print(f"      Latitude range: {ds.lat.min().values:.1f}° to {ds.lat.max().values:.1f}°N")
    print(f"      Longitude range: {ds.lon.min().values:.1f}° to {ds.lon.max().values:.1f}°E")
    print(f"      Elevation range: {ds.elevation.min().values:.0f} to {ds.elevation.max().values:.0f} m")
    
    # Close dataset to free memory
    ds.close()
    
except Exception as e:
    print(f"❌ Error loading NorSWE dataset: {e}")
    raise

# =============================================================================
# PROCESS NORSWE STATION DATA FOR SITE SELECTION
# =============================================================================

print(f"\n🔄 Processing NorSWE Station Data for Site Selection...")

# Process or load existing station data
stations_csv = 'norswe_stations.csv'

if Path(stations_csv).exists():
    print(f"   📊 Loading existing processed station data from {stations_csv}")
    stations_df = pd.read_csv(stations_csv)
    print(f"   ✅ Loaded {len(stations_df)} stations from existing file")
else:
    print(f"   ⚠️ Processed station file not found: {stations_csv}")
    print(f"   📝 Instructions: Run the processing script first to generate station metadata")
    print(f"      python run_norswe-2.py --norswe_path {snow_sample_config['norswe_path']} --output_dir ./temp --force_reprocess")
    
    # Create a minimal demonstration dataset
    print(f"   🔧 Creating demonstration dataset for tutorial purposes...")
    
    # Open dataset again to extract basic station info
    ds = xr.open_dataset(norswe_file)
    
    # Create basic station dataframe
    stations_df = pd.DataFrame({
        'station_id': ds.station_id.values,
        'station_name': [f"Station_{i:04d}" for i in range(len(ds.station_id))],
        'lat': ds.lat.values,
        'lon': ds.lon.values,
        'elevation': ds.elevation.values,
        'source': ds.source.values if 'source' in ds.variables else ['NorSWE'] * len(ds.station_id),
        'swq_completeness': np.random.uniform(20, 95, len(ds.station_id)),  # Demo values
        'snd_completeness': np.random.uniform(15, 90, len(ds.station_id)),  # Demo values
    })
    
    # Add required CONFLUENCE formatting
    buffer = 0.1
    stations_df['BOUNDING_BOX_COORDS'] = (
        (stations_df['lat'] + buffer).astype(str) + '/' +
        (stations_df['lon'] - buffer).astype(str) + '/' +
        (stations_df['lat'] - buffer).astype(str) + '/' +
        (stations_df['lon'] + buffer).astype(str)
    )
    
    stations_df['POUR_POINT_COORDS'] = stations_df['lat'].astype(str) + '/' + stations_df['lon'].astype(str)
    stations_df['Watershed_Name'] = stations_df['station_id'].str.replace('[^a-zA-Z0-9_]', '_', regex=True)
    
    ds.close()

print(f"\n📊 NorSWE Station Database Structure:")
print(f"   Total stations available: {len(stations_df)}")
print(f"   Database columns ({len(stations_df.columns)}):")
for i, col in enumerate(stations_df.columns):
    print(f"     {i+1:2d}. {col}")

# =============================================================================
# SNOW-SPECIFIC ENVIRONMENTAL GRADIENT ANALYSIS
# =============================================================================

print(f"\n🏔️ Snow-Specific Environmental Gradient Analysis...")

# Analyze environmental diversity for snow modeling
snow_environmental_summary = {}

# Elevation distribution analysis (critical for snow)
elevation_stats = stations_df['elevation'].describe()
snow_environmental_summary['elevation_range'] = (elevation_stats['min'], elevation_stats['max'])
print(f"   ⛰️ Elevation diversity: {elevation_stats['min']:.0f} to {elevation_stats['max']:.0f} m")
print(f"      Mean elevation: {elevation_stats['mean']:.0f} m")
print(f"      Elevation quartiles: Q1={elevation_stats['25%']:.0f}m, Q3={elevation_stats['75%']:.0f}m")

# Latitude distribution (snow climate indicator)
latitude_stats = stations_df['lat'].describe()
snow_environmental_summary['latitude_range'] = (latitude_stats['min'], latitude_stats['max'])
print(f"   🌍 Latitude range: {latitude_stats['min']:.1f}° to {latitude_stats['max']:.1f}°N")
print(f"      Snow climate zones: Arctic ({latitude_stats['max']:.1f}°N) to temperate ({latitude_stats['min']:.1f}°N)")

# Data completeness analysis (critical for snow validation)
if 'swq_completeness' in stations_df.columns and 'snd_completeness' in stations_df.columns:
    swe_completeness_stats = stations_df['swq_completeness'].describe()
    depth_completeness_stats = stations_df['snd_completeness'].describe()
    
    print(f"   📊 SWE data completeness: {swe_completeness_stats['min']:.1f}% to {swe_completeness_stats['max']:.1f}%")
    print(f"      Mean SWE completeness: {swe_completeness_stats['mean']:.1f}%")
    print(f"   📏 Snow depth completeness: {depth_completeness_stats['min']:.1f}% to {depth_completeness_stats['max']:.1f}%")
    print(f"      Mean depth completeness: {depth_completeness_stats['mean']:.1f}%")

# Source/type analysis if available
if 'source' in stations_df.columns:
    source_counts = stations_df['source'].value_counts()
    snow_environmental_summary['data_sources'] = len(source_counts)
    print(f"   📡 Data source diversity: {len(source_counts)} different sources")
    print(f"      Primary sources: {', '.join(source_counts.head(3).index.tolist())}")

# =============================================================================
# STRATEGIC SNOW SITE SELECTION
# =============================================================================

print(f"\n🎯 Strategic Snow Site Selection for Large Sample Analysis...")

# Implement snow-specific site selection strategy
selection_strategy = snow_sample_config['site_selection_strategy']
max_stations = snow_sample_config['max_stations']
min_swe_completeness = snow_sample_config['min_swe_completeness']
min_depth_completeness = snow_sample_config['min_depth_completeness']

print(f"   Strategy: {selection_strategy}")
print(f"   Target stations: {max_stations}")
print(f"   SWE completeness threshold: {min_swe_completeness}%")
print(f"   Depth completeness threshold: {min_depth_completeness}%")

# Apply data quality filters
quality_filtered = stations_df[
    (stations_df['swq_completeness'] >= min_swe_completeness) &
    (stations_df['snd_completeness'] >= min_depth_completeness)
].copy()

print(f"   🔍 Quality filtering results:")
print(f"      Initial stations: {len(stations_df)}")
print(f"      After quality filter: {len(quality_filtered)}")
print(f"      Filtered out: {len(stations_df) - len(quality_filtered)} low-quality stations")

if len(quality_filtered) == 0:
    print(f"   ⚠️ No stations meet quality criteria. Relaxing thresholds...")
    # Relax thresholds for demonstration
    min_swe_completeness = 10.0
    min_depth_completeness = 10.0
    quality_filtered = stations_df[
        (stations_df['swq_completeness'] >= min_swe_completeness) &
        (stations_df['snd_completeness'] >= min_depth_completeness)
    ].copy()
    print(f"      Relaxed filter results: {len(quality_filtered)} stations")

if selection_strategy == 'elevation_climate_diversity':
    # Strategy: Maximize elevation and climate diversity for snow modeling
    selected_sites = []
    
    # Create elevation bands for snow environments
    elevation_bands = [
        (0, 500, 'Lowland'),
        (500, 1000, 'Montane'),
        (1000, 1500, 'Subalpine'),
        (1500, 2000, 'Alpine'),
        (2000, 10000, 'High Alpine')
    ]
    
    # Create latitude bands for climate zones
    latitude_bands = [
        (45, 55, 'Temperate'),
        (55, 65, 'Boreal'),
        (65, 75, 'Arctic')
    ]
    
    sites_per_zone = max(1, max_stations // (len(elevation_bands) * len(latitude_bands)))
    
    print(f"   🏔️ Elevation-Climate sampling strategy:")
    print(f"      Elevation bands: {len(elevation_bands)}")
    print(f"      Climate bands: {len(latitude_bands)}")
    print(f"      Target sites per zone: ~{sites_per_zone}")
    
    for elev_min, elev_max, elev_label in elevation_bands:
        for lat_min, lat_max, lat_label in latitude_bands:
            
            # Find stations in this elevation-climate zone
            zone_stations = quality_filtered[
                (quality_filtered['elevation'] >= elev_min) &
                (quality_filtered['elevation'] < elev_max) &
                (quality_filtered['lat'] >= lat_min) &
                (quality_filtered['lat'] < lat_max)
            ]
            
            if len(zone_stations) > 0:
                # Prioritize by data completeness
                zone_stations = zone_stations.sort_values(
                    by=['swq_completeness', 'snd_completeness'], 
                    ascending=False
                )
                
                # Sample up to sites_per_zone from this climate-elevation zone
                n_sample = min(sites_per_zone, len(zone_stations))
                sampled = zone_stations.head(n_sample)
                selected_sites.extend(sampled.index.tolist())
                
                print(f"      {elev_label} × {lat_label}: {n_sample}/{len(zone_stations)} stations selected")
                
                if len(selected_sites) >= max_stations:
                    break
        
        if len(selected_sites) >= max_stations:
            break
    
    # Trim to exact number if over
    if len(selected_sites) > max_stations:
        selected_sites = selected_sites[:max_stations]
    
    selected_df = quality_filtered.loc[selected_sites].copy()

elif selection_strategy == 'random_sampling':
    # Strategy 2: Random sampling from quality-filtered stations
    selected_df = quality_filtered.sample(n=min(max_stations, len(quality_filtered)), random_state=42)

else:
    # Default: use highest quality sites
    selected_df = quality_filtered.sort_values(
        by=['swq_completeness', 'snd_completeness'], 
        ascending=False
    ).head(max_stations).copy()

print(f"✅ Snow site selection complete: {len(selected_df)} stations selected")

# =============================================================================
# VALIDATE SNOW MODELING TEMPLATE CONFIGURATION
# =============================================================================

print(f"\n📋 Validating Snow Modeling Template Configuration...")

template_path = Path(snow_sample_config['template_config'])

if template_path.exists():
    print(f"✅ Snow template configuration found: {template_path}")
    
    # Load and verify template structure
    try:
        with open(template_path, 'r') as f:
            template_config = yaml.safe_load(f)
        
        # Check key template parameters for snow modeling
        required_keys = ['DOMAIN_NAME', 'POUR_POINT_COORDS', 'BOUNDING_BOX_COORDS', 
                        'HYDROLOGICAL_MODEL', 'EXPERIMENT_TIME_START', 'EXPERIMENT_TIME_END']
        
        missing_keys = [key for key in required_keys if key not in template_config]
        
        if not missing_keys:
            print(f"✅ Snow template validation successful")
            print(f"   📝 Template contains all required parameters for snow modeling")
            
            # Check snow-specific settings if available
            if 'HYDROLOGICAL_MODEL' in template_config:
                model = template_config['HYDROLOGICAL_MODEL']
                print(f"   ❄️ Hydrological model: {model}")
                
                if model.upper() == 'SUMMA':
                    print(f"      ✅ SUMMA selected - excellent for snow physics modeling")
                    print(f"      🌨️ SUMMA capabilities: Multi-layer snowpack, energy balance, snow-vegetation interactions")
        else:
            print(f"⚠️  Snow template missing required keys: {missing_keys}")
            
    except Exception as e:
        print(f"❌ Snow template validation failed: {e}")
else:
    print(f"❌ Snow template configuration not found: {template_path}")
    print(f"   Please ensure the snow modeling template file exists")

# =============================================================================
# COMPREHENSIVE SNOW SITE SELECTION VISUALIZATION
# =============================================================================

print(f"\n📈 Creating Comprehensive Snow Site Selection Visualization...")

# Create comprehensive snow site selection visualization
fig, axes = plt.subplots(2, 3, figsize=(20, 12))

# Map 1: Global distribution with elevation coloring (top left)
ax1 = axes[0, 0]
ax1.scatter(stations_df['lon'], stations_df['lat'], 
           c='lightgray', alpha=0.3, s=15, label='Available stations')
scatter1 = ax1.scatter(selected_df['lon'], selected_df['lat'], 
                      c=selected_df['elevation'], cmap='terrain', s=60, 
                      edgecolors='black', linewidth=0.5, label='Selected stations')

ax1.set_xlabel('Longitude')
ax1.set_ylabel('Latitude')
ax1.set_title(f'Snow Site Selection: Elevation Distribution\\n{len(selected_df)} of {len(stations_df)} stations')
ax1.grid(True, alpha=0.3)
ax1.legend()

# Add colorbar for elevation
cbar1 = plt.colorbar(scatter1, ax=ax1)
cbar1.set_label('Elevation (m)')

# Map 2: Elevation distribution (top middle)
ax2 = axes[0, 1]
ax2.hist(stations_df['elevation'], bins=20, alpha=0.5, color='lightblue', 
         label='Available', edgecolor='blue')
ax2.hist(selected_df['elevation'], bins=15, alpha=0.7, color='red', 
         label='Selected', edgecolor='darkred')
ax2.set_xlabel('Elevation (m)')
ax2.set_ylabel('Number of Stations')
ax2.set_title('Elevation Distribution\\n(Snow Climate Zones)')
ax2.legend()
ax2.grid(True, alpha=0.3, axis='y')

# Add elevation zone labels
elevation_zones = [(0, 500, 'Lowland'), (500, 1000, 'Montane'), 
                  (1000, 1500, 'Subalpine'), (1500, 2000, 'Alpine'), (2000, 3000, 'High Alpine')]
for i, (min_elev, max_elev, label) in enumerate(elevation_zones):
    if i % 2 == 0:  # Every other label to avoid crowding
        ax2.axvspan(min_elev, max_elev, alpha=0.1, color='gray')
        ax2.text((min_elev + max_elev)/2, ax2.get_ylim()[1]*0.9, label, 
                ha='center', fontsize=8, rotation=45)

# Map 3: Latitude vs elevation scatter (top right)
ax3 = axes[0, 2]
ax3.scatter(selected_df['lat'], selected_df['elevation'], 
           c=selected_df['swq_completeness'], cmap='viridis', s=80,
           edgecolors='black', linewidth=0.5)
ax3.set_xlabel('Latitude (°N)')
ax3.set_ylabel('Elevation (m)')
ax3.set_title('Selected Sites: Climate-Elevation Relationship')
ax3.grid(True, alpha=0.3)

# Add colorbar for SWE completeness
scatter3 = ax3.collections[0]
cbar3 = plt.colorbar(scatter3, ax=ax3)
cbar3.set_label('SWE Data Completeness (%)')

# Map 4: Data completeness comparison (bottom left)
ax4 = axes[1, 0]
ax4.scatter(selected_df['swq_completeness'], selected_df['snd_completeness'], 
           s=60, alpha=0.7, c='purple', edgecolors='black', linewidth=0.5)
ax4.set_xlabel('SWE Data Completeness (%)')
ax4.set_ylabel('Snow Depth Data Completeness (%)')
ax4.set_title('Data Quality Assessment\\n(Selected Snow Sites)')
ax4.grid(True, alpha=0.3)

# Add quality threshold lines
ax4.axvline(x=min_swe_completeness, color='red', linestyle='--', alpha=0.7, 
           label=f'SWE threshold ({min_swe_completeness}%)')
ax4.axhline(y=min_depth_completeness, color='red', linestyle='--', alpha=0.7,
           label=f'Depth threshold ({min_depth_completeness}%)')
ax4.legend()

# Map 5: Longitude distribution (bottom middle)
ax5 = axes[1, 1]
ax5.hist(selected_df['lon'], bins=15, color='skyblue', alpha=0.7, edgecolor='black')
ax5.set_xlabel('Longitude (°E)')
ax5.set_ylabel('Number of Stations')
ax5.set_title('Longitudinal Distribution\\n(Continental Snow Regimes)')
ax5.grid(True, alpha=0.3, axis='y')

# Add regional labels
if selected_df['lon'].min() < 0:
    ax5.text(-100, ax5.get_ylim()[1]*0.8, 'North America', ha='center', fontsize=9, 
            bbox=dict(boxstyle="round,pad=0.3", facecolor="yellow", alpha=0.5))
if selected_df['lon'].max() > 0:
    ax5.text(50, ax5.get_ylim()[1]*0.8, 'Eurasia', ha='center', fontsize=9,
            bbox=dict(boxstyle="round,pad=0.3", facecolor="lightgreen", alpha=0.5))

# Map 6: Selection summary statistics (bottom right)
ax6 = axes[1, 2]

# Create summary statistics
selection_stats = [
    ('Total Available', len(stations_df)),
    ('Quality Filtered', len(quality_filtered)),
    ('Finally Selected', len(selected_df)),
    ('High Quality\\n(>80% complete)', len(selected_df[selected_df['swq_completeness'] > 80])),
    ('Mountain Sites\\n(>1000m)', len(selected_df[selected_df['elevation'] > 1000]))
]

categories = [stat[0] for stat in selection_stats]
counts = [stat[1] for stat in selection_stats]
colors = ['lightblue', 'yellow', 'green', 'orange', 'purple']

bars = ax6.bar(categories, counts, color=colors, alpha=0.7, edgecolor='black')

# Add value labels on bars
for bar, count in zip(bars, counts):
    ax6.text(bar.get_x() + bar.get_width()/2., bar.get_height() + 0.5,
            str(count), ha='center', va='bottom', fontweight='bold')

ax6.set_ylabel('Number of Stations')
ax6.set_title('Snow Site Selection Summary')
ax6.tick_params(axis='x', rotation=45)
ax6.grid(True, alpha=0.3, axis='y')

plt.suptitle(f'NorSWE Large Sample Snow Study - Site Selection Analysis\\n{snow_sample_config["experiment_name"]}', 
             fontsize=16, fontweight='bold')
plt.tight_layout()

# Save visualization
selection_plot_path = experiment_dir / 'plots' / 'snow_site_selection_overview.png'
plt.savefig(selection_plot_path, dpi=300, bbox_inches='tight')
plt.show()

print(f"✅ Snow site selection visualization saved: {selection_plot_path}")

# =============================================================================
# SAVE SELECTED SNOW SITES FOR PROCESSING
# =============================================================================

print(f"\n💾 Saving Selected Snow Sites for Large Sample Processing...")

# Save selected sites to CSV
selected_sites_csv = experiment_dir / 'selected_snow_sites.csv'
selected_df.to_csv(selected_sites_csv, index=False)

print(f"✅ Selected snow sites saved: {selected_sites_csv}")
print(f"   ❄️ Sites ready for processing: {len(selected_df)}")

# Create comprehensive summary report
summary_report = experiment_dir / 'reports' / 'snow_site_selection_summary.txt'

with open(summary_report, 'w') as f:
    f.write("NorSWE Large Sample Snow Study - Site Selection Summary\n")
    f.write("=" * 55 + "\n\n")
    f.write(f"Experiment: {snow_sample_config['experiment_name']}\n")
    f.write(f"Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n")
    f.write(f"Selection Strategy: {snow_sample_config['site_selection_strategy']}\n\n")
    
    f.write(f"SITE SELECTION RESULTS:\n")
    f.write(f"  Available stations: {len(stations_df)}\n")
    f.write(f"  Quality filtered: {len(quality_filtered)}\n")
    f.write(f"  Selected stations: {len(selected_df)}\n")
    f.write(f"  Selection ratio: {len(selected_df)/len(stations_df)*100:.1f}%\n\n")
    
    f.write(f"ENVIRONMENTAL GRADIENTS:\n")
    f.write(f"  Elevation range: {selected_df['elevation'].min():.0f} to {selected_df['elevation'].max():.0f} m\n")
    f.write(f"  Latitude range: {selected_df['lat'].min():.1f}° to {selected_df['lat'].max():.1f}°N\n")
    f.write(f"  Longitude range: {selected_df['lon'].min():.1f}° to {selected_df['lon'].max():.1f}°E\n\n")
    
    f.write(f"DATA QUALITY ASSESSMENT:\n")
    f.write(f"  SWE completeness: {selected_df['swq_completeness'].mean():.1f}% ± {selected_df['swq_completeness'].std():.1f}%\n")
    f.write(f"  Depth completeness: {selected_df['snd_completeness'].mean():.1f}% ± {selected_df['snd_completeness'].std():.1f}%\n")
    f.write(f"  High quality sites (>80%): {len(selected_df[selected_df['swq_completeness'] > 80])}\n\n")
    
    f.write(f"SNOW ENVIRONMENT CHARACTERISTICS:\n")
    f.write(f"  Mountain sites (>1000m): {len(selected_df[selected_df['elevation'] > 1000])}\n")
    f.write(f"  High elevation sites (>1500m): {len(selected_df[selected_df['elevation'] > 1500])}\n")
    f.write(f"  Arctic sites (>65°N): {len(selected_df[selected_df['lat'] > 65])}\n")
    f.write(f"  Boreal sites (55-65°N): {len(selected_df[(selected_df['lat'] >= 55) & (selected_df['lat'] <= 65)])}\n\n")
    
    f.write(f"ANALYSIS PERIOD:\n")
    f.write(f"  Start year: {snow_sample_config['start_year']}\n")
    f.write(f"  End year: {snow_sample_config['end_year']}\n")
    f.write(f"  Focus months: {snow_sample_config['focus_months']}\n\n")
    
    f.write(f"Note: This site selection prioritizes elevation and climate diversity for comprehensive snow process validation.\n")

print(f"✅ Summary report saved: {summary_report}")

# =============================================================================
# STORE RESULTS FOR SUBSEQUENT STEPS
# =============================================================================

# Store key variables for use in subsequent notebook cells
complete_stations = selected_df.copy()
experiment_config = snow_sample_config.copy()

print(f"\n🎯 Large Sample Snow Study Configuration Summary:")

configuration_summary = [
    f"Experimental design: {snow_sample_config['experiment_type']}",
    f"Analysis scale: {snow_sample_config['analysis_scale']}",
    f"Site selection: {len(selected_df)} stations from {len(stations_df)} available",
    f"Elevation diversity: {selected_df['elevation'].min():.0f}m to {selected_df['elevation'].max():.0f}m",
    f"Climate coverage: {selected_df['lat'].min():.1f}°N to {selected_df['lat'].max():.1f}°N latitude",
    f"Template configuration: Validated and ready for snow modeling"
]

for summary in configuration_summary:
    print(f"   ✅ {summary}")

print(f"\n❄️ Snow Science Objectives:")
snow_objectives = [
    f"Multi-variable validation: SWE and snow depth across elevation gradients",
    f"Seasonal dynamics: Accumulation vs. ablation process performance",
    f"Climate sensitivity: Snow modeling across temperature and precipitation regimes",
    f"Elevation effects: Systematic assessment of topographic snow process controls",
    f"Process generalization: Universal vs. environment-specific snow physics patterns"
]

for objective in snow_objectives:
    print(f"   🎓 {objective}")

print(f"\n🚀 Ready for Large Sample Snow Processing:")
next_steps = [
    f"Snow template configuration: Validated for multi-site snow modeling deployment",
    f"Site selection: {len(selected_df)} diverse snow environments prepared for analysis",
    f"Batch processing: Ready for systematic CONFLUENCE snow simulation execution",
    f"Output analysis: Framework prepared for multi-site snow validation result aggregation",
    f"Statistical analysis: Tools ready for comparative snow hydrology insights"
]

for step in next_steps:
    print(f"   ✅ {step}")

print(f"\n✅ Step 1 Complete: Large sample snow experiment designed and configured")
print(f"   ❄️ Next: Execute systematic multi-site CONFLUENCE snow processing")
print(f"   📊 Goal: Comparative snow hydrology across northern hemisphere environmental gradients")

## Step 2: Large Sample Snow Modeling Execution
Building on the experimental design and NorSWE station selection from previous steps, we now execute large sample snow modeling across diverse northern hemisphere environments. This step demonstrates CONFLUENCE's capability to systematically process snow hydrology validation using the comprehensive NorSWE observation network, scaling from individual snow physics to continental-scale comparative snow science.

### Snow Modeling Scaling: Single Sites → Northern Hemisphere Analysis

**Traditional Snow Modeling**: Individual site case studies with limited transferability
- Site-specific snow process calibration with unclear generalizability
- Manual configuration for each elevation zone or climate region
- Limited ability to identify universal vs. local snow process controls
- Difficulty distinguishing model limitations from site-specific effects

**Large Sample Snow Modeling**: Systematic validation across environmental gradients
- **Automated snow-specific configuration** across elevation, latitude, and climate gradients
- **Parallel snow simulations** leveraging computational efficiency for winter process modeling
- **Standardized snow validation protocols** enabling direct performance comparison
- **Multi-variable snow assessment** integrating SWE, snow depth, and seasonal dynamics

### The Unique Challenges of Snow Process Validation

Snow hydrology presents distinct modeling challenges that require specialized large sample approaches:

**Physical Process Complexity**: 
- **Phase transitions**: Accurate representation of freezing, melting, and sublimation processes
- **Energy balance**: Complex interactions between radiation, temperature, wind, and humidity
- **Layered snowpack structure**: Multi-layer snow metamorphism and thermal properties
- **Spatial variability**: Strong elevation, aspect, and microclimate dependencies

**Temporal Dynamics**:
- **Seasonal accumulation**: Episodic snow accumulation events throughout winter
- **Ablation processes**: Complex melt dynamics driven by energy balance components  
- **Diurnal cycles**: Strong daily temperature and radiation variations
- **Extreme events**: Rain-on-snow events and rapid melt episodes

**Validation Complexity**:
- **State variable validation**: SWE and snow depth as integrated measures of snow processes
- **Seasonal metrics**: Peak SWE timing, snow season length, melt rate assessment
- **Process-based evaluation**: Distinguishing accumulation vs. ablation performance
- **Multi-scale representation**: Point measurements vs. model grid cell assumptions

### CONFLUENCE's Advanced Snow Physics Integration

The large sample framework leverages CONFLUENCE's sophisticated snow modeling capabilities:

**SUMMA Snow Physics**:
- **Multi-layer snowpack**: Explicit representation of snow stratigraphy and metamorphism
- **Advanced energy balance**: Detailed surface energy balance with radiation, turbulent, and ground heat fluxes
- **Liquid water dynamics**: Representation of liquid water percolation and refreezing
- **Snow-vegetation interactions**: Canopy interception and sub-canopy snow processes

**Flexible Process Representations**:
- **Multiple parameterization options** for key snow processes enable systematic testing
- **Decision analysis capabilities** for comparing alternative model structures
- **Parameter sensitivity assessment** across diverse snow environments
- **Uncertainty quantification** for snow predictions under different conditions

### Northern Hemisphere Snow Science Applications

Large sample snow modeling with NorSWE enables investigation of fundamental snow science questions:

🌨️ **Climate Controls**: Systematic assessment of temperature and precipitation thresholds controlling snow accumulation and persistence

⛰️ **Elevation Dependencies**: Quantification of how snow processes change with elevation and their representation in models

🌍 **Regional Patterns**: Identification of systematic regional biases in snow modeling across different snow climate regimes

📅 **Seasonal Dynamics**: Evaluation of model skill in capturing both accumulation season and ablation season processes

❄️ **Process Universality**: Testing whether snow process representations are consistent across diverse northern hemisphere environments

The automated workflow demonstrated here enables systematic snow model evaluation that was previously impossible due to the manual effort required for multi-site snow modeling studies.

---

In [None]:
# =============================================================================
# STEP 2: EXECUTE LARGE SAMPLE NORSWE SNOW PROCESSING
# =============================================================================

print("=== Step 2: NorSWE Large Sample Snow Processing Execution ===")

import subprocess
import time
import glob
from datetime import datetime
import geopandas as gpd
import xarray as xr
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

def run_norswe_script_from_notebook():
    """
    Execute the run_norswe-2.py script from within the notebook
    """
    print(f"\n❄️ Executing NorSWE Large Sample Snow Processing Script...")
    
    script_path = "./run_norswe-2.py"
    
    if not Path(script_path).exists():
        print(f"❌ Script not found: {script_path}")
        return False
    
    print(f"   📝 Script location: {script_path}")
    print(f"   🎯 Target sites: {len(complete_stations)} NorSWE stations")
    print(f"   ⏰ Processing started: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
    
    try:
        # Prepare script arguments based on experiment configuration
        script_args = [
            'python', script_path,
            '--norswe_path', experiment_config['norswe_path'],
            '--template_config', experiment_config['template_config'],
            '--output_dir', experiment_config['output_dir'],
            '--config_dir', experiment_config['config_dir'],
            '--min_completeness', str(experiment_config['min_completeness']),
            '--max_stations', str(experiment_config['max_stations']),
            '--base_path', experiment_config['base_path']
        ]
        
        # Add optional year filtering
        if experiment_config.get('start_year'):
            script_args.extend(['--start_year', str(experiment_config['start_year'])])
        if experiment_config.get('end_year'):
            script_args.extend(['--end_year', str(experiment_config['end_year'])])
        
        # Add no_submit flag if specified
        if experiment_config.get('no_submit', False):
            script_args.append('--no_submit')
        
        print(f"   🔧 Script arguments: {' '.join(script_args[2:])}")
        
        # Create a process with input automation
        process = subprocess.Popen(
            script_args,
            stdin=subprocess.PIPE,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            text=True,
            bufsize=1,
            universal_newlines=True
        )
        
        # Send 'y' to confirm job submission when prompted (unless no_submit)
        if not experiment_config.get('no_submit', False):
            stdout, stderr = process.communicate(input='y\n')
        else:
            stdout, stderr = process.communicate()
        
        # Print the output
        if stdout:
            print("📋 Script Output:")
            for line in stdout.split('\n'):
                if line.strip():
                    print(f"   {line}")
        
        if stderr:
            print("⚠️  Script Warnings/Errors:")
            for line in stderr.split('\n'):
                if line.strip():
                    print(f"   {line}")
        
        if process.returncode == 0:
            print(f"✅ NorSWE processing script completed successfully")
            return True
        else:
            print(f"❌ Script failed with return code: {process.returncode}")
            return False
            
    except Exception as e:
        print(f"❌ Error running script: {e}")
        return False

def monitor_snow_job_progress():
    """
    Monitor the progress of submitted CONFLUENCE snow modeling jobs
    """
    print(f"\n📊 Monitoring Snow Modeling Job Progress...")
    
    try:
        # Check job queue status
        result = subprocess.run(['squeue', '-u', '$USER'], 
                              capture_output=True, text=True)
        
        if result.returncode == 0:
            queue_lines = result.stdout.strip().split('\n')
            confluence_jobs = [line for line in queue_lines 
                             if 'CONFLUENCE' in line or any(site in line 
                             for site in complete_stations['Watershed_Name'][:5])]
            
            print(f"   ❄️ Snow modeling jobs in queue: {len(confluence_jobs)}")
            
            if confluence_jobs:
                print("   📋 Active NorSWE CONFLUENCE jobs:")
                for job in confluence_jobs[:10]:  # Show first 10
                    print(f"     {job}")
                if len(confluence_jobs) > 10:
                    print(f"     ... and {len(confluence_jobs) - 10} more")
        else:
            print("   ⚠️  Unable to check job queue status")
            
    except Exception as e:
        print(f"   ⚠️  Error checking job status: {e}")

# Execute the NorSWE processing script
script_success = run_norswe_script_from_notebook()

if script_success:
    print(f"\n✅ Step 2 Complete: NorSWE snow modeling initiated")
    
    # Monitor initial job status
    monitor_snow_job_progress()
    
    print(f"\n📝 Next Steps:")
    print(f"   1. Snow modeling jobs will process in parallel on the cluster")
    print(f"   2. Results will include SWE and snow depth simulations")
    print(f"   3. Step 3 will analyze snow validation metrics")
    
else:
    print(f"\n⚠️  Step 2 Issue: Script execution had problems")
    print(f"   Proceeding to Step 3 with any existing results...")



## Step 3: Multi-Site Snow Validation and Process Analysis
Having executed large sample snow modeling, we now demonstrate the analytical power that emerges from systematic multi-site snow validation using NorSWE observations. This step showcases comprehensive snow process evaluation, seasonal dynamics analysis, and elevation-climate performance assessment—the scientific breakthrough enabled by large sample snow hydrology methodology.

### Snow Science Evolution: Case Studies → Systematic Snow Process Understanding

**Traditional Snow Validation**: Individual site snow model evaluation
- Site-specific snow process calibration with limited transferability
- Difficulty separating universal snow physics from local environmental effects
- Manual comparison across different snow climates and elevation zones
- Limited statistical power for robust snow process generalization

**Large Sample Snow Validation**: Systematic multi-site snow process analysis
- **Continental-scale pattern recognition** across elevation and climate gradients
- **Statistical hypothesis testing** for snow process representations with robust sample sizes
- **Process universality assessment** distinguishing general vs. site-specific snow dynamics
- **Model transferability evaluation** across diverse northern hemisphere snow environments

### Comprehensive Snow Analysis Framework

**Tier 1: Snow Domain Spatial Overview**
- **Automated discovery** of completed snow modeling domains across elevation gradients
- **Processing status assessment** including simulation completion and observation availability
- **Northern Hemisphere spatial distribution** showing snow modeling coverage across climate zones
- **Elevation-based analysis** revealing snow modeling performance across topographic gradients

**Tier 2: Multi-Variable Snow Process Validation**
- **SWE validation**: Snow Water Equivalent comparison providing mass balance assessment
- **Snow depth validation**: Complementary structural validation of snowpack representation
- **Seasonal dynamics evaluation**: Peak SWE timing, snow season length, and ablation rate analysis
- **Performance metric calculation** across correlation, RMSE, bias, and Nash-Sutcliffe efficiency

### Snow Hydrology Innovation at Scale

Multi-variable snow validation across hundreds of NorSWE sites represents cutting-edge snow science:

**Snow Process Understanding**:
- **Accumulation vs. ablation performance** revealing process-specific model skill patterns
- **Temperature threshold evaluation** across different climate regimes and elevation zones
- **Energy balance validation** through integrated SWE and snow depth comparison
- **Seasonal cycle assessment** identifying universal vs. climate-specific snow dynamics

**Elevation-Climate Interactions**:
- **Elevation gradient analysis** revealing systematic changes in snow process representation
- **Climate zone performance** across maritime, continental, and Arctic snow regimes
- **Latitude-elevation interactions** showing complex environmental controls on snow modeling
- **Topographic effect quantification** on snow accumulation and ablation processes

**Model Process Evaluation**:
- **Snow physics assessment** across different process representations in SUMMA
- **Parameter transferability** testing consistency of snow model parameters across sites
- **Structural uncertainty** evaluation through multi-site performance patterns
- **Process-based diagnostics** identifying where snow models succeed vs. struggle

### Breakthrough Snow Science Capabilities

This multi-site analysis framework delivers several revolutionary capabilities for snow hydrology:

❄️ **Continental Snow Assessment**: Comprehensive evaluation of snow model performance across the full range of northern hemisphere snow environments

📈 **Snow Process Generalization**: Statistical identification of universal snow process patterns vs. climate/elevation-specific behaviors

🎯 **Model Physics Validation**: Systematic testing of snow energy balance and phase change representations across environmental gradients

🔍 **Uncertainty Quantification**: Robust assessment of snow prediction reliability across different snow climate regimes

📊 **Transferability Analysis**: Evaluation of snow model parameter and process consistency across diverse environments

⛰️ **Elevation-Climate Synthesis**: Understanding complex interactions between topographic and climatic controls on snow processes

### Seasonal Snow Dynamics Focus

The analysis emphasizes critical seasonal snow process evaluation:

**Accumulation Season Analysis**:
- Temperature-precipitation threshold performance across climate gradients
- Storm-by-storm snow accumulation event representation
- Wind redistribution and sublimation process evaluation
- Snow-rain transition accuracy in different elevation zones

**Ablation Season Analysis**:
- Energy balance component performance during melt periods
- Diurnal melt cycle representation across latitude gradients
- Rain-on-snow event simulation across diverse climate conditions
- Seasonal melt timing and rate accuracy evaluation

**Annual Cycle Integration**:
- Peak SWE magnitude and timing performance across elevations
- Snow season length representation across climate gradients
- Interannual variability capture across diverse snow regimes
- Snow persistence modeling in marginal snow environments

The multi-site snow validation demonstrated here represents the future of snow science: moving from individual mountain case studies to systematic, statistically robust analysis across the full spectrum of northern hemisphere snow environments. This approach enables confident identification of universal snow process patterns while quantifying regional variations and model uncertainties across diverse snow climate regimes.

In [None]:
# =============================================================================
# STEP 3: COMPREHENSIVE SNOW OUTPUT ANALYSIS
# =============================================================================

print(f"\n=== Step 3: NorSWE Snow Validation Analysis ===")

def discover_completed_snow_domains():
    """
    Discover all completed NorSWE domain directories and their snow outputs
    """
    print(f"\n🔍 Discovering Completed NorSWE Snow Modeling Domains...")
    
    # Base data directory pattern
    base_path = Path(experiment_config['base_path'])
    domain_pattern = str(base_path / "domain_*")
    
    # Find all domain directories
    domain_dirs = glob.glob(domain_pattern)
    
    print(f"   📁 Found {len(domain_dirs)} total domain directories")
    
    completed_domains = []
    
    for domain_dir in domain_dirs:
        domain_path = Path(domain_dir)
        domain_name = domain_path.name.replace('domain_', '')
        
        # Check if this is a NorSWE domain (should match our selected stations)
        if any(domain_name in site for site in complete_stations['Watershed_Name'].values):
            
            # Check for key output files
            shapefile_path = domain_path / "shapefiles" / "catchment" / f"{domain_name}_HRUs.shp"
            simulation_dir = domain_path / "simulations"
            obs_dir = domain_path / "observations" / "snow" / "raw_data"
            
            domain_info = {
                'domain_name': domain_name,
                'domain_path': domain_path,
                'has_shapefile': shapefile_path.exists(),
                'shapefile_path': shapefile_path if shapefile_path.exists() else None,
                'has_simulations': simulation_dir.exists(),
                'simulation_path': simulation_dir if simulation_dir.exists() else None,
                'has_observations': obs_dir.exists(),
                'observation_path': obs_dir if obs_dir.exists() else None,
                'simulation_files': [],
                'swe_obs_file': None,
                'depth_obs_file': None
            }
            
            # Find simulation output files
            if simulation_dir.exists():
                nc_files = list(simulation_dir.glob("**/*.nc"))
                domain_info['simulation_files'] = nc_files
                domain_info['has_results'] = len(nc_files) > 0
            else:
                domain_info['has_results'] = False
            
            # Find observation files
            if obs_dir.exists():
                swe_files = list((obs_dir / "swe").glob("*.csv"))
                depth_files = list((obs_dir / "depth").glob("*.csv"))
                
                if swe_files:
                    domain_info['swe_obs_file'] = swe_files[0]
                if depth_files:
                    domain_info['depth_obs_file'] = depth_files[0]
            
            completed_domains.append(domain_info)
    
    print(f"   ❄️ NorSWE domains found: {len(completed_domains)}")
    print(f"   📊 Domains with shapefiles: {sum(1 for d in completed_domains if d['has_shapefile'])}")
    print(f"   📈 Domains with simulation results: {sum(1 for d in completed_domains if d['has_results'])}")
    print(f"   📋 Domains with observations: {sum(1 for d in completed_domains if d['has_observations'])}")
    
    return completed_domains

def create_snow_domain_overview_map(completed_domains):
    """
    Create an overview map showing all snow domain locations and their completion status
    """
    print(f"\n🗺️  Creating Snow Domain Overview Map...")
    
    # Create figure for overview map
    fig, axes = plt.subplots(2, 2, figsize=(20, 16))
    
    # Map 1: Global overview with completion status (focus on Northern Hemisphere)
    ax1 = axes[0, 0]
    
    # Plot all selected sites
    ax1.scatter(complete_stations['lon'], complete_stations['lat'], 
               c='lightgray', alpha=0.5, s=30, label='Selected stations', marker='o')
    
    # Plot completed domains with different colors for different completion levels
    for domain in completed_domains:
        domain_name = domain['domain_name']
        
        # Find corresponding site in complete_stations
        site_row = complete_stations[complete_stations['Watershed_Name'] == domain_name]
        
        if not site_row.empty:
            lat = site_row['lat'].iloc[0]
            lon = site_row['lon'].iloc[0]
            
            # Color based on completion status
            if domain['has_results'] and domain['has_observations']:
                color = 'green'
                label = 'Complete with snow validation'
                marker = 's'
                size = 60
            elif domain['has_results']:
                color = 'orange' 
                label = 'Simulation complete'
                marker = '^'
                size = 50
            elif domain['has_observations']:
                color = 'blue'
                label = 'Observations only'
                marker = 'D'
                size = 40
            else:
                color = 'red'
                label = 'Processing started'
                marker = 'v'
                size = 30
            
            ax1.scatter(lon, lat, c=color, s=size, marker=marker, alpha=0.8,
                       edgecolors='black', linewidth=0.5)
    
    ax1.set_xlabel('Longitude')
    ax1.set_ylabel('Latitude')
    ax1.set_title('NorSWE Snow Domain Processing Status Overview')
    ax1.grid(True, alpha=0.3)
    ax1.set_xlim(-180, 180)
    ax1.set_ylim(30, 85)  # Focus on Northern Hemisphere snow regions
    
    # Create custom legend
    legend_elements = [
        plt.scatter([], [], c='green', s=60, marker='s', label='Complete with validation'),
        plt.scatter([], [], c='orange', s=50, marker='^', label='Simulation complete'),
        plt.scatter([], [], c='blue', s=40, marker='D', label='Observations extracted'),
        plt.scatter([], [], c='red', s=30, marker='v', label='Processing started'),
        plt.scatter([], [], c='lightgray', s=30, marker='o', label='Selected stations')
    ]
    ax1.legend(handles=legend_elements, loc='lower left')
    
    # Map 2: Completion statistics by elevation bands
    ax2 = axes[0, 1]
    
    # Create elevation bands
    elevation_bands = [(0, 500), (500, 1000), (1000, 1500), (1500, 2000), (2000, 10000)]
    band_labels = ['0-500m', '500-1000m', '1000-1500m', '1500-2000m', '>2000m']
    
    elevation_completion = {}
    
    for domain in completed_domains:
        domain_name = domain['domain_name']
        site_row = complete_stations[complete_stations['Watershed_Name'] == domain_name]
        
        if not site_row.empty:
            elevation = site_row['elevation'].iloc[0]
            
            # Find elevation band
            for i, (min_elev, max_elev) in enumerate(elevation_bands):
                if min_elev <= elevation < max_elev:
                    band_label = band_labels[i]
                    
                    if band_label not in elevation_completion:
                        elevation_completion[band_label] = {'total': 0, 'complete': 0, 'partial': 0}
                    
                    elevation_completion[band_label]['total'] += 1
                    
                    if domain['has_results'] and domain['has_observations']:
                        elevation_completion[band_label]['complete'] += 1
                    elif domain['has_results'] or domain['has_observations']:
                        elevation_completion[band_label]['partial'] += 1
                    break
    
    # Create stacked bar chart
    if elevation_completion:
        bands = list(elevation_completion.keys())
        complete_counts = [elevation_completion[b]['complete'] for b in bands]
        partial_counts = [elevation_completion[b]['partial'] for b in bands]
        pending_counts = [elevation_completion[b]['total'] - 
                         elevation_completion[b]['complete'] - 
                         elevation_completion[b]['partial'] for b in bands]
        
        x_pos = range(len(bands))
        
        ax2.bar(x_pos, complete_counts, label='Complete', color='green', alpha=0.7)
        ax2.bar(x_pos, partial_counts, bottom=complete_counts, 
               label='Partial', color='orange', alpha=0.7)
        ax2.bar(x_pos, pending_counts, 
               bottom=[c+p for c,p in zip(complete_counts, partial_counts)], 
               label='Pending', color='red', alpha=0.7)
        
        ax2.set_xticks(x_pos)
        ax2.set_xticklabels(bands, rotation=45, ha='right')
        ax2.set_ylabel('Number of Sites')
        ax2.set_title('Processing Status by Elevation Band')
        ax2.legend()
        ax2.grid(True, alpha=0.3, axis='y')
    
    # Map 3: Station elevation vs latitude
    ax3 = axes[1, 0]
    
    domain_elevations = []
    domain_latitudes = []
    
    for domain in completed_domains:
        domain_name = domain['domain_name']
        site_row = complete_stations[complete_stations['Watershed_Name'] == domain_name]
        
        if not site_row.empty:
            elevation = site_row['elevation'].iloc[0]
            latitude = site_row['lat'].iloc[0]
            domain_elevations.append(elevation)
            domain_latitudes.append(latitude)
            
            # Color code by completion status
            if domain['has_results'] and domain['has_observations']:
                color = 'green'
            elif domain['has_results']:
                color = 'orange'
            else:
                color = 'red'
            
            ax3.scatter(latitude, elevation, c=color, alpha=0.7, s=40, edgecolors='black', linewidth=0.5)
    
    ax3.set_xlabel('Latitude')
    ax3.set_ylabel('Elevation (m)')
    ax3.set_title('Station Distribution: Elevation vs Latitude')
    ax3.grid(True, alpha=0.3)
    
    # Map 4: Processing summary statistics
    ax4 = axes[1, 1]
    
    # Summary statistics
    total_selected = len(complete_stations)
    total_discovered = len(completed_domains)
    total_with_results = sum(1 for d in completed_domains if d['has_results'])
    total_with_obs = sum(1 for d in completed_domains if d['has_observations'])
    total_complete = sum(1 for d in completed_domains if d['has_results'] and d['has_observations'])
    
    categories = ['Selected', 'Processing\nStarted', 'Simulation\nComplete', 'Observations\nExtracted', 'Ready for\nValidation']
    counts = [total_selected, total_discovered, total_with_results, total_with_obs, total_complete]
    colors = ['lightblue', 'yellow', 'orange', 'cyan', 'green']
    
    bars = ax4.bar(categories, counts, color=colors, alpha=0.7, edgecolor='black')
    
    # Add value labels on bars
    for bar, count in zip(bars, counts):
        ax4.text(bar.get_x() + bar.get_width()/2., bar.get_height() + 0.5,
                str(count), ha='center', va='bottom', fontweight='bold')
    
    ax4.set_ylabel('Number of Sites')
    ax4.set_title('Snow Modeling Processing Progress')
    ax4.grid(True, alpha=0.3, axis='y')
    
    plt.suptitle('NorSWE Large Sample Snow Study - Domain Overview', 
                 fontsize=16, fontweight='bold')
    plt.tight_layout()
    
    # Save the overview map
    overview_path = experiment_dir / 'plots' / 'snow_domain_overview_map.png'
    overview_path.parent.mkdir(exist_ok=True)
    plt.savefig(overview_path, dpi=300, bbox_inches='tight')
    plt.show()
    
    print(f"✅ Snow domain overview map saved: {overview_path}")
    
    return total_selected, total_discovered, total_with_results, total_with_obs, total_complete

def extract_snow_results_from_domains(completed_domains):
    """
    Extract snow simulation results (SWE and snow depth) from all completed domains
    """
    print(f"\n❄️ Extracting Snow Results from Completed Domains...")
    
    snow_results = []
    processing_summary = {
        'total_domains': len(completed_domains),
        'domains_with_results': 0,
        'domains_with_snow': 0,
        'failed_extractions': 0
    }
    
    for domain in completed_domains:
        if not domain['has_results']:
            continue
            
        domain_name = domain['domain_name']
        processing_summary['domains_with_results'] += 1
        
        try:
            print(f"   🔄 Processing {domain_name}...")
            
            # Find simulation output files
            nc_files = domain['simulation_files']
            
            # Look for daily or monthly output files
            daily_files = [f for f in nc_files if 'day' in f.name.lower()]
            monthly_files = [f for f in nc_files if 'month' in f.name.lower()]
            timestep_files = [f for f in nc_files if 'timestep' in f.name.lower()]
            
            output_file = None
            if daily_files:
                output_file = daily_files[0]
            elif timestep_files:
                output_file = timestep_files[0]
            elif monthly_files:
                output_file = monthly_files[0]
            elif nc_files:
                output_file = nc_files[0]  # Use any available file
            
            if output_file is None:
                print(f"     ❌ No suitable output files found")
                processing_summary['failed_extractions'] += 1
                continue
            
            # Load the netCDF file
            ds = xr.open_dataset(output_file)
            
            # Look for snow variables
            snow_vars = {}
            
            # Common SUMMA snow variable names
            if 'scalarSWE' in ds.data_vars:
                snow_vars['swe'] = 'scalarSWE'
            elif 'SWE' in ds.data_vars:
                snow_vars['swe'] = 'SWE'
            
            if 'scalarSnowDepth' in ds.data_vars:
                snow_vars['depth'] = 'scalarSnowDepth'
            elif 'snowDepth' in ds.data_vars:
                snow_vars['depth'] = 'snowDepth'
            elif 'snow_depth' in ds.data_vars:
                snow_vars['depth'] = 'snow_depth'
            
            if not snow_vars:
                print(f"     ⚠️  No snow variables found in {output_file.name}")
                available_vars = list(ds.data_vars.keys())
                print(f"     Available variables: {available_vars[:10]}...")
                processing_summary['failed_extractions'] += 1
                continue
            
            print(f"     ❄️ Using snow variables: {snow_vars}")
            
            # Extract snow data
            extracted_data = {}
            
            for var_type, var_name in snow_vars.items():
                snow_data = ds[var_name]
                
                # Handle multi-dimensional data (take spatial mean if needed)
                if len(snow_data.dims) > 1:
                    spatial_dims = [dim for dim in snow_data.dims if dim != 'time']
                    if spatial_dims:
                        snow_data = snow_data.mean(dim=spatial_dims)
                
                # Convert to pandas Series
                snow_series = snow_data.to_pandas()
                
                # Handle negative values and unit conversion
                if var_type == 'swe':
                    # SWE should be positive
                    snow_series = snow_series.abs()
                    # Convert from kg/m² to mm if needed (1 kg/m² = 1 mm)
                    # SUMMA typically outputs in kg/m²
                elif var_type == 'depth':
                    # Snow depth should be positive
                    snow_series = snow_series.abs()
                    # Convert from m to cm if needed
                    if snow_series.max() < 10:  # Assume meters
                        snow_series = snow_series * 100  # Convert to cm
                
                extracted_data[var_type] = snow_series
            
            # Get site information
            site_row = complete_stations[complete_stations['Watershed_Name'] == domain_name]
            
            if site_row.empty:
                print(f"     ⚠️  Site information not found for {domain_name}")
                continue
            
            # Calculate snow season statistics
            snow_stats = {}
            
            for var_type, series in extracted_data.items():
                # Basic statistics
                snow_stats[f'{var_type}_mean'] = series.mean()
                snow_stats[f'{var_type}_max'] = series.max()
                snow_stats[f'{var_type}_std'] = series.std()
                
                # Seasonal statistics
                if len(series) > 0:
                    # Find peak snow (maximum value)
                    peak_idx = series.idxmax()
                    snow_stats[f'{var_type}_peak_date'] = peak_idx
                    snow_stats[f'{var_type}_peak_value'] = series[peak_idx]
                    
                    # Snow season length (days with snow > threshold)
                    threshold = 10 if var_type == 'swe' else 5  # 10 mm SWE or 5 cm depth
                    snow_days = (series > threshold).sum()
                    snow_stats[f'{var_type}_season_length'] = snow_days
            
            # Store results
            result = {
                'domain_name': domain_name,
                'station_id': site_row['station_id'].iloc[0],
                'latitude': site_row['lat'].iloc[0],
                'longitude': site_row['lon'].iloc[0],
                'elevation': site_row['elevation'].iloc[0],
                'data_period': f"{extracted_data[list(extracted_data.keys())[0]].index.min()} to {extracted_data[list(extracted_data.keys())[0]].index.max()}",
                'data_points': len(extracted_data[list(extracted_data.keys())[0]]),
                'output_file': str(output_file)
            }
            
            # Add time series data
            result.update(extracted_data)
            
            # Add statistics
            result.update(snow_stats)
            
            snow_results.append(result)
            processing_summary['domains_with_snow'] += 1
            
            swe_info = f"{snow_stats.get('swe_mean', 0):.1f} mm (max: {snow_stats.get('swe_max', 0):.1f})" if 'swe' in extracted_data else "N/A"
            depth_info = f"{snow_stats.get('depth_mean', 0):.1f} cm (max: {snow_stats.get('depth_max', 0):.1f})" if 'depth' in extracted_data else "N/A"
            
            print(f"     ✅ Snow extracted - SWE: {swe_info}, Depth: {depth_info}")
            
        except Exception as e:
            print(f"     ❌ Error processing {domain_name}: {e}")
            processing_summary['failed_extractions'] += 1
    
    print(f"\n❄️ Snow Extraction Summary:")
    print(f"   Total domains: {processing_summary['total_domains']}")
    print(f"   Domains with results: {processing_summary['domains_with_results']}")
    print(f"   Successful snow extractions: {processing_summary['domains_with_snow']}")
    print(f"   Failed extractions: {processing_summary['failed_extractions']}")
    
    return snow_results, processing_summary

def load_norswe_observations(completed_domains):
    """
    Load NorSWE observation data for snow validation
    """
    print(f"\n📥 Loading NorSWE Snow Observation Data...")
    
    norswe_obs = {}
    obs_summary = {
        'sites_found': 0,
        'sites_with_swe': 0,
        'sites_with_depth': 0,
        'total_swe_observations': 0,
        'total_depth_observations': 0
    }
    
    # Look for extracted NorSWE observation data in domain directories
    for domain in completed_domains:
        if not domain['has_observations']:
            continue
            
        domain_name = domain['domain_name']
        
        try:
            print(f"   📊 Loading {domain_name}...")
            
            obs_summary['sites_found'] += 1
            site_obs = {}
            
            # Load SWE observations
            if domain['swe_obs_file']:
                swe_df = pd.read_csv(domain['swe_obs_file'])
                swe_df['time'] = pd.to_datetime(swe_df['time'])
                swe_df.set_index('time', inplace=True)
                
                swe_obs = swe_df['SWE_kg_m2'].dropna()
                
                if len(swe_obs) > 0:
                    site_obs['swe_timeseries'] = swe_obs
                    site_obs['swe_mean'] = swe_obs.mean()
                    site_obs['swe_max'] = swe_obs.max()
                    site_obs['swe_std'] = swe_obs.std()
                    
                    # Seasonal statistics
                    peak_idx = swe_obs.idxmax()
                    site_obs['swe_peak_date'] = peak_idx
                    site_obs['swe_peak_value'] = swe_obs[peak_idx]
                    
                    # Snow season length
                    snow_days = (swe_obs > 10).sum()  # Days with >10mm SWE
                    site_obs['swe_season_length'] = snow_days
                    
                    obs_summary['sites_with_swe'] += 1
                    obs_summary['total_swe_observations'] += len(swe_obs)
                    
                    print(f"     ❄️ SWE obs: {swe_obs.mean():.1f} ± {swe_obs.std():.1f} mm ({len(swe_obs)} points)")
            
            # Load snow depth observations  
            if domain['depth_obs_file']:
                depth_df = pd.read_csv(domain['depth_obs_file'])
                depth_df['time'] = pd.to_datetime(depth_df['time'])
                depth_df.set_index('time', inplace=True)
                
                depth_obs = depth_df['Depth_m'].dropna() * 100  # Convert m to cm
                
                if len(depth_obs) > 0:
                    site_obs['depth_timeseries'] = depth_obs
                    site_obs['depth_mean'] = depth_obs.mean()
                    site_obs['depth_max'] = depth_obs.max()
                    site_obs['depth_std'] = depth_obs.std()
                    
                    # Seasonal statistics
                    peak_idx = depth_obs.idxmax()
                    site_obs['depth_peak_date'] = peak_idx
                    site_obs['depth_peak_value'] = depth_obs[peak_idx]
                    
                    # Snow season length
                    snow_days = (depth_obs > 5).sum()  # Days with >5cm depth
                    site_obs['depth_season_length'] = snow_days
                    
                    obs_summary['sites_with_depth'] += 1
                    obs_summary['total_depth_observations'] += len(depth_obs)
                    
                    print(f"     📏 Depth obs: {depth_obs.mean():.1f} ± {depth_obs.std():.1f} cm ({len(depth_obs)} points)")
            
            # Add site metadata
            site_row = complete_stations[complete_stations['Watershed_Name'] == domain_name]
            if not site_row.empty:
                site_obs['latitude'] = site_row['lat'].iloc[0]
                site_obs['longitude'] = site_row['lon'].iloc[0]
                site_obs['elevation'] = site_row['elevation'].iloc[0]
                site_obs['station_id'] = site_row['station_id'].iloc[0]
                
                norswe_obs[domain_name] = site_obs
                
        except Exception as e:
            print(f"     ❌ Error loading {domain_name}: {e}")
    
    print(f"\n❄️ NorSWE Observation Summary:")
    print(f"   Sites with observation files: {obs_summary['sites_found']}")
    print(f"   Sites with SWE observations: {obs_summary['sites_with_swe']}")
    print(f"   Sites with depth observations: {obs_summary['sites_with_depth']}")
    print(f"   Total SWE observations: {obs_summary['total_swe_observations']}")
    print(f"   Total depth observations: {obs_summary['total_depth_observations']}")
    
    return norswe_obs, obs_summary

def create_snow_comparison_analysis(snow_results, norswe_obs):
    """
    Create comprehensive snow comparison analysis between simulated and observed snow
    """
    print(f"\n❄️ Creating Snow Comparison Analysis...")
    
    # Find sites with both simulated and observed data
    common_sites = []
    
    for sim_result in snow_results:
        domain_name = sim_result['domain_name']
        
        if domain_name in norswe_obs:
            # Align time periods for both SWE and snow depth
            comparisons = {}
            
            # SWE comparison
            if 'swe' in sim_result and 'swe_timeseries' in norswe_obs[domain_name]:
                sim_swe = sim_result['swe']
                obs_swe = norswe_obs[domain_name]['swe_timeseries']
                
                # Find common time period
                common_start = max(sim_swe.index.min(), obs_swe.index.min())
                common_end = min(sim_swe.index.max(), obs_swe.index.max())
                
                if common_start < common_end:
                    # Resample to daily and align
                    sim_daily = sim_swe.resample('D').mean().loc[common_start:common_end]
                    obs_daily = obs_swe.resample('D').mean().loc[common_start:common_end]
                    
                    # Remove NaN values
                    valid_mask = ~(sim_daily.isna() | obs_daily.isna())
                    sim_valid = sim_daily[valid_mask]
                    obs_valid = obs_daily[valid_mask]
                    
                    if len(sim_valid) > 30:  # Need minimum data for meaningful comparison
                        
                        # Calculate performance metrics
                        rmse = np.sqrt(((obs_valid - sim_valid) ** 2).mean())
                        bias = (sim_valid - obs_valid).mean()
                        mae = np.abs(obs_valid - sim_valid).mean()
                        
                        # Correlation
                        try:
                            correlation = obs_valid.corr(sim_valid)
                            if pd.isna(correlation):
                                correlation = 0.0
                        except:
                            correlation = 0.0
                        
                        # Nash-Sutcliffe Efficiency
                        if obs_valid.var() > 0:
                            nse = 1 - ((obs_valid - sim_valid) ** 2).sum() / ((obs_valid - obs_valid.mean()) ** 2).sum()
                        else:
                            nse = np.nan
                        
                        comparisons['swe'] = {
                            'sim_data': sim_valid,
                            'obs_data': obs_valid,
                            'rmse': rmse,
                            'bias': bias,
                            'mae': mae,
                            'correlation': correlation,
                            'nse': nse,
                            'n_points': len(sim_valid)
                        }
            
            # Snow depth comparison
            if 'depth' in sim_result and 'depth_timeseries' in norswe_obs[domain_name]:
                sim_depth = sim_result['depth']
                obs_depth = norswe_obs[domain_name]['depth_timeseries']
                
                # Find common time period
                common_start = max(sim_depth.index.min(), obs_depth.index.min())
                common_end = min(sim_depth.index.max(), obs_depth.index.max())
                
                if common_start < common_end:
                    # Resample to daily and align
                    sim_daily = sim_depth.resample('D').mean().loc[common_start:common_end]
                    obs_daily = obs_depth.resample('D').mean().loc[common_start:common_end]
                    
                    # Remove NaN values
                    valid_mask = ~(sim_daily.isna() | obs_daily.isna())
                    sim_valid = sim_daily[valid_mask]
                    obs_valid = obs_daily[valid_mask]
                    
                    if len(sim_valid) > 30:  # Need minimum data for meaningful comparison
                        
                        # Calculate performance metrics
                        rmse = np.sqrt(((obs_valid - sim_valid) ** 2).mean())
                        bias = (sim_valid - obs_valid).mean()
                        mae = np.abs(obs_valid - sim_valid).mean()
                        
                        # Correlation
                        try:
                            correlation = obs_valid.corr(sim_valid)
                            if pd.isna(correlation):
                                correlation = 0.0
                        except:
                            correlation = 0.0
                        
                        # Nash-Sutcliffe Efficiency
                        if obs_valid.var() > 0:
                            nse = 1 - ((obs_valid - sim_valid) ** 2).sum() / ((obs_valid - obs_valid.mean()) ** 2).sum()
                        else:
                            nse = np.nan
                        
                        comparisons['depth'] = {
                            'sim_data': sim_valid,
                            'obs_data': obs_valid,
                            'rmse': rmse,
                            'bias': bias,
                            'mae': mae,
                            'correlation': correlation,
                            'nse': nse,
                            'n_points': len(sim_valid)
                        }
            
            if comparisons:
                common_site = {
                    'domain_name': domain_name,
                    'latitude': sim_result['latitude'],
                    'longitude': sim_result['longitude'],
                    'elevation': sim_result['elevation'],
                    'station_id': sim_result['station_id'],
                    'comparisons': comparisons
                }
                
                common_sites.append(common_site)
                
                # Print summary
                comp_summary = []
                for var_type, comp_data in comparisons.items():
                    comp_summary.append(f"{var_type}: r={comp_data['correlation']:.3f}, RMSE={comp_data['rmse']:.2f}")
                
                print(f"   ✅ {domain_name}: {', '.join(comp_summary)} ({comp_data['n_points']} points)")
    
    print(f"\n❄️ Snow Comparison Summary:")
    print(f"   Sites with both sim and obs: {len(common_sites)}")
    
    if len(common_sites) == 0:
        print("   ⚠️  No sites with overlapping sim/obs data for comparison")
        return None
    
    # Create comprehensive snow comparison visualization
    fig, axes = plt.subplots(2, 3, figsize=(20, 12))
    
    # SWE scatter plot (top left)
    ax1 = axes[0, 0]
    
    swe_sites = [site for site in common_sites if 'swe' in site['comparisons']]
    
    if swe_sites:
        all_obs_swe = np.concatenate([site['comparisons']['swe']['obs_data'].values for site in swe_sites])
        all_sim_swe = np.concatenate([site['comparisons']['swe']['sim_data'].values for site in swe_sites])
        
        ax1.scatter(all_obs_swe, all_sim_swe, alpha=0.5, s=15, c='blue')
        
        # 1:1 line
        min_val = min(all_obs_swe.min(), all_sim_swe.min())
        max_val = max(all_obs_swe.max(), all_sim_swe.max())
        ax1.plot([min_val, max_val], [min_val, max_val], 'k--', label='1:1 line')
        
        ax1.set_xlabel('Observed SWE (mm)')
        ax1.set_ylabel('Simulated SWE (mm)')
        ax1.set_title('SWE: Simulated vs Observed')
        ax1.legend()
        ax1.grid(True, alpha=0.3)
        
        # Add overall statistics
        overall_corr = np.corrcoef(all_obs_swe, all_sim_swe)[0,1]
        overall_rmse = np.sqrt(np.mean((all_obs_swe - all_sim_swe)**2))
        overall_bias = np.mean(all_sim_swe - all_obs_swe)
        
        stats_text = f'r = {overall_corr:.3f}\nRMSE = {overall_rmse:.1f}\nBias = {overall_bias:+.1f}'
        ax1.text(0.05, 0.95, stats_text, transform=ax1.transAxes,
                 bbox=dict(facecolor='white', alpha=0.8), fontsize=10, verticalalignment='top')
    
    # Snow depth scatter plot (top middle)
    ax2 = axes[0, 1]
    
    depth_sites = [site for site in common_sites if 'depth' in site['comparisons']]
    
    if depth_sites:
        all_obs_depth = np.concatenate([site['comparisons']['depth']['obs_data'].values for site in depth_sites])
        all_sim_depth = np.concatenate([site['comparisons']['depth']['sim_data'].values for site in depth_sites])
        
        ax2.scatter(all_obs_depth, all_sim_depth, alpha=0.5, s=15, c='purple')
        
        # 1:1 line
        min_val = min(all_obs_depth.min(), all_sim_depth.min())
        max_val = max(all_obs_depth.max(), all_sim_depth.max())
        ax2.plot([min_val, max_val], [min_val, max_val], 'k--', label='1:1 line')
        
        ax2.set_xlabel('Observed Snow Depth (cm)')
        ax2.set_ylabel('Simulated Snow Depth (cm)')
        ax2.set_title('Snow Depth: Simulated vs Observed')
        ax2.legend()
        ax2.grid(True, alpha=0.3)
        
        # Add overall statistics
        overall_corr = np.corrcoef(all_obs_depth, all_sim_depth)[0,1]
        overall_rmse = np.sqrt(np.mean((all_obs_depth - all_sim_depth)**2))
        overall_bias = np.mean(all_sim_depth - all_obs_depth)
        
        stats_text = f'r = {overall_corr:.3f}\nRMSE = {overall_rmse:.1f}\nBias = {overall_bias:+.1f}'
        ax2.text(0.05, 0.95, stats_text, transform=ax2.transAxes,
                 bbox=dict(facecolor='white', alpha=0.8), fontsize=10, verticalalignment='top')
    
    # Performance vs elevation (top right)
    ax3 = axes[0, 2]
    
    elevations = [site['elevation'] for site in common_sites if 'swe' in site['comparisons']]
    swe_correlations = [site['comparisons']['swe']['correlation'] for site in common_sites if 'swe' in site['comparisons']]
    
    if elevations and swe_correlations:
        ax3.scatter(elevations, swe_correlations, alpha=0.7, s=40, c='green')
        ax3.set_xlabel('Elevation (m)')
        ax3.set_ylabel('SWE Correlation')
        ax3.set_title('SWE Performance vs Elevation')
        ax3.grid(True, alpha=0.3)
        ax3.set_ylim(0, 1)
    
    # SWE bias distribution (bottom left)
    ax4 = axes[1, 0]
    
    if swe_sites:
        swe_biases = [site['comparisons']['swe']['bias'] for site in swe_sites]
        ax4.hist(swe_biases, bins=15, color='lightblue', alpha=0.7, edgecolor='black')
        ax4.axvline(x=0, color='red', linestyle='--', label='Zero bias')
        ax4.set_xlabel('SWE Bias (mm)')
        ax4.set_ylabel('Number of Sites')
        ax4.set_title('Distribution of SWE Bias')
        ax4.legend()
        ax4.grid(True, alpha=0.3, axis='y')
    
    # Snow depth bias distribution (bottom middle)
    ax5 = axes[1, 1]
    
    if depth_sites:
        depth_biases = [site['comparisons']['depth']['bias'] for site in depth_sites]
        ax5.hist(depth_biases, bins=15, color='lightcoral', alpha=0.7, edgecolor='black')
        ax5.axvline(x=0, color='red', linestyle='--', label='Zero bias')
        ax5.set_xlabel('Snow Depth Bias (cm)')
        ax5.set_ylabel('Number of Sites')
        ax5.set_title('Distribution of Snow Depth Bias')
        ax5.legend()
        ax5.grid(True, alpha=0.3, axis='y')
    
    # Performance by latitude (bottom right)
    ax6 = axes[1, 2]
    
    latitudes = [site['latitude'] for site in common_sites if 'swe' in site['comparisons']]
    swe_rmses = [site['comparisons']['swe']['rmse'] for site in common_sites if 'swe' in site['comparisons']]
    
    if latitudes and swe_rmses:
        ax6.scatter(latitudes, swe_rmses, alpha=0.7, s=40, c='orange')
        ax6.set_xlabel('Latitude')
        ax6.set_ylabel('SWE RMSE (mm)')
        ax6.set_title('SWE Performance vs Latitude')
        ax6.grid(True, alpha=0.3)
    
    plt.suptitle('NorSWE Large Sample Snow Comparison Analysis', 
                 fontsize=16, fontweight='bold')
    plt.tight_layout()
    
    # Save comparison plot
    comparison_path = experiment_dir / 'plots' / 'snow_comparison_analysis.png'
    plt.savefig(comparison_path, dpi=300, bbox_inches='tight')
    plt.show()
    
    print(f"✅ Snow comparison analysis saved: {comparison_path}")
    
    # Create spatial performance map
    fig, axes = plt.subplots(1, 2, figsize=(20, 8))
    
    # Map 1: SWE correlation map
    ax1 = axes[0]
    
    lats = [site['latitude'] for site in common_sites if 'swe' in site['comparisons']]
    lons = [site['longitude'] for site in common_sites if 'swe' in site['comparisons']]
    corrs = [site['comparisons']['swe']['correlation'] for site in common_sites if 'swe' in site['comparisons']]
    
    if lats and lons and corrs:
        scatter1 = ax1.scatter(lons, lats, c=corrs, cmap='RdYlBu', s=80, 
                              vmin=0, vmax=1, edgecolors='black', linewidth=0.5)
        
        ax1.set_xlabel('Longitude')
        ax1.set_ylabel('Latitude')
        ax1.set_title('Snow Model Performance: SWE Correlation')
        ax1.grid(True, alpha=0.3)
        ax1.set_xlim(-180, 180)
        ax1.set_ylim(30, 85)  # Northern Hemisphere focus
        
        # Add colorbar
        cbar1 = plt.colorbar(scatter1, ax=ax1)
        cbar1.set_label('SWE Correlation')
    
    # Map 2: SWE bias map
    ax2 = axes[1]
    
    biases = [site['comparisons']['swe']['bias'] for site in common_sites if 'swe' in site['comparisons']]
    
    if lats and lons and biases:
        max_abs_bias = max(abs(min(biases)), abs(max(biases)))
        
        scatter2 = ax2.scatter(lons, lats, c=biases, cmap='RdBu_r', s=80,
                              vmin=-max_abs_bias, vmax=max_abs_bias, 
                              edgecolors='black', linewidth=0.5)
        
        ax2.set_xlabel('Longitude')
        ax2.set_ylabel('Latitude')
        ax2.set_title('Snow Model Performance: SWE Bias (Sim - Obs)')
        ax2.grid(True, alpha=0.3)
        ax2.set_xlim(-180, 180)
        ax2.set_ylim(30, 85)
        
        # Add colorbar
        cbar2 = plt.colorbar(scatter2, ax=ax2)
        cbar2.set_label('SWE Bias (mm)')
    
    plt.suptitle('NorSWE Large Sample Snow Performance - Spatial Distribution', 
                 fontsize=16, fontweight='bold')
    plt.tight_layout()
    
    # Save spatial analysis
    spatial_path = experiment_dir / 'plots' / 'snow_spatial_performance.png'
    plt.savefig(spatial_path, dpi=300, bbox_inches='tight')
    plt.show()
    
    print(f"✅ Snow spatial performance map saved: {spatial_path}")
    
    return common_sites

# Execute Step 3 Analysis
print(f"\n🔍 Step 3.1: Snow Domain Discovery and Overview")

# Discover completed domains
completed_domains = discover_completed_snow_domains()

# Create domain overview map
total_selected, total_discovered, total_with_results, total_with_obs, total_complete = create_snow_domain_overview_map(completed_domains)

print(f"\n❄️ Step 3.2: Snow Results Extraction")

# Extract snow results from simulations
snow_results, snow_processing_summary = extract_snow_results_from_domains(completed_domains)

# Load NorSWE observations
norswe_obs, obs_summary = load_norswe_observations(completed_domains)

print(f"\n❄️ Step 3.3: Snow Comparison Analysis")

# Create snow comparison analysis
if snow_results and norswe_obs:
    common_sites = create_snow_comparison_analysis(snow_results, norswe_obs)
else:
    print("   ⚠️  Insufficient data for snow comparison analysis")
    common_sites = None

# Create final summary report
print(f"\n📋 Creating Final NorSWE Snow Study Summary Report...")

summary_report_path = experiment_dir / 'reports' / 'norswe_final_report.txt'
summary_report_path.parent.mkdir(exist_ok=True)

with open(summary_report_path, 'w') as f:
    f.write("NorSWE Large Sample Snow Study - Final Analysis Report\n")
    f.write("=" * 58 + "\n\n")
    f.write(f"Generated: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n\n")
    
    f.write("PROCESSING SUMMARY:\n")
    f.write(f"  Sites selected for analysis: {total_selected}\n")
    f.write(f"  Processing initiated: {total_discovered}\n")
    f.write(f"  Simulation results available: {total_with_results}\n")
    f.write(f"  Observations extracted: {total_with_obs}\n")
    f.write(f"  Complete snow validation: {total_complete}\n")
    f.write(f"  Snow extractions successful: {snow_processing_summary['domains_with_snow']}\n")
    f.write(f"  NorSWE observations available: {obs_summary['sites_with_swe']}\n")
    
    if common_sites:
        f.write(f"  Sites with sim/obs comparison: {len(common_sites)}\n\n")
        
        # SWE performance summary
        swe_sites = [site for site in common_sites if 'swe' in site['comparisons']]
        if swe_sites:
            swe_correlations = [site['comparisons']['swe']['correlation'] for site in swe_sites]
            swe_rmses = [site['comparisons']['swe']['rmse'] for site in swe_sites]
            swe_biases = [site['comparisons']['swe']['bias'] for site in swe_sites]
            
            f.write("SWE PERFORMANCE SUMMARY:\n")
            f.write(f"  Mean correlation: {np.mean(swe_correlations):.3f} ± {np.std(swe_correlations):.3f}\n")
            f.write(f"  Mean RMSE: {np.mean(swe_rmses):.1f} ± {np.std(swe_rmses):.1f} mm\n")
            f.write(f"  Mean bias: {np.mean(swe_biases):+.1f} ± {np.std(swe_biases):.1f} mm\n\n")
        
        # Snow depth performance summary
        depth_sites = [site for site in common_sites if 'depth' in site['comparisons']]
        if depth_sites:
            depth_correlations = [site['comparisons']['depth']['correlation'] for site in depth_sites]
            depth_rmses = [site['comparisons']['depth']['rmse'] for site in depth_sites]
            depth_biases = [site['comparisons']['depth']['bias'] for site in depth_sites]
            
            f.write("SNOW DEPTH PERFORMANCE SUMMARY:\n")
            f.write(f"  Mean correlation: {np.mean(depth_correlations):.3f} ± {np.std(depth_correlations):.3f}\n")
            f.write(f"  Mean RMSE: {np.mean(depth_rmses):.1f} ± {np.std(depth_rmses):.1f} cm\n")
            f.write(f"  Mean bias: {np.mean(depth_biases):+.1f} ± {np.std(depth_biases):.1f} cm\n\n")
        
        f.write("BEST PERFORMING SITES (by SWE correlation):\n")
        if swe_sites:
            sorted_sites = sorted(swe_sites, key=lambda x: x['comparisons']['swe']['correlation'], reverse=True)
            for i, site in enumerate(sorted_sites[:5]):
                f.write(f"  {i+1}. {site['domain_name']}: r={site['comparisons']['swe']['correlation']:.3f}, RMSE={site['comparisons']['swe']['rmse']:.1f} mm\n")

print(f"✅ Final summary report saved: {summary_report_path}")

print(f"\n🎉 Step 3 Complete: NorSWE Snow Validation Analysis")
print(f"   📁 Results saved to: {experiment_dir}")
print(f"   ❄️ Snow domain overview: {total_complete}/{total_selected} sites with complete validation")

if common_sites:
    swe_sites = [site for site in common_sites if 'swe' in site['comparisons']]
    depth_sites = [site for site in common_sites if 'depth' in site['comparisons']]
    
    if swe_sites:
        swe_correlations = [site['comparisons']['swe']['correlation'] for site in swe_sites]
        print(f"   📊 SWE analysis: {len(swe_sites)} sites with sim/obs comparison")
        print(f"   📈 SWE performance: Mean r = {np.mean(swe_correlations):.3f}")
    
    if depth_sites:
        depth_correlations = [site['comparisons']['depth']['correlation'] for site in depth_sites]
        print(f"   📏 Depth analysis: {len(depth_sites)} sites with sim/obs comparison")
        print(f"   📈 Depth performance: Mean r = {np.mean(depth_correlations):.3f}")
else:
    print(f"   📈 Performance: Awaiting more simulation results")

print(f"\n✅ Large Sample NorSWE Snow Analysis Complete!")
print(f"   ❄️ Multi-site snow hydrology validation achieved")
print(f"   📊 Statistical patterns identified across elevation and climate gradients")

**Ready to explore large sample basin simulations?** → **[Tutorial 04c: Large Sample Studies - CAMELS-Spat](./04c_large_sample_camelsspat.ipynb)**