# CONFLUENCE Tutorial - 9: NorSWE Large Sample Study (Snow Observation Network)

## Introduction

This tutorial extends our large sample studies approach to focus specifically on snow hydrology validation using the NorSWE (Northern Hemisphere Snow Water Equivalent) dataset. Building on the multi-site analysis framework demonstrated with FLUXNET, we now apply CONFLUENCE to systematically evaluate snow modeling performance across a network of snow observation stations throughout the northern hemisphere.

### NorSWE: A Critical Snow Observation Network

The NorSWE dataset represents one of the most comprehensive collections of snow observations available for hydrological model validation:

**Spatial Coverage**:
- **Northern Hemisphere focus**: Stations across snow-dominated regions
- **Nordic emphasis**: Dense coverage in Scandinavia, Finland, and Norway
- **Elevation gradients**: From coastal lowlands to high mountain regions
- **Climate diversity**: Maritime, continental, and Arctic snow climates

**Observational Richness**:
- **Snow Water Equivalent (SWE)**: Direct measurements of snow mass
- **Snow Depth**: Complementary snow pack structure information
- **Long-term records**: Multi-decade observations at many sites
- **Quality control**: Standardized measurement protocols and data processing

### Scientific Importance of Snow Validation

Snow processes represent some of the most challenging aspects of hydrological modeling:

**Physical Complexity**:
- **Phase transitions**: Freezing, melting, and sublimation processes
- **Energy balance**: Complex interactions between radiation, temperature, and wind
- **Layered structure**: Metamorphism and density changes within the snowpack
- **Spatial variability**: Strong elevation and aspect dependencies

**Hydrological Significance**:
- **Seasonal storage**: Snow acts as a natural reservoir in many regions
- **Timing control**: Snowmelt timing affects peak flows and water availability
- **Climate sensitivity**: Snow processes are highly sensitive to temperature changes
- **Extreme events**: Snow-rain transitions and rain-on-snow events

### Why NorSWE for Large Sample Snow Studies?

NorSWE provides unique advantages for systematic snow model evaluation:

1. **Process Focus**: Dedicated snow observations rather than mixed-variable datasets
2. **Measurement Quality**: Direct SWE measurements provide unambiguous validation targets
3. **Environmental Gradients**: Sites span elevation, latitude, and climate gradients
4. **Seasonal Dynamics**: Full seasonal cycle observations capture accumulation and ablation
5. **Nordic Expertise**: Stations operated by countries with world-leading snow science

### Research Questions for Snow Modeling

Large sample studies with NorSWE enable investigation of critical snow science questions:

1. **Model Physics**: How well do different snow process representations perform across environments?
2. **Climate Controls**: Which meteorological variables most strongly control snow accumulation and melt?
3. **Elevation Effects**: How do snow processes change with elevation and their representation in models?
4. **Regional Patterns**: Are there systematic regional biases in snow modeling?
5. **Seasonal Dynamics**: Can models capture both accumulation and ablation processes accurately?

### Unique Challenges of Snow Modeling

Snow modeling presents distinct challenges compared to other hydrological processes:

**Meteorological Sensitivity**:
- **Temperature thresholds**: Critical temperature for rain-snow transitions
- **Radiation balance**: Complex interactions between shortwave and longwave radiation
- **Wind effects**: Redistribution and sublimation processes
- **Humidity control**: Sublimation rates and surface energy balance

**Temporal Dynamics**:
- **Seasonal cycle**: Distinct accumulation and ablation seasons
- **Diurnal variation**: Strong daily cycles in energy balance
- **Event-based processes**: Individual storm impacts on snowpack
- **Intermittency**: Episodic accumulation and melt events

### CONFLUENCE's Snow Modeling Capabilities

CONFLUENCE's integration with SUMMA provides sophisticated snow modeling capabilities:

**Advanced Snow Physics**:
- **Multi-layer snowpack**: Explicit representation of snow stratigraphy
- **Energy balance**: Detailed surface energy balance calculations
- **Metamorphism**: Snow density and thermal property evolution
- **Liquid water**: Representation of liquid water flow through snow

**Flexible Parameterizations**:
- **Multiple options**: Different approaches for key snow processes
- **Sensitivity analysis**: Test different process representations
- **Decision analysis**: Compare alternative model structures
- **Uncertainty quantification**: Assess parameter and structural uncertainty

### NorSWE vs. FLUXNET: Complementary Approaches

While FLUXNET focused on energy balance validation, NorSWE provides complementary insights:

| Aspect | FLUXNET | NorSWE |
|--------|---------|--------|
| **Focus** | Energy/carbon fluxes | Snow mass/depth |
| **Process** | Continuous processes | Seasonal accumulation |
| **Validation** | Flux measurements | State variables |
| **Complexity** | Ecosystem interactions | Phase change physics |
| **Temporal** | Year-round | Seasonal focus |

### Expected Outcomes

This tutorial demonstrates several key capabilities for snow-focused large sample studies:

1. **Snow-Specific Configuration**: Adapt CONFLUENCE configurations for snow observation sites
2. **Seasonal Analysis**: Focus on snow accumulation and ablation periods
3. **Multi-Variable Validation**: Compare both SWE and snow depth simulations
4. **Elevation Analysis**: Examine how model performance varies with elevation
5. **Climate Sensitivity**: Assess model performance across different snow climates

### Methodological Considerations

Snow-focused large sample studies require specific methodological approaches:

**Site Selection**:
- **Elevation gradients**: Represent different snow accumulation zones
- **Climate diversity**: Include maritime, continental, and Arctic sites
- **Data quality**: Ensure reliable SWE and snow depth measurements
- **Temporal coverage**: Adequate seasonal cycle representation

**Analysis Approaches**:
- **Seasonal statistics**: Focus on peak SWE, melt timing, and duration
- **Process evaluation**: Assess accumulation vs. ablation performance
- **Threshold analysis**: Evaluate temperature and precipitation thresholds
- **Extreme events**: Analyze performance during unusual snow years

### Tutorial Structure

This tutorial follows the established large sample framework while emphasizing snow-specific aspects:

1. **NorSWE Site Selection**: Choose representative sites across snow environments
2. **Snow-Focused Configuration**: Adapt CONFLUENCE for snow observation validation
3. **Seasonal Analysis Setup**: Configure for snow season evaluation
4. **Batch Processing**: Execute CONFLUENCE across multiple snow sites
5. **Snow-Specific Results**: Collect and analyze SWE and snow depth outputs
6. **Elevation Analysis**: Examine performance across elevation gradients
7. **Climate Comparison**: Compare results across different snow climates

### Scientific Impact

NorSWE large sample studies contribute to advancing snow science:

- **Model Validation**: Systematic evaluation of snow process representations
- **Process Understanding**: Identify key controls on snow accumulation and melt
- **Climate Applications**: Improve projections of snow under changing climate
- **Operational Applications**: Enhance seasonal forecasting and water management
- **Uncertainty Assessment**: Quantify reliability of snow predictions

### Building on Previous Tutorials

This tutorial leverages all the skills developed throughout the CONFLUENCE series:

- **Point-scale understanding**: Foundation in vertical snow processes
- **Workflow automation**: Efficient multi-site processing
- **Configuration management**: Template-based site setup
- **Results analysis**: Statistical evaluation of multi-site results
- **Visualization**: Clear presentation of spatial and temporal patterns

By applying these skills to snow-focused validation, you'll gain expertise in one of the most challenging aspects of hydrological modeling while contributing to improved understanding of snow processes across diverse northern hemisphere environments.

The combination of CONFLUENCE's sophisticated snow modeling capabilities with NorSWE's comprehensive observation network provides a powerful framework for advancing snow science through systematic, large sample analysis.

## 1. Setup and Imports

In [None]:
import sys
import os
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import subprocess
import yaml
from datetime import datetime
import xarray as xr
import seaborn as sns

# Add CONFLUENCE to path
confluence_path = Path('../').resolve()
sys.path.append(str(confluence_path))

# Set up plotting style
plt.style.use('default')
sns.set_palette("coolwarm")
%matplotlib inline

print("Setup complete!")

## 2. Configure the Experiment

In [None]:
# Configuration for the NorSWE large sample experiment
experiment_config = {
    'experiment_name': 'norswe_tutorial',
    'norswe_path': '/work/comphyd_lab/data/geospatial-data/NorSWE/NorSWE-NorEEN_1979-2021_v2.nc',
    'template_config': '../CONFLUENCE/0_config_files/config_norswe_template.yaml',
    'config_dir': '../CONFLUENCE/0_config_files/norswe',
    'output_dir': './norswe_output',
    'base_path': '/work/comphyd_lab/data/CONFLUENCE_data/norswe',
    'min_completeness': 0.0,  # Minimum % data completeness
    'max_stations': 10,  # Number of stations to process
    'start_year': 2010,  # Optional: filter data by year
    'end_year': 2020,
    'no_submit': False  # Set to True for dry run
}

# Create directories
experiment_dir = Path(f"./experiments/{experiment_config['experiment_name']}")
experiment_dir.mkdir(parents=True, exist_ok=True)
Path(experiment_config['output_dir']).mkdir(parents=True, exist_ok=True)
Path(experiment_config['config_dir']).mkdir(parents=True, exist_ok=True)

# Save configuration
with open(experiment_dir / 'experiment_config.yaml', 'w') as f:
    yaml.dump(experiment_config, f)

print(f"Experiment configured: {experiment_config['experiment_name']}")
print(f"Processing up to {experiment_config['max_stations']} NorSWE stations")

## 3. Explore NorSWE Dataset

In [None]:
'''
# Open NorSWE dataset
ds = xr.open_dataset(experiment_config['norswe_path'])

print("NorSWE Dataset Information:")
print(f"Time range: {ds.time.values[0]} to {ds.time.values[-1]}")
print(f"Number of stations: {len(ds.station_id)}")
print(f"Variables: {list(ds.data_vars)}")
print(f"Coordinates: {list(ds.coords)}")

# Display dataset structure
print("\nDataset structure:")
print(ds)

ds.close()
'''

## 4. Process NorSWE Station Data

In [None]:
# Import the processing function from the script
sys.path.append(str(confluence_path / '9_scripts'))
from run_sites_norswe import process_norswe_data

# Process station data
stations_csv = Path('norswe_stations.csv')
stations_df = pd.read_csv(stations_csv)

'''
stations_df = process_norswe_data(
    experiment_config['norswe_path'],
    str(stations_csv),
    start_year=experiment_config.get('start_year'),
    end_year=experiment_config.get('end_year'),
    use_existing_csv=True
)
'''

print(f"Processed {len(stations_df)} stations")
print("\nStation data columns:")
for col in stations_df.columns:
    print(f"  - {col}")

# Display sample stations
print("\nSample stations:")
display(stations_df[['station_id', 'station_name', 'lat', 'lon', 'elevation', 'swq_completeness']].head())


## 5. Visualize Station Distribution

In [None]:
# Create station distribution map
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 8))

# Geographic distribution
scatter = ax1.scatter(stations_df['lon'], stations_df['lat'], 
                     c=stations_df['elevation'], cmap='terrain',
                     s=50, alpha=0.7, edgecolors='black', linewidth=0.5)
ax1.set_title('NorSWE Station Locations', fontsize=16, fontweight='bold')
ax1.set_xlabel('Longitude', fontsize=12)
ax1.set_ylabel('Latitude', fontsize=12)
ax1.grid(True, alpha=0.3)

# Add colorbar for elevation
cbar = plt.colorbar(scatter, ax=ax1)
cbar.set_label('Elevation (m)', fontsize=12)

# Elevation distribution
ax2.hist(stations_df['elevation'], bins=20, color='skyblue', edgecolor='black', alpha=0.7)
ax2.set_xlabel('Elevation (m)', fontsize=12)
ax2.set_ylabel('Number of Stations', fontsize=12)
ax2.set_title('Elevation Distribution', fontsize=14)
ax2.grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

# Data completeness distribution
fig, ax = plt.subplots(figsize=(10, 6))
ax.hist(stations_df['swq_completeness'], bins=20, color='lightblue', 
        edgecolor='black', alpha=0.7, label='SWE')
ax.hist(stations_df['snd_completeness'], bins=20, color='lightcoral', 
        edgecolor='black', alpha=0.5, label='Snow Depth')
ax.set_xlabel('Data Completeness (%)', fontsize=12)
ax.set_ylabel('Number of Stations', fontsize=12)
ax.set_title('Data Completeness Distribution', fontsize=14)
ax.legend()
ax.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

## 6. Select Stations for Processing

In [None]:
# Filter stations by data completeness
complete_stations = stations_df[
    (stations_df['swq_completeness'] >= experiment_config['min_completeness']) &
    (stations_df['snd_completeness'] >= experiment_config['min_completeness'])
].copy()

print(f"Found {len(complete_stations)} stations with ≥{experiment_config['min_completeness']}% completeness")

# Select stations (prioritize by completeness and elevation diversity)
if len(complete_stations) > experiment_config['max_stations']:
    # Sort by completeness and select diverse elevations
    complete_stations = complete_stations.sort_values(
        by=['swq_completeness', 'snd_completeness'], 
        ascending=False
    ).head(experiment_config['max_stations'])

print(f"\nSelected {len(complete_stations)} stations for processing")
display(complete_stations[['station_id', 'station_name', 'elevation', 'swq_completeness']])

# Save selected stations
complete_stations.to_csv(experiment_dir / 'selected_stations.csv', index=False)

## 7. Extract Snow Observation Data

In [None]:
'''
# Example: Extract and plot snow data for one station
from run_sites_norswe import extract_snow_data

# Select first station as example
example_station = complete_stations.iloc[0]
station_id = example_station['station_id']
station_name = example_station['Watershed_Name']

# Create output directory for this station
station_dir = Path(experiment_config['base_path']) / f"domain_{station_name}"
station_dir.mkdir(parents=True, exist_ok=True)

# Extract snow data
swe_file, snd_file = extract_snow_data(
    experiment_config['norswe_path'],
    station_id,
    str(station_dir),
    start_year=experiment_config.get('start_year'),
    end_year=experiment_config.get('end_year')
)

# Load and visualize the extracted data
swe_df = pd.read_csv(swe_file)
snd_df = pd.read_csv(snd_file)

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8), sharex=True)

# Plot SWE
ax1.plot(pd.to_datetime(swe_df['time']), swe_df['SWE_kg_m2'], 
         color='blue', linewidth=1.5, alpha=0.7)
ax1.set_ylabel('SWE (kg/m²)', fontsize=12)
ax1.set_title(f'Snow Observations - {station_name} (ID: {station_id})', fontsize=14)
ax1.grid(True, alpha=0.3)

# Plot snow depth
ax2.plot(pd.to_datetime(snd_df['time']), snd_df['Depth_m'] * 100, 
         color='purple', linewidth=1.5, alpha=0.7)
ax2.set_ylabel('Snow Depth (cm)', fontsize=12)
ax2.set_xlabel('Date', fontsize=12)
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"Extracted snow data for {station_name}:")
print(f"  SWE file: {swe_file}")
print(f"  Snow depth file: {snd_file}")
'''

## 8. Generate Configuration Files

In [None]:
'''
# Generate configs for selected stations
from run_sites_norswe import generate_config_file

config_dir = Path(experiment_config['config_dir'])
generated_configs = []

for _, station in complete_stations.iterrows():
    station_name = station['Watershed_Name']
    pour_point = station['POUR_POINT_COORDS']
    bounding_box = station['BOUNDING_BOX_COORDS']
    
    # Generate config file
    config_path = config_dir / f"config_{station_name}.yaml"
    
    generate_config_file(
        experiment_config['template_config'],
        str(config_path),
        station_name,
        pour_point,
        bounding_box
    )
    
    generated_configs.append(config_path)

print(f"Generated {len(generated_configs)} configuration files")
print("\nExample config locations:")
for config in generated_configs[:3]:
    print(f"  - {config}")
'''

## 9. Launch CONFLUENCE Jobs

In [None]:
'''
# Prepare to launch CONFLUENCE runs
from run_sites_norswe import run_confluence

submitted_jobs = []
skipped_jobs = []

# Interactive decision (for notebook, we'll simulate 'y' or 'n')
if experiment_config['no_submit']:
    submit_jobs = 'n'
    print("DRY RUN MODE - No jobs will be submitted")
else:
    submit_jobs = 'y'  # In real notebook, you'd ask user
    print("Preparing to submit CONFLUENCE jobs...")

if submit_jobs == 'y':
    for _, station in complete_stations.iterrows():
        station_name = station['Watershed_Name']
        
        # Check if simulation already exists
        sim_path = Path(experiment_config['base_path']) / f"domain_{station_name}" / "simulations" / "run_1" / "SUMMA" / "run_1_timestep.nc"
        
        if sim_path.exists():
            print(f"Skipping {station_name} - simulation already exists")
            skipped_jobs.append(station_name)
            continue
        
        # Submit job
        config_path = config_dir / f"config_{station_name}.yaml"
        job_id = run_confluence(str(config_path), station_name)
        
        if job_id:
            submitted_jobs.append((station_name, job_id))
            print(f"Submitted job for {station_name}: {job_id}")
        
        # Small delay between submissions
        import time
        time.sleep(2)

# Summary
print("\nJob submission summary:")
print(f"  Submitted: {len(submitted_jobs)}")
print(f"  Skipped: {len(skipped_jobs)}")

if submitted_jobs:
    print("\nSubmitted jobs:")
    for station_name, job_id in submitted_jobs[:5]:
        print(f"  {station_name}: {job_id}")
'''

## 10. Monitor Job Status

In [None]:
'''
# Check job status
def check_job_status(user=None):
    user = user or os.environ.get('USER')
    cmd = ['squeue', '-u', user]
    result = subprocess.run(cmd, capture_output=True, text=True)
    return result.stdout

print("Current job status:")
print(check_job_status())

# Save job information
if submitted_jobs:
    job_df = pd.DataFrame(submitted_jobs, columns=['station_name', 'job_id'])
    job_df.to_csv(experiment_dir / 'submitted_jobs.csv', index=False)
    print(f"\nJob information saved to {experiment_dir / 'submitted_jobs.csv'}")
'''

## 11. Find Completed Simulations

In [None]:
# Find completed simulations
base_path = Path(experiment_config['base_path'])
completed = []

for _, station in complete_stations.iterrows():
    station_name = station['Watershed_Name']
    sim_path = base_path / f"domain_{station_name}" / "simulations" / "run_1" / "SUMMA"
    
    if sim_path.exists() and list(sim_path.glob("*timestep*.nc")):
        completed.append({
            'station_name': station_name,
            'station_id': station['station_id'],
            'elevation': station['elevation'],
            'sim_path': sim_path
        })

print(f"Found {len(completed)} completed simulations")
if completed:
    completed_df = pd.DataFrame(completed)
    display(completed_df[['station_name', 'station_id', 'elevation']])

## 12. Load and Compare Results

In [None]:
# Function to load SUMMA snow output
def load_summa_snow(sim_path):
    output_files = list(Path(sim_path).glob("*day*.nc"))
    if output_files:
        ds = xr.open_dataset(output_files[0])
        
        # Extract SWE and snow depth if available
        data = {}
        if 'scalarSWE' in ds.variables:
            data['swe'] = ds.scalarSWE.values.flatten()
        if 'scalarSnowDepth' in ds.variables:
            data['depth'] = ds.scalarSnowDepth.values.flatten()
        
        data['time'] = pd.to_datetime(ds.time.values)
        ds.close()
        
        return pd.DataFrame(data)
    return None

# Compare modeled and observed snow for completed simulations
if completed:
    fig, ax = plt.subplots()
    
    for i, site in enumerate(completed[:30]):
        # Load model output
        model_data = load_summa_snow(site['sim_path'])
        
        # Load observations
        #obs_path = base_path / f"domain_{site['station_name']}" / "observations" / "snow" / "raw_data"
        #swe_obs = pd.read_csv(obs_path / "swe" / f"{site['station_id']}_swe.csv")
        #snd_obs = pd.read_csv(obs_path / "depth" / f"{site['station_id']}_depth.csv")
        
        if model_data is not None:
            if 'swe' in model_data:
                ax.plot(model_data['time'], model_data['swe'], 
                        'r-', label=site, alpha=0.7)
            ax.set_ylabel('SWE (kg/m²)')
            ax.set_title(f"{site['station_name']} - Elevation: {site['elevation']}m")
            ax.legend()
            ax.grid(True, alpha=0.3)
        
    
    plt.tight_layout()
    plt.show()

## 13. Performance Analysis Across Elevations