# 🌡️ Climate Model Validation Workshop

## Complete End-to-End Analysis
**Study Area:** AMMAN ZARQA Basin, Jordan  
**Models:** 6 RICCAR Climate Models (SSP4.5)  
**Stations:** AL0019, AL0035, AL0059  
**Period:** 1990-2014

---

## 🎯 Workshop Overview

### **Part 1: Station Data Extraction** 📍
- Download climate model NetCDF files
- Extract temperature data at station locations

### **Part 2: Monthly Climatology** 📊
- Convert daily data to monthly climatology
- Calculate seasonal temperature patterns

### **Part 3: Station Data Processing** 🌡️
- Load observed temperature data
- Process station climatology

### **Part 4: Model Validation** 🎓
- Calculate 5 validation metrics (RMSE, R, NSE, PBIAS, MAE)
- Compare models against observations
- Rank model performance

---

## 📚 Learning Objectives
- ✅ Extract and process climate data using Python
- ✅ Calculate validation metrics and interpret results
- ✅ Create publication-quality visualizations
- ✅ Make informed decisions about model selection

**Estimated Time: 45-60 minutes**

# ⚙️ Setup: Install Packages and Download Data

In [None]:
!pip install xarray netcdf4 requests tqdm seaborn -q
print("✅ Packages installed successfully!")

In [None]:
import os
import urllib.request
import xarray as xr
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import glob
from tqdm import tqdm

print("📚 Libraries imported successfully!")

In [None]:
def download_workshop_data():
    """Download all required files from GitHub repository"""
    
    print("🎯 DOWNLOADING WORKSHOP DATA")
    print("=" * 40)
    
    base_url = "https://raw.githubusercontent.com/MoawiahHussien/climate-model-validation-workshop/main/"
    
    os.makedirs("input_data/models_netcdf", exist_ok=True)
    os.makedirs("input_data/stations_data", exist_ok=True)
    
    nc_files = [
        "arcgis_merged_Tmax_CMCC-CM2-SR5.nc",
        "arcgis_merged_Tmax_CNRM-ESM2-1.nc", 
        "arcgis_merged_Tmax_EC-Earth3-Veg.nc",
        "arcgis_merged_Tmax_IPSL-CM6A-LR.nc",
        "arcgis_merged_Tmax_MPI-ESM1-2-LR.nc",
        "arcgis_merged_Tmax_NorESM2-MM.nc"
    ]
    
    station_files = ["AL0019.xlsx", "AL0035.xlsx", "AL0059.xlsx"]
    
    print("📥 Downloading climate model files...")
    for nc_file in nc_files:
        file_url = base_url + "Input%20Files/Models.Nc.ArcGIS.Compatible/" + nc_file
        local_path = f"input_data/models_netcdf/{nc_file}"
        
        if os.path.exists(local_path):
            print(f"✅ Using cached: {nc_file}")
            continue
        
        try:
            print(f"📥 Downloading: {nc_file}")
            urllib.request.urlretrieve(file_url, local_path)
            file_size = os.path.getsize(local_path) / (1024*1024)
            print(f"   ✅ Complete: {file_size:.1f} MB")
        except Exception as e:
            print(f"   ❌ Failed: {e}")
    
    print("\n📥 Downloading station files...")
    for station_file in station_files:
        file_url = base_url + "Input%20Files/Stations.Daily/" + station_file
        local_path = f"input_data/stations_data/{station_file}"
        
        if os.path.exists(local_path):
            print(f"✅ Using cached: {station_file}")
            continue
        
        try:
            print(f"📥 Downloading: {station_file}")
            urllib.request.urlretrieve(file_url, local_path)
            file_size = os.path.getsize(local_path) / (1024*1024)
            print(f"   ✅ Complete: {file_size:.1f} MB")
        except Exception as e:
            print(f"   ❌ Failed: {e}")
    
    print(f"\n🎉 Data download complete!")

download_workshop_data()

In [None]:
# Workshop configuration
stations = {
    'AL0019': {'lat': 31.95, 'lon': 35.93, 'name': 'Amman Airport'},
    'AL0035': {'lat': 32.01, 'lon': 35.85, 'name': 'Zarqa Station'}, 
    'AL0059': {'lat': 31.97, 'lon': 36.12, 'name': 'Russeifa Station'}
}

model_files = {
    'CMCC': 'arcgis_merged_Tmax_CMCC-CM2-SR5.nc',
    'CNRM': 'arcgis_merged_Tmax_CNRM-ESM2-1.nc',
    'EC-Earth3': 'arcgis_merged_Tmax_EC-Earth3-Veg.nc',
    'IPSL': 'arcgis_merged_Tmax_IPSL-CM6A-LR.nc',
    'MPI': 'arcgis_merged_Tmax_MPI-ESM1-2-LR.nc',
    'NorESM2': 'arcgis_merged_Tmax_NorESM2-MM.nc'
}

model_colors = {
    'CMCC': '#1f77b4', 'CNRM': '#ff7f0e', 'EC-Earth3': '#2ca02c',
    'IPSL': '#d62728', 'MPI': '#9467bd', 'NorESM2': '#8c564b'
}

print(f"🎯 WORKSHOP CONFIGURATION")
print(f"📍 Target Stations: {len(stations)}")
print(f"🌡️ Climate Models: {len(model_files)}")
for station_id, info in stations.items():
    print(f"  {station_id}: {info['name']} ({info['lat']:.2f}°N, {info['lon']:.2f}°E)")

# 📍 Part 1: Station Data Extraction from Climate Models

In [None]:
def extract_station_data():
    """Extract temperature data at station locations from all models"""
    
    print(f"📍 EXTRACTING DATA AT STATION LOCATIONS")
    print("-" * 45)
    
    model_station_data = {}
    extraction_log = []
    
    for model_name, filename in model_files.items():
        print(f"\n🔄 Processing: {model_name}")
        
        file_path = f"input_data/models_netcdf/{filename}"
        
        if not os.path.exists(file_path):
            print(f"    ❌ File not found: {filename}")
            continue
        
        ds = xr.open_dataset(file_path)
        model_station_data[model_name] = {}
        
        for station_id, station_info in stations.items():
            station_data = ds.tasmaxAdjust.sel(
                lat=station_info['lat'], 
                lon=station_info['lon'], 
                method='nearest'
            )
            
            grid_lat = float(station_data.lat.values)
            grid_lon = float(station_data.lon.values)
            distance_km = np.sqrt((grid_lat - station_info['lat'])**2 + 
                                (grid_lon - station_info['lon'])**2) * 111
            
            model_station_data[model_name][station_id] = station_data.values
            
            extraction_log.append({
                'Model': model_name,
                'Station': station_id,
                'Station_Name': station_info['name'],
                'Distance_km': round(distance_km, 2),
                'Days_Extracted': len(station_data),
                'Valid_Days': int(np.sum(~np.isnan(station_data.values)))
            })
            
            print(f"    ✅ {station_id}: {len(station_data)} days, distance: {distance_km:.2f} km")
        
        ds.close()
    
    print(f"\n✅ Part 1 Complete: {len(extraction_log)} extractions")
    return model_station_data, pd.DataFrame(extraction_log)

extracted_data, extraction_summary = extract_station_data()

In [None]:
# Part 1 visualization
summary_data = []
for model_name in extracted_data.keys():
    for station_id in extracted_data[model_name].keys():
        temp_data = extracted_data[model_name][station_id]
        summary_data.append({
            'Model': model_name,
            'Station': station_id,
            'Station_Name': stations[station_id]['name'],
            'Mean_Temp': round(np.nanmean(temp_data), 2)
        })

summary_df = pd.DataFrame(summary_data)

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Distance to grid points
ax1 = axes[0]
extraction_summary.boxplot(column='Distance_km', by='Station', ax=ax1)
ax1.set_title('Distance to Nearest Grid Point\n(Spatial Resolution Check)', fontweight='bold')
ax1.set_ylabel('Distance (km)')
ax1.set_xlabel('Station ID')
ax1.grid(True, alpha=0.3)

# Temperature by model
ax2 = axes[1]
models = list(model_colors.keys())
model_temps = [summary_df[summary_df['Model'] == model]['Mean_Temp'].values for model in models]
bp = ax2.boxplot(model_temps, labels=models, patch_artist=True, widths=0.6)
for patch, model in zip(bp['boxes'], models):
    patch.set_facecolor(model_colors[model])
    patch.set_alpha(0.7)

ax2.set_title('Mean Temperature by Model\n1990-2014 Daily Average', fontweight='bold')
ax2.set_ylabel('Temperature (°C)')
ax2.set_xlabel('Climate Model')
ax2.tick_params(axis='x', rotation=45)
ax2.grid(True, alpha=0.3)

fig.suptitle('Part 1: Climate Model Data Extraction Results', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("\n🎯 PART 1 SUMMARY")
print(f"✅ Models processed: {len(extracted_data)}")
print(f"✅ Stations extracted: {len(stations)}")
avg_distance = extraction_summary['Distance_km'].mean()
print(f"📏 Average grid distance: {avg_distance:.2f} km")
print(f"\n➡️ Ready for Part 2: Monthly Climatology Calculation")

# 📊 Part 2: Monthly Climatology Calculation

In [None]:
def calculate_monthly_climatology_from_memory(extracted_data):
    """Calculate monthly climatology from extracted data in memory"""
    
    print(f"📊 CALCULATING MONTHLY CLIMATOLOGY")
    print("-" * 35)
    
    climatology_data = []
    
    for model_name in extracted_data.keys():
        for station_id in extracted_data[model_name].keys():
            
            print(f"  🔄 Processing {model_name} - {station_id}")
            
            temp_data = extracted_data[model_name][station_id]
            time_index = pd.date_range('1990-01-01', periods=len(temp_data), freq='D')
            
            df = pd.DataFrame({
                'Date': time_index,
                'Temperature_C': temp_data
            })
            
            df['Month'] = df['Date'].dt.month
            
            monthly_clim = df.groupby('Month')['Temperature_C'].agg([
                ('Mean_Temp', 'mean'),
                ('Min_Temp', 'min'),
                ('Max_Temp', 'max'),
                ('Std_Temp', 'std')
            ]).round(2)
            
            month_names = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
                          'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
            monthly_clim['Month_Name'] = month_names
            monthly_clim['Model'] = model_name
            monthly_clim['Station'] = station_id
            monthly_clim['Station_Name'] = stations[station_id]['name']
            monthly_clim = monthly_clim.reset_index()
            
            climatology_data.append(monthly_clim)
    
    all_climatology = pd.concat(climatology_data, ignore_index=True)
    
    print(f"\n✅ Part 2 Complete: Monthly climatology for {len(climatology_data)} model-station combinations")
    return all_climatology

model_climatology = calculate_monthly_climatology_from_memory(extracted_data)

if model_climatology is not None:
    print(f"\n📊 Climatology Summary:")
    print(f"  Total records: {len(model_climatology)}")
    print(f"  Models: {model_climatology['Model'].nunique()}")
    print(f"  Stations: {model_climatology['Station'].nunique()}")
    print(f"  Temperature range: {model_climatology['Mean_Temp'].min():.1f}°C to {model_climatology['Mean_Temp'].max():.1f}°C")

In [None]:
# Part 2 visualization
if model_climatology is not None:
    
    print(f"📈 CREATING CLIMATOLOGY VISUALIZATION")
    
    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    stations_order = ['AL0019', 'AL0035', 'AL0059']
    
    for i, station_id in enumerate(stations_order):
        ax = axes[i]
        
        for model in model_climatology['Model'].unique():
            data = model_climatology[(model_climatology['Model'] == model) & 
                                    (model_climatology['Station'] == station_id)]
            
            if not data.empty:
                months = data['Month'].values
                temps = data['Mean_Temp'].values
                
                ax.plot(months, temps, 'o-', color=model_colors[model], 
                       label=model, linewidth=2, markersize=6, alpha=0.8)
        
        ax.set_title(f'{station_id}\n({stations[station_id]["name"]})', fontweight='bold')
        ax.set_xlabel('Month')
        ax.set_ylabel('Temperature (°C)')
        ax.set_xticks(range(1, 13))
        ax.set_xticklabels(['J', 'F', 'M', 'A', 'M', 'J', 'J', 'A', 'S', 'O', 'N', 'D'])
        ax.grid(True, alpha=0.3)
        
        if i == 2:
            ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left', fontsize=8)
    
    fig.suptitle('Part 2: Monthly Temperature Climatology by Station\n1990-2014 Average (All Models)', 
                 fontsize=14, fontweight='bold')
    
    plt.tight_layout()
    plt.subplots_adjust(right=0.85)
    plt.show()
    
    print(f"\n🎯 PART 2 SUMMARY")
    print(f"✅ Monthly climatology calculated for {len(model_climatology['Model'].unique())} models")
    print(f"✅ {len(model_climatology['Station'].unique())} stations processed")
    print(f"✅ Temperature range: {model_climatology['Mean_Temp'].min():.1f}°C to {model_climatology['Mean_Temp'].max():.1f}°C")
    print(f"\n➡️ Ready for Part 3: Station Data Processing")

# 🌡️ Part 3: Station Data Processing

In [None]:
def process_station_data():
    """Process observed station data for validation"""
    
    print(f"🌡️ PROCESSING STATION DATA")
    print("-" * 30)
    
    station_climatology = []
    target_stations = ['AL0019', 'AL0035', 'AL0059']
    
    for station_id in target_stations:
        station_file = f"input_data/stations_data/{station_id}.xlsx"
        
        if not os.path.exists(station_file):
            print(f"❌ Station file not found: {station_id}.xlsx")
            continue
        
        print(f"🔄 Processing station: {station_id}")
        
        try:
            df = pd.read_excel(station_file)
            
            if 'Date' not in df.columns or 'Corrected Tmax' not in df.columns:
                print(f"❌ Missing required columns in {station_id}")
                continue
            
            df['Date'] = pd.to_datetime(df['Date'])
            df = df[(df['Date'].dt.year >= 1990) & (df['Date'].dt.year <= 2014)]
            df['Temperature'] = pd.to_numeric(df['Corrected Tmax'], errors='coerce')
            df = df[(df['Temperature'] >= -20) & (df['Temperature'] <= 60)]
            df['Month'] = df['Date'].dt.month
            
            monthly_clim = df.groupby('Month')['Temperature'].agg([
                ('Mean_Temp', 'mean'),
                ('Min_Temp', 'min'),
                ('Max_Temp', 'max'),
                ('Std_Temp', 'std'),
                ('Count', 'count')
            ]).round(2)
            
            month_names = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
                          'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
            monthly_clim['Month_Name'] = month_names
            monthly_clim['Station'] = station_id
            monthly_clim['Station_Name'] = stations[station_id]['name']
            monthly_clim = monthly_clim.reset_index()
            
            station_climatology.append(monthly_clim)
            
            valid_days = df['Temperature'].notna().sum()
            total_days = len(df)
            completeness = (valid_days / total_days) * 100
            
            print(f"  ✅ {station_id}: {completeness:.1f}% data completeness")
            
        except Exception as e:
            print(f"  ❌ Error processing {station_id}: {e}")
            continue
    
    if station_climatology:
        combined_station_clim = pd.concat(station_climatology, ignore_index=True)
        print(f"\n✅ Part 3 Complete: Station climatology for {len(station_climatology)} stations")
        return combined_station_clim
    else:
        print("❌ No station data processed")
        return None

station_climatology = process_station_data()

In [None]:
# Part 3 visualization
if station_climatology is not None:
    
    print("📊 CREATING STATION VISUALIZATION")
    
    station_colors = {'AL0019': '#1f77b4', 'AL0035': '#ff7f0e', 'AL0059': '#2ca02c'}
    
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))
    
    # Monthly climatology
    ax1 = axes[0]
    for station_id in ['AL0019', 'AL0035', 'AL0059']:
        station_clim = station_climatology[station_climatology['Station'] == station_id]
        if not station_clim.empty:
            months = station_clim['Month'].values
            temps = station_clim['Mean_Temp'].values
            ax1.plot(months, temps, 'o-', color=station_colors[station_id], 
                    label=f'{station_id} ({stations[station_id]["name"]})', linewidth=2, markersize=6)
    
    ax1.set_title('Observed Monthly Temperature Climatology\n1990-2014 Average', fontweight='bold')
    ax1.set_xlabel('Month')
    ax1.set_ylabel('Temperature (°C)')
    ax1.set_xticks(range(1, 13))
    ax1.set_xticklabels(['J', 'F', 'M', 'A', 'M', 'J', 'J', 'A', 'S', 'O', 'N', 'D'])
    ax1.grid(True, alpha=0.3)
    ax1.legend(fontsize=9)
    
    # Data completeness
    ax2 = axes[1]
    stations_processed = []
    data_completeness = []
    
    for station_id in ['AL0019', 'AL0035', 'AL0059']:
        if station_id in station_climatology['Station'].values:
            stations_processed.append(station_id)
            station_data = station_climatology[station_climatology['Station'] == station_id]
            avg_count = station_data['Count'].mean()
            expected_days_per_month = 25 * 365.25 / 12
            completeness = min(100, (avg_count / expected_days_per_month) * 100)
            data_completeness.append(completeness)
    
    bars = ax2.bar(range(len(stations_processed)), data_completeness, 
                   color=[station_colors[sid] for sid in stations_processed], alpha=0.7, edgecolor='black')
    
    ax2.set_title('Station Data Quality\n1990-2014 Period', fontweight='bold')
    ax2.set_ylabel('Data Completeness (%)')
    ax2.set_xlabel('Station')
    ax2.set_xticks(range(len(stations_processed)))
    ax2.set_xticklabels(stations_processed)
    ax2.set_ylim(0, 105)
    ax2.grid(True, alpha=0.3, axis='y')
    
    for bar, pct in zip(bars, data_completeness):
        height = bar.get_height()
        ax2.text(bar.get_x() + bar.get_width()/2., height + 1,
                f'{pct:.1f}%', ha='center', va='bottom', fontweight='bold')
    
    fig.suptitle('Part 3: Station Data Processing Results', fontsize=14, fontweight='bold')
    plt.tight_layout()
    plt.show()
    
    print("\n🎯 PART 3 SUMMARY")
    print(f"✅ Stations processed: {len(station_climatology['Station'].unique())}")
    print(f"✅ Total monthly records: {len(station_climatology)}")
    print(f"✅ Study period: 1990-2014 (25 years)")
    
    print(f"\n📊 Station Data Quality:")
    for station_id in station_climatology['Station'].unique():
        station_data = station_climatology[station_climatology['Station'] == station_id]
        avg_temp = station_data['Mean_Temp'].mean()
        print(f"  {station_id}: avg temp {avg_temp:.1f}°C")
    
    print(f"\n➡️ Ready for Part 4: Model Validation")
else:
    print("❌ Part 3 failed - no station data available")

# 🎓 Part 4: Model Validation

In [None]:
def calculate_validation_metrics(model_temps, obs_temps):
    """Calculate the 5 validation metrics"""
    
    valid_mask = ~(np.isnan(model_temps) | np.isnan(obs_temps))
    model_valid = model_temps[valid_mask]
    obs_valid = obs_temps[valid_mask]
    
    if len(model_valid) < 3:
        return None
    
    metrics = {}
    
    # 1. Root Mean Square Error (RMSE)
    metrics['RMSE'] = np.sqrt(np.mean((model_valid - obs_valid) ** 2))
    
    # 2. Correlation coefficient (R)
    metrics['R'] = np.corrcoef(model_valid, obs_valid)[0, 1]
    
    # 3. Nash-Sutcliffe Efficiency (NSE)
    mean_obs = np.mean(obs_valid)
    metrics['NSE'] = 1 - (np.sum((model_valid - obs_valid) ** 2) / 
                         np.sum((obs_valid - mean_obs) ** 2))
    
    # 4. Percent Bias (PBIAS)
    metrics['PBIAS'] = 100 * (np.mean(model_valid - obs_valid) / np.mean(obs_valid))
    
    # 5. Mean Absolute Error (MAE)
    metrics['MAE'] = np.mean(np.abs(model_valid - obs_valid))
    
    return metrics

def perform_validation(model_clim, station_clim):
    """Perform complete model validation"""
    
    print(f"🎓 PERFORMING MODEL VALIDATION")
    print("-" * 30)
    
    if model_clim is None or station_clim is None:
        print("❌ Missing climatology data for validation")
        return None, None
    
    validation_results = []
    models = model_clim['Model'].unique()
    station_ids = station_clim['Station'].unique()
    
    for model in models:
        for station_id in station_ids:
            
            print(f"  🔄 Validating {model} at {station_id}")
            
            model_data = model_clim[(model_clim['Model'] == model) & 
                                  (model_clim['Station'] == station_id)]
            
            station_data = station_clim[station_clim['Station'] == station_id]
            
            if model_data.empty or station_data.empty:
                print(f"    ❌ No data available")
                continue
            
            model_temps = model_data['Mean_Temp'].values
            obs_temps = station_data['Mean_Temp'].values
            
            metrics = calculate_validation_metrics(model_temps, obs_temps)
            
            if metrics is None:
                print(f"    ❌ Insufficient data for validation")
                continue
            
            result = {
                'Model': model,
                'Station': station_id,
                'Station_Name': stations[station_id]['name'],
                **metrics
            }
            validation_results.append(result)
            
            print(f"    ✅ R={metrics['R']:.3f}, RMSE={metrics['RMSE']:.2f}°C, NSE={metrics['NSE']:.3f}")
    
    if not validation_results:
        print("❌ No validation results generated")
        return None, None
    
    results_df = pd.DataFrame(validation_results)
    
    # Calculate model ranking
    model_ranking = results_df.groupby('Model').agg({
        'RMSE': 'mean',
        'R': 'mean',
        'NSE': 'mean', 
        'PBIAS': lambda x: np.mean(np.abs(x)),
        'MAE': 'mean'
    }).round(3)
    
    model_ranking['RMSE_rank'] = model_ranking['RMSE'].rank(ascending=True)
    model_ranking['R_rank'] = model_ranking['R'].rank(ascending=False)
    model_ranking['NSE_rank'] = model_ranking['NSE'].rank(ascending=False)
    model_ranking['PBIAS_rank'] = model_ranking['PBIAS'].rank(ascending=True)
    model_ranking['MAE_rank'] = model_ranking['MAE'].rank(ascending=True)
    
    rank_cols = ['RMSE_rank', 'R_rank', 'NSE_rank', 'PBIAS_rank', 'MAE_rank']
    model_ranking['Average_Rank'] = model_ranking[rank_cols].mean(axis=1)
    model_ranking = model_ranking.sort_values('Average_Rank')
    
    print(f"\n✅ Part 4 Complete: {len(results_df)} validations performed")
    return results_df, model_ranking

validation_results, model_ranking = perform_validation(model_climatology, station_climatology)

In [None]:
# Part 4 comprehensive validation visualization
if validation_results is not None and model_ranking is not None:
    
    print("📊 CREATING VALIDATION VISUALIZATION")
    
    fig, axes = plt.subplots(2, 3, figsize=(15, 10))
    
    # 1. RMSE Heatmap
    ax1 = axes[0, 0]
    rmse_pivot = validation_results.pivot(index='Station', columns='Model', values='RMSE')
    sns.heatmap(rmse_pivot, annot=True, fmt='.2f', cmap='Reds', ax=ax1, cbar_kws={'label': 'RMSE (°C)'})
    ax1.set_title('RMSE\n(Lower is Better)', fontweight='bold')
    
    # 2. Correlation Heatmap
    ax2 = axes[0, 1]
    r_pivot = validation_results.pivot(index='Station', columns='Model', values='R')
    sns.heatmap(r_pivot, annot=True, fmt='.3f', cmap='Blues', ax=ax2, cbar_kws={'label': 'Correlation (R)'})
    ax2.set_title('Correlation\n(Higher is Better)', fontweight='bold')
    
    # 3. NSE Heatmap
    ax3 = axes[0, 2]
    nse_pivot = validation_results.pivot(index='Station', columns='Model', values='NSE')
    sns.heatmap(nse_pivot, annot=True, fmt='.3f', cmap='Greens', ax=ax3, cbar_kws={'label': 'NSE'})
    ax3.set_title('Nash-Sutcliffe Efficiency\n(Higher is Better)', fontweight='bold')
    
    # 4. PBIAS Heatmap
    ax4 = axes[1, 0]
    pbias_pivot = validation_results.pivot(index='Station', columns='Model', values='PBIAS')
    sns.heatmap(pbias_pivot, annot=True, fmt='.1f', cmap='RdBu_r', center=0, ax=ax4, cbar_kws={'label': 'PBIAS (%)'})
    ax4.set_title('Percent Bias\n(Closer to 0 is Better)', fontweight='bold')
    
    # 5. Model Performance Box Plot
    ax5 = axes[1, 1]
    models = list(model_colors.keys())
    rmse_data = [validation_results[validation_results['Model'] == model]['RMSE'].values for model in models]
    bp = ax5.boxplot(rmse_data, labels=models, patch_artist=True)
    for patch, model in zip(bp['boxes'], models):
        patch.set_facecolor(model_colors[model])
        patch.set_alpha(0.7)
    ax5.set_title('RMSE Distribution\nAcross All Stations', fontweight='bold')
    ax5.set_ylabel('RMSE (°C)')
    ax5.tick_params(axis='x', rotation=45)
    ax5.grid(True, alpha=0.3)
    
    # 6. Model Ranking
    ax6 = axes[1, 2]
    models_ranked = model_ranking.index
    ranks = model_ranking['Average_Rank'].values
    colors_list = [model_colors[model] for model in models_ranked]
    bars = ax6.barh(range(len(models_ranked)), ranks, color=colors_list, alpha=0.7)
    ax6.set_yticks(range(len(models_ranked)))
    ax6.set_yticklabels(models_ranked)
    ax6.set_xlabel('Average Rank')
    ax6.set_title('Model Performance Ranking\n(Lower is Better)', fontweight='bold')
    ax6.grid(True, alpha=0.3, axis='x')
    
    for i, (bar, rank) in enumerate(zip(bars, ranks)):
        ax6.text(rank + 0.05, bar.get_y() + bar.get_height()/2, 
                f'{rank:.2f}', ha='left', va='center', fontweight='bold')
    
    fig.suptitle('Part 4: Climate Model Validation Results\nAMMAN ZARQA Basin, Jordan (1990-2014)', 
                 fontsize=14, fontweight='bold')
    
    plt.tight_layout()
    plt.subplots_adjust(top=0.90)
    plt.show()
    
    print("✅ Validation visualization complete!")

# 🎯 Workshop Results Summary

In [None]:
# Display final workshop results
print("🎯 WORKSHOP RESULTS SUMMARY")
print("=" * 40)

if validation_results is not None and model_ranking is not None:
    print(f"✅ Validation completed: {len(validation_results)} model-station combinations")
    print(f"✅ Models evaluated: {len(validation_results['Model'].unique())}")
    print(f"✅ Stations evaluated: {len(validation_results['Station'].unique())}")
    
    print(f"\n🏆 MODEL RANKING (Best to Worst):")
    for i, (model, row) in enumerate(model_ranking.iterrows(), 1):
        rmse = row['RMSE']
        r = row['R']
        nse = row['NSE']
        print(f"  {i}. {model}: RMSE={rmse:.2f}°C, R={r:.3f}, NSE={nse:.3f}")
    
    best_model = model_ranking.index[0]
    worst_model = model_ranking.index[-1]
    
    print(f"\n📊 KEY FINDINGS:")
    print(f"  🥇 Best performing model: {best_model}")
    print(f"  📉 Worst performing model: {worst_model}")
    print(f"  📏 RMSE range: {validation_results['RMSE'].min():.2f} - {validation_results['RMSE'].max():.2f}°C")
    print(f"  📈 Correlation range: {validation_results['R'].min():.3f} - {validation_results['R'].max():.3f}")
    
    print(f"\n📋 DETAILED VALIDATION RESULTS:")
    display(validation_results)
    
    print(f"\n🏆 MODEL RANKING TABLE:")
    display(model_ranking[['RMSE', 'R', 'NSE', 'PBIAS', 'MAE', 'Average_Rank']])

print(f"\n🎉 WORKSHOP COMPLETE!")
print(f"🎓 You have successfully validated climate models for AMMAN ZARQA Basin!")
print(f"📚 Skills gained: NetCDF processing, climatology calculation, statistical validation")
print(f"🔗 Repository: https://github.com/MoawiahHussien/climate-model-validation-workshop")

# 🎉 Workshop Conclusion

## Congratulations!

You have successfully completed the **Climate Model Validation Workshop**!

### 🎯 What You Accomplished:

1. **📍 Data Extraction**: Downloaded and processed 6 climate model NetCDF files
2. **📊 Climatology Calculation**: Converted daily data to monthly climatology patterns
3. **🌡️ Station Processing**: Handled observed temperature data and quality control
4. **🎓 Model Validation**: Implemented 5 statistical validation metrics
5. **📈 Visualization**: Created publication-quality analysis plots
6. **🏆 Model Ranking**: Identified best-performing climate models for your region

### 💡 Key Skills Gained:

- **Python Climate Data Processing**: xarray, pandas, numpy
- **Statistical Validation**: RMSE, R, NSE, PBIAS, MAE metrics
- **Data Visualization**: matplotlib, seaborn plotting
- **Climate Science**: Model evaluation and selection
- **Research Workflow**: End-to-end validation pipeline

### 🌍 Climate Science Insights:

- Model grid resolution affects local representation
- Different models show varying performance at different locations
- Multiple validation metrics provide comprehensive assessment
- Seasonal patterns are generally well captured by all models

---
📧 **Questions?** Contact the workshop instructor  
🔗 **Repository:** https://github.com/MoawiahHussien/climate-model-validation-workshop