# 🎯 Part 1: Station Data Extraction from Climate Models

## Climate Model Validation Workshop
**Study Area:** AMMAN ZARQA Basin, Jordan  
**Models:** 6 RICCAR Climate Models (SSP4.5)  
**Stations:** AL0019, AL0035, AL0059  
**Period:** 1990-2014

---

## 📚 Learning Objectives
By the end of this notebook, you will:
- ✅ Extract climate model data at specific station locations
- ✅ Understand spatial resolution and nearest neighbor selection
- ✅ Assess data quality and completeness
- ✅ Create visualizations showing model temperature patterns

---

## ⚙️ Setup: Install Packages and Download Data

In [None]:
# Install required packages
!pip install xarray netcdf4 requests tqdm -q

print("✅ Packages installed successfully!")

In [None]:
import os
import urllib.request
from tqdm import tqdm
import time

def download_workshop_data():
    """Download NetCDF files from GitHub repository"""
    
    print("🎯 DOWNLOADING WORKSHOP DATA")
    print("=" * 40)
    
    # GitHub repository base URL
    base_url = "https://raw.githubusercontent.com/MoawiahHussien/climate-model-validation-workshop/main/"
    
    # Create directories
    os.makedirs("input_data/models_netcdf", exist_ok=True)
    os.makedirs("workshop_output", exist_ok=True)
    
    # NetCDF files to download
    nc_files = [
        "arcgis_merged_Tmax_CMCC-CM2-SR5.nc",
        "arcgis_merged_Tmax_CNRM-ESM2-1.nc", 
        "arcgis_merged_Tmax_EC-Earth3-Veg.nc",
        "arcgis_merged_Tmax_IPSL-CM6A-LR.nc",
        "arcgis_merged_Tmax_MPI-ESM1-2-LR.nc",
        "arcgis_merged_Tmax_NorESM2-MM.nc"
    ]
    
    # Download each file
    for nc_file in nc_files:
        file_url = base_url + "Input%20Files/Models.Nc.ArcGIS.Compatible/" + nc_file
        local_path = f"input_data/models_netcdf/{nc_file}"
        
        if os.path.exists(local_path):
            print(f"✅ Using cached: {nc_file}")
            continue
        
        try:
            print(f"📥 Downloading: {nc_file}")
            urllib.request.urlretrieve(file_url, local_path)
            file_size = os.path.getsize(local_path) / (1024*1024)
            print(f"   ✅ Complete: {file_size:.1f} MB")
        except Exception as e:
            print(f"   ❌ Failed: {e}")
    
    print(f"\n🎉 Data download complete!")

# Download the data
download_workshop_data()

## 📚 Import Required Libraries

In [None]:
import xarray as xr
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
import glob

print("📚 Libraries imported successfully!")
print(f"📍 Current working directory: {os.getcwd()}")

## 📍 Define Station Locations and Model Information

In [None]:
# Station coordinates for AMMAN ZARQA basin
stations = {
    'AL0019': {'lat': 31.95, 'lon': 35.93, 'name': 'Amman Airport'},
    'AL0035': {'lat': 32.01, 'lon': 35.85, 'name': 'Zarqa Station'}, 
    'AL0059': {'lat': 31.97, 'lon': 36.12, 'name': 'Russeifa Station'}
}

# Model files (6 RICCAR models)
model_files = {
    'CMCC': 'arcgis_merged_Tmax_CMCC-CM2-SR5.nc',
    'CNRM': 'arcgis_merged_Tmax_CNRM-ESM2-1.nc',
    'EC-Earth3': 'arcgis_merged_Tmax_EC-Earth3-Veg.nc',
    'IPSL': 'arcgis_merged_Tmax_IPSL-CM6A-LR.nc',
    'MPI': 'arcgis_merged_Tmax_MPI-ESM1-2-LR.nc',
    'NorESM2': 'arcgis_merged_Tmax_NorESM2-MM.nc'
}

print(f"🎯 WORKSHOP SETUP COMPLETE")
print(f"📍 Target Stations: {len(stations)} stations")
print(f"🌡️ Models to Process: {len(model_files)} models")

for station_id, info in stations.items():
    print(f"  {station_id}: {info['name']} ({info['lat']:.2f}°N, {info['lon']:.2f}°E)")

## 🔄 Extract Temperature Data at Station Locations

In [None]:
def extract_station_data():
    """Extract temperature data at station locations from all models"""
    
    print(f"📍 EXTRACTING DATA AT STATION LOCATIONS")
    print("-" * 45)
    
    # Storage for results
    model_station_data = {}
    extraction_log = []
    
    # Process each model
    for model_name, filename in model_files.items():
        print(f"\n🔄 Processing: {model_name}")
        
        file_path = f"input_data/models_netcdf/{filename}"
        
        # Check if file exists
        if not os.path.exists(file_path):
            print(f"    ❌ File not found: {filename}")
            continue
        
        # Open NetCDF file
        ds = xr.open_dataset(file_path)
        
        # Initialize storage for this model
        model_station_data[model_name] = {}
        
        # Extract data for each station
        for station_id, station_info in stations.items():
            
            # Extract data at station location (nearest grid point)
            station_data = ds.tasmaxAdjust.sel(
                lat=station_info['lat'], 
                lon=station_info['lon'], 
                method='nearest'
            )
            
            # Get grid coordinates
            grid_lat = float(station_data.lat.values)
            grid_lon = float(station_data.lon.values)
            
            # Calculate distance
            distance_km = np.sqrt((grid_lat - station_info['lat'])**2 + 
                                (grid_lon - station_info['lon'])**2) * 111
            
            # Store extracted data
            model_station_data[model_name][station_id] = station_data.values
            
            # Log extraction info
            extraction_log.append({
                'Model': model_name,
                'Station': station_id,
                'Station_Name': station_info['name'],
                'Target_Lat': station_info['lat'],
                'Target_Lon': station_info['lon'],
                'Grid_Lat': round(grid_lat, 3),
                'Grid_Lon': round(grid_lon, 3),
                'Distance_km': round(distance_km, 2),
                'Days_Extracted': len(station_data),
                'Valid_Days': int(np.sum(~np.isnan(station_data.values)))
            })
            
            print(f"    ✅ {station_id}: {len(station_data)} days, distance: {distance_km:.2f} km")
        
        ds.close()
    
    return model_station_data, pd.DataFrame(extraction_log)

# Execute extraction
extracted_data, extraction_summary = extract_station_data()

print(f"\n🎉 EXTRACTION COMPLETE!")
print(f"✅ Successfully extracted data for {len(extracted_data)} models")

## 💾 Save Extraction Results

In [None]:
# Create output directory
output_dir = "workshop_output/Part1_Station_Extraction"
os.makedirs(output_dir, exist_ok=True)

# Save extraction log
log_file = f"{output_dir}/extraction_log.xlsx"
extraction_summary.to_excel(log_file, index=False)
print(f"✅ Extraction log saved: extraction_log.xlsx")

# Save daily temperature data for each model-station combination
files_saved = 0
for model_name in extracted_data.keys():
    for station_id in extracted_data[model_name].keys():
        
        # Create time index for the data
        time_index = pd.date_range('1990-01-01', periods=len(extracted_data[model_name][station_id]), freq='D')
        
        # Create DataFrame
        df = pd.DataFrame({
            'Date': time_index,
            'Temperature_C': extracted_data[model_name][station_id],
            'Model': model_name,
            'Station': station_id,
            'Station_Name': stations[station_id]['name']
        })
        
        # Save individual file
        output_file = f"{output_dir}/{model_name}_{station_id}_daily_temps.csv"
        df.to_csv(output_file, index=False)
        files_saved += 1

print(f"✅ Daily temperature files saved: {files_saved} files")
print(f"📁 All results saved to: {output_dir}")

## 📊 Create Visualization of Results

In [None]:
# Create temperature summary for visualization
summary_data = []
for model_name in extracted_data.keys():
    for station_id in extracted_data[model_name].keys():
        temp_data = extracted_data[model_name][station_id]
        summary_data.append({
            'Model': model_name,
            'Station': station_id,
            'Station_Name': stations[station_id]['name'],
            'Mean_Temp': round(np.nanmean(temp_data), 2),
            'Min_Temp': round(np.nanmin(temp_data), 2),
            'Max_Temp': round(np.nanmax(temp_data), 2)
        })

summary_df = pd.DataFrame(summary_data)

# Model colors
model_colors = {
    'CMCC': '#1f77b4', 'CNRM': '#ff7f0e', 'EC-Earth3': '#2ca02c',
    'IPSL': '#d62728', 'MPI': '#9467bd', 'NorESM2': '#8c564b'
}

# Create visualization
fig, axes = plt.subplots(1, 3, figsize=(16, 6))

# 1. Distance to grid points
ax1 = axes[0]
extraction_summary.boxplot(column='Distance_km', by='Station', ax=ax1)
ax1.set_title('Distance to Nearest Grid Point by Station\n(Spatial Resolution Check)', 
              fontsize=11, fontweight='bold', pad=15)
ax1.set_ylabel('Distance (km)', fontsize=10)
ax1.set_xlabel('Station ID', fontsize=10)
ax1.grid(True, alpha=0.3)

# 2. Temperature ranges by model
ax2 = axes[1]
models = list(model_colors.keys())
model_temps = [summary_df[summary_df['Model'] == model]['Mean_Temp'].values for model in models]

bp = ax2.boxplot(model_temps, labels=models, patch_artist=True, widths=0.6)
for patch, model in zip(bp['boxes'], models):
    patch.set_facecolor(model_colors[model])
    patch.set_alpha(0.7)

ax2.set_title('Mean Temperature by Model\n1990-2014 Daily Average', 
              fontsize=11, fontweight='bold', pad=15)
ax2.set_ylabel('Temperature (°C)', fontsize=10)
ax2.set_xlabel('Climate Model', fontsize=10)
ax2.tick_params(axis='x', rotation=45, labelsize=9)
ax2.grid(True, alpha=0.3)

# 3. Temperature by station (with model colors)
ax3 = axes[2]
stations_order = ['AL0019', 'AL0035', 'AL0059']

for model in models:
    model_data = summary_df[summary_df['Model'] == model]
    temps = []
    x_positions = []
    
    for i, station_id in enumerate(stations_order):
        station_temp = model_data[model_data['Station'] == station_id]['Mean_Temp']
        if not station_temp.empty:
            temps.append(station_temp.values[0])
            x_positions.append(i)
    
    if temps:
        ax3.scatter(x_positions, temps, label=model, color=model_colors[model], 
                   alpha=0.8, s=80, edgecolors='black', linewidth=0.5)

ax3.set_title('Mean Temperature by Station\n1990-2014 Daily Average (All Models)', 
              fontsize=11, fontweight='bold', pad=15)
ax3.set_ylabel('Temperature (°C)', fontsize=10)
ax3.set_xlabel('Station Location', fontsize=10)
ax3.set_xticks(range(len(stations_order)))
ax3.set_xticklabels([f"{sid}\n({stations[sid]['name']})" for sid in stations_order], fontsize=8)
ax3.legend(bbox_to_anchor=(1.05, 1), loc='upper left', fontsize=9)
ax3.grid(True, alpha=0.3)

fig.suptitle('Climate Model Data Extraction Results\nPeriod: 1990-2014 (25 years, daily data)', 
             fontsize=13, fontweight='bold', y=0.95)

plt.tight_layout()
plt.subplots_adjust(top=0.82, right=0.85, bottom=0.15, wspace=0.35)
plt.show()

print("📊 Visualization complete!")

## 🎯 Part 1 Summary and Results

In [None]:
# Display final summary
print("🎯 PART 1 SUMMARY")
print("=" * 20)
print(f"✅ Models processed: {len(extracted_data)}")
print(f"✅ Stations extracted: {len(stations)}")
print(f"✅ Total extractions: {len(extraction_summary)}")

# Show data quality
avg_distance = extraction_summary['Distance_km'].mean()
avg_completeness = ((extraction_summary['Valid_Days'].sum() / extraction_summary['Days_Extracted'].sum()) * 100)

print(f"\n📊 Data Quality:")
print(f"  Average distance to grid: {avg_distance:.2f} km")
print(f"  Average data completeness: {avg_completeness:.1f}%")

# Show extraction summary table
print(f"\n📋 Extraction Summary:")
display(extraction_summary)

print(f"\n🎓 Key Learning Points:")
print(f"  • Climate models use grid points - we find the nearest one to each station")
print(f"  • Spatial resolution matters - closer grid points give better representation")
print(f"  • Each model has slightly different temperature predictions")
print(f"  • Data quality is excellent - over 99% complete for our study period")

print(f"\n➡️ Ready for Part 2: Monthly Climatology Calculation")
print(f"📁 Find your results in: workshop_output/Part1_Station_Extraction/")

## 🚀 Next Steps

**Congratulations!** You've successfully completed Part 1 of the Climate Model Validation Workshop.

### What you accomplished:
- ✅ Downloaded 6 climate model NetCDF files
- ✅ Extracted daily temperature data at 3 station locations  
- ✅ Assessed spatial resolution and data quality
- ✅ Created visualizations showing model differences
- ✅ Saved organized results for further analysis

### Ready for Part 2:
**Part 2: Monthly Climatology Calculation**
- Calculate monthly averages from daily data
- Create seasonal temperature cycles
- Compare climatological patterns between models

---
📧 **Questions?** Contact the workshop instructor  
🔗 **Repository:** https://github.com/MoawiahHussien/climate-model-validation-workshop