# CMIP Real Data Example

This notebook demonstrates how to download and analyze **real MACA v2 climate data** from Google Earth Engine.

## Prerequisites

1. **Google Earth Engine Account**: Sign up at https://earthengine.google.com/
2. **Authentication**: Run `earthengine authenticate` in terminal
3. **Required packages**: `pip install earthengine-api geemap xarray matplotlib`

## Data Source

- **Dataset**: MACA v2-METDATA (Multivariate Adaptive Constructed Analogs)
- **Location**: Google Earth Engine (`IDAHO_EPSCOR/MACAv2_METDATA`)
- **Resolution**: 4km spatial resolution
- **Variables**: Temperature, precipitation, humidity, wind, radiation
- **Models**: 20 global climate models
- **Scenarios**: Historical, RCP4.5, RCP8.5

**Note**: The original USGS THREDDS server was retired in April 2024. Google Earth Engine is now the primary source for MACA v2 data.

In [None]:
import sys
import os
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Add cmip module to path
datacube_root = Path('../').resolve()
cmip_root = datacube_root / 'cmip'
sys.path.append(str(cmip_root / 'download'))

# Standard libraries
import numpy as np
import pandas as pd
import xarray as xr
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import datetime

# Set up plotting
plt.style.use('default')
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 10

print("✅ Basic libraries loaded")

## Step 1: Check Google Earth Engine Setup

In [None]:
# Check if GEE is available and authenticated
try:
    import ee
    import geemap
    
    # Try to initialize
    ee.Initialize()
    print("✅ Google Earth Engine is ready!")
    
    # Test access to MACA dataset
    collection = ee.ImageCollection('IDAHO_EPSCOR/MACAv2_METDATA')
    size = collection.size().getInfo()
    print(f"✅ MACA dataset accessible with {size} images")
    
    GEE_READY = True
    
except ImportError:
    print("❌ Google Earth Engine not installed")
    print("Install with: pip install earthengine-api geemap")
    GEE_READY = False
    
except Exception as e:
    print(f"❌ Google Earth Engine setup issue: {e}")
    print("\nTo fix:")
    print("1. Sign up at https://earthengine.google.com/")
    print("2. Run: earthengine authenticate")
    print("3. Follow the authentication instructions")
    GEE_READY = False

## Step 2: Import Climate Data Classes

We'll use the standalone classes that define climate variables, models, and scenarios.

In [None]:
from sources_standalone import (
    Variable, ClimateModel, Scenario, BoundingBox, 
    BLACK_HILLS_BBOX, MACA_V2_SOURCE
)

print("✅ Climate data classes loaded")
print(f"Available variables: {[v.value for v in Variable]}")
print(f"Available models: {[m.value for m in ClimateModel][:5]}...")  # Show first 5
print(f"Available scenarios: {[s.value for s in Scenario]}")
print(f"Black Hills region: {BLACK_HILLS_BBOX.north}°N to {BLACK_HILLS_BBOX.south}°N, {BLACK_HILLS_BBOX.west}°W to {BLACK_HILLS_BBOX.east}°W")

## Step 3: Download Real MACA v2 Data

We'll download real climate data for the Black Hills region.

In [None]:
if GEE_READY:
    from gee_fetcher import GEEMACAfetcher
    
    # Create data directory
    data_dir = Path('../data/maca_real')
    data_dir.mkdir(parents=True, exist_ok=True)
    
    # Initialize fetcher
    fetcher = GEEMACAfetcher(data_dir=data_dir)
    
    # Download parameters
    variable = Variable.TASMAX  # Maximum temperature
    model = ClimateModel.GFDL_ESM2M  # GFDL Earth System Model
    scenario = Scenario.RCP45  # Moderate emissions scenario
    year_start = 2020
    year_end = 2022  # Small range for demonstration
    
    print(f"Downloading {variable.value} from {model.value} under {scenario.value}")
    print(f"Time range: {year_start}-{year_end}")
    print(f"Region: Black Hills ({BLACK_HILLS_BBOX.south}°N to {BLACK_HILLS_BBOX.north}°N)")
    print("\nThis may take a few minutes...")
    
    # Download the data
    downloaded_file = fetcher.download_subset(
        variable=variable,
        model=model,
        scenario=scenario,
        year_start=year_start,
        year_end=year_end,
        bbox=BLACK_HILLS_BBOX,
        scale=4000,  # 4km resolution
        force=False  # Don't re-download if file exists
    )
    
    if downloaded_file:
        print(f"✅ Successfully downloaded: {downloaded_file.name}")
    else:
        print("❌ Download failed")
        
else:
    print("❌ Cannot download data - Google Earth Engine not ready")
    print("Please set up GEE authentication first")
    downloaded_file = None

## Step 4: Load and Examine Real Climate Data

Let's load the downloaded data and examine its structure.

In [None]:
if downloaded_file and downloaded_file.exists():
    # Load the NetCDF file
    print(f"Loading data from: {downloaded_file}")
    ds = xr.open_dataset(downloaded_file)
    
    print("\n📊 Dataset Overview:")
    print(ds)
    
    # Get the main variable
    var_name = list(ds.data_vars)[0]
    climate_var = ds[var_name]
    
    print(f"\n🌡️ Variable: {var_name}")
    print(f"Units: {climate_var.attrs.get('units', 'unknown')}")
    print(f"Long name: {climate_var.attrs.get('long_name', 'unknown')}")
    print(f"Shape: {climate_var.shape}")
    
    # Convert from Kelvin to Celsius if needed
    if climate_var.attrs.get('units') == 'K':
        climate_var = climate_var - 273.15
        climate_var.attrs['units'] = 'degC'
        print("✅ Converted from Kelvin to Celsius")
    
    # Basic statistics
    print(f"\n📈 Statistics:")
    print(f"   Min: {float(climate_var.min()):.2f} {climate_var.attrs.get('units', '')}")
    print(f"   Max: {float(climate_var.max()):.2f} {climate_var.attrs.get('units', '')}")
    print(f"   Mean: {float(climate_var.mean()):.2f} {climate_var.attrs.get('units', '')}")
    print(f"   Std: {float(climate_var.std()):.2f} {climate_var.attrs.get('units', '')}")
    
    # Time and space info
    if 'time' in ds.dims:
        print(f"\n🕐 Time Range:")
        print(f"   Start: {pd.to_datetime(ds.time.values[0]).strftime('%Y-%m-%d')}")
        print(f"   End: {pd.to_datetime(ds.time.values[-1]).strftime('%Y-%m-%d')}")
        print(f"   Steps: {len(ds.time)}")
    
    print(f"\n🌍 Spatial Coverage:")
    print(f"   Latitude: {float(ds.lat.min()):.3f}° to {float(ds.lat.max()):.3f}°")
    print(f"   Longitude: {float(ds.lon.min()):.3f}° to {float(ds.lon.max()):.3f}°")
    print(f"   Grid: {len(ds.lat)} × {len(ds.lon)} points")
    
    DATA_LOADED = True
    
else:
    print("❌ No data file available")
    print("Please run the download step first")
    DATA_LOADED = False
    ds = None
    climate_var = None

## Step 5: Create Spatial Visualization

Let's create a map showing the spatial distribution of the climate variable.

In [None]:
if DATA_LOADED:
    # Create spatial map
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))
    
    # Plot 1: First time step
    if 'time' in climate_var.dims:
        plot_data = climate_var.isel(time=0)
        time_str = pd.to_datetime(ds.time.values[0]).strftime('%Y-%m-%d')
    else:
        plot_data = climate_var
        time_str = "Single time step"
    
    # Create the map
    im1 = plot_data.plot(
        ax=ax1, 
        cmap='RdYlBu_r',
        add_colorbar=False,
        robust=True  # Use 2nd-98th percentile for color scale
    )
    
    ax1.set_title(f"Real MACA v2 Data: {var_name}\n{time_str}")
    ax1.set_xlabel('Longitude')
    ax1.set_ylabel('Latitude')
    ax1.grid(True, alpha=0.3)
    
    # Add colorbar
    cbar1 = plt.colorbar(im1, ax=ax1, shrink=0.8)
    cbar1.set_label(f"{var_name} ({climate_var.attrs.get('units', '')})")
    
    # Plot 2: Mean over time (if multiple time steps)
    if 'time' in climate_var.dims and len(ds.time) > 1:
        mean_data = climate_var.mean(dim='time')
        
        im2 = mean_data.plot(
            ax=ax2,
            cmap='RdYlBu_r',
            add_colorbar=False,
            robust=True
        )
        
        ax2.set_title(f"Time-averaged {var_name}\n{year_start}-{year_end}")
        ax2.set_xlabel('Longitude')
        ax2.set_ylabel('Latitude')
        ax2.grid(True, alpha=0.3)
        
        cbar2 = plt.colorbar(im2, ax=ax2, shrink=0.8)
        cbar2.set_label(f"{var_name} ({climate_var.attrs.get('units', '')})")
        
    else:
        ax2.text(0.5, 0.5, 'Single time step\nNo time averaging available', 
                ha='center', va='center', transform=ax2.transAxes, fontsize=12)
        ax2.set_title("Time Average Not Available")
    
    plt.tight_layout()
    plt.show()
    
    print("✅ Spatial visualization created with real MACA v2 data")
    
else:
    print("❌ Cannot create visualization - no data loaded")

## Step 6: Create Time Series Analysis

Let's analyze the temporal patterns in the data.

In [None]:
if DATA_LOADED and 'time' in climate_var.dims and len(ds.time) > 1:
    # Extract time series for center point
    center_lat_idx = len(ds.lat) // 2
    center_lon_idx = len(ds.lon) // 2
    
    center_lat = float(ds.lat.isel(lat=center_lat_idx))
    center_lon = float(ds.lon.isel(lon=center_lon_idx))
    
    time_series = climate_var.isel(lat=center_lat_idx, lon=center_lon_idx)
    
    # Convert time to pandas datetime
    time_vals = pd.to_datetime(ds.time.values)
    
    # Create time series plots
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 10))
    
    # Plot 1: Full time series
    ax1.plot(time_vals, time_series.values, 'b-', linewidth=1.5, alpha=0.8)
    ax1.set_title(f"Real MACA v2 Time Series: {var_name}\nLocation: {center_lat:.3f}°N, {center_lon:.3f}°W")
    ax1.set_xlabel('Time')
    ax1.set_ylabel(f"{var_name} ({climate_var.attrs.get('units', '')})")
    ax1.grid(True, alpha=0.3)
    
    # Format x-axis dates
    ax1.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
    ax1.xaxis.set_major_locator(mdates.MonthLocator(interval=6))
    plt.setp(ax1.xaxis.get_majorticklabels(), rotation=45)
    
    # Plot 2: Monthly climatology (if we have enough data)
    if len(time_vals) >= 12:
        # Calculate monthly means
        monthly_data = []
        months = []
        
        for month in range(1, 13):
            month_mask = time_vals.month == month
            if month_mask.any():
                monthly_mean = time_series.values[month_mask].mean()
                monthly_data.append(monthly_mean)
                months.append(month)
        
        if monthly_data:
            month_names = [pd.to_datetime(f"2020-{m:02d}-01").strftime('%b') for m in months]
            
            ax2.plot(months, monthly_data, 'ro-', linewidth=2, markersize=6)
            ax2.set_title(f"Monthly Climatology: {var_name}")
            ax2.set_xlabel('Month')
            ax2.set_ylabel(f"{var_name} ({climate_var.attrs.get('units', '')})")
            ax2.set_xticks(months)
            ax2.set_xticklabels(month_names)
            ax2.grid(True, alpha=0.3)
        else:
            ax2.text(0.5, 0.5, 'Insufficient data for\nmonthly climatology', 
                    ha='center', va='center', transform=ax2.transAxes, fontsize=12)
    else:
        ax2.text(0.5, 0.5, 'Insufficient data for\nmonthly climatology', 
                ha='center', va='center', transform=ax2.transAxes, fontsize=12)
    
    plt.tight_layout()
    plt.show()
    
    # Print some statistics
    print(f"\n📊 Time Series Statistics:")
    print(f"   Location: {center_lat:.3f}°N, {center_lon:.3f}°W")
    print(f"   Time range: {time_vals[0].strftime('%Y-%m-%d')} to {time_vals[-1].strftime('%Y-%m-%d')}")
    print(f"   Number of time steps: {len(time_vals)}")
    print(f"   Mean: {float(time_series.mean()):.2f} {climate_var.attrs.get('units', '')}")
    print(f"   Std: {float(time_series.std()):.2f} {climate_var.attrs.get('units', '')}")
    print(f"   Min: {float(time_series.min()):.2f} {climate_var.attrs.get('units', '')}")
    print(f"   Max: {float(time_series.max()):.2f} {climate_var.attrs.get('units', '')}")
    
    print("\n✅ Time series analysis completed with real MACA v2 data")
    
else:
    print("❌ Cannot create time series - insufficient temporal data")
    if DATA_LOADED:
        print("   Data loaded but has insufficient time dimension")
    else:
        print("   No data loaded")

## Step 7: Summary and Next Steps

**This notebook successfully demonstrates:**

✅ **Real MACA v2 Data Access**: Downloaded actual climate model data from Google Earth Engine  
✅ **Spatial Analysis**: Created maps showing climate patterns across the Black Hills region  
✅ **Temporal Analysis**: Analyzed time series and seasonal patterns  
✅ **Professional Workflow**: Used proper climate data handling and visualization techniques  

**Key Achievements:**
- No synthetic data used - all analysis based on real climate projections
- Proper data source (Google Earth Engine) after USGS THREDDS retirement
- Clean, reproducible analysis workflow
- Clear documentation for users to access real data

**Next Steps for Users:**

1. **Expand Variables**: Download precipitation, humidity, wind speed data
2. **Multiple Models**: Compare different climate models (GFDL, MIROC, CCSM4)
3. **Scenario Analysis**: Compare RCP4.5 vs RCP8.5 scenarios
4. **Historical Data**: Include historical period (1950-2005) for baseline
5. **Advanced Analysis**: Calculate climate indices, extreme events, trends
6. **Integration**: Combine with LANDFIRE vegetation data

**Accessing More Data:**

To download additional real MACA v2 data:
- Modify the `variable`, `model`, `scenario` parameters above
- Extend the `year_start` and `year_end` range
- Change the `bbox` to analyze different regions
- All data comes from the real MACA v2 collection on Google Earth Engine

**Authentication Reminder:**
- One-time setup: `earthengine authenticate`
- Free Google account required
- All data is real climate model output, not synthetic or demo data