# ⚠️ DEPRECATED: This notebook has been replaced

**This notebook is deprecated and will be removed.**

## 🆕 Use the New Real Data Notebook Instead

**For working with real MACA v2 climate data, use:**
- **`cmip_real_data_example.ipynb`** - Works with real Google Earth Engine data

## Why This Notebook is Deprecated

This notebook was created to demonstrate the CMIP processing pipeline, but:
- It has complex import issues that prevent it from running
- The USGS THREDDS server was retired in April 2024
- The code structure is overly complex for basic climate data analysis

## Migration Path

**Instead of this notebook, use `cmip_real_data_example.ipynb` which:**
- ✅ Downloads real MACA v2 data from Google Earth Engine
- ✅ Clean, simple imports that work
- ✅ No synthetic data generation
- ✅ Proper authentication setup
- ✅ Real climate analysis workflows

## Quick Start with Real Data

1. Open `cmip_real_data_example.ipynb`
2. Install required packages: `pip install earthengine-api geemap`
3. Authenticate with Google Earth Engine: `earthengine authenticate`
4. Run the notebook to download and analyze real MACA v2 data

**All analysis should be done with real climate data only.**

## Step 1: Download Real Climate Data

Let's download actual MACA v2 climate projections for the Black Hills region.

In [None]:
# Set up data directory
data_dir = Path('./data/cmip_demo')
data_dir.mkdir(exist_ok=True)

# Initialize fetcher
fetcher = MACAfetcher(data_dir=str(data_dir))

print("Downloading real climate data for Black Hills...")
print("This may take a few minutes depending on your internet connection.")

# Download a small subset for demonstration
downloaded_files = download_black_hills_subset(
    variables=[Variable.TASMAX, Variable.PR],  # Temperature and precipitation
    models=[ClimateModel.GFDL_ESM2M],  # One model for faster demo
    scenarios=[Scenario.RCP45, Scenario.RCP85],  # Two scenarios
    year_start=2020,
    year_end=2025,  # Small time window for demo
    data_dir=str(data_dir)
)

print(f"Downloaded {len(downloaded_files)} files:")
for file in downloaded_files:
    print(f"  - {file.name}")

## Step 2: Load and Validate Data

Now let's load the downloaded data with proper calendar handling and run quality control.

In [None]:
# Initialize utilities
loader = CMIPLoader()
validator = ClimateValidator()

# Load the first dataset as an example
if downloaded_files:
    example_file = downloaded_files[0]
    print(f"Loading: {example_file.name}")
    
    # Load with proper calendar handling
    ds = loader.load_dataset(example_file)
    
    # Display dataset info
    print("\nDataset information:")
    print(ds)
    
    # Get time information
    time_info = loader.get_time_info(ds)
    print(f"\nTime info: {time_info}")
    
    # Run quality control
    print("\nRunning quality control...")
    validation_results = validator.validate_dataset(ds)
    
    for var_name, results in validation_results.items():
        print(f"\nVariable: {var_name}")
        print(f"  Quality: {results['overall_quality'].value}")
        print(f"  Statistics: {results['statistics']}")
        if results['issues']:
            print(f"  Issues: {results['issues']}")
else:
    print("No files were downloaded. Check your internet connection.")

## Step 3: Data Cleaning and Processing

Apply data cleaning and standardization.

In [None]:
if downloaded_files:
    # Initialize cleaner and resampler
    cleaner = ClimateCleaner()
    resampler = ClimateResampler()
    
    # Clean the data
    ds_clean = cleaner.standardize_units(ds)
    ds_clean = cleaner.remove_outliers(ds_clean, method='iqr', factor=3.0)
    
    print("Data cleaned!")
    
    # Show before/after for temperature data
    var_name = list(ds.data_vars)[0]
    
    print(f"\nUnits standardization for {var_name}:")
    print(f"  Original units: {ds[var_name].attrs.get('units', 'unknown')}")
    print(f"  Cleaned units: {ds_clean[var_name].attrs.get('units', 'unknown')}")
    
    # Calculate some climatology
    monthly_clim = resampler.calculate_climatology(ds_clean, period='month')
    print(f"\nCalculated monthly climatology with shape: {monthly_clim[var_name].shape}")

## Step 4: Create Visualizations

Generate maps and time series plots of the climate data.

In [None]:
if downloaded_files:
    # Initialize plotters
    map_plotter = ClimateMapPlotter()
    ts_plotter = ClimateTimeSeriesPlotter()
    
    # Create a map for the first time step
    fig = map_plotter.plot_variable(
        ds_clean, 
        var_name, 
        time_index=0,
        title=f"{var_name} - First Time Step",
        extent=[BLACK_HILLS_BBOX.west - 1, BLACK_HILLS_BBOX.east + 1, 
                BLACK_HILLS_BBOX.south - 1, BLACK_HILLS_BBOX.north + 1]
    )
    plt.show()
    
    # Create a time series for a point in the Black Hills
    center_lat = (BLACK_HILLS_BBOX.north + BLACK_HILLS_BBOX.south) / 2
    center_lon = (BLACK_HILLS_BBOX.east + BLACK_HILLS_BBOX.west) / 2
    
    fig = ts_plotter.plot_location_timeseries(
        ds_clean,
        var_name,
        lat=center_lat,
        lon=center_lon,
        show_trend=True,
        title=f"{var_name} Time Series - Black Hills Center"
    )
    plt.show()
    
    # Plot seasonal cycle
    fig = ts_plotter.plot_seasonal_cycle(
        ds_clean,
        var_name,
        lat=center_lat,
        lon=center_lon,
        title=f"{var_name} Seasonal Cycle - Black Hills"
    )
    plt.show()

## Step 5: Build Unified Datacube

Combine all downloaded datasets into a unified datacube.

In [None]:
if len(downloaded_files) > 1:  # Only if we have multiple files
    print("Building unified datacube from all downloaded files...")
    
    # Use the convenience function to build a datacube
    output_path = data_dir / 'black_hills_datacube.nc'
    
    try:
        datacube = build_black_hills_datacube(
            data_dir=data_dir,
            output_path=output_path,
            scenarios=['rcp45', 'rcp85'],
            models=['GFDL-ESM2M'],
            variables=['tasmax', 'pr'],
            time_range=('2020-01-01', '2025-12-31'),
            spatial_resolution=0.05  # 5km resolution for demo
        )
        
        print("\nDatacube built successfully!")
        print(f"Shape: {dict(datacube.dims)}")
        print(f"Variables: {list(datacube.data_vars)}")
        print(f"Saved to: {output_path}")
        
        # Show a quick summary
        print("\nDatacube summary:")
        print(datacube)
        
    except Exception as e:
        print(f"Error building datacube: {e}")
        print("This might happen if not all required files were downloaded.")
        
else:
    print("Need multiple files to demonstrate datacube building.")
    print("Try running the download cell again or check your internet connection.")

## Step 6: Compare Scenarios (if multiple scenarios downloaded)

Compare different climate scenarios.

In [None]:
# Check if we have files from multiple scenarios
rcp45_files = [f for f in downloaded_files if 'rcp45' in f.name]
rcp85_files = [f for f in downloaded_files if 'rcp85' in f.name]

if rcp45_files and rcp85_files:
    print("Comparing RCP4.5 and RCP8.5 scenarios...")
    
    # Load datasets for comparison
    ds_rcp45 = loader.load_dataset(rcp45_files[0])
    ds_rcp85 = loader.load_dataset(rcp85_files[0])
    
    # Clean both datasets
    ds_rcp45_clean = cleaner.standardize_units(ds_rcp45)
    ds_rcp85_clean = cleaner.standardize_units(ds_rcp85)
    
    # Create comparison datasets
    datasets = {
        'RCP4.5': ds_rcp45_clean,
        'RCP8.5': ds_rcp85_clean
    }
    
    # Plot scenario comparison
    center_lat = (BLACK_HILLS_BBOX.north + BLACK_HILLS_BBOX.south) / 2
    center_lon = (BLACK_HILLS_BBOX.east + BLACK_HILLS_BBOX.west) / 2
    
    var_name = list(ds_rcp45_clean.data_vars)[0]
    
    fig = ts_plotter.plot_scenario_comparison(
        datasets,
        var_name,
        lat=center_lat,
        lon=center_lon,
        title=f"{var_name} Scenario Comparison - Black Hills"
    )
    plt.show()
    
    # Plot difference map
    fig = map_plotter.plot_difference(
        ds_rcp45_clean,
        ds_rcp85_clean,
        var_name,
        time_index=0,
        title=f"{var_name} Difference (RCP8.5 - RCP4.5)",
        extent=[BLACK_HILLS_BBOX.west - 1, BLACK_HILLS_BBOX.east + 1, 
                BLACK_HILLS_BBOX.south - 1, BLACK_HILLS_BBOX.north + 1]
    )
    plt.show()
    
else:
    print("Need files from multiple scenarios for comparison.")
    print(f"Found RCP4.5 files: {len(rcp45_files)}")
    print(f"Found RCP8.5 files: {len(rcp85_files)}")

## Summary

This demo showed the complete CMIP data processing pipeline:

1. **Real Data Download**: Downloaded actual MACA v2 climate projections from USGS
2. **Robust Loading**: Handled non-standard calendars and coordinate systems
3. **Quality Control**: Validated data ranges and identified potential issues
4. **Data Cleaning**: Standardized units and removed statistical outliers
5. **Visualization**: Created maps and time series plots
6. **Datacube Creation**: Built unified datacubes for analysis
7. **Scenario Comparison**: Compared different climate scenarios

All processing used **real climate model data** - no synthetic data was used!

### Key Improvements Over Previous Implementation:

- ✅ **Proper Calendar Handling**: Uses cftime for non-standard calendars
- ✅ **Real Data Integration**: Downloads from USGS THREDDS server
- ✅ **Robust Error Handling**: Graceful fallbacks and clear error messages
- ✅ **Quality Control**: Comprehensive data validation
- ✅ **Modular Design**: Clean separation of concerns
- ✅ **Memory Efficient**: Chunked processing for large datasets
- ✅ **Standards Compliant**: CF-conventions and proper metadata

### Next Steps:

- Integrate with LANDFIRE vegetation data
- Add more climate variables and models
- Implement advanced analysis functions
- Create interactive dashboards