# Drought Characteristics Analysis Using Run Theory

This notebook demonstrates how to analyze drought events using run theory (Yevjevich, 1967). We'll explore **three modes** of drought analysis:

1. **Event-Based Analysis** - Identify and characterize complete drought events
2. **Time-Series Monitoring** - Track drought evolution month-by-month
3. **Period Statistics** - Gridded statistics for decision-making

**Learning Objectives:**
1. Understand run theory and drought event identification
2. Calculate drought characteristics (duration, magnitude, intensity, peak)
3. Use three analysis modes for different purposes
4. Understand dual magnitude (cumulative vs instantaneous)
5. Answer decision-maker questions with spatial statistics
6. Visualize drought events and evolution

## Run Theory Framework

Run theory provides a systematic approach to analyzing climate extremes:

![Run Theory](../images/runtheory.png)

**Key Concepts:**
- **Events** are identified when SPI/SPEI crosses a threshold
- **Duration (D)**: How long the event lasts
- **Magnitude (M)**: Total accumulated deficit/surplus
- **Intensity (I)**: Average severity = M/D
- **Inter-arrival Time (T)**: Time between events

**This notebook works for BOTH extremes:**
- üåµ **Drought (dry)**: Use negative thresholds (e.g., -1.2)
- üåä **Flooding (wet)**: Use positive thresholds (e.g., +1.2)

The analysis code is identical‚Äîonly the threshold sign changes!

For detailed explanation, see [docs/user-guide/runtheory.md](../docs/user-guide/runtheory.md)


## 1. Setup and Imports

In [2]:
# Add src directory to Python path
import sys
sys.path.insert(0, '../src')

# Core libraries
import numpy as np
import xarray as xr
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
import os

# Import drought analysis functions
from runtheory import (
    identify_events,
    calculate_timeseries,
    calculate_period_statistics,
    calculate_annual_statistics,
    compare_periods,
    summarize_events
)

# Import visualization functions
from visualization import (
    plot_index,
    plot_events,
    plot_event_timeline,
    plot_spatial_stats,
    generate_location_filename
)

# Plotting settings
plt.rcParams['figure.figsize'] = (14, 6)
plt.rcParams['font.size'] = 10

# Create output directories
os.makedirs('../output/csv', exist_ok=True)
os.makedirs('../output/plots/single', exist_ok=True)
os.makedirs('../output/plots/spatial', exist_ok=True)

print("‚úì All imports successful!")
print("‚úì Output directories created")

ModuleNotFoundError: No module named 'numpy'

## 2. Load SPI Data

We'll use SPI-12 calculated in notebook 01. If you haven't run that notebook yet, you'll need to calculate SPI first.

In [None]:
# Load SPI-12 from previous notebook
# Adjust path if your SPI file is elsewhere
spi_file = '../output/netcdf/spi_12.nc'

if os.path.exists(spi_file):
    spi = xr.open_dataset(spi_file)['spi_gamma_12_month']
    print("‚úì SPI-12 loaded successfully!")
else:
    print("‚ùå SPI file not found. Please run notebook 01_calculate_spi.ipynb first.")
    print(f"   Expected location: {spi_file}")
    # Create dummy data for demonstration
    print("   Creating synthetic SPI data for demonstration...")
    from indices import spi as calculate_spi
    # Load or create precipitation data
    # (Add your data loading code here)
    raise FileNotFoundError("Please calculate SPI first using notebook 01")

print(f"  Shape: {spi.shape}")
print(f"  Dimensions: {spi.dims}")
print(f"  Time range: {spi.time[0].values} to {spi.time[-1].values}")
print(f"  Spatial extent: {len(spi.lat)} x {len(spi.lon)} grid")

## 3. Select Sample Location

For event-based and time-series analysis, we'll extract a single location.

In [None]:
# Select a sample location (adjust indices for your data)
lat_idx = len(spi.lat) // 2  # Middle of grid
lon_idx = len(spi.lon) // 2

# Extract location
spi_loc = spi.isel(lat=lat_idx, lon=lon_idx)
lat_val = float(spi.lat.values[lat_idx])
lon_val = float(spi.lon.values[lon_idx])

print(f"Selected location: {lat_val:.2f}¬∞N, {lon_val:.2f}¬∞E")
print(f"SPI time series length: {len(spi_loc)} months")
print(f"Mean SPI: {spi_loc.mean().values:.3f}")
print(f"Std SPI: {spi_loc.std().values:.3f}")

# Quick visualization
fig, ax = plt.subplots(figsize=(14, 5))
spi_loc.plot(ax=ax, linewidth=0.8, color='steelblue')
ax.axhline(y=0, color='k', linestyle='-', linewidth=0.8, alpha=0.3)
ax.axhline(y=-1.2, color='red', linestyle='--', linewidth=0.8, alpha=0.5, label='Threshold -1.2')
ax.fill_between(spi_loc.time, -5, 0, alpha=0.1, color='red', label='Dry')
ax.fill_between(spi_loc.time, 0, 5, alpha=0.1, color='blue', label='Wet')
ax.set_ylim(-3, 3)
ax.set_title(f'SPI-12 at {lat_val:.2f}¬∞N, {lon_val:.2f}¬∞E')
ax.set_ylabel('SPI-12')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 4. Mode 1: Event-Based Analysis

Identify complete drought events using run theory. Each event is characterized by:
- **Duration** (D): Length in months
- **Magnitude** (M): Cumulative deficit
- **Intensity** (I): M / D (average severity)
- **Peak** (P): Minimum SPI value
- **Inter-arrival**: Time between events

In [None]:
# Identify drought events
threshold = -1.2  # Moderate drought threshold
min_duration = 3  # Minimum 3 months to be considered an event

print(f"Identifying drought events with:")
print(f"  Threshold: {threshold}")
print(f"  Minimum duration: {min_duration} months")
print()

events = identify_events(spi_loc, threshold=threshold, min_duration=min_duration)

print(f"‚úì Found {len(events)} drought events")
print()
print("Event Summary:")
print(events.head(10))

### Event Statistics

In [None]:
# Summarize events
summary = summarize_events(events)

print("Drought Event Statistics:")
print("=" * 50)
print(f"Total events: {summary['num_events']}")
print(f"Mean duration: {summary['mean_duration']:.1f} months")
print(f"Max duration: {summary['max_duration']} months")
print(f"Mean magnitude: {summary['mean_magnitude']:.2f}")
print(f"Max magnitude: {summary['max_magnitude']:.2f}")
print(f"Most severe peak: {summary['most_severe_peak']:.2f}")
print(f"Mean inter-arrival: {summary['mean_interarrival']:.1f} months")

# Basic statistics
print("\nEvent Characteristics Distribution:")
print(events[['duration', 'magnitude', 'intensity', 'peak']].describe())

### Visualize Events

In [None]:
# Plot events timeline
fig = plot_events(spi_loc, events, threshold=threshold,
                          title=f'Drought Events at {lat_val:.2f}¬∞N, {lon_val:.2f}¬∞E')

# Save
filename = generate_location_filename('drought_events', lat_val, lon_val, 'png')
plt.savefig(f'../output/plots/single/{filename}', dpi=300, bbox_inches='tight')
print(f"‚úì Saved plot: {filename}")

plt.show()

### Save Events to CSV

In [None]:
# Save events DataFrame
csv_filename = generate_location_filename('drought_events', lat_val, lon_val, 'csv')
events.to_csv(f'../output/csv/{csv_filename}', index=False)
print(f"‚úì Events saved to: ../output/csv/{csv_filename}")
print(f"  Total events: {len(events)}")
print(f"  Columns: {list(events.columns)}")

## 5. Mode 2: Time-Series Monitoring

Calculate month-by-month drought characteristics for real-time monitoring. This provides varying intensity tracking, useful for operational systems.

In [None]:
# Calculate drought time series
print("Calculating drought time series...")
ts = calculate_timeseries(spi_loc, threshold=threshold)

print(f"‚úì Time series calculated")
print(f"  Length: {len(ts)} months")
print(f"  Columns: {list(ts.columns)}")
print()
print("Sample data:")
print(ts.head(10))

### Current Drought Status

In [None]:
# Check current status (last month in data)
current = ts.iloc[-1]

print("Current Drought Status:")
print("=" * 50)
print(f"Date: {current['time']}")
print(f"SPI-12: {current['index_value']:.2f}")

if current['is_event']:
    print(f"\nüî¥ IN DROUGHT")
    print(f"  Event ID: {current['event_id']}")
    print(f"  Duration: {current['duration']} months")
    print(f"  Cumulative magnitude: {current['magnitude_cumulative']:.2f}")
    print(f"  Current severity: {current['magnitude_instantaneous']:.2f}")
    print(f"  Intensity: {current['intensity']:.2f}")
else:
    print(f"\nüü¢ NOT IN DROUGHT")
    print(f"  Normal conditions")

### Drought Evolution Analysis

Check if recent drought is worsening or easing based on instantaneous magnitude trend.

In [None]:
# Analyze recent trend (last 3 months)
recent = ts.tail(3)

if recent['is_event'].any():
    recent_inst = recent[recent['is_event']]['magnitude_instantaneous']
    
    if len(recent_inst) >= 2:
        if recent_inst.is_monotonic_decreasing:
            print("üìâ DROUGHT EASING - Instantaneous severity decreasing")
        elif recent_inst.is_monotonic_increasing:
            print("üìà DROUGHT WORSENING - Instantaneous severity increasing")
        else:
            print("‚û°Ô∏è DROUGHT FLUCTUATING - Variable severity")
        
        print(f"\nRecent instantaneous magnitude:")
        for idx, row in recent[recent['is_event']].iterrows():
            print(f"  {row['time']}: {row['magnitude_instantaneous']:.2f}")
else:
    print("No active drought in recent months")

### Visualize Drought Timeline (5-Panel Plot)

This shows:
1. Index value (SPI-12)
2. Duration
3. Magnitude - Cumulative (blue, always increasing)
4. Magnitude - Instantaneous (red, NDVI-like pattern)
5. Intensity

In [None]:
# Plot 5-panel drought evolution
fig = plot_event_timeline(ts, title=f'Drought Evolution at {lat_val:.2f}¬∞N, {lon_val:.2f}¬∞E')

# Save
filename = generate_location_filename('drought_timeline', lat_val, lon_val, 'png')
plt.savefig(f'../output/plots/single/{filename}', dpi=300, bbox_inches='tight')
print(f"‚úì Saved timeline plot: {filename}")

plt.show()

### Magnitude Comparison: Cumulative vs Instantaneous

**Key Differences:**
- **Cumulative (blue)**: Total deficit, monotonically increases during drought, like debt accumulation
- **Instantaneous (red)**: Current severity, varies with SPI pattern, like NDVI crop phenology

In [None]:
# Compare both magnitude types
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 8), sharex=True)

# Cumulative magnitude (blue)
ts[ts['is_event']].plot(x='time', y='magnitude_cumulative', ax=ax1, 
                          color='steelblue', linewidth=2, label='Cumulative')
ax1.fill_between(ts[ts['is_event']]['time'], 0, 
                 ts[ts['is_event']]['magnitude_cumulative'],
                 alpha=0.3, color='blue')
ax1.set_ylabel('Cumulative Magnitude', fontsize=11)
ax1.set_title('Cumulative Magnitude (Total Deficit - Always Increasing)', fontsize=12)
ax1.legend()
ax1.grid(True, alpha=0.3)

# Instantaneous magnitude (red)
ts[ts['is_event']].plot(x='time', y='magnitude_instantaneous', ax=ax2,
                          color='darkred', linewidth=2, label='Instantaneous')
ax2.fill_between(ts[ts['is_event']]['time'], 0,
                 ts[ts['is_event']]['magnitude_instantaneous'],
                 alpha=0.3, color='red')
ax2.set_ylabel('Instantaneous Magnitude', fontsize=11)
ax2.set_title('Instantaneous Magnitude (Current Severity - NDVI-like Pattern)', fontsize=12)
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.suptitle('Dual Magnitude Comparison', fontsize=14, y=0.995)
plt.tight_layout()

# Save
filename = generate_location_filename('magnitude_comparison', lat_val, lon_val, 'png')
plt.savefig(f'../output/plots/single/{filename}', dpi=300, bbox_inches='tight')
print(f"‚úì Saved magnitude comparison: {filename}")

plt.show()

print("\nNote: See docs/user-guide/magnitude.md for detailed explanation")

### Save Time Series to CSV

In [None]:
# Save timeseries DataFrame
csv_filename = generate_location_filename('drought_timeseries', lat_val, lon_val, 'csv')
ts.to_csv(f'../output/csv/{csv_filename}', index=False)
print(f"‚úì Time series saved to: ../output/csv/{csv_filename}")
print(f"  Length: {len(ts)} months")

## 6. Mode 3: Period Statistics (Gridded)

Calculate spatial drought statistics for specific time periods. This answers questions like:
- "How many events occurred in 2023?"
- "Where was the worst drought in the last 5 years?"
- "How does recent compare to historical?"

### Question 1: "What happened in 2023?"

In [None]:
# Calculate statistics for 2023
print("Calculating drought statistics for 2023...")
stats_2023 = calculate_period_statistics(spi, threshold=threshold,
                                         start_year=2023, end_year=2023,
                                         min_duration=min_duration)

print(f"‚úì Statistics calculated")
print(f"  Variables: {list(stats_2023.data_vars)}")
print(f"  Dimensions: {stats_2023.dims}")
print()
print("Regional averages for 2023:")
for var in stats_2023.data_vars:
    mean_val = float(stats_2023[var].mean().values)
    print(f"  {var}: {mean_val:.2f}")

### Map 1: Number of Events in 2023

In [None]:
# Plot number of events
fig = plot_spatial_stats(stats_2023, variable='num_events',
                                 title='Number of Drought Events in 2023',
                                 cmap='YlOrRd')

plt.savefig('../output/plots/spatial/num_events_2023.png', dpi=300, bbox_inches='tight')
print("‚úì Saved: num_events_2023.png")

plt.show()

### Map 2: Worst Severity in 2023

In [None]:
# Plot worst peak
fig = plot_spatial_stats(stats_2023, variable='worst_peak',
                                 title='Worst Drought Severity in 2023',
                                 cmap='RdYlBu_r')

plt.savefig('../output/plots/spatial/worst_peak_2023.png', dpi=300, bbox_inches='tight')
print("‚úì Saved: worst_peak_2023.png")

plt.show()

### Save 2023 Statistics

In [None]:
# Save to NetCDF
stats_2023.to_netcdf('../output/netcdf/drought_stats_2023.nc')
print("‚úì Statistics saved to: ../output/netcdf/drought_stats_2023.nc")

### Question 2: "Where was the worst drought in the last 5 years?"

In [None]:
# Calculate statistics for 2020-2024
print("Calculating drought statistics for 2020-2024...")
stats_5yr = calculate_period_statistics(spi, threshold=threshold,
                                        start_year=2020, end_year=2024,
                                        min_duration=min_duration)

print(f"‚úì 5-year statistics calculated")

# Plot worst peak over 5 years
fig = plot_spatial_stats(stats_5yr, variable='worst_peak',
                                 title='Worst Drought Severity (2020-2024)',
                                 cmap='RdYlBu_r')

plt.savefig('../output/plots/spatial/worst_peak_2020-2024.png', dpi=300, bbox_inches='tight')
print("‚úì Saved: worst_peak_2020-2024.png")

plt.show()

# Save
stats_5yr.to_netcdf('../output/netcdf/drought_stats_2020-2024.nc')
print("‚úì Saved: drought_stats_2020-2024.nc")

### Multi-Variable Summary

In [None]:
# Create 2x2 panel of key statistics
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

variables = ['num_events', 'worst_peak', 'total_magnitude', 'pct_time_in_drought']
titles = ['Event Count', 'Worst Severity', 'Total Magnitude', '% Time in Drought']
cmaps = ['YlOrRd', 'RdYlBu_r', 'YlOrRd', 'Reds']

for ax, var, title, cmap in zip(axes.flat, variables, titles, cmaps):
    stats_5yr[var].plot(ax=ax, cmap=cmap, cbar_kwargs={'label': title})
    ax.set_title(title, fontsize=12)
    ax.set_xlabel('Longitude')
    ax.set_ylabel('Latitude')

plt.suptitle('Drought Statistics Summary (2020-2024)', fontsize=14, y=0.995)
plt.tight_layout()

plt.savefig('../output/plots/spatial/summary_2020-2024.png', dpi=300, bbox_inches='tight')
print("‚úì Saved: summary_2020-2024.png")

plt.show()

## 7. Annual Statistics

Calculate drought statistics for each year to analyze trends over time.

In [None]:
# Calculate annual statistics
print("Calculating annual drought statistics...")
print("‚ö†Ô∏è This may take a few minutes for large grids...")

annual = calculate_annual_statistics(spi, threshold=threshold, min_duration=min_duration)

print(f"‚úì Annual statistics calculated")
print(f"  Years: {len(annual.year)}")
print(f"  Variables: {list(annual.data_vars)}")

# Save
annual.to_netcdf('../output/netcdf/drought_stats_annual.nc')
print("‚úì Saved: drought_stats_annual.nc")

### Time Series of Regional Averages

In [None]:
# Plot annual time series of regional average
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 8))

# Number of events
regional_events = annual.num_events.mean(dim=['lat', 'lon'])
regional_events.plot(ax=ax1, marker='o', linewidth=2, color='darkred')
ax1.axhline(y=regional_events.mean(), color='k', linestyle='--', 
            linewidth=1, alpha=0.5, label='Long-term mean')
ax1.set_ylabel('Average Events per Grid Cell')
ax1.set_title('Annual Drought Event Frequency (Regional Average)')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Worst peak
regional_peak = annual.worst_peak.mean(dim=['lat', 'lon'])
regional_peak.plot(ax=ax2, marker='o', linewidth=2, color='navy')
ax2.axhline(y=regional_peak.mean(), color='k', linestyle='--',
            linewidth=1, alpha=0.5, label='Long-term mean')
ax2.set_ylabel('Average Worst Peak')
ax2.set_title('Annual Worst Drought Severity (Regional Average)')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('../output/plots/spatial/annual_trends.png', dpi=300, bbox_inches='tight')
print("‚úì Saved: annual_trends.png")

plt.show()

# Identify worst years
worst_year_events = int(annual.year[regional_events.argmax()].values)
worst_year_severity = int(annual.year[regional_peak.argmin()].values)

print(f"\nWorst years:")
print(f"  Most events: {worst_year_events} ({regional_events.max().values:.2f} avg events)")
print(f"  Worst severity: {worst_year_severity} (SPI: {regional_peak.min().values:.2f})")

## 8. Period Comparison

Compare drought characteristics between historical baseline and recent period.

In [None]:
# Compare historical (1991-2020) vs recent (2021-2024)
print("Comparing historical vs recent periods...")

comparison = compare_periods(
    spi,
    periods=[(1991, 2020), (2021, 2024)],
    period_names=['Historical (1991-2020)', 'Recent (2021-2024)'],
    threshold=threshold,
    min_duration=min_duration
)

print(f"‚úì Period comparison complete")
print(f"  Periods: {list(comparison.period.values)}")
print(f"  Variables: {list(comparison.data_vars)}")

### Side-by-Side Comparison

In [None]:
# Plot both periods
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# Historical
comparison.sel(period='Historical (1991-2020)').num_events.plot(
    ax=ax1, cmap='YlOrRd', vmin=0, vmax=15,
    cbar_kwargs={'label': 'Events'}
)
ax1.set_title('Historical Period (1991-2020)', fontsize=12)

# Recent
comparison.sel(period='Recent (2021-2024)').num_events.plot(
    ax=ax2, cmap='YlOrRd', vmin=0, vmax=15,
    cbar_kwargs={'label': 'Events'}
)
ax2.set_title('Recent Period (2021-2024)', fontsize=12)

plt.suptitle('Drought Event Comparison', fontsize=14, y=0.98)
plt.tight_layout()

plt.savefig('../output/plots/spatial/period_comparison.png', dpi=300, bbox_inches='tight')
print("‚úì Saved: period_comparison.png")

plt.show()

### Change Map (Recent - Historical)

In [None]:
# Calculate difference
diff = comparison.sel(period='Recent (2021-2024)') - comparison.sel(period='Historical (1991-2020)')

# Plot change in events
fig, ax = plt.subplots(figsize=(12, 6))

diff.num_events.plot(ax=ax, cmap='RdBu_r', center=0,
                     cbar_kwargs={'label': 'Change in Events'})
ax.set_title('Change in Drought Events (Recent - Historical)', fontsize=13)

plt.tight_layout()
plt.savefig('../output/plots/spatial/events_change.png', dpi=300, bbox_inches='tight')
print("‚úì Saved: events_change.png")

plt.show()

# Summary statistics
print("\nChange Summary (Recent - Historical):")
print("=" * 50)
for var in ['num_events', 'total_magnitude', 'worst_peak']:
    change = float(diff[var].mean().values)
    print(f"{var}: {change:+.2f} (average change)")

## 9. Summary and Best Practices

### Three Modes Recap

| Mode | Function | Use Case | Output |
|------|----------|----------|--------|
| **Event-Based** | `identify_events()` | Historical analysis | DataFrame of events |
| **Time-Series** | `calculate_timeseries()` | Real-time monitoring | DataFrame by month |
| **Period Stats** | `calculate_period_statistics()` | Decision support | Gridded statistics |

### Magnitude Types

- **Cumulative**: Total deficit, always increasing, use for event comparison
- **Instantaneous**: Current severity, NDVI-like pattern, use for monitoring evolution

See: `docs/user-guide/magnitude.md` for detailed explanation

### Best Practices

1. **Threshold**: Use -1.0 or -1.2 for operational monitoring
2. **Min Duration**: 3 months for SPI-12 (captures sustained events)
3. **Period Statistics**: Answer specific questions ("What happened in 2023?")
4. **Annual Analysis**: Identify trends and worst years
5. **Comparison**: Use fixed historical baseline for climate analysis

### Output Organization

```
output/
‚îú‚îÄ‚îÄ csv/
‚îÇ   ‚îú‚îÄ‚îÄ drought_events_lat*.##_lon*.##.csv
‚îÇ   ‚îî‚îÄ‚îÄ drought_timeseries_lat*.##_lon*.##.csv
‚îú‚îÄ‚îÄ netcdf/
‚îÇ   ‚îú‚îÄ‚îÄ drought_stats_2023.nc
‚îÇ   ‚îú‚îÄ‚îÄ drought_stats_2020-2024.nc
‚îÇ   ‚îî‚îÄ‚îÄ drought_stats_annual.nc
‚îî‚îÄ‚îÄ plots/
    ‚îú‚îÄ‚îÄ single/  # Location-specific (lat/lon in filename)
    ‚îî‚îÄ‚îÄ spatial/ # Maps
```

### Next Steps

- See `04_visualization_gallery.ipynb` for more plotting options
- Read `docs/user-guide/runtheory.md` for detailed methodology
- Apply to your own SPI/SPEI datasets

In [None]:
# Final summary
print("\n" + "="*60)
print("NOTEBOOK COMPLETE")
print("="*60)
print(f"\n‚úì Analyzed location: {lat_val:.2f}¬∞N, {lon_val:.2f}¬∞E")
print(f"‚úì Found {len(events)} drought events")
print(f"‚úì Created time series with {len(ts)} months")
print(f"‚úì Calculated period statistics")
print(f"\n‚úì All outputs saved to ../output/")
print("\nSee output directories for:")
print("  - CSV files in output/csv/")
print("  - NetCDF files in output/netcdf/")
print("  - Plots in output/plots/single/ and output/plots/spatial/")