# Day 4, Session 2: LSTM Drought Monitoring Lab
## Hands-on Implementation for Mindanao Agricultural Regions

---

## 🎯 Learning Objectives

By the end of this lab, you will be able to:

1. **Acquire and preprocess** multi-year Sentinel-2 NDVI time series for a study area
2. **Create training sequences** using sliding window approach for time series forecasting
3. **Build LSTM models** using TensorFlow/Keras with appropriate architecture
4. **Train and validate** models with proper temporal data splitting
5. **Evaluate forecast accuracy** using RMSE, MAE, and visual diagnostics
6. **Interpret predictions** in the context of drought monitoring
7. **Deploy** models for operational early warning systems

---

## 📋 Session Overview

- **Duration**: 2.5 hours (150 minutes)
- **Format**: Hands-on Lab
- **Difficulty**: Intermediate to Advanced
- **Application**: Drought forecasting for Bukidnon and South Cotabato, Mindanao

---

## 🌾 Case Study: Mindanao Drought Monitoring

### Why Mindanao?

**Bukidnon and South Cotabato** are critical agricultural provinces producing:
- Corn (major crop)
- Rice
- Coffee, Pineapple, Vegetables

**Climate Challenges:**
- Pronounced dry season (November-April)
- Strong El Niño impacts
- 2015-2016 El Niño: Severe drought affecting 2.5 million people

**Objective:** Predict drought conditions **1-3 months ahead** to enable:
- Adjust planting calendars
- Activate irrigation systems
- Distribute drought-resistant seeds
- Pre-position crop insurance programs

## Part 1: Setup and Data Loading (20 minutes)

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Deep Learning
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau

# Machine Learning
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# Configure matplotlib
plt.rcParams['figure.figsize'] = (14, 6)
sns.set_style('whitegrid')

print(f"TensorFlow version: {tf.__version__}")
print(f"GPU Available: {tf.config.list_physical_devices('GPU')}")
print("Setup complete! 🚀")

## Generate Synthetic Mindanao NDVI Data

For this lab, we'll use synthetic but realistic data. In production, you would load actual Sentinel-2 NDVI from Google Earth Engine or pre-processed files.

In [None]:
def generate_mindanao_drought_data(start_date='2015-01-01', end_date='2021-12-31'):
    """
    Generate synthetic NDVI time series for Mindanao with realistic drought patterns.
    """
    # Create monthly date range
    dates = pd.date_range(start=start_date, end=end_date, freq='MS')
    n_months = len(dates)
    
    # Initialize arrays
    ndvi = np.zeros(n_months)
    rainfall = np.zeros(n_months)
    temperature = np.zeros(n_months)
    oni = np.zeros(n_months)  # El Niño index
    
    base_ndvi = 0.70
    
    for i, date in enumerate(dates):
        month = date.month
        year = date.year
        
        # Seasonal patterns
        if month in [11, 12, 1, 2, 3]:  # Wet season
            seasonal_factor = 0.85 + 0.10 * np.sin(2 * np.pi * month / 12)
            rain_base = 250
            temp_base = 26
        else:  # Dry season
            seasonal_factor = 0.75 + 0.05 * np.sin(2 * np.pi * month / 12)
            rain_base = 80
            temp_base = 28
        
        # El Niño drought events (2015-2016)
        drought_factor = 1.0
        if year == 2015 and month >= 6:
            drought_factor = 0.60
            oni[i] = 2.5 + np.random.randn() * 0.3  # Strong El Niño
            rain_base *= 0.4
            temp_base += 2
        elif year == 2016 and month <= 6:
            drought_factor = 0.65
            oni[i] = 2.0 + np.random.randn() * 0.3
            rain_base *= 0.5
            temp_base += 1.5
        else:
            oni[i] = np.random.randn() * 0.5  # Normal conditions
        
        # Calculate values
        ndvi[i] = base_ndvi * seasonal_factor * drought_factor + np.random.normal(0, 0.03)
        ndvi[i] = np.clip(ndvi[i], 0.2, 0.9)
        
        rainfall[i] = max(0, rain_base + np.random.normal(0, 40))
        temperature[i] = temp_base + np.random.normal(0, 1.5)
    
    # Create DataFrame
    df = pd.DataFrame({
        'date': dates,
        'ndvi': ndvi,
        'rainfall': rainfall,
        'temperature': temperature,
        'oni': oni
    })
    
    return df

# Generate data
df = generate_mindanao_drought_data()
print(f"Generated {len(df)} months of data")
print(f"Date range: {df['date'].min()} to {df['date'].max()}")
print(f"\nFirst 5 rows:")
print(df.head())
print(f"\nBasic statistics:")
print(df[['ndvi', 'rainfall', 'temperature', 'oni']].describe())

## Part 2: Exploratory Data Analysis (25 minutes)

In [None]:
# Visualize NDVI time series
fig, axes = plt.subplots(3, 1, figsize=(15, 10), sharex=True)

# Plot 1: NDVI
axes[0].plot(df['date'], df['ndvi'], 'g-', linewidth=2, label='NDVI')
axes[0].axhline(y=df['ndvi'].mean(), color='gray', linestyle='--', alpha=0.5, label='Mean')
# Highlight 2015-2016 drought
axes[0].axvspan(pd.Timestamp('2015-06-01'), pd.Timestamp('2016-06-01'), 
                alpha=0.2, color='red', label='2015-16 El Niño Drought')
axes[0].set_ylabel('NDVI', fontsize=12)
axes[0].set_title('Mindanao NDVI Time Series (2015-2021)', fontsize=14, fontweight='bold')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Plot 2: Rainfall
axes[1].bar(df['date'], df['rainfall'], color='blue', alpha=0.6, width=20)
axes[1].set_ylabel('Rainfall (mm)', fontsize=12)
axes[1].set_title('Monthly Rainfall', fontsize=12)
axes[1].grid(True, alpha=0.3)

# Plot 3: ONI (El Niño Index)
colors = ['red' if x > 0.5 else 'blue' if x < -0.5 else 'gray' for x in df['oni']]
axes[2].bar(df['date'], df['oni'], color=colors, alpha=0.7, width=20)
axes[2].axhline(y=0.5, color='red', linestyle='--', alpha=0.5, label='El Niño threshold')
axes[2].axhline(y=-0.5, color='blue', linestyle='--', alpha=0.5, label='La Niña threshold')
axes[2].set_ylabel('ONI Index', fontsize=12)
axes[2].set_xlabel('Date', fontsize=12)
axes[2].set_title('Oceanic Niño Index (ONI)', fontsize=12)
axes[2].legend()
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### 🎯 Exercise 1: Data Exploration

**Tasks:**
1. Calculate the mean NDVI for dry season (May-Oct) vs. wet season (Nov-Apr)
2. Identify the month with the lowest NDVI value
3. Calculate the correlation between NDVI and rainfall

In [None]:
# TODO: Complete Exercise 1

# Task 1: Seasonal NDVI means
df['month'] = df['date'].dt.month
# dry_season_ndvi = ...
# wet_season_ndvi = ...

# Task 2: Lowest NDVI month
# lowest_ndvi_idx = ...

# Task 3: Correlation
# correlation = ...

## Part 3: Sequence Creation (30 minutes)

### Create Sliding Window Sequences for LSTM

In [None]:
# Hyperparameters
LOOKBACK_WINDOW = 12  # Use 12 months of history
FORECAST_HORIZON = 1   # Predict 1 month ahead

# Features to use
feature_columns = ['ndvi', 'rainfall', 'temperature', 'oni']
target_column = 'ndvi'

# Normalize data
scaler = MinMaxScaler(feature_range=(0, 1))
df_scaled = df.copy()
df_scaled[feature_columns] = scaler.fit_transform(df[feature_columns])

print(f"Lookback window: {LOOKBACK_WINDOW} months")
print(f"Forecast horizon: {FORECAST_HORIZON} month(s)")
print(f"Features: {feature_columns}")

In [None]:
def create_sequences(data, features, target, lookback, horizon):
    """
    Create input-output sequences for LSTM training.
    
    Args:
        data: DataFrame with features
        features: List of feature column names
        target: Target column name
        lookback: Number of time steps to look back
        horizon: Number of time steps to forecast ahead
    
    Returns:
        X: Input sequences (samples, lookback, n_features)
        y: Target values (samples,)
        dates: Corresponding dates for sequences
    """
    X, y, dates = [], [], []
    
    feature_data = data[features].values
    target_data = data[target].values
    date_data = data['date'].values
    
    for i in range(lookback, len(data) - horizon + 1):
        # Input sequence: [i-lookback : i]
        X.append(feature_data[i - lookback:i])
        
        # Target value: i + horizon - 1
        y.append(target_data[i + horizon - 1])
        
        # Date of prediction
        dates.append(date_data[i + horizon - 1])
    
    return np.array(X), np.array(y), np.array(dates)

# Create sequences
X, y, dates = create_sequences(
    df_scaled,
    feature_columns,
    target_column,
    LOOKBACK_WINDOW,
    FORECAST_HORIZON
)

print(f"X shape: {X.shape} (samples, time_steps, features)")
print(f"y shape: {y.shape} (samples,)")
print(f"Total sequences: {len(X)}")

### Temporal Train-Validation-Test Split

**CRITICAL:** Use temporal splits (not random) to avoid data leakage!

In [None]:
# TODO: Complete the temporal split

# Define split points
train_end = pd.Timestamp('2019-12-31')
val_end = pd.Timestamp('2020-12-31')

# Get indices
train_mask = dates <= train_end
val_mask = (dates > train_end) & (dates <= val_end)
test_mask = dates > val_end

# Split data
X_train, y_train = X[train_mask], y[train_mask]
X_val, y_val = X[val_mask], y[val_mask]
X_test, y_test = X[test_mask], y[test_mask]

dates_train = dates[train_mask]
dates_val = dates[val_mask]
dates_test = dates[test_mask]

print("Data Split:")
print(f"  Train: {len(X_train)} sequences ({dates_train[0]} to {dates_train[-1]})")
print(f"  Val:   {len(X_val)} sequences ({dates_val[0]} to {dates_val[-1]})")
print(f"  Test:  {len(X_test)} sequences ({dates_test[0]} to {dates_test[-1]})")

## Part 4: LSTM Model Building (30 minutes)

In [None]:
# TODO: Build LSTM model

def build_lstm_model(input_shape, lstm_units=[64, 32], dropout=0.2, learning_rate=0.001):
    """
    Build LSTM model for drought forecasting.
    
    Complete the architecture below.
    """
    model = Sequential(name='LSTM_Drought_Forecaster')
    
    # TODO: Add LSTM layers
    # Hint: First LSTM should have return_sequences=True if stacking layers
    # model.add(LSTM(...))
    # model.add(Dropout(...))
    
    # TODO: Add output layers
    # model.add(Dense(...))
    # model.add(Dense(1, activation='linear'))
    
    # TODO: Compile model
    # optimizer = keras.optimizers.Adam(learning_rate=learning_rate)
    # model.compile(...)
    
    return model

# Build model
input_shape = (LOOKBACK_WINDOW, len(feature_columns))
# model = build_lstm_model(input_shape)
# model.summary()

## Part 5: Model Training (20 minutes)

In [None]:
# TODO: Configure callbacks and train

# callbacks = [
#     EarlyStopping(monitor='val_loss', patience=20, restore_best_weights=True),
#     ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=10, min_lr=1e-6)
# ]

# BATCH_SIZE = 32
# EPOCHS = 100

# history = model.fit(
#     X_train, y_train,
#     validation_data=(X_val, y_val),
#     epochs=EPOCHS,
#     batch_size=BATCH_SIZE,
#     callbacks=callbacks,
#     verbose=1
# )

## Part 6: Model Evaluation (30 minutes)

In [None]:
# TODO: Make predictions and evaluate

# y_test_pred = model.predict(X_test).flatten()

# # Inverse transform to original scale
# def inverse_transform_ndvi(values, scaler, feature_columns):
#     dummy = np.zeros((len(values), len(feature_columns)))
#     ndvi_idx = feature_columns.index('ndvi')
#     dummy[:, ndvi_idx] = values
#     inverse = scaler.inverse_transform(dummy)
#     return inverse[:, ndvi_idx]

# y_test_actual = inverse_transform_ndvi(y_test, scaler, feature_columns)
# y_test_pred_original = inverse_transform_ndvi(y_test_pred, scaler, feature_columns)

# # Calculate metrics
# rmse = np.sqrt(mean_squared_error(y_test_actual, y_test_pred_original))
# mae = mean_absolute_error(y_test_actual, y_test_pred_original)
# r2 = r2_score(y_test_actual, y_test_pred_original)

# print(f"\nTest Set Performance:")
# print(f"  RMSE: {rmse:.4f}")
# print(f"  MAE:  {mae:.4f}")
# print(f"  R²:   {r2:.4f}")

## Visualize Predictions

In [None]:
# TODO: Create visualization

# fig, ax = plt.subplots(figsize=(15, 6))
# ax.plot(dates_test, y_test_actual, 'g-', linewidth=2, marker='o', label='Actual NDVI')
# ax.plot(dates_test, y_test_pred_original, 'r--', linewidth=2, marker='x', label='Predicted NDVI')
# ax.set_xlabel('Date', fontsize=12)
# ax.set_ylabel('NDVI', fontsize=12)
# ax.set_title('LSTM Drought Forecasting: Test Set Predictions', fontsize=14, fontweight='bold')
# ax.legend(fontsize=11)
# ax.grid(True, alpha=0.3)
# plt.tight_layout()
# plt.show()

## 🎯 Final Exercise: Operational Deployment

**Task:** Design a simple operational forecast system

1. Define drought threshold (e.g., NDVI < 0.4)
2. Identify when model predicts drought
3. Calculate lead time (months before actual drought)
4. Assess false alarm rate

In [None]:
# TODO: Complete operational analysis

# DROUGHT_THRESHOLD = 0.4
# predicted_drought = ...
# actual_drought = ...
# Calculate true positives, false positives, etc.

## 🌾 Key Takeaways

In this lab, you:
- ✅ Built end-to-end LSTM drought forecasting system
- ✅ Processed multi-year time series data
- ✅ Implemented proper temporal validation
- ✅ Evaluated operational forecast accuracy
- ✅ Designed deployment considerations for Philippine agencies

**Next:** Session 3 explores emerging AI trends (Foundation Models, XAI) to further enhance these systems!