# Day 4, Session 1: LSTMs for Earth Observation Time Series
## INSTRUCTOR VERSION - Complete Solutions

## 🎯 Learning Objectives

By the end of this session, you will be able to:
1. **Understand** the importance of time series analysis in Earth Observation
2. **Explain** why RNNs face challenges with long sequences (vanishing/exploding gradients)
3. **Describe** LSTM architecture and how gates solve RNN limitations
4. **Implement** an LSTM model for drought prediction using NDVI time series
5. **Apply** LSTMs to Philippine EO challenges (Mindanao drought monitoring)

## 📋 Session Overview

- **Duration**: 1.5 hours
- **Prerequisites**: Basic understanding of neural networks (from Day 3)
- **Application Focus**: Drought monitoring in Mindanao agricultural regions
- **Key Dataset**: Simulated Sentinel-2 NDVI time series (2019-2024)

---

## 🌍 Philippine Context

The Philippines experiences significant climate variability, with El Niño events causing severe droughts, particularly affecting Mindanao's agricultural regions. PAGASA reports that drought events have increased in frequency and intensity, making early warning systems critical for:
- **Food security** in Bukidnon and South Cotabato
- **Water resource management** for irrigation systems
- **Agricultural planning** and crop insurance programs

Time series analysis of satellite-derived vegetation indices enables us to detect early drought signals and predict future conditions.

## 📚 Module 1: Introduction to Time Series in Earth Observation

### What are EO Time Series?

Earth Observation time series are sequences of measurements taken at regular intervals from the same location. Common examples include:

1. **NDVI (Normalized Difference Vegetation Index)**: Measures vegetation health
   - Range: -1 to +1 (higher values = healthier vegetation)
   - Sentinel-2 provides 5-day revisit time
   
2. **SAR Backscatter**: Radar signal strength indicating surface properties
   - Sentinel-1 provides 6-12 day revisit
   - Sensitive to soil moisture and vegetation structure

3. **Land Surface Temperature**: Thermal measurements from satellites
   - Critical for drought and heat stress monitoring

### Why Time Series Matter for Philippine EO Applications

- **Phenology Tracking**: Monitor rice cropping calendars in Central Luzon
- **Drought Detection**: Early warning for Mindanao agricultural zones
- **Land Change Detection**: Urban expansion in Metro Manila
- **Disaster Impact Assessment**: Pre/post typhoon vegetation analysis

### 💭 Think-Through Discussion

**Question**: How might seasonal patterns in NDVI differ between irrigated rice fields in Nueva Ecija and rainfed corn farms in Bukidnon? What implications does this have for drought monitoring?

**Answer**: Irrigated rice fields show consistent NDVI patterns following cropping calendars, while rainfed farms are more sensitive to precipitation variability. This means drought detection thresholds must be location-specific and consider irrigation infrastructure.

## 🛠️ Setup and Environment Configuration

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# TensorFlow and Keras imports
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.optimizers import Adam

# Scikit-learn for preprocessing
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# Configure matplotlib for better visualizations
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 10
sns.set_style('whitegrid')

print(f"TensorFlow version: {tf.__version__}")
print(f"GPU Available: {tf.config.list_physical_devices('GPU')}")
print("Setup complete! 🚀")

## 📊 Data Generation: Simulating Realistic NDVI Time Series

We'll create a synthetic but realistic NDVI time series that mimics conditions in Bukidnon, Mindanao:
- Normal seasonal patterns (wet/dry seasons)
- El Niño drought events (2019, 2023)
- Random variations and noise

In [None]:
def generate_mindanao_ndvi_timeseries(start_date='2019-01-01', end_date='2024-12-31', 
                                      location='Bukidnon'):
    """
    Generate synthetic NDVI time series mimicking Mindanao agricultural patterns.
    Includes seasonal variations, drought events, and realistic noise.
    
    Parameters:
    -----------
    start_date : str
        Start date of the time series
    end_date : str
        End date of the time series
    location : str
        Location name for reference
    
    Returns:
    --------
    pd.DataFrame
        DataFrame with date, NDVI, precipitation, and drought index
    """
    
    # Create date range (10-day composites, similar to Sentinel-2)
    dates = pd.date_range(start=start_date, end=end_date, freq='10D')
    n_samples = len(dates)
    
    # Initialize arrays
    ndvi = np.zeros(n_samples)
    precipitation = np.zeros(n_samples)
    
    # Base NDVI for healthy vegetation in Mindanao
    base_ndvi = 0.75
    
    for i, date in enumerate(dates):
        # Seasonal component (wet season: Nov-Apr, dry season: May-Oct)
        month = date.month
        if month in [11, 12, 1, 2, 3, 4]:  # Wet season
            seasonal_factor = 0.85 + 0.1 * np.sin(2 * np.pi * month / 12)
            precip_base = 250 + 50 * np.random.randn()  # mm/month
        else:  # Dry season
            seasonal_factor = 0.7 + 0.05 * np.sin(2 * np.pi * month / 12)
            precip_base = 100 + 30 * np.random.randn()  # mm/month
        
        # El Niño drought events (2019 Q2-Q3, 2023 Q1-Q2)
        drought_factor = 1.0
        if (date.year == 2019 and 4 <= month <= 9) or \
           (date.year == 2023 and 2 <= month <= 7):
            drought_factor = 0.6 + 0.2 * np.random.random()
            precip_base *= 0.4  # Reduced precipitation during drought
        
        # Calculate NDVI with noise
        ndvi[i] = base_ndvi * seasonal_factor * drought_factor
        ndvi[i] += np.random.normal(0, 0.03)  # Add noise
        ndvi[i] = np.clip(ndvi[i], 0.1, 0.95)  # Realistic bounds
        
        # Calculate precipitation
        precipitation[i] = max(0, precip_base)
    
    # Smooth the time series (moving average)
    window = 3
    ndvi_smooth = pd.Series(ndvi).rolling(window=window, center=True).mean()
    ndvi_smooth = ndvi_smooth.fillna(method='bfill').fillna(method='ffill')
    
    # Calculate drought index (simplified: based on NDVI deviation from normal)
    ndvi_mean = ndvi_smooth.rolling(window=36, center=True).mean().fillna(method='bfill').fillna(method='ffill')
    drought_index = (ndvi_smooth - ndvi_mean) / ndvi_mean.std()
    
    # Create DataFrame
    df = pd.DataFrame({
        'date': dates,
        'ndvi': ndvi_smooth.values,
        'precipitation': precipitation,
        'drought_index': drought_index.values,
        'location': location
    })
    
    return df

# Generate data for Bukidnon, Mindanao
df_mindanao = generate_mindanao_ndvi_timeseries()
print(f"Generated {len(df_mindanao)} time points of NDVI data")
print(f"Date range: {df_mindanao['date'].min()} to {df_mindanao['date'].max()}")
print(f"\nFirst 5 rows:")
print(df_mindanao.head())
print(f"\nBasic statistics:")
print(df_mindanao[['ndvi', 'precipitation', 'drought_index']].describe())

## 📈 Visualizing the Time Series Data

Let's visualize our NDVI time series to understand the patterns, including the drought events.

In [None]:
# Create comprehensive visualization
fig, axes = plt.subplots(3, 1, figsize=(14, 10), sharex=True)

# Plot 1: NDVI Time Series
ax1 = axes[0]
ax1.plot(df_mindanao['date'], df_mindanao['ndvi'], 'g-', linewidth=1.5, label='NDVI')
ax1.axhline(y=df_mindanao['ndvi'].mean(), color='gray', linestyle='--', alpha=0.5, label='Mean NDVI')

# Highlight drought periods
drought_periods = [
    ('2019-04-01', '2019-09-30', '2019 El Niño'),
    ('2023-02-01', '2023-07-31', '2023 El Niño')
]
for start, end, label in drought_periods:
    ax1.axvspan(pd.to_datetime(start), pd.to_datetime(end), alpha=0.2, color='orange', label=label)

ax1.set_ylabel('NDVI', fontsize=11)
ax1.set_title('NDVI Time Series for Bukidnon, Mindanao (2019-2024)', fontsize=12, fontweight='bold')
ax1.legend(loc='upper right')
ax1.grid(True, alpha=0.3)
ax1.set_ylim([0.2, 0.9])

# Plot 2: Precipitation
ax2 = axes[1]
ax2.bar(df_mindanao['date'], df_mindanao['precipitation'], color='blue', alpha=0.6, width=8)
ax2.set_ylabel('Precipitation (mm)', fontsize=11)
ax2.set_title('Precipitation Patterns', fontsize=11)
ax2.grid(True, alpha=0.3)

# Plot 3: Drought Index
ax3 = axes[2]
colors = ['red' if x < -1 else 'orange' if x < 0 else 'green' for x in df_mindanao['drought_index']]
ax3.scatter(df_mindanao['date'], df_mindanao['drought_index'], c=colors, alpha=0.6, s=10)
ax3.axhline(y=0, color='black', linestyle='-', linewidth=0.5)
ax3.axhline(y=-1, color='red', linestyle='--', alpha=0.5, label='Drought threshold')
ax3.set_ylabel('Drought Index', fontsize=11)
ax3.set_xlabel('Date', fontsize=11)
ax3.set_title('Drought Index (Negative values indicate drought stress)', fontsize=11)
ax3.legend(loc='upper right')
ax3.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Print drought statistics
drought_days = (df_mindanao['drought_index'] < -1).sum()
print(f"\n📊 Drought Statistics:")
print(f"Total days with severe drought (index < -1): {drought_days}")
print(f"Percentage of time in drought: {drought_days/len(df_mindanao)*100:.1f}%")

## 🧠 Module 2: Understanding RNNs and Their Limitations

### Recurrent Neural Networks (RNNs) Basics

RNNs are designed to work with sequential data by maintaining a **hidden state** that acts as memory:

$$h_t = \tanh(W_{hh} \cdot h_{t-1} + W_{xh} \cdot x_t + b_h)$$
$$y_t = W_{hy} \cdot h_t + b_y$$

Where:
- $h_t$: hidden state at time $t$
- $x_t$: input at time $t$
- $W_{hh}, W_{xh}, W_{hy}$: weight matrices
- $b_h, b_y$: bias terms

### The Vanishing Gradient Problem

During backpropagation through time, gradients are multiplied repeatedly:

**Example**: If gradient = 0.5 at each step:
- After 10 steps: $0.5^{10} ≈ 0.001$
- After 50 steps: $0.5^{50} ≈ 8.9 \times 10^{-16}$ (essentially zero!)

This means the network **cannot learn long-term dependencies**.

### The Exploding Gradient Problem

Conversely, if gradient = 1.5 at each step:
- After 10 steps: $1.5^{10} ≈ 58$
- After 50 steps: $1.5^{50} ≈ 6.4 \times 10^{8}$ (numerical overflow!)

This causes **unstable training** and NaN values.

### 🎯 Mini-Challenge 1 - SOLUTION

**Task**: Calculate how many time steps it takes for a gradient of 0.9 to shrink below 0.01. What does this mean for analyzing a year of monthly NDVI data?

In [None]:
# SOLUTION: Mini-Challenge 1 - Calculate gradient vanishing
gradient_factor = 0.9
threshold = 0.01

# Method 1: Using a loop
steps = 0
current_gradient = gradient_factor
while current_gradient >= threshold:
    current_gradient *= gradient_factor
    steps += 1

print(f"Gradient shrinks below {threshold} after {steps} steps")
print(f"For monthly data, this means we can only learn patterns from the last {steps} months")
print(f"\n📊 Implications:")
print(f"  - A vanilla RNN would struggle to connect a drought event to conditions {steps} months earlier")
print(f"  - This is why LSTMs are crucial for long-term time series analysis")

# Method 2: Using logarithm (more elegant)
import math
steps_math = math.ceil(math.log(threshold) / math.log(gradient_factor))
print(f"\n✓ Verification using logarithm: {steps_math} steps")

## 🏗️ Module 3: LSTM Architecture - The Solution

### The LSTM Cell: A Smart Memory System

LSTMs solve the gradient problems through a sophisticated gating mechanism. Think of an LSTM as a **smart student taking notes during a lecture**:

1. **Forget Gate**: Decides what old information to discard
2. **Input Gate**: Determines what new information to store
3. **Output Gate**: Controls what information to pass forward
4. **Cell State**: The "notebook" carrying information through time

### Mathematical Formulation

The LSTM operations at time step $t$:

**Forget Gate**: $f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)$

**Input Gate**: $i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)$

**Candidate Values**: $\tilde{C}_t = \tanh(W_C \cdot [h_{t-1}, x_t] + b_C)$

**Cell State Update**: $C_t = f_t * C_{t-1} + i_t * \tilde{C}_t$

**Output Gate**: $o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o)$

**Hidden State**: $h_t = o_t * \tanh(C_t)$

Where $\sigma$ is the sigmoid function (outputs 0-1) and $*$ is element-wise multiplication.

### Why LSTMs Work

The **cell state** $C_t$ acts as a "conveyor belt" that can carry information unchanged across many time steps. The gates (using sigmoid activation) produce values between 0 and 1, acting as "valves" that control information flow without causing gradient explosion or vanishing.

### 💭 Think-Through Discussion - ANSWER

**Question**: In drought monitoring, what kind of information might the "forget gate" discard and what might the "input gate" preserve? Think about seasonal patterns vs. anomalies.

**Answer**: The forget gate might discard routine seasonal variations once they're learned, while the input gate would preserve anomalous drought signals. During a drought event, the input gate opens wide to capture the unusual low NDVI values, and the output gate ensures this critical information is propagated forward for prediction.

In [None]:
# Visualize LSTM Architecture Conceptually
from scipy.ndimage import gaussian_filter1d

def visualize_lstm_concept():
    """
    Create a conceptual visualization of LSTM information flow.
    """
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
    
    # Left plot: RNN vs LSTM gradient flow
    time_steps = np.arange(0, 50)
    rnn_gradient = 0.9 ** time_steps
    lstm_gradient = 0.95 ** time_steps  # LSTMs maintain gradients better
    
    ax1.plot(time_steps, rnn_gradient, 'r-', label='Vanilla RNN', linewidth=2)
    ax1.plot(time_steps, lstm_gradient, 'b-', label='LSTM', linewidth=2)
    ax1.axhline(y=0.01, color='gray', linestyle='--', alpha=0.5, label='Effective threshold')
    ax1.set_xlabel('Time Steps', fontsize=11)
    ax1.set_ylabel('Gradient Magnitude', fontsize=11)
    ax1.set_title('Gradient Flow: RNN vs LSTM', fontsize=12, fontweight='bold')
    ax1.set_yscale('log')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # Right plot: LSTM gates behavior simulation
    time = np.linspace(0, 24, 100)  # 24 months
    
    # Simulate gate activations during drought event
    normal_period = (time < 6) | (time > 18)
    drought_period = (time >= 6) & (time <= 18)
    
    forget_gate = np.where(normal_period, 0.8, 0.3)  # Forget more during normal times
    input_gate = np.where(drought_period, 0.9, 0.4)   # Store more during drought
    output_gate = np.where(drought_period, 0.95, 0.6) # Output more during drought
    
    # Add some smooth transitions
    forget_gate = gaussian_filter1d(forget_gate, sigma=2)
    input_gate = gaussian_filter1d(input_gate, sigma=2)
    output_gate = gaussian_filter1d(output_gate, sigma=2)
    
    ax2.plot(time, forget_gate, 'r-', label='Forget Gate', linewidth=2)
    ax2.plot(time, input_gate, 'g-', label='Input Gate', linewidth=2)
    ax2.plot(time, output_gate, 'b-', label='Output Gate', linewidth=2)
    ax2.axvspan(6, 18, alpha=0.2, color='orange', label='Drought Period')
    ax2.set_xlabel('Time (months)', fontsize=11)
    ax2.set_ylabel('Gate Activation (0-1)', fontsize=11)
    ax2.set_title('LSTM Gates During Drought Monitoring', fontsize=12, fontweight='bold')
    ax2.legend(loc='right')
    ax2.grid(True, alpha=0.3)
    ax2.set_ylim([0, 1])
    
    plt.tight_layout()
    plt.show()

visualize_lstm_concept()

print("🔍 Key Insights:")
print("1. LSTMs maintain gradient flow much better than vanilla RNNs")
print("2. During drought events, the input and output gates open more to capture and propagate anomaly information")
print("3. The forget gate reduces during drought to preserve important drought signals")

## 🔧 Data Preparation for LSTM

LSTMs require data in a specific format: sequences of fixed length. We'll create sliding windows from our time series.

### Sliding Window Approach

For drought prediction, we'll use:
- **Input**: 12 months of historical NDVI values
- **Output**: Next month's NDVI value

Example:
```
Window 1: Months 1-12 → Predict Month 13
Window 2: Months 2-13 → Predict Month 14
Window 3: Months 3-14 → Predict Month 15
...
```

In [None]:
def create_sequences(data, seq_length=12, prediction_horizon=1):
    """
    Create sequences for LSTM training.
    
    Parameters:
    -----------
    data : np.array
        Time series data
    seq_length : int
        Number of time steps to use as input
    prediction_horizon : int
        Number of time steps ahead to predict
    
    Returns:
    --------
    X, y : np.arrays
        Input sequences and targets
    """
    X, y = [], []
    
    for i in range(len(data) - seq_length - prediction_horizon + 1):
        # Input sequence
        X.append(data[i:i + seq_length])
        # Target value
        y.append(data[i + seq_length + prediction_horizon - 1])
    
    return np.array(X), np.array(y)

# Prepare NDVI data
ndvi_values = df_mindanao['ndvi'].values.reshape(-1, 1)

# Normalize data (important for neural networks)
scaler = MinMaxScaler(feature_range=(0, 1))
ndvi_scaled = scaler.fit_transform(ndvi_values)

# Create sequences
SEQ_LENGTH = 12  # Use 12 time steps (120 days) to predict next time step
X, y = create_sequences(ndvi_scaled, seq_length=SEQ_LENGTH)

print(f"📦 Data Shape:")
print(f"Input sequences (X): {X.shape}")
print(f"Target values (y): {y.shape}")
print(f"\nExample:")
print(f"First input sequence (scaled): {X[0].flatten()[:5]}... (showing first 5 values)")
print(f"Corresponding target: {y[0]}")

# Split into training and validation sets
split_index = int(0.8 * len(X))
X_train, X_val = X[:split_index], X[split_index:]
y_train, y_val = y[:split_index], y[split_index:]

print(f"\n📊 Dataset Split:")
print(f"Training samples: {len(X_train)}")
print(f"Validation samples: {len(X_val)}")

## 🎯 Mini-Challenge 2 - SOLUTION: Data Exploration

In [None]:
# SOLUTION: Mini-Challenge 2 - Analyze the sequences
# Task: Find and visualize a sequence that leads to a drought (low NDVI) prediction

drought_indices = np.where(y_train < 0.3)[0]
if len(drought_indices) > 0:
    drought_idx = drought_indices[0]
    
    plt.figure(figsize=(12, 5))
    
    # Plot the sequence leading to drought
    plt.subplot(1, 2, 1)
    plt.plot(range(SEQ_LENGTH), X_train[drought_idx], 'b-o', label='Input sequence', linewidth=2)
    plt.axvline(x=SEQ_LENGTH-0.5, color='gray', linestyle='--', alpha=0.5)
    plt.plot(SEQ_LENGTH, y_train[drought_idx], 'ro', markersize=12, label='Predicted drought', zorder=5)
    plt.xlabel('Time Step')
    plt.ylabel('Scaled NDVI')
    plt.title('Sequence Leading to Drought Prediction')
    plt.legend()
    plt.grid(True, alpha=0.3)
    
    # Plot distribution of target values
    plt.subplot(1, 2, 2)
    plt.hist(y_train, bins=50, alpha=0.7, color='green', edgecolor='black')
    plt.axvline(x=0.3, color='red', linestyle='--', linewidth=2, label='Drought threshold')
    plt.axvline(x=y_train[drought_idx], color='orange', linestyle='--', linewidth=2, label='Example drought value')
    plt.xlabel('Scaled NDVI')
    plt.ylabel('Frequency')
    plt.title('Distribution of Target NDVI Values')
    plt.legend()
    plt.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    print(f"\n📊 Analysis:")
    print(f"Found {len(drought_indices)} sequences leading to drought")
    print(f"Drought threshold (scaled): 0.3")
    print(f"Example drought prediction: {y_train[drought_idx][0]:.4f}")
    print(f"Mean of input sequence: {X_train[drought_idx].mean():.4f}")
    print(f"Trend (last - first): {(X_train[drought_idx][-1] - X_train[drought_idx][0])[0]:.4f}")
else:
    print("No severe drought events found in training data (threshold: 0.3)")
    print("Try a higher threshold (e.g., 0.4) to find drought examples")

## 🏛️ Building the LSTM Model

Now let's build our LSTM model for drought prediction. We'll use an architecture suitable for time series forecasting.

In [None]:
def build_lstm_model(seq_length, n_features=1, lstm_units=[64, 32], 
                    dropout_rate=0.2, learning_rate=0.001):
    """
    Build an LSTM model for time series prediction.
    
    Parameters:
    -----------
    seq_length : int
        Length of input sequences
    n_features : int
        Number of features per time step
    lstm_units : list
        Number of units in each LSTM layer
    dropout_rate : float
        Dropout rate for regularization
    learning_rate : float
        Learning rate for Adam optimizer
    
    Returns:
    --------
    model : keras.Model
        Compiled LSTM model
    """
    
    model = Sequential([
        # First LSTM layer with return sequences
        LSTM(lstm_units[0], 
             activation='tanh',
             return_sequences=True,  # Return full sequence for next LSTM
             input_shape=(seq_length, n_features),
             name='lstm_1'),
        
        # Dropout for regularization
        Dropout(dropout_rate, name='dropout_1'),
        
        # Second LSTM layer
        LSTM(lstm_units[1], 
             activation='tanh',
             return_sequences=False,  # Only return last output
             name='lstm_2'),
        
        # Dropout
        Dropout(dropout_rate, name='dropout_2'),
        
        # Dense layer for final prediction
        Dense(16, activation='relu', name='dense_1'),
        
        # Output layer
        Dense(1, activation='linear', name='output')
    ])
    
    # Compile model
    optimizer = Adam(learning_rate=learning_rate)
    model.compile(
        optimizer=optimizer,
        loss='mse',
        metrics=['mae']
    )
    
    return model

# Build the model
model = build_lstm_model(
    seq_length=SEQ_LENGTH,
    n_features=1,
    lstm_units=[64, 32],
    dropout_rate=0.2,
    learning_rate=0.001
)

# Display model architecture
model.summary()

print("\n✅ Model built successfully!")
print(f"Total parameters: {model.count_params():,}")

## 🚀 Training the LSTM Model - COMPLETE SOLUTION

We'll train our model with early stopping to prevent overfitting and learning rate reduction for better convergence.

### Training Best Practices

1. **Early Stopping**: Stop training when validation loss stops improving
2. **Learning Rate Reduction**: Reduce LR when loss plateaus
3. **Batch Size**: Balance between stability (large) and generalization (small)
4. **Monitoring**: Track both training and validation metrics

In [None]:
# Define callbacks
early_stopping = EarlyStopping(
    monitor='val_loss',
    patience=15,
    restore_best_weights=True,
    verbose=1
)

reduce_lr = ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.5,
    patience=5,
    min_lr=1e-6,
    verbose=1
)

# SOLUTION: Complete training code
EPOCHS = 100
BATCH_SIZE = 32

print("🏋️ Training LSTM model...")
print(f"Epochs: {EPOCHS}")
print(f"Batch size: {BATCH_SIZE}")
print(f"Training samples: {len(X_train)}")
print(f"Validation samples: {len(X_val)}\n")

history = model.fit(
    X_train, y_train,
    epochs=EPOCHS,
    batch_size=BATCH_SIZE,
    validation_data=(X_val, y_val),
    callbacks=[early_stopping, reduce_lr],
    verbose=1
)

print("\n✅ Training completed!")

## 📊 Training Visualization and Analysis

In [None]:
def plot_training_history(history):
    """
    Visualize training and validation metrics.
    """
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
    
    # Plot loss
    ax1.plot(history.history['loss'], label='Training Loss', linewidth=2)
    ax1.plot(history.history['val_loss'], label='Validation Loss', linewidth=2)
    ax1.set_xlabel('Epoch')
    ax1.set_ylabel('Loss (MSE)')
    ax1.set_title('Model Loss During Training')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # Plot MAE
    ax2.plot(history.history['mae'], label='Training MAE', linewidth=2)
    ax2.plot(history.history['val_mae'], label='Validation MAE', linewidth=2)
    ax2.set_xlabel('Epoch')
    ax2.set_ylabel('MAE')
    ax2.set_title('Mean Absolute Error During Training')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # Print final metrics
    final_train_loss = history.history['loss'][-1]
    final_val_loss = history.history['val_loss'][-1]
    final_train_mae = history.history['mae'][-1]
    final_val_mae = history.history['val_mae'][-1]
    
    print("📈 Final Training Metrics:")
    print(f"Training Loss: {final_train_loss:.6f}")
    print(f"Validation Loss: {final_val_loss:.6f}")
    print(f"Training MAE: {final_train_mae:.6f}")
    print(f"Validation MAE: {final_val_mae:.6f}")
    
    if final_val_loss > final_train_loss * 1.5:
        print("\n⚠️ Warning: Model might be overfitting!")
    else:
        print("\n✅ Model generalization looks good!")

plot_training_history(history)

## 🔮 Model Evaluation and Predictions - COMPLETE SOLUTION

In [None]:
# SOLUTION: Complete prediction code
# Make predictions on validation set
y_pred_scaled = model.predict(X_val, verbose=0)

# Inverse transform to get actual NDVI values
y_pred = scaler.inverse_transform(y_pred_scaled)
y_val_actual = scaler.inverse_transform(y_val)

# Calculate metrics
mse = mean_squared_error(y_val_actual, y_pred)
mae = mean_absolute_error(y_val_actual, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_val_actual, y_pred)

print("📊 Validation Set Performance:")
print(f"Mean Squared Error: {mse:.6f}")
print(f"Root Mean Squared Error: {rmse:.6f}")
print(f"Mean Absolute Error: {mae:.6f}")
print(f"R² Score: {r2:.4f}")
print(f"\nIn NDVI terms:")
print(f"Average prediction error: ±{mae:.3f} NDVI units")
print(f"\n📈 Interpretation:")
print(f"  - The model explains {r2*100:.1f}% of the variance in NDVI values")
print(f"  - For drought detection (NDVI < 0.4), an error of ±{mae:.3f} is {'acceptable' if mae < 0.05 else 'moderate'}")

## 📈 Visualizing Predictions - COMPLETE SOLUTION

In [None]:
def visualize_predictions(y_true, y_pred, dates=None, n_points=100):
    """
    Visualize actual vs predicted NDVI values.
    """
    fig, axes = plt.subplots(2, 1, figsize=(14, 8))
    
    # Plot 1: Time series comparison
    ax1 = axes[0]
    x_axis = range(len(y_true[:n_points]))
    
    ax1.plot(x_axis, y_true[:n_points], 'g-', label='Actual NDVI', linewidth=2, alpha=0.7)
    ax1.plot(x_axis, y_pred[:n_points], 'b--', label='Predicted NDVI', linewidth=2, alpha=0.7)
    
    # Highlight areas with large errors
    errors = np.abs(y_true[:n_points].flatten() - y_pred[:n_points].flatten())
    large_errors = errors > 0.1
    if np.any(large_errors):
        ax1.scatter(np.where(large_errors)[0], y_true[:n_points][large_errors], 
                   color='red', s=30, alpha=0.5, label='Large errors (>0.1)')
    
    ax1.set_xlabel('Time Steps', fontsize=11)
    ax1.set_ylabel('NDVI', fontsize=11)
    ax1.set_title('LSTM Predictions vs Actual NDVI Values', fontsize=12, fontweight='bold')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # Plot 2: Scatter plot
    ax2 = axes[1]
    ax2.scatter(y_true, y_pred, alpha=0.5, s=10)
    
    # Add perfect prediction line
    min_val = min(y_true.min(), y_pred.min())
    max_val = max(y_true.max(), y_pred.max())
    ax2.plot([min_val, max_val], [min_val, max_val], 'r--', alpha=0.5, label='Perfect prediction', linewidth=2)
    
    ax2.set_xlabel('Actual NDVI', fontsize=11)
    ax2.set_ylabel('Predicted NDVI', fontsize=11)
    ax2.set_title('Prediction Accuracy Scatter Plot', fontsize=12, fontweight='bold')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

visualize_predictions(y_val_actual, y_pred)

## 🌾 Module 4: LSTM Applications in Philippine Earth Observation

### Real-World Applications

1. **Drought Forecasting (Mindanao)**
   - Predict drought 1-3 months ahead
   - Enable early warning for farmers
   - Support irrigation planning

2. **Crop Yield Prediction**
   - Combine NDVI with weather data
   - Forecast rice/corn yields
   - Support food security planning

3. **Phenology Analysis**
   - Track cropping calendars
   - Detect planting/harvest dates
   - Monitor seasonal shifts due to climate change

4. **Land Cover Change Detection**
   - Identify deforestation patterns
   - Monitor urban expansion
   - Track agricultural conversion

### Practical Deployment Considerations

For operational use in Philippine agencies:

1. **Data Requirements**
   - Minimum 2-3 years of historical data
   - Regular updates (weekly/bi-weekly)
   - Cloud-free observations critical

2. **Model Updates**
   - Retrain quarterly with new data
   - Validate against ground truth
   - Account for seasonal variations

3. **Integration with Existing Systems**
   - DOST-ASTI DATOS platform
   - PAGASA seasonal forecasts
   - PhilSA Space+ Dashboard

### 💭 Think-Through Discussion - ANSWER

**Question**: How would you modify this LSTM approach to integrate multiple data sources (e.g., Sentinel-1 SAR, Sentinel-2 optical, weather data) for improved drought prediction? What challenges might arise?

**Answer**: 
- **Architecture Modification**: Change input shape from (seq_length, 1) to (seq_length, n_features) to accommodate multiple variables
- **Normalization**: Each variable must be normalized separately as they have different scales
- **Temporal Alignment**: Ensure all data sources have the same temporal resolution or use interpolation
- **Missing Data**: SAR data may be more consistently available than optical data (cloud-independent)
- **Feature Engineering**: Could include derived indices like NDWI, EVI, VH/VV ratio
- **Challenges**: Different revisit times, data volume increases significantly, increased model complexity

## 🎯 Mini-Challenge 3 - SOLUTION: Drought Alert System

In [None]:
# SOLUTION: Mini-Challenge 3 - Implement drought alert system
def drought_alert_system(predicted_ndvi, historical_mean=0.7, threshold_mild=0.6, 
                         threshold_severe=0.4):
    """
    Generate drought alerts based on predicted NDVI values.
    
    Parameters:
    -----------
    predicted_ndvi : float
        Predicted NDVI value
    historical_mean : float
        Historical mean NDVI for the location
    threshold_mild : float
        Threshold for mild drought alert
    threshold_severe : float
        Threshold for severe drought alert
    
    Returns:
    --------
    dict : Alert information
    """
    
    deviation = ((predicted_ndvi - historical_mean) / historical_mean) * 100
    
    if predicted_ndvi < threshold_severe:
        alert_level = "SEVERE"
        color = "🔴"
        message = "SEVERE DROUGHT ALERT: Immediate action required"
        recommendations = [
            "Activate emergency irrigation systems",
            "Consider crop insurance claims",
            "Monitor daily for further deterioration",
            "Coordinate with PAGASA and DA for support"
        ]
    elif predicted_ndvi < threshold_mild:
        alert_level = "MODERATE"
        color = "🟡"
        message = "MODERATE DROUGHT WARNING: Monitor closely and prepare interventions"
        recommendations = [
            "Optimize irrigation schedules",
            "Monitor soil moisture levels",
            "Prepare drought-resistant crop varieties",
            "Update contingency plans"
        ]
    elif predicted_ndvi < historical_mean * 0.9:
        alert_level = "MILD"
        color = "🟠"
        message = "MILD STRESS DETECTED: Continue monitoring"
        recommendations = [
            "Maintain regular monitoring",
            "Check weather forecasts",
            "Ensure irrigation systems are functional"
        ]
    else:
        alert_level = "NORMAL"
        color = "🟢"
        message = "NORMAL CONDITIONS: No drought detected"
        recommendations = [
            "Continue routine monitoring",
            "Maintain preparedness for future events"
        ]
    
    return {
        'alert_level': alert_level,
        'message': f"{color} {message}",
        'ndvi_value': predicted_ndvi,
        'deviation_percent': deviation,
        'recommendations': recommendations
    }

# Test the alert system with various NDVI values
test_values = [0.3, 0.45, 0.55, 0.65, 0.75]

print("🚨 DROUGHT ALERT SYSTEM TEST\n")
print("="*70)

for val in test_values:
    alert = drought_alert_system(val)
    print(f"\nNDVI: {val:.2f}")
    print(f"Alert: {alert['message']}")
    print(f"Deviation from normal: {alert['deviation_percent']:.1f}%")
    print(f"Recommendations:")
    for i, rec in enumerate(alert['recommendations'], 1):
        print(f"  {i}. {rec}")
    print("-"*70)

# Test with actual predictions from our model
print("\n\n🔍 ANALYZING MODEL PREDICTIONS\n")
print("="*70)

# Find the lowest predicted NDVI (potential drought)
min_pred_idx = y_pred.argmin()
min_ndvi = y_pred[min_pred_idx][0]

print(f"\nLowest predicted NDVI in validation set: {min_ndvi:.3f}")
alert = drought_alert_system(min_ndvi)
print(f"\n{alert['message']}")
print(f"\nRecommended Actions:")
for i, rec in enumerate(alert['recommendations'], 1):
    print(f"  {i}. {rec}")

## 🔬 Advanced: Multi-Step Ahead Prediction - COMPLETE SOLUTION

In [None]:
def multi_step_prediction(model, initial_sequence, n_steps=3, scaler=None):
    """
    Predict multiple time steps into the future.
    
    Parameters:
    -----------
    model : keras.Model
        Trained LSTM model
    initial_sequence : np.array
        Initial sequence to start predictions from
    n_steps : int
        Number of steps to predict ahead
    scaler : MinMaxScaler
        Scaler to inverse transform predictions
    
    Returns:
    --------
    predictions : list
        List of predictions for each time step
    """
    
    current_sequence = initial_sequence.copy()
    predictions = []
    
    for step in range(n_steps):
        # Predict next time step
        next_pred = model.predict(current_sequence.reshape(1, -1, 1), verbose=0)
        predictions.append(next_pred[0, 0])
        
        # Update sequence: remove first element, add prediction
        current_sequence = np.append(current_sequence[1:], next_pred)
    
    # Inverse transform if scaler provided
    if scaler is not None:
        predictions = scaler.inverse_transform(np.array(predictions).reshape(-1, 1)).flatten()
    
    return predictions

# SOLUTION: Test multi-step prediction
print("🔮 MULTI-STEP AHEAD PREDICTION TEST\n")
print("="*70)

# Select a test sequence from validation set
test_sequence = X_val[0]
actual_future = y_val[:3]  # Next 3 actual values

# Make multi-step predictions
n_steps_ahead = 3
multi_predictions_scaled = multi_step_prediction(model, test_sequence, n_steps=n_steps_ahead)
multi_predictions = scaler.inverse_transform(np.array(multi_predictions_scaled).reshape(-1, 1)).flatten()

# Also get actual values
actual_values = scaler.inverse_transform(actual_future[:n_steps_ahead]).flatten()

# Print results
print(f"\nPredicting {n_steps_ahead} time steps ahead (approx. {n_steps_ahead*10} days)\n")

for i, (pred, actual) in enumerate(zip(multi_predictions, actual_values), 1):
    error = abs(pred - actual)
    error_pct = (error / actual) * 100
    
    print(f"Step {i} (Day {i*10}):")
    print(f"  Predicted NDVI: {pred:.3f}")
    print(f"  Actual NDVI:    {actual:.3f}")
    print(f"  Error:          {error:.3f} ({error_pct:.1f}%)")
    
    # Check drought status
    if pred < 0.4:
        print(f"  ⚠️  Drought conditions predicted!")
    print()

# Visualize multi-step predictions
plt.figure(figsize=(12, 5))

# Historical sequence
hist_sequence = scaler.inverse_transform(test_sequence).flatten()
time_historical = range(len(hist_sequence))
time_future = range(len(hist_sequence), len(hist_sequence) + n_steps_ahead)

plt.plot(time_historical, hist_sequence, 'b-o', label='Historical NDVI', linewidth=2)
plt.plot(time_future, multi_predictions, 'r--o', label='Predicted NDVI', linewidth=2, markersize=8)
plt.plot(time_future, actual_values, 'g-s', label='Actual NDVI', linewidth=2, markersize=8)

# Add drought threshold
plt.axhline(y=0.4, color='orange', linestyle=':', alpha=0.7, linewidth=2, label='Drought threshold')

plt.axvline(x=len(hist_sequence)-0.5, color='gray', linestyle='--', alpha=0.5)
plt.xlabel('Time Steps (10-day periods)', fontsize=11)
plt.ylabel('NDVI', fontsize=11)
plt.title('Multi-Step LSTM Forecast for Drought Monitoring', fontsize=12, fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Calculate and display overall accuracy
mae_multistep = mean_absolute_error(actual_values, multi_predictions)
print(f"\n📊 Multi-step Prediction Performance:")
print(f"Mean Absolute Error: {mae_multistep:.4f}")
print(f"\n💡 Insight: Prediction accuracy typically decreases with longer forecasting horizons.")
print(f"For operational drought early warning, 1-3 step ahead predictions are most reliable.")

## 📋 Key Takeaways

### What We've Learned

1. **Time Series in EO**: Critical for monitoring environmental changes and predicting future conditions

2. **RNN Limitations**: Vanilla RNNs suffer from vanishing/exploding gradients, limiting their ability to learn long-term dependencies

3. **LSTM Architecture**: Gates (forget, input, output) and cell state enable learning of both short and long-term patterns

4. **Implementation**: 
   - Data preparation with sliding windows
   - Normalization is crucial
   - Early stopping prevents overfitting
   - Multi-step predictions for operational forecasting

5. **Philippine Applications**: Drought monitoring in Mindanao is a critical use case with immediate practical value

### Best Practices

✅ **DO:**
- Normalize your data before training
- Use sufficient historical data (2+ years)
- Validate predictions against ground truth
- Consider ensemble approaches for operational systems
- Account for data gaps and cloud cover
- Implement proper error handling and alerts

❌ **DON'T:**
- Ignore seasonal patterns in your data
- Use too short sequences (< 6 time steps)
- Deploy without thorough validation
- Forget to retrain with new data periodically
- Over-rely on multi-step predictions without validation

### Next Steps

1. **Experiment**: Try different sequence lengths and LSTM architectures
2. **Enhance**: Add weather data and other indices (EVI, NDWI)
3. **Scale**: Apply to your area of interest using Google Earth Engine
4. **Integrate**: Connect with Philippine EO platforms (DATOS, Space+)
5. **Deploy**: Build operational systems with alert mechanisms

### 🚀 Extension Ideas

- **Bidirectional LSTMs**: Learn from past and future contexts
- **Attention Mechanisms**: Focus on most important time steps
- **Multi-variate Inputs**: Combine NDVI, temperature, precipitation, SAR
- **Ensemble Models**: Combine multiple LSTM models for robust predictions
- **Real-time Integration**: Connect to Google Earth Engine for live data streams

## 📚 References and Further Reading

### Scientific Papers
1. Hochreiter, S., & Schmidhuber, J. (1997). "Long Short-Term Memory." Neural Computation.
2. Rußwurm, M., & Körner, M. (2020). "Self-attention for raw optical Satellite Time Series Classification." ISPRS.
3. Interdonato, R., et al. (2019). "DuPLO: A DUal view Point deep Learning architecture for time series classificatiOn." ISPRS.
4. Nguyen, L. H., et al. (2020). "Monitoring agriculture areas with satellite images and deep learning." Applied Soft Computing.

### Philippine EO Resources
- PhilSA Space+ Dashboard: [https://space.philsa.gov.ph](https://space.philsa.gov.ph)
- DOST-ASTI DATOS: [https://datos.asti.dost.gov.ph](https://datos.asti.dost.gov.ph)
- PAGASA Drought Monitoring: [https://www.pagasa.dost.gov.ph](https://www.pagasa.dost.gov.ph)

### Tutorials and Documentation
- TensorFlow Time Series Tutorial: [https://www.tensorflow.org/tutorials/structured_data/time_series](https://www.tensorflow.org/tutorials/structured_data/time_series)
- Understanding LSTM Networks (Colah's Blog): [https://colah.github.io/posts/2015-08-Understanding-LSTMs/](https://colah.github.io/posts/2015-08-Understanding-LSTMs/)
- Google Earth Engine Time Series Guide: [https://developers.google.com/earth-engine/guides/reducers_reduce_region](https://developers.google.com/earth-engine/guides/reducers_reduce_region)

---

**End of Session 1: LSTMs for Earth Observation Time Series**

**INSTRUCTOR VERSION - Complete with all solutions**

Proceed to Session 2: Foundation Models and Transfer Learning 🚀