# üìò LSTM for Biogas Prediction (Production Ready)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/benmola/OpenAD-lib/blob/main/notebooks/03_LSTM_Prediction_Updated.ipynb)

This notebook demonstrates **LSTM-based biogas prediction** using the unified OpenAD-lib API.

**‚ö†Ô∏è This notebook uses the updated OpenAD-lib unified API**

---

## üìö References
- **LSTM for AD**: [Murali et al. (2025) - LAPSE](https://psecommunity.org/LAPSE:2025.0213)

## üî¨ LSTM Background

### Why LSTM for Time-Series?

Biogas production depends on **past substrate loading**, making it a time-series problem:
- **Input at t-1** affects output at **t**
- LSTM's internal memory captures these temporal dependencies

### LSTM Cell Equations

**Forget Gate** (what to forget from memory):
$$f_t = \sigma(W_f \cdot [h_{t-1}, x_t] + b_f)$$

**Input Gate** (what new info to store):
$$i_t = \sigma(W_i \cdot [h_{t-1}, x_t] + b_i)$$
$$\tilde{C}_t = \tanh(W_C \cdot [h_{t-1}, x_t] + b_C)$$

**Cell State Update**:
$$C_t = f_t \odot C_{t-1} + i_t \odot \tilde{C}_t$$

**Output Gate**:
$$o_t = \sigma(W_o \cdot [h_{t-1}, x_t] + b_o)$$
$$h_t = o_t \odot \tanh(C_t)$$

### Key Preprocessing: Time-Lagged Features

We use `series_to_supervised()` to create features like:
- `Maize(t-1)` ‚Üí predicts `Biogas(t)`
- `Wholecrop(t-1)` ‚Üí predicts `Biogas(t)`

This captures the **lag** between feeding and biogas production.

## 1Ô∏è‚É£ Setup (Google Colab)

In [None]:
# Install OpenAD-lib with ML dependencies (PyTorch, etc.)
!pip install git+https://github.com/benmola/OpenAD-lib.git

import sys
import os

IN_COLAB = 'google.colab' in sys.modules

if not IN_COLAB:
    sys.path.append(os.path.join(os.getcwd(), '..', 'src'))

print(f"Running in Colab: {IN_COLAB}")

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Unified Import
import openad_lib as openad

print("‚úÖ All imports successful!")

## 2Ô∏è‚É£ Load Time-Series Data

**Dataset:** `sample_LSTM_timeseries.csv`
- **424 daily samples** from a real biogas plant
- **Features:** Feedstock composition (Maize, Chicken Litter, etc.) in tonnes/day
- **Target:** Total biogas production (m¬≥/day)

In [None]:
# Download data for Colab
if IN_COLAB:
    !wget -q https://raw.githubusercontent.com/benmola/OpenAD-lib/main/src/openad_lib/data/sample_LSTM_timeseries.csv
    data_path = 'sample_LSTM_timeseries.csv'
else:
    base_path = os.path.dirname(os.getcwd())
    data_path = os.path.join(base_path, 'src', 'openad_lib', 'data', 'sample_LSTM_timeseries.csv')

# Load and inspect (Standard pandas load)
data = pd.read_csv(data_path).dropna()
print(f"üìä Loaded {len(data)} samples")
print(f"\nColumns: {list(data.columns)}")
data.head()

## 3Ô∏è‚É£ Simplified Preprocessing

The `LSTMModel` now handles scaling and feature engineering internally.
We just need to define our features and target, and prepare the lag dataset.

In [None]:
# Define features and target
features = ['Maize', 'Wholecrop', 'Chicken Litter', 'Lactose', 'Apple Pomace', 'Rice bran']
target = 'Total_Biogas'

# Initialize dummy model to access preprocessing utility (or use static method logic)
temp_model = openad.LSTMModel(input_dim=len(features))

# Prepare data (Creates lags, splits X/y)
# Note: Returns unscaled data. Scaling is handled by fit()
X, y, dataset = temp_model.prepare_time_series_data(
    data, 
    features, 
    target, 
    n_in=1  # 1 day lag
)

print(f"Processed Data Shape: {X.shape}")
print(f"Target Shape: {y.shape}")

In [None]:
# Chronological Split (80/20)
split_idx = int(len(X) * 0.8)

train_X, train_y = X[:split_idx], y[:split_idx]
test_X, test_y = X[split_idx:], y[split_idx:]

print(f"Training samples: {len(train_X)}")
print(f"Testing samples: {len(test_X)}")

## 4Ô∏è‚É£ Build and Train LSTM Model

**Architecture:**
- **Input:** Automatically determined by feature count * lags
- **Hidden:** 24 LSTM units
- **Output:** 1 value (biogas prediction)

**Training:**
- 50 epochs
- Adam optimizer (lr=0.001)
- MSE loss

In [None]:
# Initialize optimized LSTM model
lstm = openad.LSTMModel(
    input_dim=train_X.shape[1],
    hidden_dim=24,
    output_dim=1,
    dropout=0.1,
    learning_rate=0.001
)

# Train (handles scaling internally!)
print("üöÄ Training LSTM model...\n")
lstm.fit(train_X, train_y, epochs=50, batch_size=4, verbose=True)

## 5Ô∏è‚É£ Evaluate Model Performance

In [None]:
# Evaluate on test data (Returns dictionary of metrics)
metrics = lstm.evaluate(test_X, test_y)

print("üìä Test Set Evaluation:")
openad.utils.metrics.print_metrics(metrics)

## 6Ô∏è‚É£ Visualize Results

In [None]:
# Get predictions for plotting
y_pred = lstm.predict(test_X)

# Use unified plotting
openad.plots.plot_predictions(
    y_true=test_y,
    y_pred=y_pred,
    title="LSTM Prediction (Test Set)",
    xlabel="Time (days)",
    ylabel="Biogas Production (m¬≥/day)",
    show=True
)

## üìù Summary

This notebook demonstrated:

1. **Simplified API** - `openad.LSTMModel` handles low-level details
2. **Automatic Scaling** - No need for manual StandardScaler steps
3. **Data Prep Helper** - `prepare_time_series_data` handles lag creation
4. **Unified Plotting** - Consistent visuals

### Next Steps

- Compare with [Multi-Task GP](04_MTGP_Prediction_Updated.ipynb) for uncertainty quantification
- Try [ADM1](01_ADM1_Tutorial_Updated.ipynb) for process understanding