# üìò Multi-Task Gaussian Process (Production Ready)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/benmola/OpenAD-lib/blob/main/notebooks/04_MTGP_Prediction_Updated.ipynb)

Predicting **multiple AD outputs** (SCOD, VFA, Biogas) with **uncertainty quantification** using unified OpenAD-lib API.

**‚ö†Ô∏è This notebook uses the updated OpenAD-lib unified API**

---

## üìö References
- **MTGP for AD**: [Dekhici et al. (2025) - LAPSE](https://psecommunity.org/LAPSE:2025.0155)

## üî¨ Gaussian Process Background

### What is a Gaussian Process?

A GP defines a **distribution over functions**:

$$f(x) \sim \mathcal{GP}(m(x), k(x, x'))$$

Where:
- $m(x)$ = mean function (usually 0)
- $k(x, x')$ = **kernel** function measuring similarity

### Why GPs for Biogas?

1. **Uncertainty Quantification** - Get confidence intervals for free!
2. **Data Efficient** - Work well with small datasets (50-200 samples)
3. **Non-parametric** - No assumptions about functional form

### Multi-Task Learning with LMC

**Problem:** Predict 3 correlated outputs (SCOD, VFA, Biogas)

**Solution:** Linear Model of Coregionalization (LMC)

$$f_t(x) = \sum_{q=1}^{Q} a_{t,q} \cdot u_q(x)$$

- $f_t$ = function for task $t$ (e.g., VFA prediction)
- $u_q$ = shared latent function $q$
- $a_{t,q}$ = weight (learned automatically)

**Key Insight:** VFA and Biogas are correlated ‚Üí share information!

### Predictive Distribution

$$p(f_* | X_*, X, Y) = \mathcal{N}(\mu_*, \Sigma_*)$$

We get:
- **Mean prediction:** $\mu_*$
- **Uncertainty:** $\pm 2\sigma_*$ (95% confidence interval)

## 1Ô∏è‚É£ Setup

In [None]:
# Install with ML dependencies (GPyTorch, PyTorch)
!pip install git+https://github.com/benmola/OpenAD-lib.git

import sys
import os

IN_COLAB = 'google.colab' in sys.modules

if not IN_COLAB:
    sys.path.append(os.path.join(os.getcwd(), '..', 'src'))

print(f"Running in Colab: {IN_COLAB}")

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Unified Import
import openad_lib as openad

print("‚úÖ Imports successful!")

## 2Ô∏è‚É£ Load Multi-Output AD Data

**Dataset:** `sample_ad_process_data.csv`

**Inputs (5):**
- `time` - Day number
- `D` - Dilution rate (1/day)
- `SCODin` - Influent SCOD (g COD/L)
- `OLR` - Organic Loading Rate (g COD/L/day)
- `pH` - Reactor pH

**Outputs (3):** All correlated!
- `SCODout` - Effluent SCOD ‚Üí waste
- `VFAout` - VFA concentration ‚Üí process stability indicator
- `Biogas` - Biogas production ‚Üí revenue

**Why predict all 3?** VFA ‚Üë often means Biogas ‚Üì (process inhibition)

In [None]:
# Download for Colab
if IN_COLAB:
    !wget -q https://raw.githubusercontent.com/benmola/OpenAD-lib/main/src/openad_lib/data/sample_ad_process_data.csv
    data_path = 'sample_ad_process_data.csv'
else:
    base_path = os.path.dirname(os.getcwd())
    data_path = os.path.join(base_path, 'src', 'openad_lib', 'data', 'sample_ad_process_data.csv')

# Load using pandas (Unified API might have load_sample_data but path logic here simpler for colab)
data = pd.read_csv(data_path)
print(f"üìä Loaded {len(data)} samples")
data.head()

In [None]:
# CRITICAL: Define columns explicitly
input_cols = ['time', 'D', 'SCODin', 'OLR', 'pH']
output_cols = ['SCODout', 'VFAout', 'Biogas']

# Extract data
X = data[input_cols].values
Y = data[output_cols].values

print(f"\nInput shape: {X.shape} (5 features)")
print(f"Output shape: {Y.shape} (3 tasks)")

In [None]:
# Alternating indices (Interpolation Split)
train_indices = np.arange(1, len(X), 2)  # [1, 3, 5, 7, ...]
test_indices = np.arange(0, len(X), 2)   # [0, 2, 4, 6, ...]

X_train, X_test = X[train_indices], X[test_indices]
Y_train, Y_test = Y[train_indices], Y[test_indices]

print(f"Training samples: {len(X_train)}")
print(f"Testing samples: {len(X_test)}")

## 4Ô∏è‚É£ MTGP Model Configuration

**Hyperparameters:**
- `num_tasks=3` (SCOD, VFA, Biogas)
- `num_latents=3` (Correlated tasks)
- `log_transform=True` (Positive-only outputs)

In [None]:
# Initialize MTGP (Unified API)
print(f"üîß Initializing MTGP with {Y.shape[1]} tasks...")

mtgp = openad.MultitaskGP(
    num_tasks=Y.shape[1],
    num_latents=min(3, Y.shape[1]),
    n_inducing=60,
    learning_rate=0.1,
    log_transform=True
)

print("‚úÖ Model initialized")

In [None]:
print("üöÄ Training MTGP (500 iterations)...\n")
mtgp.fit(X_train, Y_train, epochs=500, verbose=True)
print("\n‚úÖ Training complete!")

## 6Ô∏è‚É£ Predict with Uncertainty

**GP provides 3 values per prediction:**
1. **Mean** (Prediction)
2. **Lower Bound** (2.5%)
3. **Upper Bound** (97.5%)


In [None]:
print("üîÆ Predicting on test set for metrics...")
# Evaluate on test set
metrics = mtgp.evaluate(X_test, Y_test, task_names=output_cols)
print("\nüìä MTGP Test Metrics:")
for task, vals in metrics.items():
     print(f"{task:10s}: RMSE={vals['rmse']:.4f}, MAE={vals['mae']:.4f}, R¬≤={vals['r2']:.4f}")

## 8Ô∏è‚É£ Visualize Predictions with Uncertainty

We predict on the **entire dataset** to visualize the interpolation (training points) and prediction (test points) across the full timeline.

In [None]:
# Predict on FULL dataset for visualization
print("üîÆ Generating full timeline predictions...")
mean_all, lower_all, upper_all = mtgp.predict(X, return_std=True)

# Visualize using Unified Plotting
openad.plots.plot_multi_output(
    y_true=Y,
    y_pred=mean_all,
    x=X[:, 0],  # Time column
    y_lower=lower_all,
    y_upper=upper_all,
    train_indices=train_indices,
    test_indices=test_indices,
    output_names=output_cols,
    title="MTGP Predictions with Uncertainty (Full Timeline)",
    xlabel="Time (days)",
    show=True
)

## üìù Summary

This notebook demonstrated:

1. **Multi-Task GP** - Predicting 3 correlated outputs jointly
2. **Uncertainty Quantification** - 95% confidence intervals
3. **Unified API** - Simple `fit`/`predict` interface

### Next Steps
- Compare with [LSTM](03_LSTM_Prediction_Updated.ipynb)
- Apply to [MPC Control](05_MPC_Control_Updated.ipynb) with uncertainty