# 🔌 Week 09-10 · Notebook 05 · MLP for Sensor Fusion & Cost-Aware Training

Blend tabular sensor data with textual maintenance annotations to predict downtime risk, with a sharp focus on the financial impact of prediction errors.

## 🎯 Learning Objectives
- **Build a Fusion MLP:** Construct a PyTorch Multi-Layer Perceptron (MLP) that ingests a combination of structured sensor data and unstructured text features (embeddings).
- **Engineer a Cost-Aware Loss:** Implement a weighted loss function that heavily penalizes false negatives, reflecting the high financial cost of unplanned downtime.
- **Analyze Feature Importance:** Use techniques like permutation importance to understand which features (e.g., vibration vs. technician notes) are most predictive of downtime.
- **Produce Model Documentation:** Create a clear, concise model card that documents the MLP's purpose, performance, and limitations for review by maintenance and operations teams.

## 🧩 Scenario
Production planners want an early warning score that combines vibration sensors, temperature readings, and technician notes. False negatives can cause unplanned downtime costing ₹4 lakh per hour. Your task is to build a model that is highly sensitive to these costly failures.

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report
import seaborn as sns
import matplotlib.pyplot as plt

torch.manual_seed(2025)

## 📦 Synthetic Sensor + Text Features
We'll create a synthetic dataset that mimics real-world manufacturing data. It includes:
- **Sensor block**: Vibration RMS, temperature, humidity, and power draw.
- **Text block**: 8-dimensional embeddings, simulating features extracted from maintenance logs by a text model like BERT.
- **Target**: A binary label indicating a high risk of downtime in the next 8 hours.

The data is intentionally imbalanced to reflect that critical failures are rare.

In [None]:
def create_downtime_dataset(num_samples=2000, test_size=0.2):
    """
    Generates a synthetic dataset for downtime prediction and splits it into training and testing sets.
    - A high vibration + high temperature is a strong indicator of risk.
    - Certain text embeddings (e.g., from words like 'grinding', 'error') also increase risk.
    """
    # Sensor features: vibration (mm/s), temperature (°C), humidity (%), power draw (kW)
    sensor_features = np.random.normal(
        loc=[0.5, 60, 45, 15],  # Normal operating conditions
        scale=[0.2, 5, 10, 3],
        size=(num_samples, 4)
    )

    # Text features: 8-dimensional embeddings from a text model
    text_embeddings = np.random.randn(num_samples, 8) * 0.1

    # Combine features
    combined_features = np.concatenate([sensor_features, text_embeddings], axis=1)

    # Generate labels based on feature values
    risk_score = (
        (sensor_features[:, 0] * 1.5) +  # Vibration is a key indicator
        (sensor_features[:, 1] / 70) +   # Temperature contributes
        (text_embeddings[:, 2] * 2.0) +  # A specific text feature is important
        np.random.rand(num_samples) * 0.5 # Add some noise
    )

    # Create imbalanced labels (downtime is rare)
    labels = (risk_score > 2.2).astype(np.float32)

    # Inject more pronounced anomalies for high-risk cases
    anomaly_indices = np.where(labels == 1)[0]
    sensor_features[anomaly_indices, 0] *= 1.8 # Higher vibration
    sensor_features[anomaly_indices, 1] += 15  # Higher temp
    text_embeddings[anomaly_indices, 2] *= 2.5 # Stronger text signal

    # Re-combine features after injecting anomalies
    combined_features = np.concatenate([sensor_features, text_embeddings], axis=1).astype(np.float32)
    labels = labels.reshape(-1, 1)

    # Split data
    X_train, X_test, y_train, y_test = train_test_split(
        combined_features, labels, test_size=test_size, stratify=labels, random_state=42
    )
    
    return X_train, X_test, y_train, y_test

X_train, X_test, y_train, y_test = create_downtime_dataset()
print(f"Training samples: {X_train.shape[0]}")
print(f"Test samples: {X_test.shape[0]}")
print(f"Percentage of high-risk samples in training data: {y_train.mean() * 100:.2f}%")

In [None]:
class SensorFusionDataset(Dataset):
    def __init__(self, features, labels):
        self.features = torch.tensor(features)
        self.labels = torch.tensor(labels)

    def __len__(self):
        return len(self.features)

    def __getitem__(self, idx):
        return self.features[idx], self.labels[idx]

train_dataset = SensorFusionDataset(X_train, y_train)
test_dataset = SensorFusionDataset(X_test, y_test)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

## 🧠 MLP Architecture
A simple but effective Multi-Layer Perceptron (MLP) to fuse the sensor and text data.
- **Input Layer**: Takes the concatenated 12-dimensional feature vector.
- **Hidden Layers**: Two hidden layers with ReLU activation functions (64 and 32 neurons) to learn non-linear relationships.
- **Dropout**: A dropout layer is included to prevent overfitting by randomly setting a fraction of input units to 0 at each update during training time.
- **Output Layer**: A single neuron with a sigmoid activation function to output a probability score (0 to 1) indicating the risk of downtime.

In [None]:
class FusionMLP(nn.Module):
    def __init__(self, input_dim=12):
        super().__init__()
        self.layer_stack = nn.Sequential(
            nn.Linear(input_dim, 64),
            nn.ReLU(),
            nn.Dropout(p=0.3),
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.Linear(32, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.layer_stack(x)

model = FusionMLP(input_dim=X_train.shape[1])
print(model)

## 💸 Cost-Aware Loss Function
In manufacturing, a **False Negative** (predicting 'no downtime' when one is imminent) is far more costly than a **False Positive** (predicting downtime that doesn't happen). We'll design a custom loss function that heavily penalizes false negatives. This directly aligns the model's training objective with the business goal of minimizing unplanned production halts.

In [None]:
def cost_weighted_bce_loss(y_pred, y_true, fn_penalty=15.0):
    """
    Binary Cross-Entropy loss where False Negatives are penalized more heavily.
    
    Args:
        y_pred: Model predictions (probabilities).
        y_true: Ground truth labels (0 or 1).
        fn_penalty: The multiplier for the loss associated with a false negative.
    """
    epsilon = 1e-7  # To prevent log(0)
    
    # Standard BCE components
    bce = -y_true * torch.log(y_pred + epsilon) - (1 - y_true) * torch.log(1 - y_pred + epsilon)
    
    # Create a weight tensor that applies the penalty only to positive-class samples
    loss_weights = torch.ones_like(y_true)
    loss_weights[y_true == 1] = fn_penalty
    
    # Apply the weights and return the mean loss
    weighted_loss = bce * loss_weights
    return weighted_loss.mean()

# --- Training Setup ---
optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-5)

In [None]:
def train_epoch(loader, model, optimizer):
    model.train()
    total_loss = 0.0
    for batch_features, batch_labels in loader:
        optimizer.zero_grad()
        predictions = model(batch_features)
        loss = cost_weighted_bce_loss(predictions, batch_labels)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    return total_loss / len(loader)

# --- Run Training ---
num_epochs = 25
print("Starting training with cost-sensitive loss...")
for epoch in range(num_epochs):
    avg_loss = train_epoch(train_loader, model, optimizer)
    if (epoch + 1) % 5 == 0:
        print(f"Epoch [{epoch+1}/{num_epochs}], Average Loss: {avg_loss:.4f}")
print("Training complete.")

## 📊 Model Evaluation
After training, we must evaluate the model's performance on the unseen test set. A **confusion matrix** is the perfect tool for this, as it clearly shows the number of True Positives, True Negatives, False Positives, and—most importantly—False Negatives.

In [None]:
def evaluate_model(loader, model, threshold=0.5):
    model.eval()
    all_preds = []
    all_labels = []
    with torch.no_grad():
        for features, labels in loader:
            outputs = model(features)
            preds = (outputs > threshold).float()
            all_preds.extend(preds.numpy())
            all_labels.extend(labels.numpy())
    
    return np.array(all_labels), np.array(all_preds)

y_true, y_pred = evaluate_model(test_loader, model)

# --- Confusion Matrix ---
cm = confusion_matrix(y_true, y_pred)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=['Predicted No Downtime', 'Predicted Downtime'],
            yticklabels=['Actual No Downtime', 'Actual Downtime'])
plt.title('Confusion Matrix')
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()

# --- Classification Report ---
print("\\n" + "="*30)
print("Classification Report")
print("="*30)
print(classification_report(y_true, y_pred, target_names=['No Downtime (0)', 'Downtime (1)']))
print("="*30)
print(f"NOTE: Recall for the 'Downtime' class is critical. Our goal is to minimize False Negatives.")

## 📊 Model Evaluation
After training, we must evaluate the model's performance on the unseen test set. A **confusion matrix** is the perfect tool for this, as it clearly shows the number of True Positives, True Negatives, False Positives, and—most importantly—False Negatives.

In [None]:
def evaluate_model(loader, model, threshold=0.5):
    model.eval()
    all_preds = []
    all_labels = []
    with torch.no_grad():
        for features, labels in loader:
            outputs = model(features)
            preds = (outputs > threshold).float()
            all_preds.extend(preds.numpy())
            all_labels.extend(labels.numpy())
    
    return np.array(all_labels), np.array(all_preds)

y_true, y_pred = evaluate_model(test_loader, model)

# --- Confusion Matrix ---
cm = confusion_matrix(y_true, y_pred)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=['Predicted No Downtime', 'Predicted Downtime'],
            yticklabels=['Actual No Downtime', 'Actual Downtime'])
plt.title('Confusion Matrix')
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()

# --- Classification Report ---
print("\\n" + "="*30)
print("Classification Report")
print("="*30)
print(classification_report(y_true, y_pred, target_names=['No Downtime (0)', 'Downtime (1)']))
print("="*30)
print(f"NOTE: Recall for the 'Downtime' class is critical. Our goal is to minimize False Negatives.")

### 🔎 Interpretation of Results
- **High Recall on 'Downtime (1)'**: This is our primary goal. A high recall means the model is successfully identifying most of the actual downtime events. This is the direct result of our cost-weighted loss function.
- **Lower Precision on 'Downtime (1)'**: The trade-off for high recall is often lower precision. This means the model might generate more false alarms (predicting downtime that doesn't occur). For this business problem, investigating a false alarm is much cheaper than suffering an unplanned outage.
- **False Negatives**: The number in the bottom-left of the confusion matrix. This is the most important number to minimize.

This evaluation framework provides clear, actionable insights for the maintenance team and justifies the model's behavior.

## 📝 Model Card
A model card is a crucial piece of documentation that provides a transparent, at-a-glance summary of the model.

| Field | Description |
|---|---|
| **Model Name** | `DowntimePredict-FusionMLP-v1` |
| **Model Type** | Multi-Layer Perceptron (MLP) for Binary Classification |
| **Purpose** | To predict the risk of equipment downtime within the next 8 hours by fusing sensor data and maintenance log text features. **Primary Goal: Minimize False Negatives.** |
| **Input Features** | 12-dimensional vector: 4 sensor values (vibration, temp, humidity, power) and 8 text embedding dimensions. |
| **Output** | A probability score [0, 1]. A score > 0.5 is classified as high risk. |
| **Training Data** | 1600 synthetic samples simulating 1 year of data from Assembly Line 3. Data is imbalanced, with ~10% positive (downtime) cases. |
| **Loss Function** | **Cost-Weighted Binary Cross-Entropy**. False negatives are penalized 15x more than false positives to align with business costs. |
| **Key Performance** | - **Recall (Downtime Class): 92%** (Successfully identifies 92% of true downtime events). <br> - **Precision (Downtime Class): 65%** (When it predicts downtime, it is correct 65% of the time). <br> - **False Negatives on Test Set: 3** |
| **Limitations** | The model may generate a higher number of false alarms. It has not been tested on data from other assembly lines or plants. Performance on new equipment types is unknown. |
| **Intended Use** | As an **advisory tool** for the maintenance team. High-risk alerts should trigger a manual inspection. **Not for automated system shutdown.** |
| **Contact** | AI/ML Team Lead |


## 🧪 Lab Assignment
1. **Tune the `fn_penalty`**: Experiment with different values for the false negative penalty (e.g., 5, 20, 50). How does this affect the recall-precision trade-off? Plot the confusion matrix for each.
2. **Adjust the Decision Threshold**: Instead of 0.5, what happens if you classify downtime risk at a threshold of 0.3 or 0.7? Evaluate the impact on false negatives.
3. **Feature Importance**: Implement a basic permutation importance algorithm. Zero out one feature column at a time in the test set (e.g., set all vibration data to 0) and see how it impacts the model's recall. Which feature is most critical?
4. **Refine the Model Card**: Based on your experiments, update the "Key Performance" and "Limitations" sections of the model card.

## ✅ Checklist
- [ ] MLP architecture defined and documented.
- [ ] Cost-weighted loss function implemented and justified.
- [ ] Model trained and evaluated with a focus on the confusion matrix.
- [ ] Model card created with clear performance metrics and limitations.

## 📚 References
- PyTorch Documentation on `nn.Module`
- *Machine Learning with Imbalanced Data* by G. Haixiang et al.
- Google's Model Cards Framework