# Mamba (Selective State Space Models) for Predictive Maintenance

This notebook implements **Mamba-style State Space Models (SSMs)** for condition monitoring tasks:

1. **SSM for RUL Prediction** - Remaining Useful Life estimation
2. **SSM for Anomaly Detection** - Autoencoder-based
3. **Selective SSM for Classification** - Fault type classification

## Why Mamba/SSMs for Predictive Maintenance?

| Advantage | Description |
|-----------|-------------|
| **Linear Complexity** | O(n) vs O(n²) for Transformers - handles very long sequences |
| **Input-Selective** | Parameters adapt based on input content |
| **Memory Efficient** | Constant memory during inference |
| **Long-Range Dependencies** | Captures trends over 10,000+ time steps |

## Mamba State Space Equation

```
h_t = A_t · h_{t-1} + B_t · x_t    (State update - input dependent!)
y_t = C_t · h_t + D · x_t           (Output)
```

The key innovation is that A, B, C are **functions of the input**, making the model selective.

## Architecture Overview

```
Input → Linear Projection → SSM Block(s) → Output Head
[T,F]      [T, D]            [T, D]         [1] or [C]
```

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns
import os
import json

np.random.seed(42)

# Check TensorFlow availability
try:
    import tensorflow as tf
    from tensorflow import keras
    from tensorflow.keras import layers
    print(f"TensorFlow {tf.__version__} available")
    print(f"GPU: {tf.config.list_physical_devices('GPU')}")
    HAS_TF = True
except ImportError:
    print("TensorFlow not available - please install: pip install tensorflow")
    HAS_TF = False

# Output directories
DATA_DIR = '../data/simulated'
MODEL_DIR = '../models/mamba'
os.makedirs(MODEL_DIR, exist_ok=True)
os.makedirs(DATA_DIR, exist_ok=True)

print("Setup complete!")

## 1. State Space Model Building Blocks

We implement the core SSM components in TensorFlow/Keras.

### 1.1 Discretization

Continuous SSM parameters are discretized using Zero-Order Hold (ZOH):

```
Ā = exp(Δ · A)
B̄ = (Δ · A)^{-1} · (Ā - I) · Δ · B ≈ Δ · B  (simplified)
```

In [None]:
if HAS_TF:
    
    class S4DKernel(layers.Layer):
        """
        Simplified S4D-style SSM kernel.
        
        Uses diagonal state matrix for efficiency.
        """
        
        def __init__(self, d_model, d_state=64, **kwargs):
            super().__init__(**kwargs)
            self.d_model = d_model
            self.d_state = d_state
            
        def build(self, input_shape):
            # Initialize A as diagonal (negative for stability)
            # Use HiPPO initialization for better long-range memory
            A_real = -0.5 * np.ones(self.d_state)
            A_imag = np.pi * np.arange(self.d_state)
            
            self.A_log = self.add_weight(
                name='A_log',
                shape=(self.d_model, self.d_state),
                initializer=keras.initializers.Constant(
                    np.log(np.abs(A_real) + 1e-8)
                ),
                trainable=True
            )
            
            self.B = self.add_weight(
                name='B',
                shape=(self.d_model, self.d_state),
                initializer='glorot_uniform',
                trainable=True
            )
            
            self.C = self.add_weight(
                name='C',
                shape=(self.d_model, self.d_state),
                initializer='glorot_uniform',
                trainable=True
            )
            
            self.D = self.add_weight(
                name='D',
                shape=(self.d_model,),
                initializer='ones',
                trainable=True
            )
            
            # Delta (step size) - learned per channel
            self.log_delta = self.add_weight(
                name='log_delta',
                shape=(self.d_model,),
                initializer=keras.initializers.Constant(np.log(0.1)),
                trainable=True
            )
            
            super().build(input_shape)
            
        def call(self, inputs):
            """
            Apply SSM to input sequence.
            
            Args:
                inputs: [batch, seq_len, d_model]
                
            Returns:
                outputs: [batch, seq_len, d_model]
            """
            batch_size = tf.shape(inputs)[0]
            seq_len = tf.shape(inputs)[1]
            
            # Discretize
            delta = tf.exp(self.log_delta)  # [d_model]
            A = -tf.exp(self.A_log)  # [d_model, d_state] - ensure negative
            
            # Discretized A and B (ZOH approximation)
            A_bar = tf.exp(delta[:, None] * A)  # [d_model, d_state]
            B_bar = delta[:, None] * self.B  # [d_model, d_state]
            
            # Recurrent computation (could be parallelized with scan)
            def ssm_step(h, x):
                # h: [batch, d_model, d_state]
                # x: [batch, d_model]
                h_new = A_bar * h + B_bar * x[:, :, None]
                y = tf.reduce_sum(self.C * h_new, axis=-1) + self.D * x
                return h_new, y
            
            # Initial state
            h0 = tf.zeros((batch_size, self.d_model, self.d_state))
            
            # Transpose for scan: [seq_len, batch, d_model]
            inputs_t = tf.transpose(inputs, [1, 0, 2])
            
            # Run SSM
            _, outputs = tf.scan(
                lambda h, x: ssm_step(h, x),
                inputs_t,
                initializer=h0
            )
            
            # Transpose back: [batch, seq_len, d_model]
            outputs = tf.transpose(outputs, [1, 0, 2])
            
            return outputs
        
        def get_config(self):
            config = super().get_config()
            config.update({
                'd_model': self.d_model,
                'd_state': self.d_state,
            })
            return config
    
    print("S4DKernel layer defined")

In [None]:
if HAS_TF:
    
    class SelectiveSSM(layers.Layer):
        """
        Selective State Space Model (Mamba-style).
        
        Key difference from S4: B, C, delta are functions of the input.
        """
        
        def __init__(self, d_model, d_state=16, d_conv=4, expand=2, **kwargs):
            super().__init__(**kwargs)
            self.d_model = d_model
            self.d_state = d_state
            self.d_conv = d_conv
            self.expand = expand
            self.d_inner = d_model * expand
            
        def build(self, input_shape):
            # Input projection
            self.in_proj = layers.Dense(self.d_inner * 2, use_bias=False)
            
            # 1D convolution for local context
            self.conv1d = layers.Conv1D(
                self.d_inner, 
                kernel_size=self.d_conv,
                padding='causal',
                groups=self.d_inner
            )
            
            # Selective projections (input-dependent B, C, delta)
            self.x_proj = layers.Dense(self.d_state * 2 + 1, use_bias=False)
            
            # Fixed A parameter (diagonal, negative for stability)
            A = np.arange(1, self.d_state + 1, dtype=np.float32)
            self.A_log = self.add_weight(
                name='A_log',
                shape=(self.d_inner, self.d_state),
                initializer=keras.initializers.Constant(
                    np.log(np.tile(A, (self.d_inner, 1)))
                ),
                trainable=True
            )
            
            self.D = self.add_weight(
                name='D',
                shape=(self.d_inner,),
                initializer='ones',
                trainable=True
            )
            
            # Output projection
            self.out_proj = layers.Dense(self.d_model, use_bias=False)
            
            super().build(input_shape)
            
        def call(self, inputs, training=None):
            """
            Args:
                inputs: [batch, seq_len, d_model]
                
            Returns:
                outputs: [batch, seq_len, d_model]
            """
            batch_size = tf.shape(inputs)[0]
            seq_len = tf.shape(inputs)[1]
            
            # Project and split
            xz = self.in_proj(inputs)  # [batch, seq, d_inner*2]
            x, z = tf.split(xz, 2, axis=-1)  # each [batch, seq, d_inner]
            
            # Conv for local context
            x = self.conv1d(x)
            x = tf.nn.silu(x)
            
            # Selective parameters from input
            x_proj = self.x_proj(x)  # [batch, seq, d_state*2 + 1]
            
            # Split into B, C, delta
            B = x_proj[:, :, :self.d_state]  # [batch, seq, d_state]
            C = x_proj[:, :, self.d_state:2*self.d_state]
            delta = tf.nn.softplus(x_proj[:, :, -1:])  # [batch, seq, 1]
            
            # A is fixed (but learned), negative for stability
            A = -tf.exp(self.A_log)  # [d_inner, d_state]
            
            # Discretize (per timestep due to selective delta)
            # A_bar = exp(delta * A), B_bar = delta * B
            delta_A = delta[:, :, :, None] * A[None, None, :, :]  # [batch, seq, d_inner, d_state]
            A_bar = tf.exp(delta_A)
            
            delta_B = delta * B  # [batch, seq, d_state]
            
            # Recurrent SSM
            def selective_ssm_step(h, inputs_t):
                x_t, A_bar_t, dB_t, C_t = inputs_t
                # h: [batch, d_inner, d_state]
                # x_t: [batch, d_inner]
                # A_bar_t: [batch, d_inner, d_state]
                # dB_t: [batch, d_state]
                # C_t: [batch, d_state]
                
                h_new = A_bar_t * h + x_t[:, :, None] * dB_t[:, None, :]
                y = tf.reduce_sum(h_new * C_t[:, None, :], axis=-1)
                return h_new, y
            
            # Prepare for scan
            h0 = tf.zeros((batch_size, self.d_inner, self.d_state))
            
            scan_inputs = (
                tf.transpose(x, [1, 0, 2]),  # [seq, batch, d_inner]
                tf.transpose(A_bar, [1, 0, 2, 3]),  # [seq, batch, d_inner, d_state]
                tf.transpose(delta_B, [1, 0, 2]),  # [seq, batch, d_state]
                tf.transpose(C, [1, 0, 2]),  # [seq, batch, d_state]
            )
            
            _, y = tf.scan(
                selective_ssm_step,
                scan_inputs,
                initializer=h0
            )
            
            # y: [seq, batch, d_inner]
            y = tf.transpose(y, [1, 0, 2])  # [batch, seq, d_inner]
            
            # Add skip connection (D parameter)
            y = y + self.D * x
            
            # Gate with z
            y = y * tf.nn.silu(z)
            
            # Output projection
            return self.out_proj(y)
        
        def get_config(self):
            config = super().get_config()
            config.update({
                'd_model': self.d_model,
                'd_state': self.d_state,
                'd_conv': self.d_conv,
                'expand': self.expand,
            })
            return config
    
    print("SelectiveSSM (Mamba-style) layer defined")

In [None]:
if HAS_TF:
    
    class MambaBlock(layers.Layer):
        """
        Complete Mamba block with normalization and residual.
        """
        
        def __init__(self, d_model, d_state=16, d_conv=4, expand=2, dropout=0.1, **kwargs):
            super().__init__(**kwargs)
            self.d_model = d_model
            self.norm = layers.LayerNormalization(epsilon=1e-6)
            self.ssm = SelectiveSSM(d_model, d_state, d_conv, expand)
            self.dropout = layers.Dropout(dropout)
            
        def call(self, inputs, training=None):
            x = self.norm(inputs)
            x = self.ssm(x, training=training)
            x = self.dropout(x, training=training)
            return inputs + x
        
        def get_config(self):
            config = super().get_config()
            config.update({'d_model': self.d_model})
            return config
    
    print("MambaBlock defined")

## 2. Generate Training Data

Generate long sequences with degradation patterns - where Mamba excels.

In [None]:
def generate_long_degradation_sequences(n_sequences=100, seq_length=1000, n_features=8):
    """
    Generate long degradation sequences for RUL prediction.
    
    These long sequences (1000+ timesteps) are where Mamba shines
    due to its O(n) complexity vs Transformer's O(n²).
    """
    X = []
    y = []
    
    for i in range(n_sequences):
        # Random total life (in cycles)
        total_life = np.random.randint(200, max(201, seq_length - 50))
        
        # Time array
        t = np.arange(seq_length)
        
        # RUL at each timestep (capped at 0 after failure)
        rul = np.maximum(0, total_life - t)
        
        # Generate multi-sensor data with degradation
        features = np.zeros((seq_length, n_features))
        
        # Health indicator (decreases over time)
        health = 1.0 - (t / total_life).clip(0, 1)
        degradation = 1.0 - health
        
        # Feature 0: Vibration RMS (increases with wear)
        base_vib = 0.5 + 2.0 * degradation ** 1.5
        features[:, 0] = base_vib + np.random.normal(0, 0.1, seq_length)
        
        # Feature 1: Temperature (gradual increase)
        base_temp = 60 + 40 * degradation
        features[:, 1] = base_temp + np.random.normal(0, 2, seq_length)
        
        # Feature 2: Oil pressure (decreases)
        base_pressure = 5.0 - 2.0 * degradation
        features[:, 2] = base_pressure + np.random.normal(0, 0.2, seq_length)
        
        # Feature 3: Motor current (increases with friction)
        base_current = 10 + 5 * degradation ** 2
        features[:, 3] = base_current + np.random.normal(0, 0.5, seq_length)
        
        # Feature 4-7: Additional sensors with various patterns
        for f in range(4, n_features):
            noise_level = 0.1 + 0.2 * degradation
            base = np.sin(2 * np.pi * t / (100 + f * 20)) * (1 + degradation)
            features[:, f] = base + np.random.normal(0, noise_level, seq_length)
        
        X.append(features)
        y.append(rul)
    
    return np.array(X), np.array(y)

# Generate data
print("Generating long degradation sequences...")
X_long, y_long = generate_long_degradation_sequences(n_sequences=200, seq_length=500)
print(f"Generated: X={X_long.shape}, y={y_long.shape}")
print(f"Sequence length: {X_long.shape[1]} timesteps (Mamba handles this efficiently!)")

In [None]:
# Visualize example sequences
fig, axes = plt.subplots(2, 2, figsize=(14, 8))

for idx, ax_row in enumerate(axes):
    seq_idx = idx * 50  # Different sequences
    
    ax_row[0].plot(X_long[seq_idx, :, 0], label='Vibration', alpha=0.7)
    ax_row[0].plot(X_long[seq_idx, :, 1] / 20, label='Temp/20', alpha=0.7)
    ax_row[0].set_title(f'Sequence {seq_idx}: Sensor Data')
    ax_row[0].legend()
    ax_row[0].set_xlabel('Time Steps')
    
    ax_row[1].plot(y_long[seq_idx], 'r-', linewidth=2)
    ax_row[1].set_title(f'Sequence {seq_idx}: RUL')
    ax_row[1].set_xlabel('Time Steps')
    ax_row[1].set_ylabel('Remaining Useful Life')
    ax_row[1].axhline(y=0, color='k', linestyle='--', alpha=0.3)

plt.tight_layout()
plt.savefig(f'{MODEL_DIR}/long_sequence_examples.png', dpi=150, bbox_inches='tight')
plt.show()
print(f"Saved: {MODEL_DIR}/long_sequence_examples.png")

## 3. Build Mamba-based RUL Model

In [None]:
if HAS_TF:
    
    def build_mamba_rul_model(
        seq_length,
        n_features,
        d_model=64,
        n_layers=4,
        d_state=16,
        dropout=0.1
    ):
        """
        Build a Mamba-based model for RUL prediction.
        
        Outputs RUL at each timestep (sequence-to-sequence).
        """
        inputs = keras.Input(shape=(seq_length, n_features))
        
        # Input projection
        x = layers.Dense(d_model)(inputs)
        
        # Stack Mamba blocks
        for i in range(n_layers):
            x = MambaBlock(
                d_model=d_model,
                d_state=d_state,
                dropout=dropout,
                name=f'mamba_block_{i}'
            )(x)
        
        # Final normalization
        x = layers.LayerNormalization(epsilon=1e-6)(x)
        
        # Output: RUL at each timestep
        outputs = layers.Dense(1, activation='relu')(x)
        outputs = tf.squeeze(outputs, axis=-1)
        
        model = keras.Model(inputs=inputs, outputs=outputs)
        return model
    
    # Build model
    seq_length = X_long.shape[1]
    n_features = X_long.shape[2]
    
    mamba_rul_model = build_mamba_rul_model(
        seq_length=seq_length,
        n_features=n_features,
        d_model=64,
        n_layers=3,  # Fewer layers for faster training
        d_state=16
    )
    
    mamba_rul_model.summary()

In [None]:
if HAS_TF:
    # Prepare data
    scaler = StandardScaler()
    X_scaled = np.zeros_like(X_long)
    for i in range(X_long.shape[0]):
        X_scaled[i] = scaler.fit_transform(X_long[i])
    
    # Normalize RUL (divide by max possible)
    y_normalized = y_long / 500.0  # Max sequence length
    
    # Train/test split
    X_train, X_test, y_train, y_test = train_test_split(
        X_scaled, y_normalized, test_size=0.2, random_state=42
    )
    
    print(f"Training: {X_train.shape}, Test: {X_test.shape}")
    
    # Compile
    mamba_rul_model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=1e-3),
        loss='mse',
        metrics=['mae']
    )
    
    # Callbacks
    callbacks = [
        keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True),
        keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=5)
    ]
    
    # Train
    print("\nTraining Mamba RUL model...")
    history = mamba_rul_model.fit(
        X_train, y_train,
        validation_split=0.15,
        epochs=50,
        batch_size=16,
        callbacks=callbacks,
        verbose=1
    )

In [None]:
if HAS_TF:
    # Evaluate
    y_pred = mamba_rul_model.predict(X_test)
    
    # Scale back to original RUL values
    y_test_orig = y_test * 500
    y_pred_orig = y_pred * 500
    
    # Calculate metrics (on last timestep - when we care most)
    rmse = np.sqrt(mean_squared_error(y_test_orig[:, -1], y_pred_orig[:, -1]))
    mae = mean_absolute_error(y_test_orig[:, -1], y_pred_orig[:, -1])
    r2 = r2_score(y_test_orig[:, -1], y_pred_orig[:, -1])
    
    print(f"\nMamba RUL Model Results (Final Timestep):")
    print(f"  RMSE: {rmse:.2f} cycles")
    print(f"  MAE:  {mae:.2f} cycles")
    print(f"  R²:   {r2:.4f}")
    
    # Visualize predictions
    fig, axes = plt.subplots(1, 3, figsize=(15, 4))
    
    # Training history
    axes[0].plot(history.history['loss'], label='Train')
    axes[0].plot(history.history['val_loss'], label='Val')
    axes[0].set_xlabel('Epoch')
    axes[0].set_ylabel('Loss')
    axes[0].set_title('Training History')
    axes[0].legend()
    
    # Actual vs Predicted scatter
    axes[1].scatter(y_test_orig[:, -1], y_pred_orig[:, -1], alpha=0.5)
    axes[1].plot([0, 500], [0, 500], 'r--', label='Perfect')
    axes[1].set_xlabel('Actual RUL')
    axes[1].set_ylabel('Predicted RUL')
    axes[1].set_title(f'Mamba RUL (R²={r2:.3f})')
    axes[1].legend()
    
    # Example sequence prediction
    idx = 0
    axes[2].plot(y_test_orig[idx], label='Actual', linewidth=2)
    axes[2].plot(y_pred_orig[idx], label='Predicted', linewidth=2, alpha=0.7)
    axes[2].set_xlabel('Time Step')
    axes[2].set_ylabel('RUL')
    axes[2].set_title('Sequence RUL Prediction')
    axes[2].legend()
    
    plt.tight_layout()
    plt.savefig(f'{MODEL_DIR}/mamba_rul_results.png', dpi=150, bbox_inches='tight')
    plt.show()

## 4. Mamba Anomaly Detection (Autoencoder)

Use Mamba as encoder/decoder for reconstruction-based anomaly detection.

In [None]:
if HAS_TF:
    
    def build_mamba_autoencoder(
        seq_length,
        n_features,
        d_model=32,
        latent_dim=16,
        n_layers=2
    ):
        """
        Mamba-based Autoencoder for anomaly detection.
        
        Anomalies have high reconstruction error.
        """
        inputs = keras.Input(shape=(seq_length, n_features))
        
        # Encoder
        x = layers.Dense(d_model)(inputs)
        for i in range(n_layers):
            x = MambaBlock(d_model, name=f'enc_mamba_{i}')(x)
        
        # Bottleneck (compress to latent)
        x = layers.Dense(latent_dim)(x)
        encoded = layers.GlobalAveragePooling1D()(x)
        
        # Decoder - expand latent to sequence
        x = layers.RepeatVector(seq_length)(encoded)
        x = layers.Dense(d_model)(x)
        for i in range(n_layers):
            x = MambaBlock(d_model, name=f'dec_mamba_{i}')(x)
        
        # Output
        outputs = layers.Dense(n_features)(x)
        
        model = keras.Model(inputs=inputs, outputs=outputs)
        return model
    
    # Build autoencoder
    mamba_ae = build_mamba_autoencoder(
        seq_length=seq_length,
        n_features=n_features,
        d_model=32,
        latent_dim=8,
        n_layers=2
    )
    
    mamba_ae.summary()

In [None]:
if HAS_TF:
    # Train on "normal" data only (first half of each sequence before degradation)
    # This simulates learning what "healthy" looks like
    X_normal = X_scaled[:, :250, :]  # First 250 timesteps (healthier)
    
    X_train_ae, X_test_ae = train_test_split(X_normal, test_size=0.2, random_state=42)
    
    mamba_ae.compile(
        optimizer=keras.optimizers.Adam(1e-3),
        loss='mse'
    )
    
    print("Training Mamba Autoencoder on 'normal' data...")
    ae_history = mamba_ae.fit(
        X_train_ae, X_train_ae,
        validation_split=0.15,
        epochs=30,
        batch_size=16,
        callbacks=[
            keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True)
        ],
        verbose=1
    )

In [None]:
if HAS_TF:
    # Test anomaly detection
    # Normal data should have low reconstruction error
    # Anomalous (degraded) data should have high error
    
    X_normal_test = X_scaled[:, :250, :]  # Normal (healthy)
    X_anomaly_test = X_scaled[:, 350:, :]  # Anomalous (degraded)
    
    # Compute reconstruction errors
    recon_normal = mamba_ae.predict(X_normal_test, verbose=0)
    recon_anomaly = mamba_ae.predict(X_anomaly_test, verbose=0)
    
    error_normal = np.mean((X_normal_test - recon_normal) ** 2, axis=(1, 2))
    error_anomaly = np.mean((X_anomaly_test - recon_anomaly) ** 2, axis=(1, 2))
    
    # Visualize
    fig, axes = plt.subplots(1, 2, figsize=(12, 4))
    
    axes[0].hist(error_normal, bins=30, alpha=0.7, label='Normal', color='green')
    axes[0].hist(error_anomaly, bins=30, alpha=0.7, label='Anomaly', color='red')
    axes[0].set_xlabel('Reconstruction Error (MSE)')
    axes[0].set_ylabel('Count')
    axes[0].set_title('Mamba Autoencoder: Anomaly Detection')
    axes[0].legend()
    
    # ROC-like: threshold vs detection
    thresholds = np.linspace(0, max(error_anomaly.max(), error_normal.max()), 100)
    tpr = [np.mean(error_anomaly > t) for t in thresholds]
    fpr = [np.mean(error_normal > t) for t in thresholds]
    
    axes[1].plot(fpr, tpr, 'b-', linewidth=2)
    axes[1].plot([0, 1], [0, 1], 'k--', alpha=0.3)
    axes[1].set_xlabel('False Positive Rate')
    axes[1].set_ylabel('True Positive Rate')
    axes[1].set_title('ROC Curve')
    
    plt.tight_layout()
    plt.savefig(f'{MODEL_DIR}/mamba_anomaly_detection.png', dpi=150, bbox_inches='tight')
    plt.show()
    
    # Compute AUC
    from sklearn.metrics import roc_auc_score
    labels = np.concatenate([np.zeros(len(error_normal)), np.ones(len(error_anomaly))])
    scores = np.concatenate([error_normal, error_anomaly])
    auc = roc_auc_score(labels, scores)
    print(f"\nMamba Autoencoder AUC: {auc:.4f}")

## 5. Mamba Classifier for Fault Detection

In [None]:
def generate_fault_classification_data(n_samples=500, seq_length=200, n_features=6):
    """
    Generate multi-class fault data for classification.
    """
    X = []
    y = []
    
    fault_types = ['normal', 'unbalance', 'misalignment', 'bearing_fault', 'looseness']
    
    for _ in range(n_samples):
        fault = np.random.choice(fault_types)
        
        # Base signal (rotating machinery)
        t = np.linspace(0, 1, seq_length)
        rpm = 1500 + np.random.normal(0, 50)
        f_rot = rpm / 60
        
        features = np.zeros((seq_length, n_features))
        
        if fault == 'normal':
            # Clean signal with small 1x component
            features[:, 0] = 0.5 * np.sin(2 * np.pi * f_rot * t)
            features[:, 1] = 0.2 * np.sin(4 * np.pi * f_rot * t)
            noise_level = 0.1
            
        elif fault == 'unbalance':
            # Strong 1x component
            features[:, 0] = 2.0 * np.sin(2 * np.pi * f_rot * t)
            features[:, 1] = 0.3 * np.sin(4 * np.pi * f_rot * t)
            noise_level = 0.15
            
        elif fault == 'misalignment':
            # Strong 2x component
            features[:, 0] = 0.8 * np.sin(2 * np.pi * f_rot * t)
            features[:, 1] = 1.5 * np.sin(4 * np.pi * f_rot * t)
            features[:, 2] = 0.5 * np.sin(6 * np.pi * f_rot * t)
            noise_level = 0.2
            
        elif fault == 'bearing_fault':
            # High frequency components + impulses
            features[:, 0] = 0.6 * np.sin(2 * np.pi * f_rot * t)
            # BPFO-like frequency
            f_bpfo = f_rot * 7.2  # Typical BPFO ratio
            features[:, 3] = 0.8 * np.sin(2 * np.pi * f_bpfo * t)
            # Random impulses
            impulses = np.zeros(seq_length)
            impulse_locs = np.random.choice(seq_length, size=10, replace=False)
            impulses[impulse_locs] = np.random.uniform(1, 2, 10)
            features[:, 4] = impulses
            noise_level = 0.25
            
        elif fault == 'looseness':
            # Many harmonics (1x, 2x, 3x, 4x...)
            for h in range(1, 6):
                features[:, min(h-1, n_features-1)] += (0.5 / h) * np.sin(2 * np.pi * h * f_rot * t)
            # Sub-harmonics
            features[:, 5] = 0.4 * np.sin(np.pi * f_rot * t)
            noise_level = 0.3
        
        # Add noise
        features += np.random.normal(0, noise_level, features.shape)
        
        X.append(features)
        y.append(fault)
    
    return np.array(X), np.array(y)

# Generate classification data
print("Generating fault classification data...")
X_clf, y_clf = generate_fault_classification_data(n_samples=1000, seq_length=200)
print(f"Generated: X={X_clf.shape}")
print(f"Classes: {np.unique(y_clf)}")

In [None]:
if HAS_TF:
    
    def build_mamba_classifier(
        seq_length,
        n_features,
        n_classes,
        d_model=64,
        n_layers=3
    ):
        """
        Mamba-based sequence classifier.
        """
        inputs = keras.Input(shape=(seq_length, n_features))
        
        # Project
        x = layers.Dense(d_model)(inputs)
        
        # Mamba blocks
        for i in range(n_layers):
            x = MambaBlock(d_model, name=f'clf_mamba_{i}')(x)
        
        # Global pooling
        x = layers.LayerNormalization()(x)
        x = layers.GlobalAveragePooling1D()(x)
        
        # Classification head
        x = layers.Dense(32, activation='relu')(x)
        x = layers.Dropout(0.3)(x)
        outputs = layers.Dense(n_classes, activation='softmax')(x)
        
        model = keras.Model(inputs=inputs, outputs=outputs)
        return model
    
    # Encode labels
    le = LabelEncoder()
    y_encoded = le.fit_transform(y_clf)
    n_classes = len(le.classes_)
    
    # Scale features
    X_clf_scaled = np.zeros_like(X_clf)
    for i in range(X_clf.shape[0]):
        X_clf_scaled[i] = StandardScaler().fit_transform(X_clf[i])
    
    # Split
    X_train_clf, X_test_clf, y_train_clf, y_test_clf = train_test_split(
        X_clf_scaled, y_encoded, test_size=0.2, random_state=42, stratify=y_encoded
    )
    
    # Build model
    mamba_clf = build_mamba_classifier(
        seq_length=X_clf.shape[1],
        n_features=X_clf.shape[2],
        n_classes=n_classes,
        d_model=48,
        n_layers=2
    )
    
    mamba_clf.compile(
        optimizer=keras.optimizers.Adam(1e-3),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )
    
    mamba_clf.summary()

In [None]:
if HAS_TF:
    print("Training Mamba Classifier...")
    clf_history = mamba_clf.fit(
        X_train_clf, y_train_clf,
        validation_split=0.15,
        epochs=40,
        batch_size=32,
        callbacks=[
            keras.callbacks.EarlyStopping(patience=8, restore_best_weights=True),
            keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=4)
        ],
        verbose=1
    )
    
    # Evaluate
    y_pred_clf = mamba_clf.predict(X_test_clf).argmax(axis=1)
    
    print("\n" + "="*50)
    print("Mamba Classifier Results:")
    print("="*50)
    print(classification_report(y_test_clf, y_pred_clf, target_names=le.classes_))

In [None]:
if HAS_TF:
    # Confusion matrix
    cm = confusion_matrix(y_test_clf, y_pred_clf)
    
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Training curves
    axes[0].plot(clf_history.history['accuracy'], label='Train')
    axes[0].plot(clf_history.history['val_accuracy'], label='Val')
    axes[0].set_xlabel('Epoch')
    axes[0].set_ylabel('Accuracy')
    axes[0].set_title('Mamba Classifier Training')
    axes[0].legend()
    
    # Confusion matrix
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
                xticklabels=le.classes_, yticklabels=le.classes_, ax=axes[1])
    axes[1].set_xlabel('Predicted')
    axes[1].set_ylabel('Actual')
    axes[1].set_title('Confusion Matrix')
    
    plt.tight_layout()
    plt.savefig(f'{MODEL_DIR}/mamba_classifier_results.png', dpi=150, bbox_inches='tight')
    plt.show()

## 6. Comparison: Mamba vs Transformer Efficiency

Demonstrate the computational advantage of Mamba for long sequences.

In [None]:
if HAS_TF:
    import time
    
    # Compare inference time for different sequence lengths
    seq_lengths = [100, 250, 500, 750, 1000]
    mamba_times = []
    n_features_test = 8
    batch_size_test = 16
    
    print("Measuring Mamba inference time for different sequence lengths...")
    print(f"Batch size: {batch_size_test}")
    print("-" * 40)
    
    for seq_len in seq_lengths:
        # Build fresh model
        model = build_mamba_rul_model(
            seq_length=seq_len,
            n_features=n_features_test,
            d_model=64,
            n_layers=3
        )
        
        # Random input
        X_test_time = np.random.randn(batch_size_test, seq_len, n_features_test).astype(np.float32)
        
        # Warmup
        _ = model.predict(X_test_time, verbose=0)
        
        # Measure
        start = time.time()
        for _ in range(10):
            _ = model.predict(X_test_time, verbose=0)
        elapsed = (time.time() - start) / 10
        
        mamba_times.append(elapsed)
        print(f"  Seq Length {seq_len:4d}: {elapsed*1000:.2f} ms")
    
    # Plot scaling
    plt.figure(figsize=(8, 5))
    plt.plot(seq_lengths, mamba_times, 'bo-', linewidth=2, markersize=8, label='Mamba (O(n))')
    
    # Theoretical O(n²) for comparison
    t_base = mamba_times[0]
    theoretical_n2 = [t_base * (s / seq_lengths[0])**2 for s in seq_lengths]
    plt.plot(seq_lengths, theoretical_n2, 'r--', linewidth=2, label='Theoretical O(n²)')
    
    plt.xlabel('Sequence Length')
    plt.ylabel('Inference Time (seconds)')
    plt.title('Mamba Scaling: Linear Complexity')
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.savefig(f'{MODEL_DIR}/mamba_scaling.png', dpi=150, bbox_inches='tight')
    plt.show()

## 7. Save Models and Export

In [None]:
if HAS_TF:
    # Save Keras models
    mamba_rul_model.save(f'{MODEL_DIR}/mamba_rul_model.keras')
    mamba_ae.save(f'{MODEL_DIR}/mamba_autoencoder.keras')
    mamba_clf.save(f'{MODEL_DIR}/mamba_classifier.keras')
    
    # Save metadata
    metadata = {
        'model_type': 'Mamba (Selective State Space Model)',
        'models': {
            'rul': {
                'file': 'mamba_rul_model.keras',
                'seq_length': int(seq_length),
                'n_features': int(n_features),
                'metrics': {'rmse': float(rmse), 'mae': float(mae), 'r2': float(r2)}
            },
            'autoencoder': {
                'file': 'mamba_autoencoder.keras',
                'seq_length': 250,
                'auc': float(auc)
            },
            'classifier': {
                'file': 'mamba_classifier.keras',
                'classes': le.classes_.tolist()
            }
        },
        'advantages': [
            'Linear O(n) complexity for long sequences',
            'Input-selective parameters',
            'Memory efficient inference',
            'Captures long-range dependencies'
        ],
        'recommended_use_cases': [
            'Long monitoring windows (1000+ timesteps)',
            'Edge deployment with memory constraints',
            'Real-time streaming data'
        ]
    }
    
    with open(f'{MODEL_DIR}/mamba_metadata.json', 'w') as f:
        json.dump(metadata, f, indent=2)
    
    print(f"\nModels saved to {MODEL_DIR}/")
    print("Files:")
    for f in os.listdir(MODEL_DIR):
        print(f"  - {f}")

## 8. Node-RED Integration Example

JavaScript code for integrating Mamba predictions in Node-RED.

In [None]:
node_red_code = '''
// Node-RED Function Node: Mamba RUL Prediction
// Requires: tensorflowjs converted model

// Buffer for collecting sequence data
const SEQ_LENGTH = 500;  // Mamba handles long sequences efficiently
const N_FEATURES = 8;

// Initialize buffer
if (!context.buffer) {
    context.buffer = [];
}

// Add new reading
context.buffer.push([
    msg.payload.vibration,
    msg.payload.temperature,
    msg.payload.pressure,
    msg.payload.current,
    msg.payload.sensor5,
    msg.payload.sensor6,
    msg.payload.sensor7,
    msg.payload.sensor8
]);

// Keep only last SEQ_LENGTH readings
if (context.buffer.length > SEQ_LENGTH) {
    context.buffer.shift();
}

// Need minimum data before prediction
if (context.buffer.length < 100) {
    msg.payload = {
        status: "collecting",
        samples: context.buffer.length,
        required: 100
    };
    return msg;
}

// Pad if needed
let sequence = context.buffer.slice();
while (sequence.length < SEQ_LENGTH) {
    sequence.unshift(sequence[0]);  // Pad with first reading
}

// Normalize (using stored scaler params)
const means = flow.get("scaler_means") || new Array(N_FEATURES).fill(0);
const stds = flow.get("scaler_stds") || new Array(N_FEATURES).fill(1);

let normalized = sequence.map(row => 
    row.map((val, i) => (val - means[i]) / stds[i])
);

// Prepare for model
msg.payload = {
    input: [normalized],  // Batch of 1
    model: "mamba_rul",
    description: "Mamba SSM - efficient for long sequences"
};

return msg;
''';

print("Node-RED Integration Code:")
print("=" * 50)
print(node_red_code)

## Summary

This notebook demonstrated **Mamba (Selective State Space Models)** for Predictive Maintenance:

### Key Advantages of Mamba for PdM:

| Feature | Benefit |
|---------|--------|
| **O(n) Complexity** | Handles sequences 10x longer than Transformers |
| **Selective Mechanism** | Adapts to different sensor inputs dynamically |
| **Memory Efficient** | Constant memory during inference |
| **Long-Range Dependencies** | Captures slow degradation trends |

### Models Created:

1. **Mamba RUL Model** - Sequence-to-sequence RUL prediction
2. **Mamba Autoencoder** - Reconstruction-based anomaly detection
3. **Mamba Classifier** - Fault type classification

### When to Use Mamba over Transformers:

- Sequence length > 500 timesteps
- Memory-constrained edge devices
- Real-time streaming applications
- Monitoring slow degradation processes