# Graph Neural Networks for Predictive Maintenance

This notebook implements **Graph Neural Networks (GNNs)** for condition monitoring of multi-component systems:

1. **Sensor Network Modeling** - Capture sensor interdependencies
2. **Graph Attention for Fault Detection** - Attention-weighted neighbor aggregation
3. **Spatio-Temporal GNN** - Combine spatial (graph) and temporal patterns
4. **Multi-Machine Health Monitoring** - Factory-level predictions

## Why GNNs for Predictive Maintenance?

| Advantage | Description |
|-----------|-------------|
| **Sensor Relationships** | Model physical/functional connections between sensors |
| **Distinguish Faults** | Separate sensor faults from system faults |
| **Scalable** | Works with varying number of sensors/machines |
| **Explainable** | Attention weights show which connections matter |

## Graph Representation

```
Nodes = Sensors or Components
Edges = Physical connections, correlations, or proximity
Node Features = Sensor readings over time
Edge Features = Distance, connection type, correlation strength
```

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score
import seaborn as sns
import os
import json

np.random.seed(42)

# Check TensorFlow availability
try:
    import tensorflow as tf
    from tensorflow import keras
    from tensorflow.keras import layers
    print(f"TensorFlow {tf.__version__} available")
    HAS_TF = True
except ImportError:
    print("TensorFlow not available")
    HAS_TF = False

# Output directories
DATA_DIR = '../data/gnn'
MODEL_DIR = '../models/gnn'
os.makedirs(DATA_DIR, exist_ok=True)
os.makedirs(MODEL_DIR, exist_ok=True)

print("Setup complete!")

## 1. GNN Building Blocks

We implement GNN layers from scratch using TensorFlow for maximum compatibility.

In [None]:
if HAS_TF:
    
    class GraphConvolution(layers.Layer):
        """
        Graph Convolutional Layer (GCN).
        
        Aggregates features from neighboring nodes:
        h_v = σ(W · MEAN(h_u for u in N(v) ∪ {v}))
        """
        
        def __init__(self, units, activation='relu', use_bias=True, **kwargs):
            super().__init__(**kwargs)
            self.units = units
            self.activation = keras.activations.get(activation)
            self.use_bias = use_bias
            
        def build(self, input_shape):
            # input_shape: (node_features_shape, adjacency_shape)
            feature_dim = input_shape[0][-1]
            
            self.kernel = self.add_weight(
                name='kernel',
                shape=(feature_dim, self.units),
                initializer='glorot_uniform',
                trainable=True
            )
            
            if self.use_bias:
                self.bias = self.add_weight(
                    name='bias',
                    shape=(self.units,),
                    initializer='zeros',
                    trainable=True
                )
            
            super().build(input_shape)
            
        def call(self, inputs):
            """
            Args:
                inputs: tuple of (node_features, adjacency_matrix)
                    node_features: [batch, n_nodes, features]
                    adjacency: [batch, n_nodes, n_nodes] or [n_nodes, n_nodes]
                    
            Returns:
                Updated node features: [batch, n_nodes, units]
            """
            node_features, adjacency = inputs
            
            # Normalize adjacency (add self-loops and normalize)
            adj = adjacency
            if len(adj.shape) == 2:
                # Shared adjacency for all samples
                adj = tf.expand_dims(adj, 0)  # [1, n, n]
            
            # Add self-loops
            n_nodes = tf.shape(adj)[-1]
            identity = tf.eye(n_nodes, dtype=adj.dtype)
            adj = adj + identity
            
            # Degree normalization D^(-1/2) A D^(-1/2)
            degree = tf.reduce_sum(adj, axis=-1, keepdims=True)  # [batch, n, 1]
            degree_inv_sqrt = tf.math.rsqrt(tf.maximum(degree, 1e-6))
            adj_normalized = adj * degree_inv_sqrt * tf.transpose(degree_inv_sqrt, [0, 2, 1])
            
            # Message passing: aggregate neighbor features
            # [batch, n, n] @ [batch, n, features] = [batch, n, features]
            aggregated = tf.matmul(adj_normalized, node_features)
            
            # Transform
            output = tf.matmul(aggregated, self.kernel)
            
            if self.use_bias:
                output = output + self.bias
                
            return self.activation(output)
        
        def get_config(self):
            config = super().get_config()
            config.update({'units': self.units})
            return config
    
    print("GraphConvolution layer defined")

In [None]:
if HAS_TF:
    
    class GraphAttention(layers.Layer):
        """
        Graph Attention Layer (GAT).
        
        Uses attention mechanism to weight neighbor contributions:
        α_ij = softmax(LeakyReLU(a · [Wh_i || Wh_j]))
        h'_i = σ(Σ α_ij · W·h_j)
        """
        
        def __init__(self, units, n_heads=4, dropout=0.1, **kwargs):
            super().__init__(**kwargs)
            self.units = units
            self.n_heads = n_heads
            self.dropout = dropout
            self.head_dim = units // n_heads
            
        def build(self, input_shape):
            feature_dim = input_shape[0][-1]
            
            # Linear transformation for each head
            self.W = self.add_weight(
                name='W',
                shape=(self.n_heads, feature_dim, self.head_dim),
                initializer='glorot_uniform',
                trainable=True
            )
            
            # Attention weights
            self.a_src = self.add_weight(
                name='a_src',
                shape=(self.n_heads, self.head_dim, 1),
                initializer='glorot_uniform',
                trainable=True
            )
            
            self.a_dst = self.add_weight(
                name='a_dst',
                shape=(self.n_heads, self.head_dim, 1),
                initializer='glorot_uniform',
                trainable=True
            )
            
            self.dropout_layer = layers.Dropout(self.dropout)
            
            super().build(input_shape)
            
        def call(self, inputs, training=None):
            """
            Args:
                inputs: tuple of (node_features, adjacency_matrix)
                    
            Returns:
                Updated node features with attention
            """
            node_features, adjacency = inputs
            batch_size = tf.shape(node_features)[0]
            n_nodes = tf.shape(node_features)[1]
            
            # Transform features for each head
            # [batch, n_nodes, features] -> [batch, n_heads, n_nodes, head_dim]
            h = tf.einsum('bni,hio->bhno', node_features, self.W)
            
            # Compute attention scores
            # Source attention: [batch, n_heads, n_nodes, 1]
            attn_src = tf.einsum('bhni,hio->bhno', h, self.a_src)
            # Target attention: [batch, n_heads, n_nodes, 1]
            attn_dst = tf.einsum('bhni,hio->bhno', h, self.a_dst)
            
            # Pairwise attention: e_ij = attn_src_i + attn_dst_j
            # [batch, n_heads, n_nodes, 1] + [batch, n_heads, 1, n_nodes]
            attn = attn_src + tf.transpose(attn_dst, [0, 1, 3, 2])
            attn = tf.nn.leaky_relu(attn, alpha=0.2)
            
            # Mask with adjacency (only attend to neighbors)
            if len(adjacency.shape) == 2:
                adjacency = tf.expand_dims(adjacency, 0)
            mask = tf.expand_dims(adjacency, 1)  # [batch, 1, n, n]
            attn = tf.where(mask > 0, attn, tf.ones_like(attn) * -1e9)
            
            # Softmax attention
            attn = tf.nn.softmax(attn, axis=-1)
            attn = self.dropout_layer(attn, training=training)
            
            # Aggregate with attention weights
            # [batch, n_heads, n_nodes, n_nodes] @ [batch, n_heads, n_nodes, head_dim]
            output = tf.einsum('bhnm,bhmd->bhnd', attn, h)
            
            # Concatenate heads
            output = tf.transpose(output, [0, 2, 1, 3])  # [batch, n_nodes, n_heads, head_dim]
            output = tf.reshape(output, [batch_size, n_nodes, self.units])
            
            return output
        
        def get_config(self):
            config = super().get_config()
            config.update({
                'units': self.units,
                'n_heads': self.n_heads,
                'dropout': self.dropout
            })
            return config
    
    print("GraphAttention (GAT) layer defined")

In [None]:
if HAS_TF:
    
    class SpatioTemporalGNN(layers.Layer):
        """
        Spatio-Temporal Graph Neural Network.
        
        Combines:
        - Spatial: Graph convolution across sensors
        - Temporal: 1D convolution along time
        """
        
        def __init__(self, spatial_units, temporal_units, kernel_size=3, **kwargs):
            super().__init__(**kwargs)
            self.spatial_units = spatial_units
            self.temporal_units = temporal_units
            self.kernel_size = kernel_size
            
        def build(self, input_shape):
            # Spatial: Graph convolution
            self.graph_conv = GraphConvolution(self.spatial_units)
            
            # Temporal: 1D convolution
            self.temporal_conv = layers.Conv1D(
                self.temporal_units,
                kernel_size=self.kernel_size,
                padding='same',
                activation='relu'
            )
            
            self.norm = layers.LayerNormalization()
            
            super().build(input_shape)
            
        def call(self, inputs):
            """
            Args:
                inputs: tuple of (node_features, adjacency)
                    node_features: [batch, n_nodes, time_steps, features]
                    adjacency: [n_nodes, n_nodes]
            """
            node_features, adjacency = inputs
            batch_size = tf.shape(node_features)[0]
            n_nodes = tf.shape(node_features)[1]
            time_steps = tf.shape(node_features)[2]
            features = tf.shape(node_features)[3]
            
            # Reshape for temporal processing: [batch * n_nodes, time, features]
            x = tf.reshape(node_features, [-1, time_steps, features])
            x = self.temporal_conv(x)
            
            # Reshape back: [batch, n_nodes, time, temporal_units]
            x = tf.reshape(x, [batch_size, n_nodes, time_steps, self.temporal_units])
            
            # Spatial processing at each time step
            # [batch, n_nodes, time, units] -> process each time step
            outputs = []
            for t in range(time_steps):
                x_t = x[:, :, t, :]  # [batch, n_nodes, units]
                x_t = self.graph_conv([x_t, adjacency])
                outputs.append(x_t)
            
            # Stack: [batch, n_nodes, time, spatial_units]
            output = tf.stack(outputs, axis=2)
            output = self.norm(output)
            
            return output
        
        def get_config(self):
            config = super().get_config()
            config.update({
                'spatial_units': self.spatial_units,
                'temporal_units': self.temporal_units
            })
            return config
    
    print("SpatioTemporalGNN layer defined")

## 2. Create Sensor Network Data

Simulate a multi-sensor system with known topology.

In [None]:
def create_sensor_network(n_sensors=12):
    """
    Create a sensor network topology representing a pump system.
    
    Sensors:
    0: Motor vibration
    1: Motor temperature
    2: Motor current
    3: Coupling vibration
    4: Pump inlet pressure
    5: Pump outlet pressure
    6: Pump vibration (bearing 1)
    7: Pump vibration (bearing 2)
    8: Pump temperature
    9: Flow rate
    10: Discharge temperature
    11: Control valve position
    """
    sensor_names = [
        'motor_vib', 'motor_temp', 'motor_current', 'coupling_vib',
        'inlet_pressure', 'outlet_pressure', 'pump_vib_1', 'pump_vib_2',
        'pump_temp', 'flow_rate', 'discharge_temp', 'valve_pos'
    ]
    
    # Adjacency matrix (physical connections)
    adj = np.zeros((n_sensors, n_sensors))
    
    # Motor group (0, 1, 2)
    adj[0, 1] = adj[1, 0] = 1  # vib-temp
    adj[0, 2] = adj[2, 0] = 1  # vib-current
    adj[1, 2] = adj[2, 1] = 1  # temp-current
    
    # Motor to coupling
    adj[0, 3] = adj[3, 0] = 1
    
    # Coupling to pump bearings
    adj[3, 6] = adj[6, 3] = 1
    adj[3, 7] = adj[7, 3] = 1
    
    # Pump bearing group (6, 7, 8)
    adj[6, 7] = adj[7, 6] = 1
    adj[6, 8] = adj[8, 6] = 1
    adj[7, 8] = adj[8, 7] = 1
    
    # Pressure flow connections
    adj[4, 5] = adj[5, 4] = 1  # inlet-outlet
    adj[5, 9] = adj[9, 5] = 1  # outlet-flow
    adj[9, 11] = adj[11, 9] = 1  # flow-valve
    
    # Temperature propagation
    adj[8, 10] = adj[10, 8] = 1  # pump_temp-discharge
    
    # Pump to pressure
    adj[6, 4] = adj[4, 6] = 1
    adj[7, 5] = adj[5, 7] = 1
    
    return adj.astype(np.float32), sensor_names

# Create network
adj_matrix, sensor_names = create_sensor_network(n_sensors=12)

# Visualize network
fig, ax = plt.subplots(figsize=(10, 8))
sns.heatmap(adj_matrix, annot=True, fmt='.0f', cmap='Blues',
            xticklabels=sensor_names, yticklabels=sensor_names, ax=ax)
plt.title('Sensor Network Adjacency Matrix')
plt.tight_layout()
plt.savefig(f'{DATA_DIR}/sensor_network.png', dpi=150, bbox_inches='tight')
plt.show()

print(f"\nSensor network created: {len(sensor_names)} nodes, {int(adj_matrix.sum()/2)} edges")

In [None]:
def generate_sensor_data_with_faults(
    n_samples=500,
    n_sensors=12,
    time_steps=100,
    adj_matrix=None
):
    """
    Generate multi-sensor time series data with various fault conditions.
    
    Faults propagate through the network according to adjacency.
    """
    X = []  # [n_samples, n_sensors, time_steps, features]
    y = []  # Fault labels
    
    fault_types = ['normal', 'motor_fault', 'pump_bearing', 'cavitation', 'valve_stuck']
    
    for _ in range(n_samples):
        fault = np.random.choice(fault_types)
        
        # Initialize all sensors with baseline behavior
        t = np.linspace(0, 2*np.pi, time_steps)
        sensor_data = np.zeros((n_sensors, time_steps, 4))  # 4 features per sensor
        
        # Base signals for each sensor
        for s in range(n_sensors):
            # Feature 0: Main signal
            sensor_data[s, :, 0] = np.sin(t + s * 0.5) + np.random.normal(0, 0.1, time_steps)
            # Feature 1: Trend
            sensor_data[s, :, 1] = np.random.normal(0, 0.05, time_steps).cumsum() * 0.01
            # Feature 2: High frequency
            sensor_data[s, :, 2] = np.sin(10 * t) * 0.2 + np.random.normal(0, 0.05, time_steps)
            # Feature 3: Level
            sensor_data[s, :, 3] = 0.5 + np.random.normal(0, 0.02, time_steps)
        
        # Apply fault-specific patterns
        if fault == 'motor_fault':
            # Affects motor sensors (0, 1, 2) and propagates to coupling (3)
            sensor_data[0, :, 0] += 1.5 * np.sin(2 * t)  # Increased vibration
            sensor_data[1, :, 3] += 0.5 * np.linspace(0, 1, time_steps)  # Rising temp
            sensor_data[2, :, 0] += 0.8 * np.abs(np.sin(t))  # Current spikes
            sensor_data[3, :, 0] += 0.7 * np.sin(2 * t)  # Coupling vibration
            
        elif fault == 'pump_bearing':
            # Affects pump bearings (6, 7) and temperature (8)
            # Add high frequency impulses
            impulses = np.zeros(time_steps)
            impulse_locs = np.linspace(10, time_steps-10, 8).astype(int)
            impulses[impulse_locs] = np.random.uniform(1, 2, 8)
            sensor_data[6, :, 0] += impulses
            sensor_data[6, :, 2] += 0.8  # High frequency energy
            sensor_data[7, :, 0] += impulses * 0.8
            sensor_data[7, :, 2] += 0.6
            sensor_data[8, :, 3] += 0.3 * np.linspace(0, 1, time_steps)  # Temp rise
            
        elif fault == 'cavitation':
            # Affects inlet pressure (4), pump vibrations (6, 7), noise increase
            sensor_data[4, :, 0] -= 0.5  # Low inlet pressure
            sensor_data[4, :, 3] -= 0.3
            # Broadband noise on pump
            sensor_data[6, :, :] += np.random.normal(0, 0.5, (time_steps, 4))
            sensor_data[7, :, :] += np.random.normal(0, 0.4, (time_steps, 4))
            sensor_data[9, :, 0] -= 0.3  # Reduced flow (unstable)
            
        elif fault == 'valve_stuck':
            # Valve (11) doesn't respond, affects flow (9) and pressure (5)
            sensor_data[11, :, 0] = 0.7 + np.random.normal(0, 0.01, time_steps)  # Stuck
            sensor_data[11, :, 1] = 0  # No trend
            sensor_data[9, :, 0] *= 0.6  # Reduced flow
            sensor_data[5, :, 0] += 0.5  # Higher pressure
            sensor_data[5, :, 3] += 0.3
        
        X.append(sensor_data)
        y.append(fault)
    
    return np.array(X), np.array(y)

# Generate data
print("Generating sensor network data with faults...")
X_graph, y_graph = generate_sensor_data_with_faults(
    n_samples=1000,
    n_sensors=12,
    time_steps=100,
    adj_matrix=adj_matrix
)

print(f"Generated: X={X_graph.shape}")
print(f"  Shape: [samples, sensors, time_steps, features]")
print(f"Classes: {np.unique(y_graph, return_counts=True)}")

In [None]:
# Visualize different fault patterns
fig, axes = plt.subplots(2, 3, figsize=(15, 8))
fault_types = ['normal', 'motor_fault', 'pump_bearing', 'cavitation', 'valve_stuck']

for idx, fault in enumerate(fault_types):
    ax = axes.flat[idx]
    # Find example of this fault
    sample_idx = np.where(y_graph == fault)[0][0]
    
    # Plot several sensors
    for s in [0, 4, 6, 9]:  # motor_vib, inlet_p, pump_vib, flow
        ax.plot(X_graph[sample_idx, s, :, 0], label=sensor_names[s], alpha=0.7)
    
    ax.set_title(f'Fault: {fault}')
    ax.set_xlabel('Time')
    ax.set_ylabel('Signal')
    if idx == 0:
        ax.legend(fontsize=8)

axes.flat[-1].axis('off')  # Hide last empty subplot
plt.tight_layout()
plt.savefig(f'{DATA_DIR}/fault_patterns.png', dpi=150, bbox_inches='tight')
plt.show()

## 3. Build GNN Fault Classifier

In [None]:
if HAS_TF:
    
    def build_gnn_classifier(
        n_sensors,
        time_steps,
        n_features,
        n_classes,
        gcn_units=[64, 32],
        use_attention=True
    ):
        """
        Build a GNN-based fault classifier.
        
        Architecture:
        1. Temporal processing per sensor (1D Conv)
        2. Graph convolution across sensors
        3. Global pooling and classification
        """
        # Inputs
        node_input = keras.Input(shape=(n_sensors, time_steps, n_features), name='node_features')
        adj_input = keras.Input(shape=(n_sensors, n_sensors), name='adjacency')
        
        # Step 1: Temporal processing per sensor
        # Reshape: [batch, n_sensors, time, features] -> [batch*n_sensors, time, features]
        batch_size = tf.shape(node_input)[0]
        x = tf.reshape(node_input, [-1, time_steps, n_features])
        
        # 1D convolutions
        x = layers.Conv1D(32, 5, padding='same', activation='relu')(x)
        x = layers.Conv1D(64, 3, padding='same', activation='relu')(x)
        x = layers.GlobalAveragePooling1D()(x)  # [batch*n_sensors, 64]
        
        # Reshape back: [batch, n_sensors, 64]
        x = tf.reshape(x, [batch_size, n_sensors, 64])
        
        # Step 2: Graph convolutions
        if use_attention:
            x = GraphAttention(gcn_units[0], n_heads=4)([x, adj_input])
            x = layers.ReLU()(x)
            x = layers.Dropout(0.2)(x)
            x = GraphAttention(gcn_units[1], n_heads=2)([x, adj_input])
        else:
            for units in gcn_units:
                x = GraphConvolution(units)([x, adj_input])
                x = layers.Dropout(0.2)(x)
        
        # Step 3: Global pooling
        x = layers.GlobalAveragePooling1D()(x)  # Aggregate across nodes
        
        # Classification head
        x = layers.Dense(64, activation='relu')(x)
        x = layers.Dropout(0.3)(x)
        outputs = layers.Dense(n_classes, activation='softmax')(x)
        
        model = keras.Model(
            inputs=[node_input, adj_input],
            outputs=outputs
        )
        return model
    
    # Prepare data
    le = LabelEncoder()
    y_encoded = le.fit_transform(y_graph)
    n_classes = len(le.classes_)
    
    # Normalize per sensor
    X_normalized = np.zeros_like(X_graph)
    for i in range(X_graph.shape[0]):
        for s in range(X_graph.shape[1]):
            scaler = StandardScaler()
            X_normalized[i, s] = scaler.fit_transform(X_graph[i, s])
    
    # Split
    X_train, X_test, y_train, y_test = train_test_split(
        X_normalized, y_encoded, test_size=0.2, random_state=42, stratify=y_encoded
    )
    
    # Adjacency for all samples (same structure)
    adj_train = np.tile(adj_matrix[np.newaxis, :, :], (len(X_train), 1, 1))
    adj_test = np.tile(adj_matrix[np.newaxis, :, :], (len(X_test), 1, 1))
    
    print(f"Training: {X_train.shape}")
    print(f"Test: {X_test.shape}")
    print(f"Classes: {le.classes_}")

In [None]:
if HAS_TF:
    # Build model
    gnn_model = build_gnn_classifier(
        n_sensors=12,
        time_steps=100,
        n_features=4,
        n_classes=n_classes,
        gcn_units=[64, 32],
        use_attention=True
    )
    
    gnn_model.compile(
        optimizer=keras.optimizers.Adam(1e-3),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )
    
    gnn_model.summary()

In [None]:
if HAS_TF:
    print("Training GNN Classifier...")
    history = gnn_model.fit(
        [X_train, adj_train], y_train,
        validation_split=0.15,
        epochs=50,
        batch_size=32,
        callbacks=[
            keras.callbacks.EarlyStopping(patience=10, restore_best_weights=True),
            keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=5)
        ],
        verbose=1
    )

In [None]:
if HAS_TF:
    # Evaluate
    y_pred = gnn_model.predict([X_test, adj_test]).argmax(axis=1)
    
    print("\n" + "="*50)
    print("GNN Classifier Results:")
    print("="*50)
    print(classification_report(y_test, y_pred, target_names=le.classes_))
    
    # Confusion matrix
    cm = confusion_matrix(y_test, y_pred)
    
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Training curves
    axes[0].plot(history.history['accuracy'], label='Train')
    axes[0].plot(history.history['val_accuracy'], label='Val')
    axes[0].set_xlabel('Epoch')
    axes[0].set_ylabel('Accuracy')
    axes[0].set_title('GNN Training History')
    axes[0].legend()
    
    # Confusion matrix
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
                xticklabels=le.classes_, yticklabels=le.classes_, ax=axes[1])
    axes[1].set_xlabel('Predicted')
    axes[1].set_ylabel('Actual')
    axes[1].set_title('Confusion Matrix')
    plt.xticks(rotation=45)
    
    plt.tight_layout()
    plt.savefig(f'{MODEL_DIR}/gnn_results.png', dpi=150, bbox_inches='tight')
    plt.show()

## 4. GNN for Anomaly Detection

Detect when sensor readings deviate from expected network-wide patterns.

In [None]:
if HAS_TF:
    
    def build_gnn_autoencoder(n_sensors, time_steps, n_features, latent_dim=16):
        """
        Graph Autoencoder for anomaly detection.
        
        Learns to reconstruct normal sensor patterns.
        Anomalies have high reconstruction error.
        """
        # Inputs
        node_input = keras.Input(shape=(n_sensors, time_steps, n_features))
        adj_input = keras.Input(shape=(n_sensors, n_sensors))
        
        batch_size = tf.shape(node_input)[0]
        
        # Encoder: Temporal then Graph
        x = tf.reshape(node_input, [-1, time_steps, n_features])
        x = layers.Conv1D(32, 5, padding='same', activation='relu')(x)
        x = layers.Conv1D(16, 3, padding='same', activation='relu')(x)
        x = layers.GlobalAveragePooling1D()(x)
        x = tf.reshape(x, [batch_size, n_sensors, 16])
        
        # Graph convolution to capture network patterns
        x = GraphConvolution(32)([x, adj_input])
        x = GraphConvolution(latent_dim)([x, adj_input])
        
        # Latent representation
        latent = layers.GlobalAveragePooling1D()(x)  # [batch, latent_dim]
        
        # Decoder: Expand back to sensors
        x = layers.Dense(n_sensors * latent_dim)(latent)
        x = tf.reshape(x, [batch_size, n_sensors, latent_dim])
        
        x = GraphConvolution(32)([x, adj_input])
        x = GraphConvolution(16)([x, adj_input])
        
        # Expand to time series
        x = layers.Dense(time_steps * n_features)(x)
        outputs = tf.reshape(x, [batch_size, n_sensors, time_steps, n_features])
        
        model = keras.Model(
            inputs=[node_input, adj_input],
            outputs=outputs
        )
        return model
    
    # Train on normal data only
    normal_mask = y_graph == 'normal'
    X_normal = X_normalized[normal_mask]
    
    X_train_ae, X_val_ae = train_test_split(X_normal, test_size=0.2, random_state=42)
    adj_train_ae = np.tile(adj_matrix[np.newaxis, :, :], (len(X_train_ae), 1, 1))
    adj_val_ae = np.tile(adj_matrix[np.newaxis, :, :], (len(X_val_ae), 1, 1))
    
    # Build model
    gnn_ae = build_gnn_autoencoder(
        n_sensors=12,
        time_steps=100,
        n_features=4,
        latent_dim=8
    )
    
    gnn_ae.compile(
        optimizer=keras.optimizers.Adam(1e-3),
        loss='mse'
    )
    
    print(f"Training GNN Autoencoder on {len(X_train_ae)} normal samples...")
    gnn_ae.fit(
        [X_train_ae, adj_train_ae], X_train_ae,
        validation_data=([X_val_ae, adj_val_ae], X_val_ae),
        epochs=30,
        batch_size=32,
        callbacks=[
            keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True)
        ],
        verbose=1
    )

In [None]:
if HAS_TF:
    # Compute anomaly scores
    adj_all = np.tile(adj_matrix[np.newaxis, :, :], (len(X_normalized), 1, 1))
    
    reconstructions = gnn_ae.predict([X_normalized, adj_all], verbose=0)
    
    # Reconstruction error per sample
    errors = np.mean((X_normalized - reconstructions) ** 2, axis=(1, 2, 3))
    
    # Separate by class
    error_by_class = {}
    for fault in np.unique(y_graph):
        mask = y_graph == fault
        error_by_class[fault] = errors[mask]
    
    # Visualize
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Box plot by class
    data_to_plot = [error_by_class[f] for f in np.unique(y_graph)]
    bp = axes[0].boxplot(data_to_plot, labels=np.unique(y_graph), patch_artist=True)
    colors = plt.cm.Set3(np.linspace(0, 1, len(data_to_plot)))
    for patch, color in zip(bp['boxes'], colors):
        patch.set_facecolor(color)
    axes[0].set_ylabel('Reconstruction Error')
    axes[0].set_title('GNN Autoencoder: Error by Fault Type')
    axes[0].tick_params(axis='x', rotation=45)
    
    # ROC curve (normal vs all faults)
    labels_binary = (y_graph != 'normal').astype(int)
    
    from sklearn.metrics import roc_curve, auc
    fpr, tpr, _ = roc_curve(labels_binary, errors)
    roc_auc = auc(fpr, tpr)
    
    axes[1].plot(fpr, tpr, 'b-', linewidth=2, label=f'AUC = {roc_auc:.3f}')
    axes[1].plot([0, 1], [0, 1], 'k--', alpha=0.3)
    axes[1].set_xlabel('False Positive Rate')
    axes[1].set_ylabel('True Positive Rate')
    axes[1].set_title('GNN Autoencoder: ROC Curve')
    axes[1].legend()
    
    plt.tight_layout()
    plt.savefig(f'{MODEL_DIR}/gnn_anomaly_detection.png', dpi=150, bbox_inches='tight')
    plt.show()
    
    print(f"\nGNN Autoencoder AUC: {roc_auc:.4f}")

## 5. Multi-Machine Factory Monitoring

Model relationships between multiple machines in a factory.

In [None]:
def create_factory_graph(n_machines=8):
    """
    Create a factory floor layout with machine dependencies.
    
    Layout:
    [M0] -> [M1] -> [M2] -> [M3]  (Production Line 1)
             |       |
    [M4] -> [M5] -> [M6] -> [M7]  (Production Line 2)
    
    Shared utilities, proximity effects, etc.
    """
    adj = np.zeros((n_machines, n_machines))
    
    # Line 1 flow
    adj[0, 1] = adj[1, 0] = 1
    adj[1, 2] = adj[2, 1] = 1
    adj[2, 3] = adj[3, 2] = 1
    
    # Line 2 flow
    adj[4, 5] = adj[5, 4] = 1
    adj[5, 6] = adj[6, 5] = 1
    adj[6, 7] = adj[7, 6] = 1
    
    # Cross-line dependencies (shared resources)
    adj[1, 5] = adj[5, 1] = 1  # Shared utility
    adj[2, 6] = adj[6, 2] = 1  # Proximity
    
    machine_names = [f'Machine_{i}' for i in range(n_machines)]
    
    return adj.astype(np.float32), machine_names

factory_adj, machine_names = create_factory_graph(8)

# Visualize
plt.figure(figsize=(8, 6))
sns.heatmap(factory_adj, annot=True, fmt='.0f', cmap='Greens',
            xticklabels=machine_names, yticklabels=machine_names)
plt.title('Factory Machine Network')
plt.tight_layout()
plt.savefig(f'{DATA_DIR}/factory_network.png', dpi=150, bbox_inches='tight')
plt.show()

In [None]:
def generate_factory_health_data(n_samples=500, n_machines=8, time_steps=50):
    """
    Generate factory-level health monitoring data.
    
    Features per machine: [health_index, production_rate, energy, alerts]
    """
    X = []
    y = []  # Overall factory status
    
    statuses = ['healthy', 'degrading', 'bottleneck', 'failure_propagating']
    
    for _ in range(n_samples):
        status = np.random.choice(statuses, p=[0.5, 0.2, 0.15, 0.15])
        
        machine_data = np.zeros((n_machines, time_steps, 4))
        
        for m in range(n_machines):
            t = np.linspace(0, 1, time_steps)
            
            # Base healthy operation
            health = 0.9 + np.random.normal(0, 0.02, time_steps)
            production = 0.85 + np.random.normal(0, 0.05, time_steps)
            energy = 0.7 + np.random.normal(0, 0.03, time_steps)
            alerts = np.random.poisson(0.1, time_steps).astype(float)
            
            if status == 'degrading':
                # Random machine degrading
                if np.random.random() < 0.3:
                    health -= 0.3 * t
                    energy += 0.2 * t
                    alerts += np.random.poisson(0.5, time_steps)
                    
            elif status == 'bottleneck':
                # Machine 2 or 6 causing bottleneck
                if m in [2, 6]:
                    production *= 0.5
                    health -= 0.1
                # Downstream affected
                if m in [3, 7]:
                    production *= 0.6  # Starved
                    
            elif status == 'failure_propagating':
                # Failure at M1 propagates to M2, M5
                if m == 1:
                    health = 0.2 + np.random.normal(0, 0.05, time_steps)
                    production *= 0.1
                    alerts = np.random.poisson(2, time_steps).astype(float)
                elif m in [2, 5]:  # Connected to M1
                    health -= 0.2
                    production *= 0.4
                    alerts += np.random.poisson(0.8, time_steps)
            
            machine_data[m, :, 0] = np.clip(health, 0, 1)
            machine_data[m, :, 1] = np.clip(production, 0, 1)
            machine_data[m, :, 2] = np.clip(energy, 0, 1)
            machine_data[m, :, 3] = alerts
        
        X.append(machine_data)
        y.append(status)
    
    return np.array(X), np.array(y)

# Generate factory data
print("Generating factory health data...")
X_factory, y_factory = generate_factory_health_data(n_samples=800)
print(f"Generated: X={X_factory.shape}")
print(f"Classes: {np.unique(y_factory, return_counts=True)}")

In [None]:
if HAS_TF:
    # Build and train factory-level GNN
    le_factory = LabelEncoder()
    y_factory_enc = le_factory.fit_transform(y_factory)
    
    # Split
    X_train_f, X_test_f, y_train_f, y_test_f = train_test_split(
        X_factory, y_factory_enc, test_size=0.2, random_state=42, stratify=y_factory_enc
    )
    
    adj_train_f = np.tile(factory_adj[np.newaxis, :, :], (len(X_train_f), 1, 1))
    adj_test_f = np.tile(factory_adj[np.newaxis, :, :], (len(X_test_f), 1, 1))
    
    # Build model
    factory_gnn = build_gnn_classifier(
        n_sensors=8,  # machines
        time_steps=50,
        n_features=4,
        n_classes=len(le_factory.classes_),
        gcn_units=[32, 16],
        use_attention=True
    )
    
    factory_gnn.compile(
        optimizer=keras.optimizers.Adam(1e-3),
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )
    
    print("Training Factory GNN...")
    factory_gnn.fit(
        [X_train_f, adj_train_f], y_train_f,
        validation_split=0.15,
        epochs=40,
        batch_size=32,
        callbacks=[
            keras.callbacks.EarlyStopping(patience=8, restore_best_weights=True)
        ],
        verbose=1
    )
    
    # Evaluate
    y_pred_f = factory_gnn.predict([X_test_f, adj_test_f]).argmax(axis=1)
    print("\nFactory GNN Results:")
    print(classification_report(y_test_f, y_pred_f, target_names=le_factory.classes_))

## 6. Save Models

In [None]:
if HAS_TF:
    # Save models
    gnn_model.save(f'{MODEL_DIR}/gnn_sensor_classifier.keras')
    gnn_ae.save(f'{MODEL_DIR}/gnn_autoencoder.keras')
    factory_gnn.save(f'{MODEL_DIR}/gnn_factory_monitor.keras')
    
    # Save adjacency matrices
    np.save(f'{MODEL_DIR}/sensor_adjacency.npy', adj_matrix)
    np.save(f'{MODEL_DIR}/factory_adjacency.npy', factory_adj)
    
    # Save metadata
    metadata = {
        'model_type': 'Graph Neural Network',
        'models': {
            'sensor_classifier': {
                'file': 'gnn_sensor_classifier.keras',
                'adjacency': 'sensor_adjacency.npy',
                'n_sensors': 12,
                'sensor_names': sensor_names,
                'classes': le.classes_.tolist()
            },
            'autoencoder': {
                'file': 'gnn_autoencoder.keras',
                'auc': float(roc_auc)
            },
            'factory_monitor': {
                'file': 'gnn_factory_monitor.keras',
                'adjacency': 'factory_adjacency.npy',
                'n_machines': 8,
                'classes': le_factory.classes_.tolist()
            }
        },
        'advantages': [
            'Models sensor/machine relationships explicitly',
            'Distinguishes sensor faults from system faults',
            'Scalable to varying network sizes',
            'Attention weights provide explainability'
        ]
    }
    
    with open(f'{MODEL_DIR}/gnn_metadata.json', 'w') as f:
        json.dump(metadata, f, indent=2)
    
    print(f"\nModels saved to {MODEL_DIR}/")

## 7. Node-RED Integration

In [None]:
node_red_code = '''
// Node-RED Function: GNN Sensor Network Monitoring
// Collects data from multiple sensors and predicts system health

const N_SENSORS = 12;
const TIME_WINDOW = 100;

// Sensor network topology (adjacency matrix)
// This should match the trained model's expected structure
const ADJACENCY = flow.get("sensor_adjacency") || [
    // Define your sensor connections here
    // 1 = connected, 0 = not connected
];

// Initialize buffers for each sensor
if (!context.sensorBuffers) {
    context.sensorBuffers = {};
    for (let i = 0; i < N_SENSORS; i++) {
        context.sensorBuffers[i] = [];
    }
}

// Add reading to appropriate sensor buffer
const sensorId = msg.payload.sensorId;
const reading = [
    msg.payload.value,
    msg.payload.trend || 0,
    msg.payload.highFreq || 0,
    msg.payload.level || 0.5
];

context.sensorBuffers[sensorId].push(reading);

// Keep only last TIME_WINDOW readings
if (context.sensorBuffers[sensorId].length > TIME_WINDOW) {
    context.sensorBuffers[sensorId].shift();
}

// Check if all sensors have enough data
let ready = true;
for (let i = 0; i < N_SENSORS; i++) {
    if (context.sensorBuffers[i].length < TIME_WINDOW) {
        ready = false;
        break;
    }
}

if (!ready) {
    msg.payload = { status: "collecting", sensors: context.sensorBuffers };
    return msg;
}

// Prepare data for GNN model
// Shape: [1, n_sensors, time_steps, features]
let nodeFeatures = [];
for (let i = 0; i < N_SENSORS; i++) {
    nodeFeatures.push(context.sensorBuffers[i]);
}

msg.payload = {
    nodeFeatures: [nodeFeatures],
    adjacency: [ADJACENCY],
    model: "gnn_sensor_classifier"
};

return msg;
''';

print("Node-RED Integration Code:")
print("=" * 50)
print(node_red_code)

## Summary

This notebook demonstrated **Graph Neural Networks** for Predictive Maintenance:

### Key Concepts:

| Component | Purpose |
|-----------|--------|
| **Graph Convolution (GCN)** | Aggregate neighbor features with mean pooling |
| **Graph Attention (GAT)** | Weighted aggregation using attention |
| **Spatio-Temporal GNN** | Combine graph + temporal patterns |
| **Graph Autoencoder** | Anomaly detection via reconstruction |

### Use Cases:

1. **Sensor Networks** - Model physical connections between sensors
2. **Multi-Machine Monitoring** - Factory-level health prediction
3. **Fault Propagation** - Track how failures spread through system
4. **Sensor vs System Faults** - Distinguish local vs global issues

### When to Use GNNs:

- Multiple interconnected sensors/machines
- Known physical or logical relationships
- Need to explain which connections matter
- Varying number of nodes (scalable)