# üè≠ Smart Industrial Maintenance System ‚Äî GPU Training Notebook

**FSE 570 Capstone** | Arizona State University

This notebook runs the complete training pipeline on Google Colab with GPU acceleration.

---

## 1. Setup Environment

In [1]:
# Check GPU availability
import torch
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    props = torch.cuda.get_device_properties(0)
    vram = getattr(props, 'total_memory', getattr(props, 'total_mem', 0))
    print(f"VRAM: {vram / 1e9:.1f} GB")

PyTorch: 2.10.0+cu128
CUDA available: True
GPU: NVIDIA L4
VRAM: 23.7 GB


In [2]:
# Install dependencies
!pip install -q xgboost lifelines shap pulp

  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m350.0/350.0 kB[0m [31m29.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m117.3/117.3 kB[0m [31m13.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for autograd-gamma (setup.py) ... [?25l[?25hdone


In [3]:
# Clone your project repo
!git clone https://github.com/SivaKanth007/Capstone-Project.git
%cd Capstone-Project

Cloning into 'Capstone-Project'...
remote: Enumerating objects: 55, done.[K
remote: Counting objects: 100% (55/55), done.[K
remote: Compressing objects: 100% (37/37), done.[K
remote: Total 55 (delta 9), reused 54 (delta 8), pack-reused 0 (from 0)[K
Receiving objects: 100% (55/55), 826.59 KiB | 907.00 KiB/s, done.
Resolving deltas: 100% (9/9), done.
/content/Capstone-Project


## 2. Download & Preprocess Data

The download module will automatically try direct download first ‚Äî **no Kaggle credentials needed**.
If the direct download fails, it falls back to a synthetic data generator.

In [4]:
import os
import numpy as np
import config
from src.data.download import download_cmapss, load_cmapss_train
from src.data.preprocess import DataPreprocessor
from src.data.feature_engineering import FeatureEngineer
from src.data.synthetic_generator import SyntheticDataGenerator

# Download C-MAPSS dataset (direct URL, no auth)
download_cmapss()
df_train = load_cmapss_train()
print(f"\nTraining data: {df_train.shape}")
df_train.head()

[CONFIG] Using device: cuda
[CONFIG] GPU: NVIDIA L4
[CONFIG] VRAM: 23.7 GB
[DOWNLOAD] Attempting direct download (no authentication required)...
[DOWNLOAD] Downloading from: https://phm-datasets.s3.amazonaws.com/NASA/6.+Turbofan+Engine+Degradation+Simula...
[DOWNLOAD] Progress: 100% (12.4 MB)
[DOWNLOAD] Extracting...
[DOWNLOAD] Extracting nested zip: CMAPSSData.zip
[DOWNLOAD] Direct download successful!
[DOWNLOAD] Available files:
  - Damage Propagation Modeling.pdf (0.43 MB)
  - RUL_FD001.txt (0.00 MB)
  - RUL_FD002.txt (0.00 MB)
  - RUL_FD003.txt (0.00 MB)
  - RUL_FD004.txt (0.00 MB)
  - readme.txt (0.00 MB)
  - test_FD001.txt (2.23 MB)
  - test_FD002.txt (5.73 MB)
  - test_FD003.txt (2.83 MB)
  - test_FD004.txt (6.96 MB)
  - train_FD001.txt (3.52 MB)
  - train_FD002.txt (9.08 MB)
  - train_FD003.txt (4.21 MB)
  - train_FD004.txt (10.35 MB)
[DOWNLOAD] Loading training data from: /content/Capstone-Project/data/raw/train_FD001.txt
[DOWNLOAD] Loaded 20631 rows, 100 units
[DOWNLOAD] Cycl

Unnamed: 0,unit_id,cycle,op_setting_1,op_setting_2,op_setting_3,sensor_1,sensor_2,sensor_3,sensor_4,sensor_5,...,sensor_13,sensor_14,sensor_15,sensor_16,sensor_17,sensor_18,sensor_19,sensor_20,sensor_21,RUL
0,1,1,-0.0007,-0.0004,100.0,518.67,641.82,1589.7,1400.6,14.62,...,2388.02,8138.62,8.4195,0.03,392,2388,100.0,39.06,23.419,125
1,1,2,0.0019,-0.0003,100.0,518.67,642.15,1591.82,1403.14,14.62,...,2388.07,8131.49,8.4318,0.03,392,2388,100.0,39.0,23.4236,125
2,1,3,-0.0043,0.0003,100.0,518.67,642.35,1587.99,1404.2,14.62,...,2388.03,8133.23,8.4178,0.03,390,2388,100.0,38.95,23.3442,125
3,1,4,0.0007,0.0,100.0,518.67,642.35,1582.79,1401.87,14.62,...,2388.08,8133.83,8.3682,0.03,392,2388,100.0,38.88,23.3739,125
4,1,5,-0.0019,-0.0002,100.0,518.67,642.37,1582.85,1406.22,14.62,...,2388.04,8133.8,8.4294,0.03,393,2388,100.0,38.9,23.4044,125


In [5]:
# Generate synthetic data
gen = SyntheticDataGenerator()
logs, context, schedule = gen.generate_all(df_train)
print(f"Maintenance logs: {logs.shape}")
print(f"Operational context: {context.shape}")

Generating Synthetic Industrial Data
[SYNTHETIC] Generated 444 maintenance logs for 100 units
  Planned: 292 | Unplanned: 152
  Total cost: $4,030,800
[SYNTHETIC] Generated operational context for 100 units

[SYNTHETIC] All data saved to /content/Capstone-Project/data/synthetic
Maintenance logs: (444, 8)
Operational context: (100, 9)


In [6]:
# Feature engineering (for XGBoost)
fe = FeatureEngineer()
df_engineered = fe.engineer_features(df_train.copy())
print(f"Engineered features: {df_engineered.shape}")

Running Feature Engineering Pipeline
[FEATURES] Added cycle_norm and cycle_squared features


  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] =

[FEATURES] Added 168 rolling features (3 windows √ó 4 stats √ó 14 sensors)


  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series
  df[col_name] = series


[FEATURES] Added 14 trend features


  df["operating_regime"] = self.regime_model.fit_predict(df[op_cols])
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = np.nan
  df[col_name] = n

[FEATURES] Clustered 2 settings into 3 regimes:
  Regime 0: 5208 observations (25.2%)
  Regime 1: 5800 observations (28.1%)
  Regime 2: 9623 observations (46.6%)
[FEATURES] Added 42 lag features (3 lags √ó 14 sensors)
[FEATURES] Added 20 interaction features from top-5 sensors

[FEATURES] Total features: 27 ‚Üí 274 (+247 engineered)
Engineered features: (20631, 274)


  df[lag_cols] = df[lag_cols].fillna(method="bfill").fillna(0)
  df[f"{s1}_x_{s2}"] = df[s1] * df[s2]
  df[f"{s1}_div_{s2}"] = df[s1] / (df[s2] + 1e-8)
  df[f"{s1}_x_{s2}"] = df[s1] * df[s2]
  df[f"{s1}_div_{s2}"] = df[s1] / (df[s2] + 1e-8)
  df[f"{s1}_x_{s2}"] = df[s1] * df[s2]
  df[f"{s1}_div_{s2}"] = df[s1] / (df[s2] + 1e-8)
  df[f"{s1}_x_{s2}"] = df[s1] * df[s2]
  df[f"{s1}_div_{s2}"] = df[s1] / (df[s2] + 1e-8)
  df[f"{s1}_x_{s2}"] = df[s1] * df[s2]
  df[f"{s1}_div_{s2}"] = df[s1] / (df[s2] + 1e-8)
  df[f"{s1}_x_{s2}"] = df[s1] * df[s2]
  df[f"{s1}_div_{s2}"] = df[s1] / (df[s2] + 1e-8)
  df[f"{s1}_x_{s2}"] = df[s1] * df[s2]
  df[f"{s1}_div_{s2}"] = df[s1] / (df[s2] + 1e-8)
  df[f"{s1}_x_{s2}"] = df[s1] * df[s2]
  df[f"{s1}_div_{s2}"] = df[s1] / (df[s2] + 1e-8)
  df[f"{s1}_x_{s2}"] = df[s1] * df[s2]
  df[f"{s1}_div_{s2}"] = df[s1] / (df[s2] + 1e-8)
  df[f"{s1}_x_{s2}"] = df[s1] * df[s2]
  df[f"{s1}_div_{s2}"] = df[s1] / (df[s2] + 1e-8)


In [7]:
# Preprocessing pipeline (for LSTM models)
preprocessor = DataPreprocessor()
data = preprocessor.fit_transform(df_train)
preprocessor.save()

for split_name, split_data in data.items():
    np.savez_compressed(
        os.path.join(config.PROCESSED_DATA_DIR, f"{split_name}_data.npz"),
        **split_data
    )

X_train = data['train']['X']
y_train_rul = data['train']['y_rul']
y_train_binary = data['train']['y_binary']
X_val = data['val']['X']
y_val_binary = data['val']['y_binary']

n_features = X_train.shape[2]
print(f"Sequences: {X_train.shape}, Features: {n_features}")

Running Full Preprocessing Pipeline
[PREPROCESS] Dropped 9 constant columns: ['sensor_1', 'sensor_5', 'sensor_6', 'sensor_10', 'sensor_16', 'sensor_18', 'sensor_19', 'op_setting_3', 'op_setting_2']
[PREPROCESS] No missing values detected
[PREPROCESS] Split: train=70 units (14316 rows), val=15 units (3170 rows), test=15 units (3145 rows)
[PREPROCESS] Fitted scaler on 15 features
[PREPROCESS] Created 12286 sequences of shape (30, 15)
[PREPROCESS] train: X=(12286, 30, 15), y_rul range=[0, 125], failure_rate=17.66%
[PREPROCESS] Created 2735 sequences of shape (30, 15)
[PREPROCESS] val: X=(2735, 30, 15), y_rul range=[0, 125], failure_rate=17.00%
[PREPROCESS] Created 2710 sequences of shape (30, 15)
[PREPROCESS] test: X=(2710, 30, 15), y_rul range=[0, 125], failure_rate=17.16%
[PREPROCESS] Saved preprocessor to /content/Capstone-Project/models/saved/preprocessor.pkl
Sequences: (12286, 30, 15), Features: 15


## 3. Train LSTM Autoencoder (Anomaly Detection)

In [8]:
from src.models.autoencoder import LSTMAutoencoder, AutoencoderTrainer

# Train on healthy data only
healthy_mask = y_train_rul > config.MAX_RUL * 0.5
X_healthy = X_train[healthy_mask]
X_val_ae = X_val[data['val']['y_rul'] > config.MAX_RUL * 0.5]

print(f"Training autoencoder on {len(X_healthy)} healthy samples")
print(f"Device: {config.DEVICE}")

autoencoder = LSTMAutoencoder(input_dim=n_features)
ae_trainer = AutoencoderTrainer(autoencoder, epochs=50)
ae_trainer.train(X_healthy, X_val_ae)
ae_trainer.save_model()

Training autoencoder on 7876 healthy samples
Device: cuda


TypeError: ReduceLROnPlateau.__init__() got an unexpected keyword argument 'verbose'

In [None]:
# Visualize training loss
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(10, 4))
ax.plot(ae_trainer.train_history, label='Train Loss', color='#3a7bd5')
if ae_trainer.val_history:
    ax.plot(ae_trainer.val_history, label='Val Loss', color='#FF6B6B')
ax.set_xlabel('Epoch')
ax.set_ylabel('MSE Loss')
ax.set_title('Autoencoder Training')
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 4. Train LSTM Failure Predictor

In [None]:
from src.models.lstm_predictor import LSTMPredictor, PredictorTrainer

predictor = LSTMPredictor(input_dim=n_features)
pred_trainer = PredictorTrainer(predictor, epochs=50)
pred_trainer.train(X_train, y_train_binary, X_val, y_val_binary)
pred_trainer.save_model()

In [None]:
# Visualize predictor training
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 4))

ax1.plot(pred_trainer.train_history, label='Train Loss', color='#3a7bd5')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.set_title('Predictor Training Loss')
ax1.legend()
ax1.grid(True, alpha=0.3)

if pred_trainer.val_history:
    epochs_list = range(5, len(pred_trainer.val_history) * 5 + 1, 5)
    f1s = [m.get('f1', 0) for m in pred_trainer.val_history]
    aucs = [m.get('auc', 0) for m in pred_trainer.val_history]
    ax2.plot(epochs_list, f1s, label='F1 Score', color='#44BB44', marker='o', markersize=4)
    ax2.plot(epochs_list, aucs, label='AUC', color='#FF6B6B', marker='s', markersize=4)
    ax2.set_xlabel('Epoch')
    ax2.set_ylabel('Score')
    ax2.set_title('Validation Metrics')
    ax2.legend()
    ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 5. Train XGBoost RUL Model

In [None]:
from src.models.xgboost_rul import XGBoostRUL

exclude_cols = ['unit_id', 'cycle', 'RUL']
feature_cols = [c for c in df_engineered.columns if c not in exclude_cols]

unit_ids = df_engineered['unit_id'].unique()
np.random.seed(config.RANDOM_SEED)
np.random.shuffle(unit_ids)
n = len(unit_ids)
train_units = unit_ids[:int(n * 0.7)]
val_units = unit_ids[int(n * 0.7):int(n * 0.85)]

X_train_xgb = df_engineered[df_engineered['unit_id'].isin(train_units)][feature_cols]
y_train_xgb = df_engineered[df_engineered['unit_id'].isin(train_units)]['RUL'].values
X_val_xgb = df_engineered[df_engineered['unit_id'].isin(val_units)][feature_cols]
y_val_xgb = df_engineered[df_engineered['unit_id'].isin(val_units)]['RUL'].values

xgb_model = XGBoostRUL()
xgb_model.train(X_train_xgb, y_train_xgb, X_val_xgb, y_val_xgb,
                feature_names=feature_cols)
xgb_model.evaluate(X_val_xgb, y_val_xgb)
xgb_model.save()

## 6. Bayesian Survival Analysis

In [None]:
from src.models.bayesian_survival import BayesianSurvival

survival_features = config.ACTIVE_SENSORS + ['cycle']
survival_cols = [c for c in survival_features if c in df_train.columns] + ['RUL']

df_survival_train = df_train[df_train['unit_id'].isin(train_units)][['unit_id'] + survival_cols]

survival_model = BayesianSurvival()
survival_model.fit(df_survival_train)

df_survival_val = df_train[df_train['unit_id'].isin(val_units)][['unit_id'] + survival_cols]
survival_model.evaluate(df_survival_val)
survival_model.save()

## 7. Explainability (SHAP & Attention)

In [None]:
from src.explainability.shap_analysis import SHAPExplainer

# SHAP for XGBoost
shap_explainer = SHAPExplainer(xgb_model, model_type='xgboost')
shap_explainer.compute_shap_values(X_val_xgb)
shap_explainer.plot_global_importance(save_path='shap_importance.png')
shap_explainer.plot_beeswarm(save_path='shap_beeswarm.png')
ranking = shap_explainer.get_sensor_ranking()
ranking.head(15)

In [None]:
from src.explainability.attention_viz import AttentionVisualizer

# Attention visualization
attn_viz = AttentionVisualizer(predictor)
attn_viz.plot_attention_heatmap(data['test']['X'], save_path='attention_heatmap.png')
attn_viz.plot_average_attention(data['test']['X'], data['test']['y_binary'],
                                save_path='attention_comparison.png')

## 8. MILP Optimization & Simulation

In [None]:
from src.optimization.milp_scheduler import MaintenanceScheduler

# Get predictions for test data
failure_proba, _ = predictor.predict_proba(torch.FloatTensor(data['test']['X']).to(config.DEVICE))

# Aggregate per unit
unit_risks = {}
for uid in np.unique(data['test']['unit_ids']):
    mask = data['test']['unit_ids'] == uid
    unit_risks[int(uid)] = float(failure_proba[mask][-1])

# Run MILP optimization
scheduler = MaintenanceScheduler()
result = scheduler.create_schedule(
    machine_risks=unit_risks,
    machine_names={uid: f'Engine-{uid:03d}' for uid in unit_risks}
)
result['schedule']

In [None]:
# Monte Carlo simulation
from src.evaluation.simulation import MaintenanceSimulator

sim = MaintenanceSimulator(n_machines=50, n_periods=100)
sim_df, sim_summary = sim.run_comparison(n_simulations=100)
sim.plot_comparison(sim_df, save_path='simulation_comparison.png')

## 9. Download Results

Download trained models and results back to local machine.

In [None]:
# Save all results to a zip for download
import shutil
shutil.make_archive('capstone_results', 'zip', '.', 'models/saved')

# In Colab, download the zip:
try:
    from google.colab import files
    files.download('capstone_results.zip')
except ImportError:
    print('Not in Colab. Files saved locally.')

---
**Project**: FSE 570 Data Science Capstone | Arizona State University