# Notebook 1 – Turbofan Engine RUL Prediction (C-MAPSS)

This notebook begins the end-to-end workflow for predicting Remaining Useful Life (RUL) of turbofan engines using the NASA C-MAPSS dataset.

### 1.1 Objective, Contribution & Source Citation

---

## **Objective**

The goal of this project is to build a data-driven model that predicts the **Remaining Useful Life (RUL)** of turbofan engines using the NASA C-MAPSS simulation data. Each engine in the dataset is operated from a nominal healthy state until failure under varying operational conditions and realistic sensor noise. The objective is to learn a mapping from multivariate sensor measurements and operational settings to a continuous RUL value for each engine cycle. Accurate RUL estimation enables condition-based maintenance, reduces unexpected failures, improves fleet reliability, and lowers operational cost in real aerospace systems.

---

## **Why ML Is Necessary (What Physics-Based Models Cannot Capture)**

Physics-based degradation models for gas turbine engines require detailed knowledge of compressor efficiency curves, turbine flow capacities, stall margins, component health parameters, and thermodynamic states. The NASA *Damage Propagation Modeling* document explicitly states that **fault progression is nonlinear, sensor-dependent, and influenced by operating conditions**. Modeling this directly would require component-level degradation equations for:

- High-Pressure Compressor (HPC) efficiency loss  
- Fan flow deterioration  
- Turbine efficiency drift  
- Variable bleed valve interactions  
- Unobservable internal wear mechanisms  

These relationships vary per engine due to manufacturing differences, usage patterns, and environmental exposure.

Furthermore, C-MAPSS introduces:

- **unobserved initial wear**,  
- **stochastic fault propagation rates**,  
- **sensor contamination and noise**,  
- **multiple operating regimes**,  
- **nonlinear sensor interactions**.

Because degradation is intentionally simulated with randomized rates and multi-regime operating conditions, building an analytical or closed-form physics model becomes impractical. Classical regression and simple statistical models also fail because they cannot generalize across the 100–260 engines in FD001–FD004, each with unique starting conditions and degradation trajectories (as documented in the dataset *readme*).

Machine learning, however, can:

- learn nonlinear degradation behavior directly from data,  
- fuse all **21–26 sensor channels** simultaneously,  
- capture multi-regime patterns,  
- infer hidden health states that are not physically measurable.

This makes ML uniquely suitable for the RUL prediction problem.

---

## **Novelty of My Our Work**

In typical course-level C-MAPSS demonstrations, the workflow is limited to raw sensor values, a single baseline model, and no robustness or deployment pipeline. We extend this significantly through four main innovations:

---

### **1. Feature Engineering Beyond Standard Approaches**

We engineer features that approximate hidden health indicators such as compressor/turbine efficiency degradation by using:

- cycle-normalized degradation indicators  
- moving-window statistics that mimic health index behavior  
- operating-condition–aware normalization  
- inter-sensor ratios used in PHM research (e.g., temperature–pressure ratios)  
- sensor deltas that capture trajectory curvature  

These features allow the model to approximate nonlinear damage propagation described in the NASA *Damage Propagation Modeling* document.

---

### **2. Multiple Competing ML Models**

Instead of relying on one baseline model, We compare several models:

- Tuned **MLPRegressor**  
- **Random Forest Regressor**  
- **Gradient Boosting / XGBoost** (if allowed)  
- **KNN Regressor**  
- Linear regression models as baselines  

All models will be tuned using **GridSearchCV or RandomizedSearchCV**, not default settings.

---

### **3. Robustness + Uncertainty Quantification (in Notebook #2)**

We extend the evaluation by computing:

- bootstrap-based confidence intervals,  
- repeated train/test splits to measure variance,  
- prediction spread across resamples,  
- stability of RUL predictions across engine units.

This type of robustness analysis is *rare* in student submissions.

---

### **4. Full Deployment Pipeline (in Notebook #3)**

We construct an end-to-end pipeline that:

- uses the same fitted **StandardScaler**,  
- applies identical preprocessing and feature engineering,  
- loads the final trained ML model,  
- processes a new incoming batch from **test_FDXXX**,  
- outputs RUL predictions exactly as a real maintenance system would.

This demonstrates **full reproducibility**, which is not offered in basic C-MAPSS tutorials.

---

## **Contribution (What We Actually Improve)**

- **Accuracy:** Improved performance through engineered features and tuned ML models.  
- **Model Stability:** Reduced variance and better reliability through robustness analysis.  
- **Interpretability:** Physically meaningful features help relate predictions to HPC and fan degradation.  
- **Practical Usability:** A working deployment pipeline mirrors real-world PHM workflows.  

---

## ** Source Citation**

This project uses the **NASA C-MAPSS Turbofan Engine Degradation Simulation Data Sets (FD001–FD004)**, provided by the NASA Prognostics Center of Excellence (PCoE).

Primary reference:  
A. Saxena et al., *“Damage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation,”* NASA Ames PHM'08.  
(Reference document uploaded as: **Damage Propagation Modeling.pdf**)  

Dataset description obtained from the provided NASA **readme.txt**, which details operational conditions, fault modes, engine counts, and sensor layout for FD001–FD004.
