# 1. Project Overview

Modern aircraft engines generate large volumes of sensor data during operation. Monitoring this data effectively is critical for ensuring engine reliability, reducing unplanned downtime, and enabling proactive maintenance decisions. Traditional reactive maintenance approaches can lead to increased operational costs and safety risks, making predictive maintenance a key focus area in the aerospace industry.

This project focuses on building an AI-driven predictive maintenance system for aircraft engines using historical sensor data. The objective is to analyze multivariate time-series data collected from aircraft engines and develop machine learning models that can estimate the Remaining Useful Life (RUL) of an engine and detect early signs of abnormal behavior. By identifying degradation patterns before failure occurs, the system aims to support data-driven maintenance planning and risk mitigation.

The project leverages the NASA C-MAPSS turbofan engine dataset, which contains run-to-failure sensor readings across multiple engine life cycles. The workflow includes data preprocessing, feature engineering, model training and evaluation, and result interpretation. In addition to predictive modeling, the project incorporates an explainability layer using GenAI concepts, enabling the generation of human-readable maintenance insights from model outputs.

Overall, this project demonstrates an end-to-end application of machine learning and AI techniques to a real-world aerospace engineering problem, highlighting skills in data analysis, predictive modeling, and AI-assisted decision support.

# 2. Business & Engineering Problem

Aircraft engines operate under complex and varying conditions and are subject to gradual degradation over time. Unplanned engine failures can lead to significant operational disruptions, increased maintenance costs, and potential safety risks. From a business perspective, aerospace organizations must balance engine reliability, maintenance efficiency, and cost control, while ensuring strict compliance with safety and regulatory standards.

Traditional maintenance strategies are often reactive or schedule-based, relying on fixed inspection intervals or component lifetimes. These approaches may result in premature maintenance, unnecessary component replacement, or delayed detection of critical issues. As a result, there is a strong business need for predictive maintenance systems that can accurately assess engine health and forecast failure risk based on actual operating data.

From an engineering standpoint, the challenge lies in analyzing high-dimensional, multivariate time-series sensor data collected across multiple engine life cycles. Each engine exhibits unique operational patterns, degradation rates, and noise characteristics, making it difficult to directly model failure behavior. Engineers must extract meaningful features, identify degradation trends, and distinguish between normal operational variability and true fault conditions.

The core engineering problem addressed in this project is to develop an AI-driven approach that can:

  - Estimate the Remaining Useful Life (RUL) of aircraft engines based on historical sensor readings.

  - Detect early-stage anomalies that may indicate abnormal degradation or impending failure.

  - Provide interpretable insights that support maintenance decision-making rather than producing opaque predictions.

By solving this problem, the system aims to support data-driven maintenance planning, reduce unplanned downtime, and improve overall operational efficiency, aligning both engineering objectives and business outcomes in the aerospace domain.

# 3. Dataset Description

This project uses the NASA C-MAPSS (Commercial Modular Aero-Propulsion System Simulation) turbofan engine dataset, a publicly available benchmark dataset designed for predictive maintenance and remaining useful life (RUL) estimation in aerospace applications. The dataset contains simulated run-to-failure sensor data collected from multiple aircraft engines operating under varying conditions.

Each engine in the dataset is monitored over multiple operational cycles, starting from a healthy state and progressing toward system failure. At each cycle, a set of operational settings and sensor measurements is recorded, capturing the gradual degradation behavior of the engine.

- Dataset Structure

 - The FD001 subset of the C-MAPSS dataset consists of three primary files:

   - train_FD001.txt:
     Contains complete engine life cycles from start-up until failure. This data is used for model training and learning degradation patterns.

   - test_FD001.txt:
     Contains partial engine life cycles where engines have not yet failed. This data is used for model evaluation.

   - RUL_FD001.txt:
     Provides the true Remaining Useful Life values for each engine in the test dataset, enabling quantitative evaluation of model predictions.

- Features and Variables

  - Each record in the dataset includes:

    - Engine ID: Unique identifier for each engine.

    - Cycle: Time step representing an operational cycle.
 
    - Operational Settings (3 variables): Represent external or operational conditions under which the engine is running.

    - Sensor Measurements (21 variables): Continuous measurements capturing engine behavior such as temperature, pressure, and rotational speed.

All sensor values are numeric and sampled at consistent intervals, making the dataset suitable for time-series analysis and machine learning-based modeling.

- Dataset Characteristics

  - Multivariate time-series data

  - High dimensional (25+ features per time step)

  - Engine-specific degradation trajectories

  - No missing values in raw data

Failure events occur only in the training set

# 4. Exploratory Data Analysis(EDA)


In [1]:
import pandas as pd

columns = [
    "engine_id", "cycle",
    "op_setting_1", "op_setting_2", "op_setting_3",
    "sensor_1", "sensor_2", "sensor_3", "sensor_4", "sensor_5",
    "sensor_6", "sensor_7", "sensor_8", "sensor_9", "sensor_10",
    "sensor_11", "sensor_12", "sensor_13", "sensor_14", "sensor_15",
    "sensor_16", "sensor_17", "sensor_18", "sensor_19", "sensor_20",
    "sensor_21"
]

In [5]:
train_df = pd.read_csv(
    "data/train_FD001.txt",
    sep=" ",
    header=None,
    names=columns
)

train_df.head()

Unnamed: 0,Unnamed: 1,engine_id,cycle,op_setting_1,op_setting_2,op_setting_3,sensor_1,sensor_2,sensor_3,sensor_4,sensor_5,...,sensor_12,sensor_13,sensor_14,sensor_15,sensor_16,sensor_17,sensor_18,sensor_19,sensor_20,sensor_21
1,1,-0.0007,-0.0004,100.0,518.67,641.82,1589.7,1400.6,14.62,21.61,554.36,...,8138.62,8.4195,0.03,392,2388,100.0,39.06,23.419,,
1,2,0.0019,-0.0003,100.0,518.67,642.15,1591.82,1403.14,14.62,21.61,553.75,...,8131.49,8.4318,0.03,392,2388,100.0,39.0,23.4236,,
1,3,-0.0043,0.0003,100.0,518.67,642.35,1587.99,1404.2,14.62,21.61,554.26,...,8133.23,8.4178,0.03,390,2388,100.0,38.95,23.3442,,
1,4,0.0007,0.0,100.0,518.67,642.35,1582.79,1401.87,14.62,21.61,554.45,...,8133.83,8.3682,0.03,392,2388,100.0,38.88,23.3739,,
1,5,-0.0019,-0.0002,100.0,518.67,642.37,1582.85,1406.22,14.62,21.61,554.0,...,8133.8,8.4294,0.03,393,2388,100.0,38.9,23.4044,,


In [6]:
test_df = pd.read_csv(
    "data/test_FD001.txt",
    sep=" ",
    header=None,
    names= columns
)

test_df.head()

Unnamed: 0,Unnamed: 1,engine_id,cycle,op_setting_1,op_setting_2,op_setting_3,sensor_1,sensor_2,sensor_3,sensor_4,sensor_5,...,sensor_12,sensor_13,sensor_14,sensor_15,sensor_16,sensor_17,sensor_18,sensor_19,sensor_20,sensor_21
1,1,0.0023,0.0003,100.0,518.67,643.02,1585.29,1398.21,14.62,21.61,553.9,...,8125.55,8.4052,0.03,392,2388,100.0,38.86,23.3735,,
1,2,-0.0027,-0.0003,100.0,518.67,641.71,1588.45,1395.42,14.62,21.61,554.85,...,8139.62,8.3803,0.03,393,2388,100.0,39.02,23.3916,,
1,3,0.0003,0.0001,100.0,518.67,642.46,1586.94,1401.34,14.62,21.61,554.11,...,8130.1,8.4441,0.03,393,2388,100.0,39.08,23.4166,,
1,4,0.0042,0.0,100.0,518.67,642.44,1584.12,1406.42,14.62,21.61,554.07,...,8132.9,8.3917,0.03,391,2388,100.0,39.0,23.3737,,
1,5,0.0014,0.0,100.0,518.67,642.51,1587.19,1401.92,14.62,21.61,554.16,...,8129.54,8.4031,0.03,390,2388,100.0,38.99,23.413,,


In [9]:
rul_df = pd.read_csv(
    "data/RUL_FD001.txt",
    header=None,
    names=["RUL"]
)

rul_df.head()

Unnamed: 0,RUL
0,112
1,98
2,69
3,82
4,91


In [10]:
print("Test shape:", test_df.shape)
print("RUL shape:", rul_df.shape)

print("Unique engines in test:", test_df["engine_id"].nunique())
print("Rows in RUL:", len(rul_df))


Test shape: (13096, 26)
RUL shape: (100, 1)
Unique engines in test: 150
Rows in RUL: 100


# 5. Data Preprocessing

# 6. Feature Engineering

# 7. Model Building

# 8. Model Evaluation

# 9. Anomaly Detection

# 10. Explainability and Insights   

# 11. Gen AI Assistant

# 12. Limitation and Future Scope