# <center>AI-Driven Predictive Maintenance Model Development


Predictive maintenance is a proactive approach that leverages data analysis and machine learning to predict equipment failures before they occur. By anticipating maintenance needs, organizations can prevent unexpected breakdowns, optimize maintenance schedules, and reduce operational costs. One effective way to implement predictive maintenance is through the development of a classification model using AI and machine learning techniques.

# Why Use a Classification Model for Predictive Maintenance?
 
 ### Predicting Failure vs. No-Failure:

In predictive maintenance, the primary goal is to classify whether an equipment component is likely to fail within a certain timeframe. This is a classic binary classification problem where the outcomes are typically labeled as "failure" or "no failure."
By predicting these outcomes accurately, maintenance activities can be scheduled proactively, thereby avoiding unexpected downtimes and costly repairs.

### Handling Imbalanced Datasets:

In many industrial settings, the instances of failure are significantly fewer than instances of normal operation. Classification models can be tuned to handle such imbalanced datasets effectively.
Techniques such as class weighting, resampling (over-sampling and under-sampling), and advanced methods like SMOTE can be employed to ensure the model learns to detect the minority class (failures) accurately.

### Precision and Recall Balance:

Precision and recall are critical metrics in predictive maintenance. High precision ensures that when the model predicts a failure, it is likely correct, reducing the number of false alarms. High recall ensures that most of the actual failures are detected, minimizing the risk of unexpected breakdowns.
Classification models allow fine-tuning to achieve an optimal balance between precision and recall, ensuring both accurate and comprehensive maintenance predictions.

### Interpretable Results:

Many classification models, especially tree-based models like Random Forest and Gradient Boosting, provide interpretable results through feature importance scores. This helps in understanding which factors contribute most to equipment failures.
Interpretable models facilitate better decision-making and root cause analysis, enabling maintenance teams to focus on the most critical aspects of equipment performance.


###  Load and Explore the Data

There are actually four datasets in the challenge, but here I will explore the one denoted as FD001. Firstly, I load libraries that will be used in the analysis:


In [3]:
# import data from pandas
import pandas as pd

# Load the dataset
column_names = ["unit_number", "time_in_cycles", "operational_setting_1", "operational_setting_2", "operational_setting_3",
                "sensor_measurement_1", "sensor_measurement_2", "sensor_measurement_3", "sensor_measurement_4",
                "sensor_measurement_5", "sensor_measurement_6", "sensor_measurement_7", "sensor_measurement_8",
                "sensor_measurement_9", "sensor_measurement_10", "sensor_measurement_11", "sensor_measurement_12",
                "sensor_measurement_13", "sensor_measurement_14", "sensor_measurement_15", "sensor_measurement_16",
                "sensor_measurement_17", "sensor_measurement_18", "sensor_measurement_19", "sensor_measurement_20",
                "sensor_measurement_21"]

data = pd.read_csv('../Data/train_FD001.txt', sep=" ", header=None, names=column_names, index_col=False)

# Drop any columns with all NaN values
data.dropna(axis=1, how='all', inplace=True)

# Display basic information about the dataset
print(data.info())
print(data.describe())
print(data.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20631 entries, 0 to 20630
Data columns (total 26 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   unit_number            20631 non-null  int64  
 1   time_in_cycles         20631 non-null  int64  
 2   operational_setting_1  20631 non-null  float64
 3   operational_setting_2  20631 non-null  float64
 4   operational_setting_3  20631 non-null  float64
 5   sensor_measurement_1   20631 non-null  float64
 6   sensor_measurement_2   20631 non-null  float64
 7   sensor_measurement_3   20631 non-null  float64
 8   sensor_measurement_4   20631 non-null  float64
 9   sensor_measurement_5   20631 non-null  float64
 10  sensor_measurement_6   20631 non-null  float64
 11  sensor_measurement_7   20631 non-null  float64
 12  sensor_measurement_8   20631 non-null  float64
 13  sensor_measurement_9   20631 non-null  float64
 14  sensor_measurement_10  20631 non-null  float64
 15  se

  data = pd.read_csv('../Data/train_FD001.txt', sep=" ", header=None, names=column_names, index_col=False)


###  Preprocess the Data



In [4]:
# Create a target variable based on a threshold of remaining useful life (RUL)
# Here, RUL is calculated as the difference between the maximum cycle and the current cycle for each unit

data['RUL'] = data.groupby('unit_number')['time_in_cycles'].transform(lambda x: x.max() - x)

# Define a binary classification target: 1 if RUL <= threshold, else 0
threshold = 30
data['label'] = (data['RUL'] <= threshold).astype(int)

# Drop the RUL column as it's no longer needed
data.drop('RUL', axis=1, inplace=True)

# Split the data into features and target
X = data.drop(['unit_number', 'time_in_cycles', 'label'], axis=1)
y = data['label']

###  Split the Data



In [5]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)





###  Build and Train the Model


In [6]:
from sklearn.ensemble import RandomForestClassifier

# Initialize the model
model = RandomForestClassifier(n_estimators=100, random_state=42)

# Fit the model on the training data
model.fit(X_train, y_train)



###  Evaluate the Model


In [7]:
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))


Accuracy: 0.9612309183426218
Confusion Matrix:
 [[3483   60]
 [ 100  484]]
Classification Report:
               precision    recall  f1-score   support

           0       0.97      0.98      0.98      3543
           1       0.89      0.83      0.86       584

    accuracy                           0.96      4127
   macro avg       0.93      0.91      0.92      4127
weighted avg       0.96      0.96      0.96      4127




###  Optimize the Model 




In [8]:
from sklearn.model_selection import GridSearchCV

# Define the parameter grid
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10]
}

# Initialize the GridSearchCV
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=3, n_jobs=-1, verbose=2)

# Fit the GridSearchCV
grid_search.fit(X_train, y_train)

# Print the best parameters
print("Best parameters found: ", grid_search.best_params_)

# Use the best estimator to make predictions and evaluate
best_model = grid_search.best_estimator_
y_pred_best = best_model.predict(X_test)

print("Accuracy with best model:", accuracy_score(y_test, y_pred_best))
print("Confusion Matrix with best model:\n", confusion_matrix(y_test, y_pred_best))
print("Classification Report with best model:\n", classification_report(y_test, y_pred_best))


Fitting 3 folds for each of 36 candidates, totalling 108 fits
Best parameters found:  {'max_depth': 10, 'min_samples_split': 10, 'n_estimators': 200}
Accuracy with best model: 0.9622001453840562
Confusion Matrix with best model:
 [[3485   58]
 [  98  486]]
Classification Report with best model:
               precision    recall  f1-score   support

           0       0.97      0.98      0.98      3543
           1       0.89      0.83      0.86       584

    accuracy                           0.96      4127
   macro avg       0.93      0.91      0.92      4127
weighted avg       0.96      0.96      0.96      4127



 ###  Model explaination and analysis 
      
      Best Parameters Found  
These are the hyperparameters that were found to be optimal through the GridSearchCV process:

max_depth = 10: The maximum depth of the tree. Restricting the depth helps prevent overfitting.
min_samples_split = 10: The minimum number of samples required to split an internal node.
n_estimators = 200: The number of trees in the forest. More trees can increase the model's robustness and accuracy.
           
         Accuracy with Best Model

The model achieved an accuracy of approximately 96.22%, which means it correctly predicted the class of 96.22% of the instances in the test set.

         Confusion Matrix with Best Model

The confusion matrix provides a detailed breakdown of the model's predictions:

True Negatives (TN): 3485 (model correctly predicted class 0)
False Positives (FP): 58 (model incorrectly predicted class 1)
False Negatives (FN): 98 (model incorrectly predicted class 0)
True Positives (TP): 486 (model correctly predicted class 1)

    Classification Report with Best Model
The classification report provides detailed performance metrics for each class:

Class 0 (Non-Failure):

Precision: 0.97
Recall: 0.98
F1-Score: 0.98
Support: 3543 (number of instances)
Class 1 (Failure):

Precision: 0.89
Recall: 0.83
F1-Score: 0.86
Support: 584 (number of instances)

    Overall Metrics:
Accuracy: 0.96
Macro Average:
Precision: 0.93
Recall: 0.91
F1-Score: 0.92
Weighted Average:
Precision: 0.96
Recall: 0.96
F1-Score: 0.96

    Interpretation:
Precision:

For class 0: 0.97 indicates that 97% of instances predicted as non-failure were correct.
For class 1: 0.89 indicates that 89% of instances predicted as failure were correct.
Recall:

For class 0: 0.98 indicates that 98% of actual non-failure instances were correctly identified.
For class 1: 0.83 indicates that 83% of actual failure instances were correctly identified.
F1-Score:

For class 0: 0.98 indicates a high balance between precision and recall.
For class 1: 0.86 indicates a reasonable balance, but slightly lower than for class 0, reflecting the difficulty in predicting the minority class (failures).
Support: Indicates the number of true instances for each class in the dataset, showing a class imbalance.

    Conclusion:
The model performs very well overall, with high accuracy (96.22%) and strong precision, recall, and F1-scores, especially for the majority class (non-failure).
The model is slightly less effective at predicting the minority class (failure), but still performs reasonably well with an F1-score of 0.86.
Given the high overall accuracy and balanced performance metrics, the model is reliable for predictive maintenance, though there might be room for further improvement in predicting failures.

    Potential Improvements:
  1.Handling Class Imbalance: Techniques such as SMOTE (Synthetic Minority Over-sampling Technique) or class weighting could be used to improve recall for the minority class.
   2.Model Ensemble: Combining multiple models through ensemble techniques might improve overall performance.
   3.Feature Engineering: Further analysis and creation of additional relevant features could enhance model accuracy.

This comprehensive evaluation confirms that your optimized model is robust and effective for the given predictive maintenance task.



In [12]:
# from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, roc_auc_score, log_loss

# # Assuming y_test are the true labels and y_pred are the predicted labels
# # y_prob are the predicted probabilities

# accuracy = accuracy_score(y_test, y_pred)
# precision = precision_score(y_test, y_pred)
# recall = recall_score(y_test, y_pred)
# f1 = f1_score(y_test, y_pred)
# conf_matrix = confusion_matrix(y_test, y_pred)
# roc_auc = roc_auc_score(y_test, y_prob)
# log_loss_value = log_loss(y_test, y_prob)

# print("Accuracy:", accuracy)
# print("Precision:", precision)
# print("Recall:", recall)
# print("F1 Score:", f1)
# print("Confusion Matrix:\n", conf_matrix)
# print("ROC AUC:", roc_auc)
# print("Log Loss:", log_loss_value)


### Deployment and Monitoring:

Deploy the trained model into the production environment where it can continuously monitor equipment performance and predict potential failures in real-time.
Implement a feedback loop to regularly update the model with new data and ensure it adapts to changing operational conditions.

## Conclusion
Using a classification model for predictive maintenance leverages the power of AI to foresee equipment failures and schedule timely interventions. This proactive approach not only enhances equipment reliability and operational efficiency but also reduces maintenance costs and improves overall safety. By systematically developing and optimizing classification models, organizations can achieve robust predictive maintenance solutions tailored to their specific needs.

In [16]:
import plotly.express as px

pipeline = Pipeline([
    ("scaler", StandardScaler()),
    ("pca", PCA(n_components=2))
])
pca_components = pipeline.fit_transform(train_df[features])
for i in range(pca_components.shape[1]):
    train_df[f"pca_component_{i+1}"] = pca_components[:, i]
    
fig = px.scatter_3d(train_df, x="pca_component_1", y="pca_component_2", z="RUL", color="unit_number", size_max=0.002)
fig.update_traces(marker_size = 3)
fig.update_layout(
    title='PCA state visualization',
    title_x=0.5,
    width=800,
    height=800,
    margin=dict(l=65, r=50, b=65, t=90)
)
fig.show()

One interesting observation is that even though the rolling PCA component value of two units could be the same, the corresponding RUL could vary greatly. What could be the cause of this? Is it just randomness or are there other features explaining this behavior? I noted unit numbers 9 and 22 differed greatly. Let’s visualize these values for the second PCA component 2. I’ll use the same code except I’ll change the color to highlight unit 9 and 22 and swap PCA component 1 to PCA component 2:

In [17]:
unit_to_color = { u: u in [9, 22] for u in train_df["unit_number"].unique() }
fig = px.scatter_3d((
    train_df
    .pipe(lambda x: x[x["RUL"] <= x.groupby("unit_number")["time_cycles"].max().min()])
    .assign(
        color=train_df.unit_number.apply(unit_to_color.get)
    )
    .groupby("unit_number", group_keys=False)
    .apply(lambda x: x.assign(rolling_pca_component_2=lambda x: x.pca_component_2.rolling(window=10).mean()))), 
    x="unit_number", 
    y="rolling_pca_component_2", 
    color="color", 
    z="RUL"
)
fig.update_traces(marker_size = 3)
fig.update_layout(
    title='Side by side',
    title_x=0.5,
    width=800,
    height=800,
    margin=dict(l=65, r=50, b=65, t=90)
)
fig.show()

As can be seen, for the second PCA component these units seem to be on opposite sides in the graph. Almost like the second PCA component represents some kind of acceleration of the first PCA component from the healthy small values to the larger values they have at failure. Let’s change the color of the earlier 3D graph for PCA component 1 to represent the second PCA component and plot both components in the same plot:

In [19]:
fig = px.scatter_3d((
    train_df
    .pipe(lambda x: x[x["RUL"] <= x.groupby("unit_number")["time_cycles"].max().min()])
    .groupby("unit_number", group_keys=False)
    .apply(lambda x: x.assign(
        rolling_pca_component_1=lambda x: x.pca_component_1.rolling(window=10).mean()
    ))), 
    x="unit_number", 
    y="rolling_pca_component_1", 
    color="pca_component_2", 
    z="RUL"
)
fig.update_traces(marker_size = 3)
fig.update_layout(
    title='Side by side',
    title_x=0.5,
    width=800,
    height=800,
    margin=dict(l=65, r=50, b=65, t=90)
)
fig.show()

###  Part Size: Analyse the correlation 
To finalize the analysis of their relationship, let’s calculate the value of the PCA component 1 and 2 when the RUL is 100 for every time series and then see if there is a correlation:

In [20]:
train_df.query("RUL == 100")[["pca_component_1", "pca_component_2"]].corr()


Unnamed: 0,pca_component_1,pca_component_2
pca_component_1,1.0,-0.462607
pca_component_2,-0.462607,1.0


Indeed there is. Thus, we can conclude that only keeping the first principal component is likely a poor choice.

Conclusion
Exploratory analysis and visualization can be rewarding in many situations to interpret and understand data. It can uncover insights that lead to better modeling, discover weaknesses, lack of information or reveal potential applications. In predictive maintenance, it can be very useful to get an understanding of how well the data describes the degradation phenomenon. In this case, there were clear relationships between many of the sensor measurements and the RUL, but this is not always the case. Sometimes extensive feature extraction is necessary and other times the data acquisition itself needs to be improved due to the lack of predictive power in the data.