

### Original Features:
1. **Type**: This is a categorical variable representing the type of machine or component being monitored. Different types may have different failure characteristics, which could impact the likelihood of failure.

2. **Air temperature [K]**: This is the ambient temperature around the machine, measured in Kelvin. Higher or lower temperatures may affect machine performance or indicate different operational conditions, impacting failure probabilities.

3. **Process temperature [K]**: This is the internal operating temperature of the process within the machine, also in Kelvin. Elevated process temperatures may indicate higher stress levels on components, which can increase the likelihood of failure.

4. **Rotational speed [rpm]**: This represents the machine’s speed of rotation in revolutions per minute. Machines operating at higher speeds may be more prone to wear, especially if they exceed recommended operating ranges.

5. **Torque [Nm]**: The torque applied to the machine, measured in Newton-meters. Torque indicates the rotational force on the machine; excessive torque may lead to mechanical stress and eventual failure.

6. **Tool wear [min]**: This records the cumulative time that a tool has been in use (in minutes). Greater wear may indicate an increased probability of failure or need for replacement, especially if the tool is near the end of its service life.

### Derived (Engineered) Features:
To enhance the predictive accuracy, we’ve derived additional features based on the original columns:

1. **Temperature_diff**: Calculated as the difference between the process temperature and air temperature. This feature can capture the operational stress on the machine in terms of the heat generated internally relative to ambient conditions. A high difference may indicate overheating, which could lead to failure.

2. **Torque_per_rpm**: This feature is the ratio of torque to rotational speed. It provides insight into the load placed on the machine per unit of rotation. High torque per rpm may signal strain on the machine that could increase the probability of failure.

3. **Tool_wear_rate**: This feature calculates the wear rate of the tool by dividing the tool wear time by the process time or operational hours (if available). It helps estimate how quickly the tool is degrading, which can be a good indicator of when the tool is likely to fail.

### Prediction Goal:
The model uses these features to predict the binary outcome of whether a machine failure will occur (`Failure: Yes/No`). By incorporating both raw and derived features, we aim to capture not only the current state of the machine but also the operational and environmental stress factors that may influence failure.


In [12]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
data = pd.read_csv(r"predictive_maintenance.csv")
data.head()

Unnamed: 0,UDI,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Target,Failure Type
0,1,M14860,M,298.1,308.6,1551,42.8,0,0,No Failure
1,2,L47181,L,298.2,308.7,1408,46.3,3,0,No Failure
2,3,L47182,L,298.1,308.5,1498,49.4,5,0,No Failure
3,4,L47183,L,298.2,308.6,1433,39.5,7,0,No Failure
4,5,L47184,L,298.2,308.7,1408,40.0,9,0,No Failure


In [13]:

# Drop unnecessary columns
data = data.drop(columns=["UDI", "Product ID", "Failure Type"])  # Remove irrelevant columns and Failure Type

In [16]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline


# Feature engineering: create new features
data['Temperature_diff'] = data['Process temperature [K]'] - data['Air temperature [K]']  # Temperature difference
data['Torque_per_rpm'] = data['Torque [Nm]'] / data['Rotational speed [rpm]']  # Torque per rpm as a feature
data['Tool_wear_rate'] = data['Tool wear [min]'] / (data['Rotational speed [rpm]'] + 1)  # Tool wear rate

# Prepare features and target variable
X = data.drop(columns=['Target'])  # Features
y = data['Target']                 # Target (failure or not)

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, accuracy_score
from sklearn.impute import SimpleImputer
from sklearn.ensemble import RandomForestClassifier


# Identify categorical and numerical columns
categorical_cols = X.select_dtypes(include=['object']).columns
numerical_cols = X.select_dtypes(exclude=['object']).columns

# Preprocessing for numerical and categorical columns
preprocessor = ColumnTransformer(
    transformers=[
        ('num', make_pipeline(SimpleImputer(strategy='mean'), StandardScaler()), numerical_cols),
        ('cat', make_pipeline(SimpleImputer(strategy='most_frequent'), OneHotEncoder()), categorical_cols)
    ]
)

# Create a logistic regression model pipeline with preprocessing
model = make_pipeline(preprocessor, LogisticRegression(max_iter=500))

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model
model.fit(X_train, y_train)

# Predict on the test set
y_pred = model.predict(X_test)

# Evaluate the model
print("Classification Report:\n", classification_report(y_test, y_pred))
print("Accuracy Score:", accuracy_score(y_test, y_pred))



Classification Report:
               precision    recall  f1-score   support

           0       0.98      1.00      0.99      1939
           1       0.65      0.28      0.39        61

    accuracy                           0.97      2000
   macro avg       0.82      0.64      0.69      2000
weighted avg       0.97      0.97      0.97      2000

Accuracy Score: 0.9735


In [17]:
X_train.columns

Index(['Type', 'Air temperature [K]', 'Process temperature [K]',
       'Rotational speed [rpm]', 'Torque [Nm]', 'Tool wear [min]',
       'Temperature_diff', 'Torque_per_rpm', 'Tool_wear_rate'],
      dtype='object')

In [20]:
from sklearn.metrics import classification_report, roc_auc_score
# Predict probability of failure
y_pred_proba = model.predict_proba(X_test)[:, 1]  # Probability of class 1 (Failure)

# Evaluate the model
print("ROC AUC Score:", roc_auc_score(y_test, y_pred_proba))
print("\nClassification Report:\n", classification_report(y_test, model.predict(X_test)))

# Add probability predictions to a DataFrame for inspection
results = pd.DataFrame(X_test)
results['Failure Probability'] = y_pred_proba
results['Predicted Failure'] = model.predict(X_test)

# Display results
display(results.head(10))

ROC AUC Score: 0.9112352995882617

Classification Report:
               precision    recall  f1-score   support

           0       0.98      1.00      0.99      1939
           1       0.65      0.28      0.39        61

    accuracy                           0.97      2000
   macro avg       0.82      0.64      0.69      2000
weighted avg       0.97      0.97      0.97      2000



Unnamed: 0,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Temperature_diff,Torque_per_rpm,Tool_wear_rate,Failure Probability,Predicted Failure
6252,L,300.8,310.3,1538,36.1,198,9.5,0.023472,0.128655,0.014062,0
4684,M,303.6,311.8,1421,44.8,101,8.2,0.031527,0.071027,0.03054,0
1731,M,298.3,307.9,1485,42.0,117,9.6,0.028283,0.078735,0.006707,0
4742,L,303.3,311.3,1592,33.7,14,8.0,0.021168,0.008788,0.003626,0
4521,L,302.4,310.4,1865,23.9,129,8.0,0.012815,0.069132,0.024163,0
6340,H,300.5,309.9,1397,45.9,210,9.4,0.032856,0.150215,0.03909,0
576,H,297.7,309.7,1440,51.1,191,12.0,0.035486,0.132547,0.011649,0
5202,L,303.7,312.7,1335,51.1,161,9.0,0.038277,0.120509,0.165302,0
6363,M,300.0,309.6,1618,36.2,53,9.6,0.022373,0.032736,0.00205,0
439,M,297.4,308.3,1535,34.6,51,10.9,0.022541,0.033203,0.000312,0


In [21]:
import pickle

# Assuming 'model' is your trained pipeline (e.g., with preprocessing and logistic regression)
with open("failure_prediction_model.pkl", "wb") as f:
    pickle.dump(model, f)

print("Model saved successfully!")


Model saved successfully!
