# Predictive Maintenance for Centrifugal Pumps
## Project Description
This project focuses on developing a Predictive Maintenance System for Centrifugal Pumps used in chemical industries. By leveraging machine learning algorithms and sensor data, the goal is to predict potential failures before they occur, optimizing maintenance schedules, minimizing downtime, and reducing operational costs.

## Dataset Parameters
The dataset simulates operational data collected from centrifugal pumps. Key parameters include:

- **Air Temperature [K]**: Ambient temperature near the equipment.
- **Process Temperature [K]:** Temperature of the fluid being pumped.
- **Rotational Speed [rpm]:** Speed of the pump impeller.
- **Torque [Nm]:** Motor torque applied to drive the pump.
- **Tool Wear [min]:** Cumulative wear of critical components like bearings and impellers.
- **Target:** Indicator of pump failure.
- **Failure Type:** Categorized into:
       No Failure
       Power Failure
       Tool Wear Failure
       Overstrain Failure
       Random Failures
       Heat Dissipation Failure

## What Are Centrifugal Pumps?
Centrifugal pumps are mechanical devices designed to move fluids by converting rotational kinetic energy from a motor into hydrodynamic energy. They operate based on centrifugal force, where the rotation of an impeller increases the fluid's velocity and pressure.

## Uses in Chemical Industries
- **Fluid Transfer:** Transporting chemicals, solvents, and process liquids across different units.
- **Reaction Processes:** Circulating reactants in chemical reactors.
- **Cooling Systems:** Pumping cooling water in heat exchangers.
- **Filtration Systems:** Driving fluids through filtration units.
Centrifugal pumps are indispensable in chemical manufacturing, ensuring smooth and efficient operations.

## Predictive Maintenance
Predictive maintenance is a proactive strategy that uses data analysis tools and techniques to identify potential equipment failures before they occur. Unlike reactive or preventive maintenance, it optimizes maintenance schedules by predicting the actual condition of equipment.

## How It Works:
- **Data Collection:** Sensors monitor critical parameters like speed, temperature, and torque.
- **Data Analysis:** Historical data is analyzed to identify patterns and anomalies.
- **Machine Learning Models:** Algorithms predict the likelihood of failures based on sensor data.
- **Actionable Insights:** Maintenance teams are alerted to repair or replace components proactively.
## Why Is It Crucial and Beneficial?
- **Reduces Downtime:** Minimizes unexpected breakdowns.
- **Cost-Effective:** Prevents over-maintenance and reduces repair costs.
- **Improves Safety:** Avoids catastrophic failures that could endanger workers or the environment.
- **Enhances Efficiency:** Ensures optimal equipment performance.

## Current Industry Practices
Industries are increasingly adopting machine learning for predictive maintenance. Tools like anomaly detection, time-series forecasting, and classification models are integrated with IoT-enabled systems to monitor and maintain equipment health.

- **Case Studies:** Companies like GE and Siemens deploy AI-driven predictive maintenance solutions for pumps and compressors.
- **Real-Time Monitoring:** Systems continuously monitor sensor data and predict failures using cloud-based platforms.
- **Scalable Solutions:** Machine learning models adapt to different equipment and environments.\




  Approach for the problem

  so it's a classification problem we are here predicting whether a centrifugal pump is going to failure or not in certain condtions and parameters
  as we have to predict failure or no failure along with which type of failure it is we will develop two step model

  first one will predict failure or no failure then second to determine what type of failure it is 


In [2]:
import numpy as np 
import pandas as pd
import plotly.express as px
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")


sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (10, 8)
matplotlib.rcParams['figure.facecolor'] = '#00000000'

In [3]:
df = pd.read_csv('predictive_maintenance.csv')
df

Unnamed: 0,UDI,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Target,Failure Type
0,1,M14860,M,298.1,308.6,1551,42.8,0,0,No Failure
1,2,L47181,L,298.2,308.7,1408,46.3,3,0,No Failure
2,3,L47182,L,298.1,308.5,1498,49.4,5,0,No Failure
3,4,L47183,L,298.2,308.6,1433,39.5,7,0,No Failure
4,5,L47184,L,298.2,308.7,1408,40.0,9,0,No Failure
...,...,...,...,...,...,...,...,...,...,...
9995,9996,M24855,M,298.8,308.4,1604,29.5,14,0,No Failure
9996,9997,H39410,H,298.9,308.4,1632,31.8,17,0,No Failure
9997,9998,M24857,M,299.0,308.6,1645,33.4,22,0,No Failure
9998,9999,H39412,H,299.0,308.7,1408,48.5,25,0,No Failure


In [4]:
df_copy = df.copy()

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 10 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   UDI                      10000 non-null  int64  
 1   Product ID               10000 non-null  object 
 2   Type                     10000 non-null  object 
 3   Air temperature [K]      10000 non-null  float64
 4   Process temperature [K]  10000 non-null  float64
 5   Rotational speed [rpm]   10000 non-null  int64  
 6   Torque [Nm]              10000 non-null  float64
 7   Tool wear [min]          10000 non-null  int64  
 8   Target                   10000 non-null  int64  
 9   Failure Type             10000 non-null  object 
dtypes: float64(3), int64(4), object(3)
memory usage: 781.4+ KB


1. We don't have null values
2. Three main data types

In [6]:
df.describe()

Unnamed: 0,UDI,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Target
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,5000.5,300.00493,310.00556,1538.7761,39.98691,107.951,0.0339
std,2886.89568,2.000259,1.483734,179.284096,9.968934,63.654147,0.180981
min,1.0,295.3,305.7,1168.0,3.8,0.0,0.0
25%,2500.75,298.3,308.8,1423.0,33.2,53.0,0.0
50%,5000.5,300.1,310.1,1503.0,40.1,108.0,0.0
75%,7500.25,301.5,311.1,1612.0,46.8,162.0,0.0
max,10000.0,304.5,313.8,2886.0,76.6,253.0,1.0


1. The data looks good with resonable max and min values 

## Exploratory Analysis and Visualization

In [7]:
fig = px.histogram(df,x='Air temperature [K]',marginal='box',nbins = 100,title='Distrubtion of Air temperature [K]')
fig.update_layout(bargap=0.2)
fig.show()

In [8]:
fig = px.histogram(df,x='Process temperature [K]',marginal='box',nbins = 100,title='Distrubtion of Air temperature [K]',color_discrete_sequence=['green'])
fig.update_layout(bargap=0.2)
fig.show()

From above Plotings we can observe the max temp[K] noted by senrors 

In [9]:
import plotly.express as px
df['Target'] = df['Target'].astype(int)


df['Target'] = df['Target'].astype(str)
fig = px.scatter(df, 
                 x='Rotational speed [rpm]', 
                 y='Torque [Nm]', 
                 opacity=1, 
                 title='RPM VS NM', 
                 color='Target',
                 color_discrete_sequence=['blue','red'])


fig.update_traces(marker=dict(size=2))


fig.update_layout(width=1200, height=700)

fig.show()


In [10]:
df['Target'].unique()

array(['0', '1'], dtype=object)

In [11]:
import plotly.express as px
df['Target'] = df['Target'].astype(int)


df['Target'] = df['Target'].astype(str)
fig = px.scatter(df, 
                 x='Process temperature [K]', 
                 y='Torque [Nm]', 
                 opacity=1, 
                 title='Process temperature [K] VS Torque', 
                 color='Target',
                 color_discrete_sequence=['blue','red'])


fig.update_traces(marker=dict(size=2))


fig.update_layout(width=1000, height=700)

fig.show()


In [12]:
import plotly.express as px
df['Target'] = df['Target'].astype(int)


df['Target'] = df['Target'].astype(str)
fig = px.scatter(df, 
                 x='Air temperature [K]', 
                 y='Torque [Nm]', 
                 opacity=1, 
                 title='Air temperature [K] VS Torque', 
                 color='Target',
                 color_discrete_sequence=['blue','red'])


fig.update_traces(marker=dict(size=2))


fig.update_layout(width=1000, height=700)

fig.show()


In [13]:
import plotly.express as px
df['Target'] = df['Target'].astype(int)

# Convert 'Target' to categorical (optional but recommended)
df['Target'] = df['Target'].astype(str)
fig = px.scatter(df, 
                 x='Rotational speed [rpm]', 
                 y='Torque [Nm]', 
                 opacity=1, 
                 title='Rotational speed [rpm] VS Torque', 
                 color='Failure Type',
                 )

# Set marker size
fig.update_traces(marker=dict(size=2))

# Set figure size
fig.update_layout(width=1000, height=700)

fig.show()


## FEATURE ENGINEERING 


In [14]:
df

Unnamed: 0,UDI,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Target,Failure Type
0,1,M14860,M,298.1,308.6,1551,42.8,0,0,No Failure
1,2,L47181,L,298.2,308.7,1408,46.3,3,0,No Failure
2,3,L47182,L,298.1,308.5,1498,49.4,5,0,No Failure
3,4,L47183,L,298.2,308.6,1433,39.5,7,0,No Failure
4,5,L47184,L,298.2,308.7,1408,40.0,9,0,No Failure
...,...,...,...,...,...,...,...,...,...,...
9995,9996,M24855,M,298.8,308.4,1604,29.5,14,0,No Failure
9996,9997,H39410,H,298.9,308.4,1632,31.8,17,0,No Failure
9997,9998,M24857,M,299.0,308.6,1645,33.4,22,0,No Failure
9998,9999,H39412,H,299.0,308.7,1408,48.5,25,0,No Failure


As we are going to train the first model for target column we will drop failure type column 

In [15]:
df['rolling_mean_rpm'] = df['Rotational speed [rpm]'].rolling(window=10,min_periods=1).mean()
df['rolling_std_rpm'] = df['Rotational speed [rpm]'].rolling(window=10,min_periods=1).std()
df['rolling_var_rpm'] = df['Rotational speed [rpm]'].rolling(window=10,min_periods=1).var()


In [16]:
df

Unnamed: 0,UDI,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Target,Failure Type,rolling_mean_rpm,rolling_std_rpm,rolling_var_rpm
0,1,M14860,M,298.1,308.6,1551,42.8,0,0,No Failure,1551.000000,,
1,2,L47181,L,298.2,308.7,1408,46.3,3,0,No Failure,1479.500000,101.116270,10224.500000
2,3,L47182,L,298.1,308.5,1498,49.4,5,0,No Failure,1485.666667,72.293384,5226.333333
3,4,L47183,L,298.2,308.6,1433,39.5,7,0,No Failure,1472.500000,64.634872,4177.666667
4,5,L47184,L,298.2,308.7,1408,40.0,9,0,No Failure,1459.600000,62.970628,3965.300000
...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,9996,M24855,M,298.8,308.4,1604,29.5,14,0,No Failure,1583.200000,131.944264,17409.288889
9996,9997,H39410,H,298.9,308.4,1632,31.8,17,0,No Failure,1595.700000,129.827278,16855.122222
9997,9998,M24857,M,299.0,308.6,1645,33.4,22,0,No Failure,1610.200000,125.991887,15873.955556
9998,9999,H39412,H,299.0,308.7,1408,48.5,25,0,No Failure,1573.900000,126.805582,16079.655556


In [17]:
df['torque_trend'] = df['Torque [Nm]'].diff()
df['temperature_trend'] = df['Process temperature [K]'].diff()


In [18]:
df

Unnamed: 0,UDI,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Target,Failure Type,rolling_mean_rpm,rolling_std_rpm,rolling_var_rpm,torque_trend,temperature_trend
0,1,M14860,M,298.1,308.6,1551,42.8,0,0,No Failure,1551.000000,,,,
1,2,L47181,L,298.2,308.7,1408,46.3,3,0,No Failure,1479.500000,101.116270,10224.500000,3.5,0.1
2,3,L47182,L,298.1,308.5,1498,49.4,5,0,No Failure,1485.666667,72.293384,5226.333333,3.1,-0.2
3,4,L47183,L,298.2,308.6,1433,39.5,7,0,No Failure,1472.500000,64.634872,4177.666667,-9.9,0.1
4,5,L47184,L,298.2,308.7,1408,40.0,9,0,No Failure,1459.600000,62.970628,3965.300000,0.5,0.1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,9996,M24855,M,298.8,308.4,1604,29.5,14,0,No Failure,1583.200000,131.944264,17409.288889,1.6,0.1
9996,9997,H39410,H,298.9,308.4,1632,31.8,17,0,No Failure,1595.700000,129.827278,16855.122222,2.3,0.0
9997,9998,M24857,M,299.0,308.6,1645,33.4,22,0,No Failure,1610.200000,125.991887,15873.955556,1.6,0.2
9998,9999,H39412,H,299.0,308.7,1408,48.5,25,0,No Failure,1573.900000,126.805582,16079.655556,15.1,0.1


In [19]:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
df['Failure Type Encoded'] = le.fit_transform(df['Failure Type'])
dict(zip(le.classes_, le.transform(le.classes_)))



{'Heat Dissipation Failure': np.int64(0),
 'No Failure': np.int64(1),
 'Overstrain Failure': np.int64(2),
 'Power Failure': np.int64(3),
 'Random Failures': np.int64(4),
 'Tool Wear Failure': np.int64(5)}

In [20]:
df = df.drop('Failure Type', axis=1)

In [21]:
df

Unnamed: 0,UDI,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Target,rolling_mean_rpm,rolling_std_rpm,rolling_var_rpm,torque_trend,temperature_trend,Failure Type Encoded
0,1,M14860,M,298.1,308.6,1551,42.8,0,0,1551.000000,,,,,1
1,2,L47181,L,298.2,308.7,1408,46.3,3,0,1479.500000,101.116270,10224.500000,3.5,0.1,1
2,3,L47182,L,298.1,308.5,1498,49.4,5,0,1485.666667,72.293384,5226.333333,3.1,-0.2,1
3,4,L47183,L,298.2,308.6,1433,39.5,7,0,1472.500000,64.634872,4177.666667,-9.9,0.1,1
4,5,L47184,L,298.2,308.7,1408,40.0,9,0,1459.600000,62.970628,3965.300000,0.5,0.1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,9996,M24855,M,298.8,308.4,1604,29.5,14,0,1583.200000,131.944264,17409.288889,1.6,0.1,1
9996,9997,H39410,H,298.9,308.4,1632,31.8,17,0,1595.700000,129.827278,16855.122222,2.3,0.0,1
9997,9998,M24857,M,299.0,308.6,1645,33.4,22,0,1610.200000,125.991887,15873.955556,1.6,0.2,1
9998,9999,H39412,H,299.0,308.7,1408,48.5,25,0,1573.900000,126.805582,16079.655556,15.1,0.1,1


In [22]:
df = df.drop(columns=['UDI', 'Product ID'])


In [23]:
df

Unnamed: 0,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Target,rolling_mean_rpm,rolling_std_rpm,rolling_var_rpm,torque_trend,temperature_trend,Failure Type Encoded
0,M,298.1,308.6,1551,42.8,0,0,1551.000000,,,,,1
1,L,298.2,308.7,1408,46.3,3,0,1479.500000,101.116270,10224.500000,3.5,0.1,1
2,L,298.1,308.5,1498,49.4,5,0,1485.666667,72.293384,5226.333333,3.1,-0.2,1
3,L,298.2,308.6,1433,39.5,7,0,1472.500000,64.634872,4177.666667,-9.9,0.1,1
4,L,298.2,308.7,1408,40.0,9,0,1459.600000,62.970628,3965.300000,0.5,0.1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,M,298.8,308.4,1604,29.5,14,0,1583.200000,131.944264,17409.288889,1.6,0.1,1
9996,H,298.9,308.4,1632,31.8,17,0,1595.700000,129.827278,16855.122222,2.3,0.0,1
9997,M,299.0,308.6,1645,33.4,22,0,1610.200000,125.991887,15873.955556,1.6,0.2,1
9998,H,299.0,308.7,1408,48.5,25,0,1573.900000,126.805582,16079.655556,15.1,0.1,1


In [24]:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
df['Quality Type Encoded'] = le.fit_transform(df['Type'])


In [25]:
df_1 = df.copy()

In [26]:
df

Unnamed: 0,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Target,rolling_mean_rpm,rolling_std_rpm,rolling_var_rpm,torque_trend,temperature_trend,Failure Type Encoded,Quality Type Encoded
0,M,298.1,308.6,1551,42.8,0,0,1551.000000,,,,,1,2
1,L,298.2,308.7,1408,46.3,3,0,1479.500000,101.116270,10224.500000,3.5,0.1,1,1
2,L,298.1,308.5,1498,49.4,5,0,1485.666667,72.293384,5226.333333,3.1,-0.2,1,1
3,L,298.2,308.6,1433,39.5,7,0,1472.500000,64.634872,4177.666667,-9.9,0.1,1,1
4,L,298.2,308.7,1408,40.0,9,0,1459.600000,62.970628,3965.300000,0.5,0.1,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,M,298.8,308.4,1604,29.5,14,0,1583.200000,131.944264,17409.288889,1.6,0.1,1,2
9996,H,298.9,308.4,1632,31.8,17,0,1595.700000,129.827278,16855.122222,2.3,0.0,1,0
9997,M,299.0,308.6,1645,33.4,22,0,1610.200000,125.991887,15873.955556,1.6,0.2,1,2
9998,H,299.0,308.7,1408,48.5,25,0,1573.900000,126.805582,16079.655556,15.1,0.1,1,0


In [27]:
df = df.drop('Type', axis =1)

In [28]:
df

Unnamed: 0,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Target,rolling_mean_rpm,rolling_std_rpm,rolling_var_rpm,torque_trend,temperature_trend,Failure Type Encoded,Quality Type Encoded
0,298.1,308.6,1551,42.8,0,0,1551.000000,,,,,1,2
1,298.2,308.7,1408,46.3,3,0,1479.500000,101.116270,10224.500000,3.5,0.1,1,1
2,298.1,308.5,1498,49.4,5,0,1485.666667,72.293384,5226.333333,3.1,-0.2,1,1
3,298.2,308.6,1433,39.5,7,0,1472.500000,64.634872,4177.666667,-9.9,0.1,1,1
4,298.2,308.7,1408,40.0,9,0,1459.600000,62.970628,3965.300000,0.5,0.1,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,298.8,308.4,1604,29.5,14,0,1583.200000,131.944264,17409.288889,1.6,0.1,1,2
9996,298.9,308.4,1632,31.8,17,0,1595.700000,129.827278,16855.122222,2.3,0.0,1,0
9997,299.0,308.6,1645,33.4,22,0,1610.200000,125.991887,15873.955556,1.6,0.2,1,2
9998,299.0,308.7,1408,48.5,25,0,1573.900000,126.805582,16079.655556,15.1,0.1,1,0


In [29]:
df = df.drop(index=0).reset_index(drop=True)


In [30]:
df = df.drop('Failure Type Encoded', axis =1)

In [31]:
df

Unnamed: 0,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Target,rolling_mean_rpm,rolling_std_rpm,rolling_var_rpm,torque_trend,temperature_trend,Quality Type Encoded
0,298.2,308.7,1408,46.3,3,0,1479.500000,101.116270,10224.500000,3.5,0.1,1
1,298.1,308.5,1498,49.4,5,0,1485.666667,72.293384,5226.333333,3.1,-0.2,1
2,298.2,308.6,1433,39.5,7,0,1472.500000,64.634872,4177.666667,-9.9,0.1,1
3,298.2,308.7,1408,40.0,9,0,1459.600000,62.970628,3965.300000,0.5,0.1,1
4,298.1,308.6,1425,41.9,11,0,1453.833333,58.066915,3371.766667,1.9,-0.1,2
...,...,...,...,...,...,...,...,...,...,...,...,...
9994,298.8,308.4,1604,29.5,14,0,1583.200000,131.944264,17409.288889,1.6,0.1,2
9995,298.9,308.4,1632,31.8,17,0,1595.700000,129.827278,16855.122222,2.3,0.0,0
9996,299.0,308.6,1645,33.4,22,0,1610.200000,125.991887,15873.955556,1.6,0.2,2
9997,299.0,308.7,1408,48.5,25,0,1573.900000,126.805582,16079.655556,15.1,0.1,0


In [32]:
from sklearn.model_selection import train_test_split

X = df.drop(columns=['Target'])
y = df['Target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

X_train.columns = X_train.columns.str.replace(r"[\[\]<>]", "", regex=True)
X_test.columns = X_test.columns.str.replace(r"[\[\]<>]", "", regex=True)

from xgboost import XGBClassifier


y_train = y_train.astype(int)
y_test = y_test.astype(int)


model = XGBClassifier(n_estimators=100, learning_rate=0.05)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

from sklearn.metrics import classification_report, confusion_matrix, accuracy_score


y_pred = model.predict(X_test)


print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

print("\nAccuracy Score:", accuracy_score(y_test, y_pred))


Confusion Matrix:
[[1930    6]
 [  27   37]]

Classification Report:
              precision    recall  f1-score   support

           0       0.99      1.00      0.99      1936
           1       0.86      0.58      0.69        64

    accuracy                           0.98      2000
   macro avg       0.92      0.79      0.84      2000
weighted avg       0.98      0.98      0.98      2000


Accuracy Score: 0.9835


Here we are getting good score so lets give some input data to how it will perform on unseen data

In [33]:
import numpy as np

custom_data = np.array([[298.2, 202.3, 3200,35.0,2,1300.500000, 300,5222,-0.1,1,1]])  


custom_pred = model.predict(custom_data)

print("Predicted Failure Type:", custom_pred[0])


Predicted Failure Type: 0


Lets see another model to do the same 

In [34]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score


X = df.drop(columns=['Target'])
y = df['Target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)


X_train.columns = X_train.columns.str.replace(r"[\[\]<>]", "", regex=True)
X_test.columns = X_test.columns.str.replace(r"[\[\]<>]", "", regex=True)


y_train = y_train.astype(int)
y_test = y_test.astype(int)


model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)


y_pred = model.predict(X_test)


print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

print("\nAccuracy Score:", accuracy_score(y_test, y_pred))


Confusion Matrix:
[[2416    3]
 [  42   39]]

Classification Report:
              precision    recall  f1-score   support

           0       0.98      1.00      0.99      2419
           1       0.93      0.48      0.63        81

    accuracy                           0.98      2500
   macro avg       0.96      0.74      0.81      2500
weighted avg       0.98      0.98      0.98      2500


Accuracy Score: 0.982


### As we can we see successfully trained two machine learning models with scores [ 0.9835, 0.982] and but our ultimate target is to find the failure type also so we will train one more model if first model predict failure then we will predict what type of failure actually it is 

In [35]:
df_1

Unnamed: 0,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Target,rolling_mean_rpm,rolling_std_rpm,rolling_var_rpm,torque_trend,temperature_trend,Failure Type Encoded,Quality Type Encoded
0,M,298.1,308.6,1551,42.8,0,0,1551.000000,,,,,1,2
1,L,298.2,308.7,1408,46.3,3,0,1479.500000,101.116270,10224.500000,3.5,0.1,1,1
2,L,298.1,308.5,1498,49.4,5,0,1485.666667,72.293384,5226.333333,3.1,-0.2,1,1
3,L,298.2,308.6,1433,39.5,7,0,1472.500000,64.634872,4177.666667,-9.9,0.1,1,1
4,L,298.2,308.7,1408,40.0,9,0,1459.600000,62.970628,3965.300000,0.5,0.1,1,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,M,298.8,308.4,1604,29.5,14,0,1583.200000,131.944264,17409.288889,1.6,0.1,1,2
9996,H,298.9,308.4,1632,31.8,17,0,1595.700000,129.827278,16855.122222,2.3,0.0,1,0
9997,M,299.0,308.6,1645,33.4,22,0,1610.200000,125.991887,15873.955556,1.6,0.2,1,2
9998,H,299.0,308.7,1408,48.5,25,0,1573.900000,126.805582,16079.655556,15.1,0.1,1,0


In [36]:
df_1 = df_1.drop(index=0).reset_index(drop=True)

In [37]:
df_1['Failure Type Encoded'].unique()

array([1, 3, 5, 2, 4, 0])

In [38]:
df_1.sample(30)

Unnamed: 0,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Target,rolling_mean_rpm,rolling_std_rpm,rolling_var_rpm,torque_trend,temperature_trend,Failure Type Encoded,Quality Type Encoded
6477,L,300.5,309.8,1663,29.1,145,1,1533.6,147.153133,21654.044444,4.5,0.1,1,1
4111,H,301.9,310.5,1472,39.7,187,0,1541.7,177.636989,31554.9,-1.3,-0.1,1,0
9789,L,298.6,309.5,1483,43.2,92,0,1533.4,131.873003,17390.488889,14.8,-0.1,1,1
4139,L,301.7,310.2,1331,61.2,47,1,1526.7,176.893596,31291.344444,30.4,0.0,0,1
5891,L,301.4,311.1,1472,44.5,159,0,1500.5,161.799773,26179.166667,-8.8,0.1,1,1
1390,L,298.9,310.2,2737,8.8,142,1,1650.9,405.805221,164677.877778,-46.0,0.0,3,1
7701,M,300.7,311.7,1701,30.8,44,0,1501.1,196.503576,38613.655556,-7.9,0.0,1,2
2563,L,299.3,309.1,1493,38.0,151,0,1520.1,147.061174,21626.988889,0.8,0.0,1,1
2541,H,299.2,308.9,1712,26.4,95,0,1495.7,142.329079,20257.566667,-6.2,0.1,1,0
5529,M,302.3,311.8,1355,50.1,104,0,1470.0,93.260686,8697.555556,10.2,0.0,1,2


In [39]:
df_1 = df_1.drop('Type', axis =1)

In [40]:
df_1 = df_1.drop('Target', axis=1)

In [41]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score


X = df_1.drop(columns=['Failure Type Encoded'])
y = df_1['Failure Type Encoded']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)


X_train.columns = X_train.columns.str.replace(r"[\[\]<>]", "", regex=True)
X_test.columns = X_test.columns.str.replace(r"[\[\]<>]", "", regex=True)


y_train = y_train.astype(int)
y_test = y_test.astype(int)


failure_type_encoded_model = RandomForestClassifier(n_estimators=100, random_state=42)
failure_type_encoded_model.fit(X_train, y_train)


y_pred = failure_type_encoded_model.predict(X_test)


print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

print("\nAccuracy Score:", accuracy_score(y_test, y_pred))


Confusion Matrix:
[[  10   15    0    0    0    0]
 [   1 2415    0    0    0    0]
 [   0   13    7    1    0    0]
 [   0    5    0   16    0    0]
 [   0    4    0    0    0    0]
 [   0   12    1    0    0    0]]

Classification Report:
              precision    recall  f1-score   support

           0       0.91      0.40      0.56        25
           1       0.98      1.00      0.99      2416
           2       0.88      0.33      0.48        21
           3       0.94      0.76      0.84        21
           4       0.00      0.00      0.00         4
           5       0.00      0.00      0.00        13

    accuracy                           0.98      2500
   macro avg       0.62      0.42      0.48      2500
weighted avg       0.97      0.98      0.97      2500


Accuracy Score: 0.9792


In [42]:
import numpy as np


custom_data = np.array([[302.0,309.9, 38,57.6,197,1527.6,175.857392,30925.822222,30.4, 0.0,0]])

target_pred = model.predict(custom_data)

print("Predicted Target (Failure or Not):", target_pred[0])


if target_pred[0] == 1:
   
    failure_type_pred = failure_type_encoded_model.predict(custom_data)

    
    failure_types_encoded = {
        0: "Heat Dissipation Failure",
        1: "No Failure",
        2: "Overstrain Failure",
        3: "Power Failure",
        4: "Random Failures",
        5: "Tool Wear Failure",
    }

    
    print("Predicted Failure Type:", failure_types_encoded.get(failure_type_pred[0], "Unknown Failure Type"))
else:
    print("No failure detected.")


Predicted Target (Failure or Not): 1
Predicted Failure Type: Heat Dissipation Failure


In [43]:
import numpy as np


custom_data = np.array([[300.3,309.9,1394,46.7,210,1492.4,72.9216,5317.600000,-5.4, 0.0,0]])

target_pred = model.predict(custom_data)

print("Predicted Target (Failure or Not):", target_pred[0])


if target_pred[0] == 1:
   
    failure_type_pred = failure_type_encoded_model.predict(custom_data)

    
    failure_types_encoded = {
        0: "Heat Dissipation Failure",
        1: "No Failure",
        2: "Overstrain Failure",
        3: "Power Failure",
        4: "Random Failures",
        5: "Tool Wear Failure",
    }

    
    print("Predicted Failure Type:", failure_types_encoded.get(failure_type_pred[0], "Unknown Failure Type"))
else:
    print("No failure detected.")


Predicted Target (Failure or Not): 1
Predicted Failure Type: Tool Wear Failure


## Real-Time Predictive Maintenance System: How It Works

In a real-time predictive maintenance system, the model continuously receives input from the sensors on the equipment (like pumps), processes the data, and compares it against the patterns it has learned during training. Here's a step-by-step breakdown of how it works:

### 1. Real-Time Data Collection:
- Sensors on the equipment (e.g., centrifugal pumps) collect real-time operational data like rotational speed (RPM), temperature, torque, vibration, and other relevant parameters.
- This data is continuously transmitted to a central system or cloud platform via an IoT network.

### 2. Input to the Model:
- **Preprocessing**: The raw sensor data might be preprocessed (for example, by calculating rolling means, standard deviations, or trends, as discussed earlier).
- This preprocessed data becomes the input to the machine learning model. Every time new data is collected, it serves as fresh input to the model for analysis.

### 3. Model Comparison & Prediction:
- The trained machine learning model continuously compares the incoming real-time data to the patterns and trends it learned during the training phase.
- The model checks whether the current values (e.g., RPM, temperature, torque) match those associated with normal operation or indicate signs of impending failure.
  
  **For example**:
  - If the RPM deviates from the expected rolling mean, it might signal that the pump is operating inefficiently, which could lead to failure.
  - If the temperature is rising unusually or fluctuating, it could indicate overheating or a malfunction.

### 4. Failure Prediction:
- Based on the comparison, the model makes a real-time prediction:
  - **"No Failure"**: If the system detects that the equipment is operating normally.
  - **"Failure Predicted"**: If the model detects signs that suggest a potential failure within a specific time window (e.g., 24 hours, 48 hours).
- The model uses its learned thresholds (like high RPM, high torque, etc.) or patterns to determine whether an alert should be triggered.

### 5. Alert/Action:
- If the model predicts a failure or detects anomalies, it alerts operators or triggers a maintenance action. The system could issue an alert like:
  - "Warning: Torque is higher than expected, indicating a potential blockage or resistance."
  - "Warning: Temperature is increasing rapidly, indicating overheating."
- Operators or maintenance teams can then take action to prevent failure, such as adjusting the pump, performing a quick inspection, or scheduling downtime for repairs.

### 6. Continuous Monitoring:
- This process happens continuously, with the system constantly comparing the latest data to the model's predictions, ensuring that the equipment is always being monitored for potential issues.
- The model is always "on," updating its predictions as new data comes in.
