                         PREDICITVE MAINTENANCE

1. Data Loading:
Load the dataset into a pandas DataFrame.

In [63]:
import pandas as pd

# Loading the dataset
file_path = 'ai4i2020.csv'
df = pd.read_csv(file_path)

2. Data Exploration:
Display basic information about the datase and descriptive statistics.

In [64]:
# Displaying basic information about the dataset
print(df.info())

# Displaying descriptive statistics
print(df.describe())

# Checking the first few rows of the dataset
print(df.head(5))

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 13 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   UID                      10000 non-null  int64  
 1   Product ID               10000 non-null  object 
 2   Type                     10000 non-null  object 
 3   Air temperature [K]      10000 non-null  float64
 4   Process temperature [K]  10000 non-null  float64
 5   Rotational speed [rpm]   10000 non-null  int64  
 6   Torque [Nm]              10000 non-null  float64
 7   Tool wear [min]          10000 non-null  int64  
 8   TWF                      10000 non-null  int64  
 9   HDF                      10000 non-null  int64  
 10  PWF                      10000 non-null  int64  
 11  OSF                      10000 non-null  int64  
 12  RNF                      10000 non-null  int64  
dtypes: float64(3), int64(8), object(2)
memory usage: 1015.8+ KB
None
            

3. Data Preprocessing:
Check for missing valuesand 
Convert categorical variables into numerical format.

In [65]:
# Checking for missing values
print(df.isnull().sum())

# Converting categorical variables into numerical format, replacing 'L' with 1, 'M' with 2 and 'H' with 3 in a new column
df['Product_quality'] = df['Product ID'].apply(lambda x: {'L': 1, 'M': 2, 'H': 3}[x[0]])
df.head(5)

UID                        0
Product ID                 0
Type                       0
Air temperature [K]        0
Process temperature [K]    0
Rotational speed [rpm]     0
Torque [Nm]                0
Tool wear [min]            0
TWF                        0
HDF                        0
PWF                        0
OSF                        0
RNF                        0
dtype: int64


Unnamed: 0,UID,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],TWF,HDF,PWF,OSF,RNF,Product_quality
0,1,M14860,M,298.1,308.6,1551,42.8,0,0,0,0,0,0,2
1,2,L47181,L,298.2,308.7,1408,46.3,3,0,0,0,0,0,1
2,3,L47182,L,298.1,308.5,1498,49.4,5,0,0,0,0,0,1
3,4,L47183,L,298.2,308.6,1433,39.5,7,0,0,0,0,0,1
4,5,L47184,L,298.2,308.7,1408,40.0,9,0,0,0,0,0,1


4. Feature Engineering:
Create additional features.

In [66]:
# Calculating the difference between air and process temperature
df['Temperature_difference'] = df['Process temperature [K]'] - df['Air temperature [K]']

# Creating a binary label for machine failure (1 for failure, 0 for no failure)
df['Machine failure'] = df.apply(lambda row: 1 if any([row['TWF'], row['HDF'], row['PWF'], row['OSF'], row['RNF']]) else 0, axis=1)
df.head(5)

Unnamed: 0,UID,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],TWF,HDF,PWF,OSF,RNF,Product_quality,Temperature_difference,Machine failure
0,1,M14860,M,298.1,308.6,1551,42.8,0,0,0,0,0,0,2,10.5,0
1,2,L47181,L,298.2,308.7,1408,46.3,3,0,0,0,0,0,1,10.5,0
2,3,L47182,L,298.1,308.5,1498,49.4,5,0,0,0,0,0,1,10.4,0
3,4,L47183,L,298.2,308.6,1433,39.5,7,0,0,0,0,0,1,10.4,0
4,5,L47184,L,298.2,308.7,1408,40.0,9,0,0,0,0,0,1,10.5,0


5. Model Training and Evaluation:
Split the dataset into features (X) and target variable (y).
Split the data into training and testing sets.
Train a predictive model to predict machine failures.

In [67]:
# Splitting the dataset into features and target variable
X = df.drop(['UID', 'Product ID', 'Type', 'Machine failure'], axis=1)
y = df['Machine failure']

# Splitting the data into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Training a predictive model (RandomForestClassifier)
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X_train, y_train)

# Evaluate model performance
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:")
print(classification_report(y_test, y_pred))
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

Accuracy: 1.0
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      1935
           1       1.00      1.00      1.00        65

    accuracy                           1.00      2000
   macro avg       1.00      1.00      1.00      2000
weighted avg       1.00      1.00      1.00      2000

Confusion Matrix:
[[1935    0]
 [   0   65]]


6. Model Deployment:
Save the trained model for future use.
Integrate the model into a production environment for making predictions.

In [68]:
# Saving the trained model
import joblib
joblib.dump(model, 'predictive_maintenance_model.pkl')

# Load the model
# model = joblib.load('predictive_maintenance_model.pkl')

# Use the model to make predictions on new data
# new_data_predictions = model.predict(new_data)

['predictive_maintenance_model.pkl']