# Machine Predictive Maintenance

[Dataset Kaggle link](https://www.kaggle.com/datasets/shivamb/machine-predictive-maintenance-classification)

In [37]:
import os
import pickle
import pandas as pd
from sklearn.svm import SVC
from imblearn.over_sampling import SMOTE
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB, MultinomialNB
from sklearn.linear_model import LinearRegression, LogisticRegression   

In [38]:
df = pd.read_csv('predictive_maintenance.csv')
df.head()

Unnamed: 0,UDI,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Target,Failure Type
0,1,M14860,M,298.1,308.6,1551,42.8,0,0,No Failure
1,2,L47181,L,298.2,308.7,1408,46.3,3,0,No Failure
2,3,L47182,L,298.1,308.5,1498,49.4,5,0,No Failure
3,4,L47183,L,298.2,308.6,1433,39.5,7,0,No Failure
4,5,L47184,L,298.2,308.7,1408,40.0,9,0,No Failure


In [39]:
df['Failure Type'].value_counts()

Failure Type
No Failure                  9652
Heat Dissipation Failure     112
Power Failure                 95
Overstrain Failure            78
Tool Wear Failure             45
Random Failures               18
Name: count, dtype: int64

> `Failure Type` has imbalance value_counts

In [40]:
df.Type.value_counts()

Type
L    6000
M    2997
H    1003
Name: count, dtype: int64

> There are three types. `L, M, H`

We need to do one hot encoding `Type`

In [41]:
df1 = pd.get_dummies(df, columns=['Type'])
df1[['Type_H', 'Type_L', 'Type_M']] = df1[['Type_H', 'Type_L', 'Type_M']].astype(int)
df1.head()

Unnamed: 0,UDI,Product ID,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Target,Failure Type,Type_H,Type_L,Type_M
0,1,M14860,298.1,308.6,1551,42.8,0,0,No Failure,0,0,1
1,2,L47181,298.2,308.7,1408,46.3,3,0,No Failure,0,1,0
2,3,L47182,298.1,308.5,1498,49.4,5,0,No Failure,0,1,0
3,4,L47183,298.2,308.6,1433,39.5,7,0,No Failure,0,1,0
4,5,L47184,298.2,308.7,1408,40.0,9,0,No Failure,0,1,0


In [42]:
df1.Target.unique()

array([0, 1])

> `Target` has 1 and 0. It show that Target says whether the machine is failure or not.

> 1 => failed & 0 => not failed

In [43]:
df1[df1.Target == 1]['Failure Type'].value_counts()

Failure Type
Heat Dissipation Failure    112
Power Failure                95
Overstrain Failure           78
Tool Wear Failure            45
No Failure                    9
Name: count, dtype: int64

In [44]:
df1.columns

Index(['UDI', 'Product ID', 'Air temperature [K]', 'Process temperature [K]',
       'Rotational speed [rpm]', 'Torque [Nm]', 'Tool wear [min]', 'Target',
       'Failure Type', 'Type_H', 'Type_L', 'Type_M'],
      dtype='object')

## failed or not

In [45]:
X = df1[['Air temperature [K]', 'Process temperature [K]',
       'Rotational speed [rpm]', 'Torque [Nm]', 'Tool wear [min]',
       'Type_H', 'Type_L', 'Type_M']]
y = df1.Target

Due to imbalance, we need to use `imblearn` module to balance over data

In [46]:
smote = SMOTE()
X_smote, y_smote = smote.fit_resample(X, y)

In [47]:
x_train, x_test, y_train , y_test = train_test_split(X_smote, y_smote, test_size=0.2)

In [48]:
models = [LinearRegression, LogisticRegression,
          DecisionTreeClassifier,RandomForestClassifier,
          KNeighborsClassifier,GaussianNB,
          MultinomialNB,SVC]
names = ['LinearRegression', 'LogisticRegression',
          'DecisionTreeClassifier','RandomForestClassifier',
          'KNeighborsClassifier','GaussianNB',
          'MultinomialNB','SVC']

data = []
for name,model in zip(names,models):
    print(name)
    m = model()
    m.fit(x_train, y_train)
    score = m.score(x_test, y_test)
    data.append([name, score])

LinearRegression
LogisticRegression
DecisionTreeClassifier
RandomForestClassifier


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


KNeighborsClassifier
GaussianNB
MultinomialNB
SVC


In [49]:
data.sort(key = lambda x: x[1], reverse=True)
pd.DataFrame(data, columns=['Model name', 'Score'])

Unnamed: 0,Model name,Score
0,RandomForestClassifier,0.988098
1,DecisionTreeClassifier,0.973351
2,KNeighborsClassifier,0.937904
3,LogisticRegression,0.872445
4,SVC,0.827684
5,GaussianNB,0.756792
6,MultinomialNB,0.646831
7,LinearRegression,0.592166


> By seeing above table `RandomForestClassifier` is good model for this problem

Training the `RandomForestClassifier` with more `n_estimators` and passing total dataset form training.

In [50]:
best_model = RandomForestClassifier(n_estimators=300)
best_model.fit(X_smote, y_smote)

Creating a function that gets x values and return prediction of failure

In [51]:
def is_failure(x):
    df1 = pd.get_dummies(x, columns=['Type'])
    df1[['Type_H', 'Type_L', 'Type_M']] = df1[['Type_H', 'Type_L', 'Type_M']].astype(int)
    return best_model.predict(df1)

In [52]:
df.columns

Index(['UDI', 'Product ID', 'Type', 'Air temperature [K]',
       'Process temperature [K]', 'Rotational speed [rpm]', 'Torque [Nm]',
       'Tool wear [min]', 'Target', 'Failure Type'],
      dtype='object')

In [53]:
x = df[['Type', 'Air temperature [K]',
       'Process temperature [K]', 'Rotational speed [rpm]', 'Torque [Nm]',
       'Tool wear [min]']]
print(x.shape)
accuracy_score(is_failure(x), df.Target)

(10000, 6)


1.0

> Testing with actual data (without Oversampling)

Saving the model

In [54]:
if not os.path.exists('models'):
    os.mkdir('models')

with open('models/is_failure.pkl', 'wb') as f:
    pickle.dump(best_model, f)

## Failure Type

In [55]:
X = df1[['Air temperature [K]', 'Process temperature [K]',
       'Rotational speed [rpm]', 'Torque [Nm]', 'Tool wear [min]',
       'Type_H', 'Type_L', 'Type_M']]
y = df1['Failure Type']

> `Failure Type` has string values. So we need to convert them into int

In [56]:
labelEncoding = {j:i for i,j in enumerate(y.unique())}
inverse = {j:i for i,j in labelEncoding.items()}
y = y.map(labelEncoding)

In [57]:
y

0       0
1       0
2       0
3       0
4       0
       ..
9995    0
9996    0
9997    0
9998    0
9999    0
Name: Failure Type, Length: 10000, dtype: int64

In [58]:
labelEncoding

{'No Failure': 0,
 'Power Failure': 1,
 'Tool Wear Failure': 2,
 'Overstrain Failure': 3,
 'Random Failures': 4,
 'Heat Dissipation Failure': 5}

In [59]:
inverse

{0: 'No Failure',
 1: 'Power Failure',
 2: 'Tool Wear Failure',
 3: 'Overstrain Failure',
 4: 'Random Failures',
 5: 'Heat Dissipation Failure'}

In [60]:
y.map(inverse)

0       No Failure
1       No Failure
2       No Failure
3       No Failure
4       No Failure
           ...    
9995    No Failure
9996    No Failure
9997    No Failure
9998    No Failure
9999    No Failure
Name: Failure Type, Length: 10000, dtype: object

`Failure Type` is also unbalance. So we need to do over sampling

In [61]:
smote = SMOTE()
X_smote, y_smote = smote.fit_resample(X, y)

In [62]:
x_train, x_test, y_train , y_test = train_test_split(X_smote, y_smote, test_size=0.2)

In [63]:
models = [LinearRegression, LogisticRegression,
          DecisionTreeClassifier,RandomForestClassifier,
          KNeighborsClassifier,GaussianNB,
          MultinomialNB]
names = ['LinearRegression', 'LogisticRegression',
          'DecisionTreeClassifier','RandomForestClassifier',
          'KNeighborsClassifier','GaussianNB',
          'MultinomialNB']

data = []
for name,model in zip(names,models):
    print(name)
    m = model()
    m.fit(x_train, y_train)
    score = m.score(x_test, y_test)
    data.append([name, score])

LinearRegression
LogisticRegression


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


DecisionTreeClassifier
RandomForestClassifier
KNeighborsClassifier
GaussianNB
MultinomialNB


In [64]:
data.sort(key = lambda x: x[1], reverse=True)
pd.DataFrame(data, columns=['Model name', 'Score'])

Unnamed: 0,Model name,Score
0,RandomForestClassifier,0.994561
1,DecisionTreeClassifier,0.991108
2,KNeighborsClassifier,0.948977
3,LogisticRegression,0.751619
4,GaussianNB,0.690236
5,MultinomialNB,0.515583
6,LinearRegression,0.321907


> By seeing above table `RandomForestClassifier` is good model for this problem

Training the `RandomForestClassifier` with more `n_estimators` and passing total dataset form training.

In [65]:
best_model = RandomForestClassifier(n_estimators=300)
best_model.fit(X_smote, y_smote)

Creating a function that gets x values and returns prediction values of failure_type

In [66]:
def failure_type(x):
    df1 = pd.get_dummies(x, columns=['Type'])
    df1[['Type_H', 'Type_L', 'Type_M']] = df1[['Type_H', 'Type_L', 'Type_M']].astype(int)
    return best_model.predict(df1)

In [67]:
x = df[['Type', 'Air temperature [K]',
       'Process temperature [K]', 'Rotational speed [rpm]', 'Torque [Nm]',
       'Tool wear [min]']]
print(x.shape)
prediction = failure_type(x)
accuracy_score(df['Failure Type'].map(labelEncoding), prediction)

(10000, 6)


1.0

> Testing with actual data (without Oversampling)

Saving `failure_type` model

In [68]:
with open('models/failure_type.pkl', 'wb') as f:
    pickle.dump(best_model, f)

Saving `inverse` dictionary. It is used to convert the int values of predicted values into strings

In [69]:
with open('models/encoding.pkl', 'wb') as f:
    pickle.dump(inverse, f)

In [73]:
df_test = pd.Series(prediction).map(inverse)

> Using the above code we can convert that predicted values into failure_type using `inverse`