# Predicting machine failure

Predicting machine failure through machine learning is paramount for efficient industrial operations. By analysing data patterns, ML models can forecast potential breakdowns, allowing for proactive maintenance and minimising downtime. This predictive approach not only enhances equipment reliability but also optimises resource allocation.

## Scope

## Model performance metrics

## Import dependencies

In [1]:
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import random
import seaborn as sns

import sys
sys.path.insert(0, "C:\\Users\\billy\\OneDrive\\Documents\\Python Scripts\\1. Portfolio\\machine-failure\\machine-failure")
import custom_funcs as cf

from sklearn.model_selection import KFold
from mixed_naive_bayes import MixedNB
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import QuantileTransformer
from sklearn.metrics import precision_score, recall_score, f1_score
from sklearn.naive_bayes import GaussianNB
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import precision_recall_curve, auc
from sklearn.model_selection import StratifiedKFold
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import MinMaxScaler
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from imblearn.over_sampling import SMOTE
from imblearn.pipeline import make_pipeline
from sklearn.metrics import classification_report, confusion_matrix

## Import raw data

* **UID:** Unique identifier ranging from 1 to 10000

* **productID:** Consisting of a letter L, M, or H for low (50% of all products), medium (30%), and high (20%) as product quality variants and a variant-specific serial number

* **air temperature [K]:** Generated using a random walk process later normalized to a standard deviation of 2 K around 300 K

* **process temperature [K]:** Generated using a random walk process normalized to a standard deviation of 1 K, added to the air temperature plus 10 K.

* **rotational speed [rpm]:** Calculated from powepower of 2860 W, overlaid with a normally distributed noise

* **torque [Nm]:** Torque values are normally distributed around 40 Nm with an Ïƒ = 10 Nm and no negative values.

* **tool wear [min]:** The quality variants H/M/L add 5/3/2 minutes of tool wear to the used tool in the process. and a 'machine failure' label that indicates, whether the machine has failed in this particular data point for any of the following failure modes are true.

* **Target:** Failure or Not

* **Failure Type:** Type of Failure

In [2]:
raw_df = pd.read_csv(cf.file_directory("raw") + "predictive_maintenance.csv")
raw_df.head()

Unnamed: 0,UDI,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Target,Failure Type
0,1,M14860,M,298.1,308.6,1551,42.8,0,0,No Failure
1,2,L47181,L,298.2,308.7,1408,46.3,3,0,No Failure
2,3,L47182,L,298.1,308.5,1498,49.4,5,0,No Failure
3,4,L47183,L,298.2,308.6,1433,39.5,7,0,No Failure
4,5,L47184,L,298.2,308.7,1408,40.0,9,0,No Failure


## Import cleaned data

In [3]:
train_df = pd.read_csv(cf.file_directory('cleaned') + 'train_df.csv')
test_df = pd.read_csv(cf.file_directory('cleaned') + 'test_df.csv')
train_df.head()

Unnamed: 0,Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Type_H,Type_L,Type_M,Target,Failure Type_Heat,Failure Type_None,Failure Type_Overstrain,Failure Type_Power,Failure Type_Random,Failure Type_Tool
0,310.7,1454,39.4,17,0,0,1,0,0,1,0,0,0,0
1,309.7,1868,23.8,118,0,0,1,0,0,1,0,0,0,0
2,308.5,1616,30.2,34,0,1,0,0,0,1,0,0,0,0
3,312.6,1768,23.9,149,0,0,1,0,0,1,0,0,0,0
4,313.4,1624,32.1,53,0,0,1,0,0,1,0,0,0,0


In [4]:
scaled_train_df = pd.read_csv(cf.file_directory('cleaned') + 'scaled_train_df.csv')
scaled_test_df = pd.read_csv(cf.file_directory('cleaned') + 'scaled_test_df.csv')
scaled_train_df.head()

Unnamed: 0,Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Type_H,Type_L,Type_M,Target,Failure Type_Heat,Failure Type_None,Failure Type_Overstrain,Failure Type_Power,Failure Type_Random,Failure Type_Tool
0,0.617284,0.160117,0.489011,0.067194,0,0,1,0,0,1,0,0,0,0
1,0.493827,0.402933,0.274725,0.466403,0,0,1,0,0,1,0,0,0,0
2,0.345679,0.255132,0.362637,0.134387,0,1,0,0,0,1,0,0,0,0
3,0.851852,0.344282,0.276099,0.588933,0,0,1,0,0,1,0,0,0,0
4,0.950617,0.259824,0.388736,0.209486,0,0,1,0,0,1,0,0,0,0


In [5]:
qt_train_df = pd.read_csv(cf.file_directory('cleaned') + 'qt_train_df.csv')
qt_test_df = pd.read_csv(cf.file_directory('cleaned') + 'qt_test_df.csv')
qt_train_df.head()

Unnamed: 0,Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Type_H,Type_L,Type_M,Target,Failure Type_Heat,Failure Type_None,Failure Type_Overstrain,Failure Type_Power,Failure Type_Random,Failure Type_Tool
0,0.4129,-0.392492,-0.06277,-1.334066,0,0,1,0,0,1,0,0,0,0
1,-0.175278,1.669112,-1.625274,0.146048,0,0,1,0,0,1,0,0,0,0
2,-0.926176,0.691121,-0.979511,-0.943656,0,1,0,0,0,1,0,0,0,0
3,1.684464,1.374878,-1.615945,0.510631,0,0,1,0,0,1,0,0,0,0
4,2.511791,0.733156,-0.77145,-0.645631,0,0,1,0,0,1,0,0,0,0


## Train/test prep

In [11]:
cols = ['Process temperature [K]','Rotational speed [rpm]','Torque [Nm]','Tool wear [min]', 'Type_H', 'Type_L', 'Type_M']
X_train = train_df[cols]
X_test = test_df[cols]

y_train = train_df.filter(regex=("Failure Type.*"))
y_test = test_df.filter(regex=("Failure Type.*"))

In [12]:
# Scaled data
cols = ['Process temperature [K]','Rotational speed [rpm]','Torque [Nm]','Tool wear [min]', 'Type_H', 'Type_L', 'Type_M']
X_train_scaled = scaled_train_df[cols]
X_test_scaled = scaled_test_df[cols]

y_train_scaled = scaled_train_df.filter(regex=("Failure Type.*"))
y_test_scaled = scaled_test_df.filter(regex=("Failure Type.*"))

In [13]:
# Transformed data
cols = ['Process temperature [K]','Rotational speed [rpm]','Torque [Nm]','Tool wear [min]', 'Type_H', 'Type_L', 'Type_M']
X_train_qt = qt_train_df[cols]
X_test_qt = qt_test_df[cols]

y_train_qt = qt_train_df.filter(regex=("Failure Type.*"))
y_test_qt = qt_test_df.filter(regex=("Failure Type.*"))

## Multiclass classification

## Model selection
* It's a multiclass classification problem.

* The following models will be tested:
    - Random Forest
    - XGBoost
    - Neural network

## Model training & evaluation

### Random forest

### XGBoost

### Neural network

### Conclusion