# Machine Predictive Maintenance Classification

This includes a personal project by ~SammieKn (@GitHub) on a [kaggle dataset](https://www.kaggle.com/datasets/shivamb/machine-predictive-maintenance-classification/code) to predict machine failure (binary) and type (multiclass). It is my introduction on doing such a project on my own and am looking forward to it :D.

<img src="./images/dalle.png" width="500" height="500" />

In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

import os

In [7]:
PATH_TO_DATA = "./data/predictive_maintenance.csv"

df = pd.read_csv(PATH_TO_DATA, delimiter=",", index_col="UDI")

In [8]:
df

Unnamed: 0_level_0,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Target,Failure Type
UDI,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1,M14860,M,298.1,308.6,1551,42.8,0,0,No Failure
2,L47181,L,298.2,308.7,1408,46.3,3,0,No Failure
3,L47182,L,298.1,308.5,1498,49.4,5,0,No Failure
4,L47183,L,298.2,308.6,1433,39.5,7,0,No Failure
5,L47184,L,298.2,308.7,1408,40.0,9,0,No Failure
...,...,...,...,...,...,...,...,...,...
9996,M24855,M,298.8,308.4,1604,29.5,14,0,No Failure
9997,H39410,H,298.9,308.4,1632,31.8,17,0,No Failure
9998,M24857,M,299.0,308.6,1645,33.4,22,0,No Failure
9999,H39412,H,299.0,308.7,1408,48.5,25,0,No Failure


In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 10000 entries, 1 to 10000
Data columns (total 9 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Product ID               10000 non-null  object 
 1   Type                     10000 non-null  object 
 2   Air temperature [K]      10000 non-null  float64
 3   Process temperature [K]  10000 non-null  float64
 4   Rotational speed [rpm]   10000 non-null  int64  
 5   Torque [Nm]              10000 non-null  float64
 6   Tool wear [min]          10000 non-null  int64  
 7   Target                   10000 non-null  int64  
 8   Failure Type             10000 non-null  object 
dtypes: float64(3), int64(3), object(3)
memory usage: 781.2+ KB


In [10]:
df.describe()

Unnamed: 0,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Target
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,300.00493,310.00556,1538.7761,39.98691,107.951,0.0339
std,2.000259,1.483734,179.284096,9.968934,63.654147,0.180981
min,295.3,305.7,1168.0,3.8,0.0,0.0
25%,298.3,308.8,1423.0,33.2,53.0,0.0
50%,300.1,310.1,1503.0,40.1,108.0,0.0
75%,301.5,311.1,1612.0,46.8,162.0,0.0
max,304.5,313.8,2886.0,76.6,253.0,1.0


### About Dataset
Machine Predictive Maintenance Classification Dataset
Since real predictive maintenance datasets are generally difficult to obtain and in particular difficult to publish, we present and provide a synthetic dataset that reflects real predictive maintenance encountered in the industry to the best of our knowledge.

The dataset consists of 10 000 data points stored as rows with 14 features in columns

- UID: unique identifier ranging from 1 to 10000
- productID: consisting of a letter L, M, or H for low (50% of all products), medium (30%), and high (20%) as product quality variants and a variant-specific serial number
- air temperature [K]: generated using a random walk process later normalized to a standard deviation of 2 K around 300 K
- process temperature [K]: generated using a random walk process normalized to a standard deviation of 1 K, added to the air temperature plus 10 K.
- rotational speed [rpm]: calculated from powepower of 2860 W, overlaid with a normally distributed noise
- torque [Nm]: torque values are normally distributed around 40 Nm with an Ïƒ = 10 Nm and no negative values.
- tool wear [min]: The quality variants H/M/L add 5/3/2 minutes of tool wear to the used tool in the process. and a
'machine failure' label that indicates, whether the machine has failed in this particular data point for any of the following failure modes are true.

**Important :** There are two Targets - Do not make the mistake of using one of them as feature, as it will lead to leakage.
Target : Failure or Not
Failure Type : Type of Failure

## Data preparation

In the cells below the data is prepared for further analysis. 

## Exploratory Data Analysis (EDA)

In the cells below the data is analyzed by means of visualization and statistics. The goal here is to understand the potential relationships of the data in preparation for the machine learning part. 

## Prediction

The cells below describe the different modelling choices and how the data is split in training and testing sets. Once the model is described, the model is trained on the data and predictions are made. 

## Evaluation 

The training set is now validated against the training set.

## Interpretation and conclusions