<div style="color:red;background-color:lime;padding:3%;border-radius:150px 150px;font-size:2em;text-align:center">Data analysis with-LR-DT-RF-and-SVM-99.6% AUC</div>

![](https://j.gifs.com/76kDrQ.gif)

# About Dataset

Machine Predictive Maintenance Classification Dataset Since real predictive maintenance datasets are generally difficult to obtain and in particular difficult to publish, we present and provide a synthetic dataset that reflects real predictive maintenance encountered in the industry to the best of our knowledge.


The dataset consists of 10 000 data points stored as rows with 14 features in columns

Important : 

There are two Targets - Do not make the mistake of using one of them as feature, as it will lead to leakage. Target : Failure or Not Failure Type : Type of Failure

1.Type: Indicates the type of the tool (e.g., L, M, H).

2.Air temperature [°C]: The temperature of the surrounding air in degrees Celsius.

3.Process temperature [°C]: The temperature during the manufacturing process in degrees Celsius.

4.Rotational speed [rpm]: The speed at which the tool is rotating, measured in revolutions per minute.

5.Torque [Nm]: The torque applied during the manufacturing process, measured in Newton meters.

6.Tool wear [min]: The amount of wear on the tool, measured in minutes.

7.Target: The target variable indicating whether the process was successful (0) or resulted in a failure (1).

8.Failure Type: The type of failure that occurred during the manufacturing process. If no failure occurred, this field is "No Failure".

9.Temperature difference [°C]: The difference between the process temperature and the air temperature, measured in degrees Celsius.



In [1]:
import numpy as np 
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore")

In [2]:
#Default theme
sns.set_theme(palette='tab10',
              font='Lucida Calligraphy',
              font_scale=1.5,
              rc=None)

import matplotlib
matplotlib.rcParams.update({'font.size': 15})
plt.style.use('dark_background')

In [3]:
pd.set_option("display.max_columns",None)
pd.set_option("display.max_rows",None)

In [12]:
df=pd.read_csv("predictive_maintenance.csv")
df = df.drop(["UDI","Product ID"],axis=1)
df.sample(30).style.set_properties(
    **{
        'background-color': 'Brown',
        'color': 'white',
        'border-color': 'White'
    })

Unnamed: 0,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Target,Failure Type
9759,L,298.5,309.7,1613,36.7,0,0,No Failure
7100,L,300.7,310.2,1569,32.8,31,0,No Failure
322,L,297.8,308.5,1448,40.0,192,0,No Failure
695,L,297.6,309.0,1487,45.4,64,0,No Failure
5351,M,303.5,312.9,1530,34.5,110,0,No Failure
5560,L,302.4,311.9,1541,37.9,192,0,No Failure
8492,M,298.4,309.5,1514,38.8,128,0,No Failure
5595,M,302.7,312.0,1944,21.4,71,0,No Failure
1704,M,298.1,307.8,1503,44.7,54,0,No Failure
2743,L,299.6,309.1,1476,45.3,166,0,No Failure


In [5]:
## Converting temperature in centigrate from Kelvin [1 K = -272.15 °C  ] 

df["Air temperature [K]"] = df["Air temperature [K]"] - 272.15
df["Process temperature [K]"] = df["Process temperature [K]"] - 272.15

# Renaming temperature in Centigrate(°C) from Kelvin (K)
df.rename(columns={"Air temperature [K]" : "Air temperature [°C]","Process temperature [K]" : "Process temperature [°C]"},inplace=True)

In [6]:
df["Temperature difference [°C]"] = df["Process temperature [°C]"] - df["Air temperature [°C]"]
df.sample(6)

Unnamed: 0,Type,Air temperature [°C],Process temperature [°C],Rotational speed [rpm],Torque [Nm],Tool wear [min],Target,Failure Type,Temperature difference [°C]
1985,L,25.95,35.55,1537,40.8,169,0,No Failure,9.6
7903,L,28.65,40.15,1435,44.8,134,0,No Failure,11.5
4993,L,31.65,40.75,1819,23.7,16,0,No Failure,9.1
3976,H,30.15,39.25,1372,46.0,86,0,No Failure,9.1
4873,M,31.55,40.25,1474,38.8,139,0,No Failure,8.7
8012,M,28.75,39.85,1375,56.1,182,0,No Failure,11.1


In [7]:
display(df.shape)
display(df.size)

(10000, 9)

90000

In [8]:
df.info() 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 9 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   Type                         10000 non-null  object 
 1   Air temperature [°C]         10000 non-null  float64
 2   Process temperature [°C]     10000 non-null  float64
 3   Rotational speed [rpm]       10000 non-null  int64  
 4   Torque [Nm]                  10000 non-null  float64
 5   Tool wear [min]              10000 non-null  int64  
 6   Target                       10000 non-null  int64  
 7   Failure Type                 10000 non-null  object 
 8   Temperature difference [°C]  10000 non-null  float64
dtypes: float64(4), int64(3), object(2)
memory usage: 703.3+ KB


In [9]:
df.columns

Index(['Type', 'Air temperature [°C]', 'Process temperature [°C]',
       'Rotational speed [rpm]', 'Torque [Nm]', 'Tool wear [min]', 'Target',
       'Failure Type', 'Temperature difference [°C]'],
      dtype='object')

In [10]:
for col in df[['Type','Target','Failure Type']]:
    print(df[col].value_counts()) 
    print("****"*8)

Type
L    6000
M    2997
H    1003
Name: count, dtype: int64
********************************
Target
0    9661
1     339
Name: count, dtype: int64
********************************
Failure Type
No Failure                  9652
Heat Dissipation Failure     112
Power Failure                 95
Overstrain Failure            78
Tool Wear Failure             45
Random Failures               18
Name: count, dtype: int64
********************************
