<a href="https://colab.research.google.com/github/AlvaroMAlves/Pred-Maintenance-Machine/blob/main/Predictive_Maintenance_of_Machines.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Predictive Maintenance of Machines

## Introduction

Import the necessary libraries:

In [1]:
import numpy
import pandas as pd

Import the database from GitHub:

In [39]:
url = 'https://raw.githubusercontent.com/AlvaroMAlves/Pred-Maintenance-Machine/main/predictive%20maintenance%20of%20machines.csv'
df = pd.read_csv(url)

Check the created dataframe:

In [40]:
df.head()

Unnamed: 0,UDI,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Machine failure,TWF,HDF,PWF,OSF,RNF
0,1,M14860,M,298.1,308.6,1551,42.8,0,0,0,0,0,0,0
1,2,L47181,L,298.2,308.7,1408,46.3,3,0,0,0,0,0,0
2,3,L47182,L,298.1,308.5,1498,49.4,5,0,0,0,0,0,0
3,4,L47183,L,298.2,308.6,1433,39.5,7,0,0,0,0,0,0
4,5,L47184,L,298.2,308.7,1408,40.0,9,0,0,0,0,0,0


The column analysis is as follows:

Machines have an ID and an ordinal categorical variable indicating the quality
of the machines, classified as Low, Medium, and High.

Data acquisition follows a time series. Observed data includes air temperature, process temperature, rotational speed, torque, and tool wear.

Starting from the "Machine Failure" column, the subsequent columns indicate binary information regarding the occurrence of a failure and specify the type of failure, following the mentioned format:

*   Tool wear failure (TWF)
*   Heat dissipation failure (HDF)
*   Power failure (PWF)
*   Overstrain failure (OSF)
*   Random failures (RNF)


Rename de column "UDI" to "UID", and setting "UID" as dataframe index:

In [41]:
df.rename(columns={'UDI': 'UID'}, inplace=True)
df.set_index('UID', inplace=True)

In [42]:
df.head()

Unnamed: 0_level_0,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Machine failure,TWF,HDF,PWF,OSF,RNF
UID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
1,M14860,M,298.1,308.6,1551,42.8,0,0,0,0,0,0,0
2,L47181,L,298.2,308.7,1408,46.3,3,0,0,0,0,0,0
3,L47182,L,298.1,308.5,1498,49.4,5,0,0,0,0,0,0
4,L47183,L,298.2,308.6,1433,39.5,7,0,0,0,0,0,0
5,L47184,L,298.2,308.7,1408,40.0,9,0,0,0,0,0,0


## Exploratory Data Analysis

### Overview of the columns:

In [43]:
print(df.info())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10000 entries, 1 to 10000
Data columns (total 13 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Product ID               10000 non-null  object 
 1   Type                     10000 non-null  object 
 2   Air temperature [K]      10000 non-null  float64
 3   Process temperature [K]  10000 non-null  float64
 4   Rotational speed [rpm]   10000 non-null  int64  
 5   Torque [Nm]              10000 non-null  float64
 6   Tool wear [min]          10000 non-null  int64  
 7   Machine failure          10000 non-null  int64  
 8   TWF                      10000 non-null  int64  
 9   HDF                      10000 non-null  int64  
 10  PWF                      10000 non-null  int64  
 11  OSF                      10000 non-null  int64  
 12  RNF                      10000 non-null  int64  
dtypes: float64(3), int64(8), object(2)
memory usage: 1.1+ MB
None


We can observe that there are 10,000 rows and no null values in the dataset. Additionally, the columns "Air temperature", "Process temperature," and "Torque" are numeric continuous variables, while the "Rotational Speed" column contains discrete values.

### Descriptive Statistics Analysis

In [44]:
print(df.describe())

       Air temperature [K]  Process temperature [K]  Rotational speed [rpm]  \
count         10000.000000             10000.000000            10000.000000   
mean            300.004930               310.005560             1538.776100   
std               2.000259                 1.483734              179.284096   
min             295.300000               305.700000             1168.000000   
25%             298.300000               308.800000             1423.000000   
50%             300.100000               310.100000             1503.000000   
75%             301.500000               311.100000             1612.000000   
max             304.500000               313.800000             2886.000000   

        Torque [Nm]  Tool wear [min]  Machine failure           TWF  \
count  10000.000000     10000.000000     10000.000000  10000.000000   
mean      39.986910       107.951000         0.033900      0.004600   
std        9.968934        63.654147         0.180981      0.067671   
min 

It doesn't seem to have any inconsistent values. All the binary variables have a minimum value of 0 and a maximum value of 1. Let's count the number of failures and failure types:

In [45]:
column_counts = df.loc[df['Machine failure'] == 1, 'Machine failure':'RNF'].sum()
print(column_counts)

Machine failure    339
TWF                 46
HDF                115
PWF                 95
OSF                 98
RNF                  1
dtype: int64


The total number of failures is 339. The sum of failure types is 355. Let's see if there are rows with 2 or more failure types:

In [48]:
rows_with_two_failures = df[(df['Machine failure'] == 1) & (df[['TWF', 'HDF', 'PWF', 'OSF', 'RNF']].sum(axis=1) > 1)]

# rows with 2 or more failure types
selected_columns = ['Machine failure', 'TWF', 'HDF', 'PWF', 'OSF', 'RNF']
print(rows_with_two_failures[selected_columns])

# rows count
count_rows = rows_with_two_failures.shape[0]
print("\n","Rows Count:", count_rows)

      Machine failure  TWF  HDF  PWF  OSF  RNF
UID                                           
70                  1    0    0    1    1    0
1325                1    0    0    1    1    0
1497                1    0    0    1    1    0
3612                1    1    0    0    0    1
3855                1    0    0    1    1    0
3944                1    0    0    1    1    0
4255                1    0    1    1    0    0
4343                1    0    1    1    0    0
4371                1    0    1    0    1    0
4384                1    0    1    0    1    0
4418                1    0    1    1    0    0
4463                1    0    1    0    1    0
4643                1    0    1    0    1    0
4644                1    0    1    0    1    0
4730                1    0    1    0    1    0
5395                1    0    0    1    1    0
5402                1    1    0    0    1    0
5910                1    1    0    1    1    0
6249                1    0    0    1    1    0
7084         

There are 23 rows with 2 and 1 row with 3 failure types, but numbers don't match yet. There must be some rows without any specified failure types:

In [51]:
rows_without_failures = df[(df['Machine failure'] == 1) & (df[['TWF', 'HDF', 'PWF', 'OSF', 'RNF']].sum(axis=1) == 0)]

# rows without any specified failure types
print(rows_without_failures[selected_columns])

# rows count
count_zero = rows_without_failures.shape[0]
print("\n","Rows Count:", count_zero)

      Machine failure  TWF  HDF  PWF  OSF  RNF
UID                                           
1438                1    0    0    0    0    0
2750                1    0    0    0    0    0
4045                1    0    0    0    0    0
4685                1    0    0    0    0    0
5537                1    0    0    0    0    0
5942                1    0    0    0    0    0
6479                1    0    0    0    0    0
8507                1    0    0    0    0    0
9016                1    0    0    0    0    0

 Rows Count: 9


The analysis is now correct. The total sum of failure types is 355. We have identified 23 rows with 2 failure types, one row with 3 failure types, and 9 rows without any failure types. Interestingly, this adds up to a total of 339, which matches exactly with the total count of machine failures.