# About Dataset
<h2> Context</h2>
This an example data source which can be used for Predictive Maintenance Model Building. It consists of the following data:

- Machine conditions and usage: The operating conditions of a machine e.g. data collected from sensors.
- Failure history: The failure history of a machine or component within the machine.
- Maintenance history: The repair history of a machine, e.g. error codes, previous maintenance activities or component replacements.
- Machine features: The features of a machine, e.g. engine size, make and model, location.
<h2> Details</h2>
- Telemetry Time Series Data (PdM_telemetry.csv): It consists of hourly average of voltage, rotation, pressure, vibration collected from 100 machines for the year 2015.

- Error (PdM_errors.csv): These are errors encountered by the machines while in operating condition. Since, these errors don't shut down the machines, these are not considered as failures. The error date and times are rounded to the closest hour since the telemetry data is collected at an hourly rate.

- Maintenance (PdM_maint.csv): If a component of a machine is replaced, that is captured as a record in this table. Components are replaced under two situations: 1. During the regular scheduled visit, the technician replaced it (Proactive Maintenance) 2. A component breaks down and then the technician does an unscheduled maintenance to replace the component (Reactive Maintenance). This is considered as a failure and corresponding data is captured under Failures. Maintenance data has both 2014 and 2015 records. This data is rounded to the closest hour since the telemetry data is collected at an hourly rate.

- Failures (PdM_failures.csv): Each record represents replacement of a component due to failure. This data is a subset of Maintenance
  data. This data is rounded to the closest hour since the telemetry data is collected at an hourly rate.

- Metadata of Machines (PdM_Machines.csv): Model type & age of the Machines.

<h2> Acknowledgements</h2>
This dataset was available as a part of Azure AI Notebooks for Predictive Maintenance. But as of 15th Oct, 2020 the notebook (link) is no longer available. However, the data can still be downloaded using the following URLs:

https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_telemetry.csv
https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_errors.csv
https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_maint.csv
https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_failures.csv
https://azuremlsampleexperiments.blob.core.windows.net/datasets/PdM_machines.csv

<h2> Inspiration</h2>
Try to use this data to build Machine Learning models related to Predictive Maintenance.

In [1]:
# import packages
import pandas as pd
import numpy as np

In [5]:
Error_dt = pd.read_csv("Dataset/PdM_errors.csv")
Failures_dt = pd.read_csv("Dataset/PdM_failures.csv")
Machines_dt = pd.read_csv("Dataset/PdM_machines.csv")
Maint_dt = pd.read_csv("Dataset/PdM_maint.csv")
Telemetry_dt = pd.read_csv("Dataset/PdM_telemetry.csv")

In [22]:
# display the information
def info(x):
    """This function is used to diplay the crucial information about the file"""
    print("================================")
    print("Display the Info")
    print(x.info())
    print()
    print("================================")
    print("Display the columns and row number")
    row, column = x.shape
    print(f"Rows = {row}, columns = {column}")
    print()
    print("================================")
    print("Display sample")
    print(x.sample(5))
    print()
    print("================================")
    print("Display the stats")
    print(x.describe())
    print()
    print("================================")
    print("Display the null values")
    print(x.isna().sum())
    print()
    print("================================")
    print("Display the columns")
    print(x.columns)

In [23]:
# Error dataset info
info(Error_dt)

Display the Info
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3919 entries, 0 to 3918
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   datetime   3919 non-null   object
 1   machineID  3919 non-null   int64 
 2   errorID    3919 non-null   object
dtypes: int64(1), object(2)
memory usage: 92.0+ KB
None

Display the columns and row number
Rows = 3919, columns = 3

Display sample
                 datetime  machineID errorID
3761  2015-05-28 08:00:00         97  error4
3381  2015-10-23 18:00:00         87  error2
3107  2015-01-01 06:00:00         81  error1
830   2015-02-19 09:00:00         22  error1
1778  2015-07-31 13:00:00         47  error2

Display the stats
         machineID
count  3919.000000
mean     51.044654
std      28.954988
min       1.000000
25%      25.000000
50%      51.000000
75%      77.000000
max     100.000000

Display the null values
datetime     0
machineID    0
errorID      0
dtype: int64

Displ

In [24]:
# Failures dataset info
info(Failures_dt)

Display the Info
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 761 entries, 0 to 760
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   datetime   761 non-null    object
 1   machineID  761 non-null    int64 
 2   failure    761 non-null    object
dtypes: int64(1), object(2)
memory usage: 18.0+ KB
None

Display the columns and row number
Rows = 761, columns = 3

Display sample
                datetime  machineID failure
270  2015-09-04 06:00:00         36   comp2
678  2015-05-20 06:00:00         92   comp3
246  2015-04-01 06:00:00         33   comp4
34   2015-05-24 06:00:00          7   comp2
686  2015-04-16 06:00:00         93   comp1

Display the stats
        machineID
count  761.000000
mean    51.911958
std     29.515542
min      1.000000
25%     24.000000
50%     51.000000
75%     79.000000
max    100.000000

Display the null values
datetime     0
machineID    0
failure      0
dtype: int64

Display the columns
Ind

In [25]:
# Machine dataset info
info(Machines_dt)

Display the Info
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   machineID  100 non-null    int64 
 1   model      100 non-null    object
 2   age        100 non-null    int64 
dtypes: int64(2), object(1)
memory usage: 2.5+ KB
None

Display the columns and row number
Rows = 100, columns = 3

Display sample
    machineID   model  age
90         91  model4   17
34         35  model1   17
40         41  model4    9
37         38  model4   15
80         81  model4    1

Display the stats
        machineID         age
count  100.000000  100.000000
mean    50.500000   11.330000
std     29.011492    5.856974
min      1.000000    0.000000
25%     25.750000    6.750000
50%     50.500000   12.000000
75%     75.250000   16.000000
max    100.000000   20.000000

Display the null values
machineID    0
model        0
age          0
dtype: int64

Display the columns

In [26]:
# Maint dataset info
info(Maint_dt)

Display the Info
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3286 entries, 0 to 3285
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   datetime   3286 non-null   object
 1   machineID  3286 non-null   int64 
 2   comp       3286 non-null   object
dtypes: int64(1), object(2)
memory usage: 77.1+ KB
None

Display the columns and row number
Rows = 3286, columns = 3

Display sample
                 datetime  machineID   comp
2272  2015-10-17 06:00:00         69  comp4
921   2015-11-27 06:00:00         28  comp4
1841  2014-11-28 06:00:00         57  comp1
897   2014-09-29 06:00:00         28  comp2
1455  2015-09-12 06:00:00         44  comp1

Display the stats
         machineID
count  3286.000000
mean     50.284236
std      28.914478
min       1.000000
25%      25.250000
50%      50.000000
75%      75.000000
max     100.000000

Display the null values
datetime     0
machineID    0
comp         0
dtype: int64

Display the

In [27]:
# Telemetry dataset info
info(Telemetry_dt)

Display the Info
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 876100 entries, 0 to 876099
Data columns (total 6 columns):
 #   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  
 0   datetime   876100 non-null  object 
 1   machineID  876100 non-null  int64  
 2   volt       876100 non-null  float64
 3   rotate     876100 non-null  float64
 4   pressure   876100 non-null  float64
 5   vibration  876100 non-null  float64
dtypes: float64(4), int64(1), object(1)
memory usage: 40.1+ MB
None

Display the columns and row number
Rows = 876100, columns = 6

Display sample
                   datetime  machineID        volt      rotate    pressure  \
230376  2015-04-19 04:00:00         27  146.060066  515.247434   98.345132   
208405  2015-10-15 20:00:00         24  145.228771  513.302316   95.970176   
577924  2015-12-19 17:00:00         66  158.366992  491.777599   89.816319   
832727  2015-01-19 06:00:00         96  163.410558  371.649443  104.088656   
1897    

In [29]:
# combining the dataset
combined_data = pd.concat([Error_dt, Failures_dt, Machines_dt, Maint_dt, Telemetry_dt], ignore_index=True)

In [30]:
combined_data

Unnamed: 0,datetime,machineID,errorID,failure,model,age,comp,volt,rotate,pressure,vibration
0,2015-01-03 07:00:00,1,error1,,,,,,,,
1,2015-01-03 20:00:00,1,error3,,,,,,,,
2,2015-01-04 06:00:00,1,error5,,,,,,,,
3,2015-01-10 15:00:00,1,error4,,,,,,,,
4,2015-01-22 10:00:00,1,error4,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...
884161,2016-01-01 02:00:00,100,,,,,,179.438162,395.222827,102.290715,50.771941
884162,2016-01-01 03:00:00,100,,,,,,189.617555,446.207972,98.180607,35.123072
884163,2016-01-01 04:00:00,100,,,,,,192.483414,447.816524,94.132837,48.314561
884164,2016-01-01 05:00:00,100,,,,,,165.475310,413.771670,104.081073,44.835259


In [35]:
# data join between the Error and Failures
Error_failure = pd.merge(Error_dt, Failures_dt, on="machineID", how="outer")
Error_failure

Unnamed: 0,datetime_x,machineID,errorID,datetime_y,failure
0,2015-01-03 07:00:00,1,error1,2015-01-05 06:00:00,comp4
1,2015-01-03 07:00:00,1,error1,2015-03-06 06:00:00,comp1
2,2015-01-03 07:00:00,1,error1,2015-04-20 06:00:00,comp2
3,2015-01-03 07:00:00,1,error1,2015-06-19 06:00:00,comp4
4,2015-01-03 07:00:00,1,error1,2015-09-02 06:00:00,comp4
...,...,...,...,...,...
31182,2015-12-08 06:00:00,100,error3,2015-09-10 06:00:00,comp1
31183,2015-12-08 06:00:00,100,error3,2015-12-09 06:00:00,comp2
31184,2015-12-22 03:00:00,100,error3,2015-02-12 06:00:00,comp1
31185,2015-12-22 03:00:00,100,error3,2015-09-10 06:00:00,comp1
