# Predictive Maintenance


Since real predictive maintenance datasets are generally difficult to obtain and in particular difficult to publish, we present and provide a synthetic dataset that reflects real predictive maintenance encountered in the industry to the best of our knowledge.

The dataset consists of 10 000 data points stored as rows with 14 features in columns



*   UID: unique identifier ranging from 1 to 10000
*   ProductID: consisting of a letter L, M, or H for low (50% of all products),
medium (30%), and high (20%) as product quality variants and a variant-specific serial number



*   Air temperature [K]: generated using a random walk process later normalized to a standard deviation of 2 K around 300 K
*   Process temperature [K]: generated using a random walk process normalized to a standard deviation of 1 K, added to the air temperature plus 10 K.




*   Rotational speed [rpm]: calculated from powepower of 2860 W, overlaid with a normally distributed noise
*   torque [Nm]: torque values are normally distributed around 40 Nm with an Ïƒ = 10 Nm and no negative values.



*   Tool wear [min]: The quality variants H/M/L add 5/3/2 minutes of tool wear to the used tool in the process.

A 'machine failure' label that indicates, whether the machine has failed in this particular data point for any of the following failure modes are true.
Important : There are two Targets - Do not make the mistake of using one of them as feature, as it will lead to leakage.



*   Target : Failure or Not
*   Failure Type : Type of Failure






In [45]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.neighbors import KNeighborsClassifier
from imblearn.over_sampling import SMOTE
from sklearn.metrics import confusion_matrix
from sklearn.metrics import(
    precision_score, recall_score, f1_score, balanced_accuracy_score,
    confusion_matrix, matthews_corrcoef, roc_auc_score, label_binarize
)
import numpy as np
from tabulate import tabulate

ImportError: cannot import name 'label_binarize' from 'sklearn.metrics' (/usr/local/lib/python3.11/dist-packages/sklearn/metrics/__init__.py)

In [None]:
semilla = 7
df = pd.read_csv('predictive_maintenance.csv')
df.head()

In [None]:
LE = LabelEncoder()
df['Type'] = LE.fit_transform(df['Type'])
df['Type'].value_counts()

In [None]:
df['Failure Type'].value_counts()

In [None]:
# Separar características de la etiqueta

X = df.drop(columns = ['UDI', 'Product ID', 'Failure Type'])
Y = df['Failure Type']

## Método de validación

Validación cruzada estratificada

* Aplicación de smote para cada Fold

In [None]:
def matriz(y_test, y_pred):
  cm = confusion_matrix(y_test, y_pred)
  tn, fp, fn, tp = cm.ravel()
  print("-"*30)
  print("Matriz de confusión:")
  print(f"{tp:>3} {fn:>3}")
  print(f"{fp:>3} {tn:>3}")

### Algoritmo Euclidiano


In [None]:
def macro_medidas(y_true, y_pred, y_prob):
  recall = recall_score(y_true, y_pred, average = 'macro')
  specificity = recall_score(y_true, y_pred, pos_label = 0, average='macro')
  b_acc = balanced_accuracy_score(y_true, y_pred)
  precision = precision_score(y_true, y_pred, average = 'macro')
  f1 = f1_score(y_true, y_pred, average = 'macro')
  mcc = matthews_corrcoef(y_true, y_pred)

  # ROC-AUC
  classes = np.unique(y_true)
  y_true_bin = label_binarize(y_true, classes = classes)
  roc_auc_macro = roc_auc_score(y_true_bin, y_prob, average = 'macro', multiclass = 'ovr')

  metrics_table = [
      ['Recall', recall],
      ['Specificity', specificity],
      ['Balanced Accuracy', b_acc],
      ['Precision', precision],
      ['F1 Score', f1],
      ['MCC', mcc],
      ['ROC-AUC', roc_auc_macro]
  ]

  print(tabulate(metrics_table, ['Medidas', 'Valor'], floatfmt = '.4f', tablefmt = 'plain'))

In [None]:
kf = StratifiedKFold(n_splits = 5, shuffle = True, random_state = semilla)
y_true_total = []
y_pred_total = []
y_proba_total = []

for train_idx, test_idx in kf.split(X, Y):
  x_train, x_test = X.iloc[train_idx], X.iloc[test_idx]
  y_train, y_test = Y.iloc[train_idx], Y.iloc[test_idx]

  smote = SMOTE(random_state = semilla)
  x_train, y_train = smote.fit_resample(x_train, y_train)

  model = KNeighborsClassifier(n_neighbors = 1, metric = 'euclidean')
  model.fit(x_train, y_train)
  y_pred = model.predict(x_test)
  y_prob = model.predict_proba(x_test)

  y_true_total.extend(y_test)
  y_pred_total.extend(y_pred)
  y_proba_total.extend(y_prob)

macro_medidas(y_true_total, y_pred_total, y_proba_total)