# Predictive Maintenance


Since real predictive maintenance datasets are generally difficult to obtain and in particular difficult to publish, we present and provide a synthetic dataset that reflects real predictive maintenance encountered in the industry to the best of our knowledge.

The dataset consists of 10 000 data points stored as rows with 14 features in columns



*   UID: unique identifier ranging from 1 to 10000
*   ProductID: consisting of a letter L, M, or H for low (50% of all products),
medium (30%), and high (20%) as product quality variants and a variant-specific serial number



*   Air temperature [K]: generated using a random walk process later normalized to a standard deviation of 2 K around 300 K
*   Process temperature [K]: generated using a random walk process normalized to a standard deviation of 1 K, added to the air temperature plus 10 K.




*   Rotational speed [rpm]: calculated from powepower of 2860 W, overlaid with a normally distributed noise
*   torque [Nm]: torque values are normally distributed around 40 Nm with an Ïƒ = 10 Nm and no negative values.



*   Tool wear [min]: The quality variants H/M/L add 5/3/2 minutes of tool wear to the used tool in the process.

A 'machine failure' label that indicates, whether the machine has failed in this particular data point for any of the following failure modes are true.
Important : There are two Targets - Do not make the mistake of using one of them as feature, as it will lead to leakage.



*   Target : Failure or Not
*   Failure Type : Type of Failure






In [1]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedKFold
from sklearn.neighbors import KNeighborsClassifier
from imblearn.over_sampling import SMOTE
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import label_binarize
from sklearn.metrics import(
    precision_score, recall_score, f1_score, balanced_accuracy_score,
    confusion_matrix, matthews_corrcoef, roc_auc_score, accuracy_score
)
import numpy as np
from tabulate import tabulate
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

import tensorflow as tf
import random
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.initializers import HeNormal

2025-06-17 08:45:55.242993: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-06-17 08:45:55.251073: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-06-17 08:45:55.322248: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-06-17 08:45:55.401466: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1750171555.490086    3170 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1750171555.51

In [20]:
semilla = 7
np.random.seed(semilla)
tf.random.set_seed(semilla)
random.seed(semilla)
df = pd.read_csv('predictive_maintenance.csv')
df.head()

Unnamed: 0,UDI,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Target,Failure Type
0,1,M14860,M,298.1,308.6,1551,42.8,0,0,No Failure
1,2,L47181,L,298.2,308.7,1408,46.3,3,0,No Failure
2,3,L47182,L,298.1,308.5,1498,49.4,5,0,No Failure
3,4,L47183,L,298.2,308.6,1433,39.5,7,0,No Failure
4,5,L47184,L,298.2,308.7,1408,40.0,9,0,No Failure


In [3]:
df.shape

(10000, 10)

In [4]:
df['Failure Type'].value_counts()

Failure Type
No Failure                  9652
Heat Dissipation Failure     112
Power Failure                 95
Overstrain Failure            78
Tool Wear Failure             45
Random Failures               18
Name: count, dtype: int64

In [5]:
LE = LabelEncoder()
df['Type'] = LE.fit_transform(df['Type'])
df['Type'].value_counts()

Type
1    6000
2    2997
0    1003
Name: count, dtype: int64

In [6]:
LE = LabelEncoder()
df['Failure Type'] = LE.fit_transform(df['Failure Type'])
df['Failure Type'].value_counts()

Failure Type
1    9652
0     112
3      95
2      78
5      45
4      18
Name: count, dtype: int64

In [7]:
# Separar características de la etiqueta

X = df.drop(columns = ['UDI', 'Product ID', 'Failure Type', 'Target'])
Y = df['Failure Type']

In [11]:
X.head()

Unnamed: 0,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min]
0,2,298.1,308.6,1551,42.8,0
1,1,298.2,308.7,1408,46.3,3
2,1,298.1,308.5,1498,49.4,5
3,1,298.2,308.6,1433,39.5,7
4,1,298.2,308.7,1408,40.0,9


In [8]:
# IR por clase
conteo_clase = np.bincount(Y)
mayor = conteo_clase.max()
ir_por_clases= np.round(mayor / conteo_clase, 2)

ir_por_clases

array([ 86.18,   1.  , 123.74, 101.6 , 536.22, 214.49])

## Método de validación

Validación cruzada estratificada

* Aplicación de smote para cada Fold

In [23]:
def macro_medidas(y_true, y_pred, y_prob):

  # Cálculo de medidas macro
  labels = np.unique(y_true)
  cm = confusion_matrix(y_true, y_pred, labels=labels)

  recall_por_clase = []
  precision_por_clase = []
  f1_por_clase = []
  specificity_por_clase = []

  for i, label in enumerate(labels):
      TP = cm[i, i]
      FN = cm[i, :].sum() - TP
      FP = cm[:, i].sum() - TP
      TN = cm.sum() - (TP + FN + FP)
      
      recall = TP / (TP + FN) if (TP + FN) > 0 else 0
      precision = TP / (TP + FP) if (TP + FP) > 0 else 0 
      f1 = (2 * precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
      specificity = TN / (TN + FP) if (TN + FP) > 0 else 0

      recall_por_clase.append(recall)
      precision_por_clase.append(precision)
      f1_por_clase.append(f1)
      specificity_por_clase.append(specificity)

  # Aplicables a ponderacion    
  macro_recall = np.mean(recall_por_clase)
  macro_precision = np.mean(precision_por_clase)
  macro_f1 = np.mean(f1_por_clase)
  macro_specificity = np.mean(specificity_por_clase)

  # Medidas que son generales
  accuracy = accuracy_score(y_true, y_pred)
  error_rate = 1 - accuracy
  b_acc = balanced_accuracy_score(y_true, y_pred)
  mcc = matthews_corrcoef(y_true, y_pred)

  # ROC-AUC
  classes = np.unique(y_true)
  y_true_bin = label_binarize(y_true, classes = classes)
  roc_auc_macro = roc_auc_score(y_true_bin, y_prob, average = 'macro', multi_class = 'ovr')

  metrics_table = [
      ['Accuracy', accuracy],
      ['Error Rate', error_rate],
      ['Recall (macro)', macro_recall],
      ['Specificity (macro)', macro_specificity],
      ['Balanced Accuracy', b_acc],
      ['Precision (macro)', macro_precision],
      ['F1 Score (macro)', macro_f1],
      ['MCC', mcc],
      ['ROC-AUC (macro)', roc_auc_macro]
  ]

  print(tabulate(metrics_table, ['Medidas', 'Valor'], floatfmt = '.4f', tablefmt = 'plain'))

In [13]:
def macro_medidas_2(y_true, y_pred, y_prob):

  # Cálculo de medidas macro actualizando balanced accuracy
  labels = np.unique(y_true)
  cm = confusion_matrix(y_true, y_pred, labels=labels)

  recall_por_clase = []
  precision_por_clase = []
  f1_por_clase = []
  specificity_por_clase = []

  for i, label in enumerate(labels):
      TP = cm[i, i]
      FN = cm[i, :].sum() - TP
      FP = cm[:, i].sum() - TP
      TN = cm.sum() - (TP + FN + FP)
      
      recall = TP / (TP + FN) if (TP + FN) > 0 else 0
      precision = TP / (TP + FP) if (TP + FP) > 0 else 0 
      f1 = (2 * precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
      specificity = TN / (TN + FP) if (TN + FP) > 0 else 0

      recall_por_clase.append(recall)
      precision_por_clase.append(precision)
      f1_por_clase.append(f1)
      specificity_por_clase.append(specificity)

  # Aplicables a ponderacion    
  macro_recall = np.mean(recall_por_clase)
  macro_precision = np.mean(precision_por_clase)
  macro_f1 = np.mean(f1_por_clase)
  macro_specificity = np.mean(specificity_por_clase)
  macro_b_acc = np.mean([macro_recall, macro_specificity])

  # Medidas que son generales
  accuracy = accuracy_score(y_true, y_pred)
  error_rate = 1 - accuracy
  mcc = matthews_corrcoef(y_true, y_pred)

  # ROC-AUC
  classes = np.unique(y_true)
  y_true_bin = label_binarize(y_true, classes = classes)
  roc_auc_macro = roc_auc_score(y_true_bin, y_prob, average = 'macro', multi_class = 'ovr')

  metrics_table = [
      ['Accuracy', accuracy],
      ['Error Rate', error_rate],
      ['Recall (macro)', macro_recall],
      ['Specificity (macro)', macro_specificity],
      ['Balanced Accuracy', macro_b_acc],
      ['Precision (macro)', macro_precision],
      ['F1 Score (macro)', macro_f1],
      ['MCC', mcc],
      ['ROC-AUC (macro)', roc_auc_macro]
  ]

  print(tabulate(metrics_table, ['Medidas', 'Valor'], floatfmt = '.4f', tablefmt = 'plain'))

In [15]:
def weighted_medidas(y_true, y_pred, y_prob):

    labels = np.unique(y_true)
    cm = confusion_matrix(y_true, y_pred, labels=labels)
    conteo_clase = np.bincount(y_true)
    total = conteo_clase.sum()
    proporciones = conteo_clase/total


    recall_por_clase = []
    precision_por_clase = []
    f1_por_clase = []
    specificity_por_clase = []

    for i, label in enumerate(labels):
        TP = cm[i, i]
        FN = cm[i, :].sum() - TP
        FP = cm[:, i].sum() - TP
        TN = cm.sum() - (TP + FN + FP)
        
        recall = TP / (TP + FN) if (TP + FN) > 0 else 0
        precision = TP / (TP + FP) if (TP + FP) > 0 else 0 
        f1 = (2 * precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
        specificity = TN / (TN + FP) if (TN + FP) > 0 else 0

        recall_por_clase.append(recall)
        precision_por_clase.append(precision)
        f1_por_clase.append(f1)
        specificity_por_clase.append(specificity)

    # Aplicables a ponderacion    
    weighted_recall = np.dot(recall_por_clase, proporciones)
    weighted_precision = np.dot(precision_por_clase, proporciones)
    weighted_f1 = np.dot(f1_por_clase, proporciones)
    weighted_specificity = np.dot(specificity_por_clase, proporciones)

    # Medidas generalizadas
    accuracy = accuracy_score(y_true, y_pred)
    error_rate = 1 - accuracy
    b_acc = balanced_accuracy_score(y_true, y_pred)
    mcc = matthews_corrcoef(y_true, y_pred)

    y_true_bin = label_binarize(y_true, classes=labels)
    roc_auc = roc_auc_score(y_true_bin, y_prob, average='macro', multi_class='ovr')

    metrics_table = [
      ['Accuracy', accuracy],
      ['Error Rate', error_rate],
      ['Recall (weighted)', weighted_recall],
      ['Specificity (weighted)', weighted_specificity],
      ['Balanced Accuracy', b_acc],
      ['Precision (weighted)', weighted_precision],
      ['F1 Score (weighted)', weighted_f1],
      ['MCC', mcc],
      ['ROC-AUC', roc_auc]
    ]

    print(tabulate(metrics_table, ['Medidas', 'Valor'], floatfmt='.4f', tablefmt='plain'))

In [46]:
def weighted_medidas_2(y_true, y_pred, y_prob):

    labels = np.unique(y_true)
    cm = confusion_matrix(y_true, y_pred, labels=labels)
    conteo_clase = np.bincount(y_true)
    total = conteo_clase.sum()
    proporciones = conteo_clase/total


    recall_por_clase = []
    precision_por_clase = []
    f1_por_clase = []
    specificity_por_clase = []

    for i, label in enumerate(labels):
        TP = cm[i, i]
        FN = cm[i, :].sum() - TP
        FP = cm[:, i].sum() - TP
        TN = cm.sum() - (TP + FN + FP)
        
        recall = TP / (TP + FN) if (TP + FN) > 0 else 0
        precision = TP / (TP + FP) if (TP + FP) > 0 else 0 
        f1 = (2 * precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
        specificity = TN / (TN + FP) if (TN + FP) > 0 else 0

        recall_por_clase.append(recall)
        precision_por_clase.append(precision)
        f1_por_clase.append(f1)
        specificity_por_clase.append(specificity)

    # Aplicables a ponderacion    
    weighted_recall = np.dot(recall_por_clase, proporciones)
    weighted_precision = np.dot(precision_por_clase, proporciones)
    weighted_f1 = np.dot(f1_por_clase, proporciones)
    weighted_specificity = np.dot(specificity_por_clase, proporciones)
    weighted_b_acc = np.mean([weighted_recall, weighted_specificity])

    # Medidas generalizadas
    accuracy = accuracy_score(y_true, y_pred)
    error_rate = 1 - accuracy
    mcc = matthews_corrcoef(y_true, y_pred)

    y_true_bin = label_binarize(y_true, classes=labels)
    roc_auc = roc_auc_score(y_true_bin, y_prob, average='macro', multi_class='ovr')

    metrics_table = [
      ['Accuracy', accuracy],
      ['Error Rate', error_rate],
      ['Recall (weighted)', weighted_recall],
      ['Specificity (weighted)', weighted_specificity],
      ['Balanced Accuracy', weighted_b_acc],
      ['Precision (weighted)', weighted_precision],
      ['F1 Score (weighted)', weighted_f1],
      ['MCC', mcc],
      ['ROC-AUC', roc_auc]
    ]

    print(tabulate(metrics_table, ['Medidas', 'Valor'], floatfmt='.4f', tablefmt='plain'))

In [None]:
def macro_medidas_nn(y_true, y_pred, y_prob):
    
    accuracy = accuracy_score(y_true, y_pred)
    error_rate = 1 - accuracy
    mcc = matthews_corrcoef(y_true, y_pred)
    
    recall = recall_score(y_true, y_pred, average='macro')
    b_acc = balanced_accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred, average='macro')
    f1 = f1_score(y_true, y_pred, average='macro')
    
    
    # Cálculo manual de specificity por clase
    cm = confusion_matrix(y_true, y_pred, labels=np.unique(y_true))
    n_classes = cm.shape[0]
    
    specificities = []
    recalls = []
    b_accs = []
    precisions = []
    f1s = []


    for i in range(n_classes):
        TP = cm[i, i]
        FN = cm[i, :].sum() - TP
        FP = cm[:, i].sum() - TP
        TN = cm.sum() - (TP + FN + FP)

        specificity = TN / (TN + FP) if (TN + FP) > 0 else 0

        specificities.append(specificity)

    specificity_macro = np.mean(specificities)

    y_true_bin = label_binarize(y_true, classes=np.unique(y_true))
    roc_auc_macro = roc_auc_score(y_true_bin, y_prob, average='macro', multi_class='ovr')

    metrics_table = [
        ['Accuracy', accuracy],
        ['Error Rate', error_rate],
        ['Recall (macro)', recall],
        ['Specificity (macro)', specificity_macro],
        ['Balanced Accuracy', b_acc],
        ['Precision (macro)', precision],
        ['F1 Score (macro)', f1],
        ['MCC', mcc],
        ['ROC-AUC (macro)', roc_auc_macro]
    ]
    print(tabulate(metrics_table, ['Medidas', 'Valor'], floatfmt='.4f', tablefmt='plain'))

In [15]:
def weighted_medidas_nn(y_true, y_pred, y_prob):
    labels = np.unique(y_true)
    cm = confusion_matrix(y_true, y_pred, labels=labels)
    conteo_clase = np.bincount(y_true)
    total = conteo_clase.sum()
    proporciones = conteo_clase / total

    recall_por_clase = []
    precision_por_clase = []
    f1_por_clase = []
    specificity_por_clase = []

    for i, label in enumerate(labels):
        TP = cm[i, i]
        FN = cm[i, :].sum() - TP
        FP = cm[:, i].sum() - TP
        TN = cm.sum() - (TP + FN + FP)

        recall = TP / (TP + FN) if (TP + FN) > 0 else 0
        precision = TP / (TP + FP) if (TP + FP) > 0 else 0
        f1 = (2 * precision * recall) / (precision + recall) if (precision + recall) > 0 else 0
        specificity = TN / (TN + FP) if (TN + FP) > 0 else 0

        recall_por_clase.append(recall)
        precision_por_clase.append(precision)
        f1_por_clase.append(f1)
        specificity_por_clase.append(specificity)

    weighted_recall = np.dot(recall_por_clase, proporciones)
    weighted_precision = np.dot(precision_por_clase, proporciones)
    weighted_f1 = np.dot(f1_por_clase, proporciones)
    weighted_specificity = np.dot(specificity_por_clase, proporciones)

    accuracy = accuracy_score(y_true, y_pred)
    error_rate = 1 - accuracy
    b_acc = balanced_accuracy_score(y_true, y_pred)
    mcc = matthews_corrcoef(y_true, y_pred)

    y_true_bin = label_binarize(y_true, classes=labels)
    roc_auc_macro = roc_auc_score(y_true_bin, y_prob, average='macro', multi_class='ovr')

    metrics_table = [
        ['Accuracy', accuracy],
        ['Error Rate', error_rate],
        ['Recall (weighted)', weighted_recall],
        ['Specificity (weighted)', weighted_specificity],
        ['Balanced Accuracy', b_acc],
        ['Precision (weighted)', weighted_precision],
        ['F1 Score (weighted)', weighted_f1],
        ['MCC', mcc],
        ['ROC-AUC (macro)', roc_auc_macro]
    ]
    print(tabulate(metrics_table, ['Medidas', 'Valor'], floatfmt='.4f', tablefmt='plain'))

### Algoritmo Euclidiano


In [47]:
kf = StratifiedKFold(n_splits = 5, shuffle = True, random_state = semilla)
y_true_total = []
y_pred_total = []
y_proba_total = []

for train_idx, test_idx in kf.split(X, Y):
  x_train, x_test = X.iloc[train_idx], X.iloc[test_idx]
  y_train, y_test = Y.iloc[train_idx], Y.iloc[test_idx]

  smote = SMOTE(random_state = semilla)
  x_train, y_train = smote.fit_resample(x_train, y_train)

  model = KNeighborsClassifier(n_neighbors = 1, metric = 'euclidean')
  model.fit(x_train, y_train)
  y_pred = model.predict(x_test)
  y_prob = model.predict_proba(x_test)

  y_true_total.extend(y_test)
  y_pred_total.extend(y_pred)
  y_proba_total.extend(y_prob)

y_proba_total = np.array(y_proba_total)

macro_medidas_2(y_true_total, y_pred_total, y_proba_total)
print('*'*31)
weighted_medidas_2(y_true_total, y_pred_total, y_proba_total)

Medidas                Valor
Accuracy              0.8315
Error Rate            0.1685
Recall (macro)        0.4996
Specificity (macro)   0.9131
Balanced Accuracy     0.7063
Precision (macro)     0.3389
F1 Score (macro)      0.3726
MCC                   0.2252
ROC-AUC (macro)       0.7063
*******************************
Medidas                   Valor
Accuracy                 0.8315
Error Rate               0.1685
Recall (weighted)        0.8315
Specificity (weighted)   0.6470
Balanced Accuracy        0.7392
Precision (weighted)     0.9594
F1 Score (weighted)      0.8884
MCC                      0.2252
ROC-AUC                  0.7063


### Euclidiano 2

In [48]:
import numpy as np

def euclidean_predictor_multiclass(x_train, y_train, x_test):
    # Obtener clases únicas
    clases = np.unique(y_train)
    
    # Calcular centroides para cada clase
    centroides = {
        clase: x_train[y_train == clase].mean().values
        for clase in clases
    }

    # Clasificar en base a distancia mínima a los centroides
    y_pred = []
    for _, row in x_test.iterrows():
        distancias = {
            clase: np.linalg.norm(row.values - centroide)
            for clase, centroide in centroides.items()
        }
        clase_predicha = min(distancias, key=distancias.get)
        y_pred.append(clase_predicha)

    return np.array(y_pred)

macro_medidas_2(y_true_total, y_pred_total, y_proba_total)
print('*'*31)
weighted_medidas_2(y_true_total, y_pred_total, y_proba_total)

Medidas                Valor
Accuracy              0.8315
Error Rate            0.1685
Recall (macro)        0.4996
Specificity (macro)   0.9131
Balanced Accuracy     0.7063
Precision (macro)     0.3389
F1 Score (macro)      0.3726
MCC                   0.2252
ROC-AUC (macro)       0.7063
*******************************
Medidas                   Valor
Accuracy                 0.8315
Error Rate               0.1685
Recall (weighted)        0.8315
Specificity (weighted)   0.6470
Balanced Accuracy        0.7392
Precision (weighted)     0.9594
F1 Score (weighted)      0.8884
MCC                      0.2252
ROC-AUC                  0.7063


### Algoritmo 1NN

In [49]:
kf = StratifiedKFold(n_splits = 5, shuffle = True, random_state = semilla)
y_true_total = []
y_pred_total = []
y_proba_total = []

for train_idx, test_idx in kf.split(X, Y):
  x_train, x_test = X.iloc[train_idx], X.iloc[test_idx]
  y_train, y_test = Y.iloc[train_idx], Y.iloc[test_idx]

  smote = SMOTE(random_state = semilla)
  x_train, y_train = smote.fit_resample(x_train, y_train)

  model = KNeighborsClassifier(n_neighbors = 1, metric = 'chebyshev')
  model.fit(x_train, y_train)
  y_pred = model.predict(x_test)
  y_prob = model.predict_proba(x_test)

  y_true_total.extend(y_test)
  y_pred_total.extend(y_pred)
  y_proba_total.extend(y_prob)

y_proba_total = np.array(y_proba_total)

macro_medidas_2(y_true_total, y_pred_total, y_proba_total)
print('*'*31)
weighted_medidas_2(y_true_total, y_pred_total, y_proba_total)

Medidas                Valor
Accuracy              0.8272
Error Rate            0.1728
Recall (macro)        0.5190
Specificity (macro)   0.9124
Balanced Accuracy     0.7157
Precision (macro)     0.3381
F1 Score (macro)      0.3728
MCC                   0.2240
ROC-AUC (macro)       0.7157
*******************************
Medidas                   Valor
Accuracy                 0.8272
Error Rate               0.1728
Recall (weighted)        0.8272
Specificity (weighted)   0.6470
Balanced Accuracy        0.7371
Precision (weighted)     0.9592
F1 Score (weighted)      0.8856
MCC                      0.2240
ROC-AUC                  0.7157


### Algoritmo 3NN

In [50]:
kf = StratifiedKFold(n_splits = 5, shuffle = True, random_state = semilla)
y_true_total = []
y_pred_total = []
y_proba_total = []

for train_idx, test_idx in kf.split(X, Y):
  x_train, x_test = X.iloc[train_idx], X.iloc[test_idx]
  y_train, y_test = Y.iloc[train_idx], Y.iloc[test_idx]

  smote = SMOTE(random_state = semilla)
  x_train, y_train = smote.fit_resample(x_train, y_train)

  model = KNeighborsClassifier(n_neighbors = 3, metric = 'chebyshev')
  model.fit(x_train, y_train)
  y_pred = model.predict(x_test)
  y_prob = model.predict_proba(x_test)

  y_true_total.extend(y_test)
  y_pred_total.extend(y_pred)
  y_proba_total.extend(y_prob)

y_proba_total = np.array(y_proba_total)

macro_medidas_2(y_true_total, y_pred_total, y_proba_total)
print('*'*31)
weighted_medidas_2(y_true_total, y_pred_total, y_proba_total)

Medidas                Valor
Accuracy              0.7574
Error Rate            0.2426
Recall (macro)        0.5640
Specificity (macro)   0.9178
Balanced Accuracy     0.7409
Precision (macro)     0.3221
F1 Score (macro)      0.3581
MCC                   0.2120
ROC-AUC (macro)       0.7560
*******************************
Medidas                   Valor
Accuracy                 0.7574
Error Rate               0.2426
Recall (weighted)        0.7574
Specificity (weighted)   0.7493
Balanced Accuracy        0.7533
Precision (weighted)     0.9617
F1 Score (weighted)      0.8425
MCC                      0.2120
ROC-AUC                  0.7560


### Algoritmo 5NN

In [51]:
kf = StratifiedKFold(n_splits = 5, shuffle = True, random_state = semilla)
y_true_total = []
y_pred_total = []
y_proba_total = []

for train_idx, test_idx in kf.split(X, Y):
  x_train, x_test = X.iloc[train_idx], X.iloc[test_idx]
  y_train, y_test = Y.iloc[train_idx], Y.iloc[test_idx]

  smote = SMOTE(random_state = semilla)
  x_train, y_train = smote.fit_resample(x_train, y_train)

  model = KNeighborsClassifier(n_neighbors = 5, metric = 'chebyshev')
  model.fit(x_train, y_train)
  y_pred = model.predict(x_test)
  y_prob = model.predict_proba(x_test)

  y_true_total.extend(y_test)
  y_pred_total.extend(y_pred)
  y_proba_total.extend(y_prob)

y_proba_total = np.array(y_proba_total)

macro_medidas_2(y_true_total, y_pred_total, y_proba_total)
print('*'*31)
weighted_medidas_2(y_true_total, y_pred_total, y_proba_total)

Medidas                Valor
Accuracy              0.7143
Error Rate            0.2857
Recall (macro)        0.5551
Specificity (macro)   0.9156
Balanced Accuracy     0.7354
Precision (macro)     0.3095
F1 Score (macro)      0.3398
MCC                   0.1965
ROC-AUC (macro)       0.7702
*******************************
Medidas                   Valor
Accuracy                 0.7143
Error Rate               0.2857
Recall (weighted)        0.7143
Specificity (weighted)   0.7796
Balanced Accuracy        0.7469
Precision (weighted)     0.9618
F1 Score (weighted)      0.8139
MCC                      0.1965
ROC-AUC                  0.7702


### Algoritmo 7NN

In [52]:
kf = StratifiedKFold(n_splits = 5, shuffle = True, random_state = semilla)
y_true_total = []
y_pred_total = []
y_proba_total = []

for train_idx, test_idx in kf.split(X, Y):
  x_train, x_test = X.iloc[train_idx], X.iloc[test_idx]
  y_train, y_test = Y.iloc[train_idx], Y.iloc[test_idx]

  smote = SMOTE(random_state = semilla)
  x_train, y_train = smote.fit_resample(x_train, y_train)

  model = KNeighborsClassifier(n_neighbors = 7, metric = 'chebyshev')
  model.fit(x_train, y_train)
  y_pred = model.predict(x_test)
  y_prob = model.predict_proba(x_test)

  y_true_total.extend(y_test)
  y_pred_total.extend(y_pred)
  y_proba_total.extend(y_prob)

y_proba_total = np.array(y_proba_total)

macro_medidas_2(y_true_total, y_pred_total, y_proba_total)
print('*'*31)
weighted_medidas_2(y_true_total, y_pred_total, y_proba_total)

Medidas                Valor
Accuracy              0.6853
Error Rate            0.3147
Recall (macro)        0.5733
Specificity (macro)   0.9182
Balanced Accuracy     0.7457
Precision (macro)     0.3011
F1 Score (macro)      0.3305
MCC                   0.1968
ROC-AUC (macro)       0.7822
*******************************
Medidas                   Valor
Accuracy                 0.6853
Error Rate               0.3147
Recall (weighted)        0.6853
Specificity (weighted)   0.8238
Balanced Accuracy        0.7546
Precision (weighted)     0.9631
F1 Score (weighted)      0.7937
MCC                      0.1968
ROC-AUC                  0.7822


### Algoritmo IB1

In [53]:
kf = StratifiedKFold(n_splits = 5, shuffle = True, random_state = semilla)
y_true_total = []
y_pred_total = []
y_proba_total = []

for train_idx, test_idx in kf.split(X, Y):
  x_train, x_test = X.iloc[train_idx], X.iloc[test_idx]
  y_train, y_test = Y.iloc[train_idx], Y.iloc[test_idx]

  smote = SMOTE(random_state = semilla)
  x_train, y_train = smote.fit_resample(x_train, y_train)

  model = KNeighborsClassifier(n_neighbors = 1, metric = 'manhattan')
  model.fit(x_train, y_train)
  y_pred = model.predict(x_test)
  y_prob = model.predict_proba(x_test)

  y_true_total.extend(y_test)
  y_pred_total.extend(y_pred)
  y_proba_total.extend(y_prob)

y_proba_total = np.array(y_proba_total)

macro_medidas_2(y_true_total, y_pred_total, y_proba_total)
print('*'*31)
weighted_medidas_2(y_true_total, y_pred_total, y_proba_total)

Medidas                Valor
Accuracy              0.8249
Error Rate            0.1751
Recall (macro)        0.5046
Specificity (macro)   0.9161
Balanced Accuracy     0.7104
Precision (macro)     0.3275
F1 Score (macro)      0.3632
MCC                   0.2274
ROC-AUC (macro)       0.7104
*******************************
Medidas                   Valor
Accuracy                 0.8249
Error Rate               0.1751
Recall (weighted)        0.8249
Specificity (weighted)   0.6719
Balanced Accuracy        0.7484
Precision (weighted)     0.9598
F1 Score (weighted)      0.8843
MCC                      0.2274
ROC-AUC                  0.7104


### Algoritmo IB3

In [54]:
kf = StratifiedKFold(n_splits = 5, shuffle = True, random_state = semilla)
y_true_total = []
y_pred_total = []
y_proba_total = []

for train_idx, test_idx in kf.split(X, Y):
  x_train, x_test = X.iloc[train_idx], X.iloc[test_idx]
  y_train, y_test = Y.iloc[train_idx], Y.iloc[test_idx]

  smote = SMOTE(random_state = semilla)
  x_train, y_train = smote.fit_resample(x_train, y_train)

  model = KNeighborsClassifier(n_neighbors = 3, metric = 'manhattan')
  model.fit(x_train, y_train)
  y_pred = model.predict(x_test)
  y_prob = model.predict_proba(x_test)

  y_true_total.extend(y_test)
  y_pred_total.extend(y_pred)
  y_proba_total.extend(y_prob)

y_proba_total = np.array(y_proba_total)

macro_medidas_2(y_true_total, y_pred_total, y_proba_total)
print('*'*31)
weighted_medidas_2(y_true_total, y_pred_total, y_proba_total)

Medidas                Valor
Accuracy              0.7616
Error Rate            0.2384
Recall (macro)        0.5411
Specificity (macro)   0.9236
Balanced Accuracy     0.7323
Precision (macro)     0.3068
F1 Score (macro)      0.3430
MCC                   0.2197
ROC-AUC (macro)       0.7664
*******************************
Medidas                   Valor
Accuracy                 0.7616
Error Rate               0.2384
Recall (weighted)        0.7616
Specificity (weighted)   0.7798
Balanced Accuracy        0.7707
Precision (weighted)     0.9624
F1 Score (weighted)      0.8451
MCC                      0.2197
ROC-AUC                  0.7664


### Algoritmo IB5

In [55]:
kf = StratifiedKFold(n_splits = 5, shuffle = True, random_state = semilla)
y_true_total = []
y_pred_total = []
y_proba_total = []

for train_idx, test_idx in kf.split(X, Y):
  x_train, x_test = X.iloc[train_idx], X.iloc[test_idx]
  y_train, y_test = Y.iloc[train_idx], Y.iloc[test_idx]

  smote = SMOTE(random_state = semilla)
  x_train, y_train = smote.fit_resample(x_train, y_train)

  model = KNeighborsClassifier(n_neighbors = 5, metric = 'manhattan')
  model.fit(x_train, y_train)
  y_pred = model.predict(x_test)
  y_prob = model.predict_proba(x_test)

  y_true_total.extend(y_test)
  y_pred_total.extend(y_pred)
  y_proba_total.extend(y_prob)

y_proba_total = np.array(y_proba_total)

macro_medidas_2(y_true_total, y_pred_total, y_proba_total)
print('*'*31)
weighted_medidas_2(y_true_total, y_pred_total, y_proba_total)

Medidas                Valor
Accuracy              0.7281
Error Rate            0.2719
Recall (macro)        0.5656
Specificity (macro)   0.9253
Balanced Accuracy     0.7455
Precision (macro)     0.2979
F1 Score (macro)      0.3329
MCC                   0.2159
ROC-AUC (macro)       0.7865
*******************************
Medidas                   Valor
Accuracy                 0.7281
Error Rate               0.2719
Recall (weighted)        0.7281
Specificity (weighted)   0.8240
Balanced Accuracy        0.7760
Precision (weighted)     0.9636
F1 Score (weighted)      0.8230
MCC                      0.2159
ROC-AUC                  0.7865


### Algoritmo IB7

In [56]:
kf = StratifiedKFold(n_splits = 5, shuffle = True, random_state = semilla)
y_true_total = []
y_pred_total = []
y_proba_total = []

for train_idx, test_idx in kf.split(X, Y):
  x_train, x_test = X.iloc[train_idx], X.iloc[test_idx]
  y_train, y_test = Y.iloc[train_idx], Y.iloc[test_idx]

  smote = SMOTE(random_state = semilla)
  x_train, y_train = smote.fit_resample(x_train, y_train)

  model = KNeighborsClassifier(n_neighbors = 7, metric = 'manhattan')
  model.fit(x_train, y_train)
  y_pred = model.predict(x_test)
  y_prob = model.predict_proba(x_test)

  y_true_total.extend(y_test)
  y_pred_total.extend(y_pred)
  y_proba_total.extend(y_prob)

y_proba_total = np.array(y_proba_total)

macro_medidas_2(y_true_total, y_pred_total, y_proba_total)
print('*'*31)
weighted_medidas_2(y_true_total, y_pred_total, y_proba_total)

Medidas                Valor
Accuracy              0.6989
Error Rate            0.3011
Recall (macro)        0.5587
Specificity (macro)   0.9209
Balanced Accuracy     0.7398
Precision (macro)     0.2882
F1 Score (macro)      0.3188
MCC                   0.2026
ROC-AUC (macro)       0.7951
*******************************
Medidas                   Valor
Accuracy                 0.6989
Error Rate               0.3011
Recall (weighted)        0.6989
Specificity (weighted)   0.8266
Balanced Accuracy        0.7628
Precision (weighted)     0.9628
F1 Score (weighted)      0.8027
MCC                      0.2026
ROC-AUC                  0.7951


### Naive Bayes

In [57]:
kf = StratifiedKFold(n_splits = 5, shuffle = True, random_state = semilla)
y_true_total = []
y_pred_total = []
y_proba_total = []

for train_idx, test_idx in kf.split(X, Y):
  x_train, x_test = X.iloc[train_idx], X.iloc[test_idx]
  y_train, y_test = Y.iloc[train_idx], Y.iloc[test_idx]

  smote = SMOTE(random_state = semilla)
  x_train, y_train = smote.fit_resample(x_train, y_train)

  model = GaussianNB()
  model.fit(x_train, y_train)
  y_pred = model.predict(x_test)
  y_prob = model.predict_proba(x_test)

  y_true_total.extend(y_test)
  y_pred_total.extend(y_pred)
  y_proba_total.extend(y_prob)

y_proba_total = np.array(y_proba_total)

macro_medidas_2(y_true_total, y_pred_total, y_proba_total)
print('*'*31)
weighted_medidas_2(y_true_total, y_pred_total, y_proba_total)

Medidas                Valor
Accuracy              0.5541
Error Rate            0.4459
Recall (macro)        0.7074
Specificity (macro)   0.9170
Balanced Accuracy     0.8122
Precision (macro)     0.2623
F1 Score (macro)      0.2768
MCC                   0.2016
ROC-AUC (macro)       0.8741
*******************************
Medidas                   Valor
Accuracy                 0.5541
Error Rate               0.4459
Recall (weighted)        0.5541
Specificity (weighted)   0.9481
Balanced Accuracy        0.7511
Precision (weighted)     0.9672
F1 Score (weighted)      0.6889
MCC                      0.2016
ROC-AUC                  0.8741


### Regresión Logística

In [58]:
kf = StratifiedKFold(n_splits = 5, shuffle = True, random_state = semilla)
y_true_total = []
y_pred_total = []
y_proba_total = []

for train_idx, test_idx in kf.split(X, Y):
  x_train, x_test = X.iloc[train_idx], X.iloc[test_idx]
  y_train, y_test = Y.iloc[train_idx], Y.iloc[test_idx]

  smote = SMOTE(random_state = semilla)
  x_train, y_train = smote.fit_resample(x_train, y_train)

  model = LogisticRegression()
  model.fit(x_train, y_train)
  y_pred = model.predict(x_test)
  y_prob = model.predict_proba(x_test)

  y_true_total.extend(y_test)
  y_pred_total.extend(y_pred)
  y_proba_total.extend(y_prob)

y_proba_total = np.array(y_proba_total)

macro_medidas_2(y_true_total, y_pred_total, y_proba_total)
print('*'*31)
weighted_medidas_2(y_true_total, y_pred_total, y_proba_total)

STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

Medidas                Valor
Accuracy              0.5507
Error Rate            0.4493
Recall (macro)        0.6890
Specificity (macro)   0.9210
Balanced Accuracy     0.8050
Precision (macro)     0.2681
F1 Score (macro)      0.2799
MCC                   0.2008
ROC-AUC (macro)       0.8680
*******************************
Medidas                   Valor
Accuracy                 0.5507
Error Rate               0.4493
Recall (weighted)        0.5507
Specificity (weighted)   0.9753
Balanced Accuracy        0.7630
Precision (weighted)     0.9689
F1 Score (weighted)      0.6863
MCC                      0.2008
ROC-AUC                  0.8680


STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


### SVM

In [59]:
kf = StratifiedKFold(n_splits = 5, shuffle = True, random_state = semilla)
y_true_total = []
y_pred_total = []
y_proba_total = []

for train_idx, test_idx in kf.split(X, Y):
  x_train, x_test = X.iloc[train_idx], X.iloc[test_idx]
  y_train, y_test = Y.iloc[train_idx], Y.iloc[test_idx]

  smote = SMOTE(random_state = semilla)
  x_train, y_train = smote.fit_resample(x_train, y_train)

  model = SVC()
  model.fit(x_train, y_train)
  y_pred = model.predict(x_test)
  #y_prob = model.predict_proba(x_test)

  y_true_total.extend(y_test)
  y_pred_total.extend(y_pred)
  y_proba_total.extend(y_prob)

y_proba_total = np.array(y_proba_total)

macro_medidas_2(y_true_total, y_pred_total, y_proba_total)
print('*'*31)
weighted_medidas_2(y_true_total, y_pred_total, y_proba_total)

Medidas                Valor
Accuracy              0.3540
Error Rate            0.6460
Recall (macro)        0.6444
Specificity (macro)   0.8877
Balanced Accuracy     0.7660
Precision (macro)     0.3004
F1 Score (macro)      0.2636
MCC                   0.1558
ROC-AUC (macro)       0.5986
*******************************
Medidas                   Valor
Accuracy                 0.3540
Error Rate               0.6460
Recall (weighted)        0.3540
Specificity (weighted)   0.9722
Balanced Accuracy        0.6631
Precision (weighted)     0.9698
F1 Score (weighted)      0.4979
MCC                      0.1558
ROC-AUC                  0.5986


### Árbol de desición

In [60]:
kf = StratifiedKFold(n_splits = 5, shuffle = True, random_state = semilla)
y_true_total = []
y_pred_total = []
y_proba_total = []

for train_idx, test_idx in kf.split(X, Y):
  x_train, x_test = X.iloc[train_idx], X.iloc[test_idx]
  y_train, y_test = Y.iloc[train_idx], Y.iloc[test_idx]

  smote = SMOTE(random_state = semilla)
  x_train, y_train = smote.fit_resample(x_train, y_train)

  model = DecisionTreeClassifier()
  model.fit(x_train, y_train)
  y_pred = model.predict(x_test)
  #y_prob = model.predict_proba(x_test)

  y_true_total.extend(y_test)
  y_pred_total.extend(y_pred)
  y_proba_total.extend(y_prob)

y_proba_total = np.array(y_proba_total)

macro_medidas_2(y_true_total, y_pred_total, y_proba_total)
print('*'*31)
weighted_medidas_2(y_true_total, y_pred_total, y_proba_total)

Medidas                Valor
Accuracy              0.9428
Error Rate            0.0572
Recall (macro)        0.5733
Specificity (macro)   0.9461
Balanced Accuracy     0.7597
Precision (macro)     0.5076
F1 Score (macro)      0.5284
MCC                   0.4684
ROC-AUC (macro)       0.5986
*******************************
Medidas                   Valor
Accuracy                 0.9428
Error Rate               0.0572
Recall (weighted)        0.9428
Specificity (weighted)   0.7335
Balanced Accuracy        0.8382
Precision (weighted)     0.9747
F1 Score (weighted)      0.9581
MCC                      0.4684
ROC-AUC                  0.5986


### Random Forest

In [61]:
kf = StratifiedKFold(n_splits = 5, shuffle = True, random_state = semilla)
y_true_total = []
y_pred_total = []
y_proba_total = []

for train_idx, test_idx in kf.split(X, Y):
  x_train, x_test = X.iloc[train_idx], X.iloc[test_idx]
  y_train, y_test = Y.iloc[train_idx], Y.iloc[test_idx]

  smote = SMOTE(random_state = semilla)
  x_train, y_train = smote.fit_resample(x_train, y_train)

  model = RandomForestClassifier()
  model.fit(x_train, y_train)
  y_pred = model.predict(x_test)
  #y_prob = model.predict_proba(x_test)

  y_true_total.extend(y_test)
  y_pred_total.extend(y_pred)
  y_proba_total.extend(y_prob)

y_proba_total = np.array(y_proba_total)

macro_medidas_2(y_true_total, y_pred_total, y_proba_total)
print('*'*31)
weighted_medidas_2(y_true_total, y_pred_total, y_proba_total)

Medidas                Valor
Accuracy              0.9585
Error Rate            0.0415
Recall (macro)        0.6014
Specificity (macro)   0.9501
Balanced Accuracy     0.7757
Precision (macro)     0.5243
F1 Score (macro)      0.5549
MCC                   0.5502
ROC-AUC (macro)       0.5986
*******************************
Medidas                   Valor
Accuracy                 0.9585
Error Rate               0.0415
Recall (weighted)        0.9585
Specificity (weighted)   0.7419
Balanced Accuracy        0.8502
Precision (weighted)     0.9759
F1 Score (weighted)      0.9668
MCC                      0.5502
ROC-AUC                  0.5986


### Red Neuronal

In [62]:
semilla = 7
np.random.seed(semilla)
tf.random.set_seed(semilla)
random.seed(semilla)

# Resultados acumulados
y_true_total = []
y_pred_total = []
y_proba_total = []

kf = StratifiedKFold(n_splits=5, shuffle=True, random_state=semilla)
n_clases = len(np.unique(Y))

for train_idx, test_idx in kf.split(X, Y):
    x_train, x_test = X.iloc[train_idx], X.iloc[test_idx]
    y_train, y_test = Y.iloc[train_idx], Y.iloc[test_idx]

    # SMOTE
    smote = SMOTE(random_state=semilla)
    x_train, y_train = smote.fit_resample(x_train, y_train)

    # Modelo multiclase
    model = Sequential([
        Dense(21, activation='relu', input_shape=(x_train.shape[1],), kernel_initializer=HeNormal(seed=semilla)),
        Dense(33, activation='relu', kernel_initializer=HeNormal(seed=semilla)),
        Dense(n_clases, activation='softmax')
    ])

    model.compile(optimizer=Adam(learning_rate=0.003),
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])

    model.fit(x_train, y_train, epochs=50, batch_size=16, validation_split=0.2, verbose=0)

    # Predicciones
    y_prob = model.predict(x_test)  # salida (n_samples, n_classes)
    y_pred = np.argmax(y_prob, axis=1)

    y_true_total.extend(y_test)
    y_pred_total.extend(y_pred)
    y_proba_total.extend(y_prob)

# Aplanar resultados
y_true_total = np.array(y_true_total)
y_pred_total = np.array(y_pred_total)
y_proba_total = np.array(y_proba_total).reshape(-1, n_clases)

# Mostrar métricas
macro_medidas_2(y_true_total, y_pred_total, y_proba_total)
print('*' * 31)
weighted_medidas_2(y_true_total, y_pred_total, y_proba_total)


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
2025-06-17 11:51:20.393613: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)


[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step  


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step  


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m63/63[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step
Medidas                Valor
Accuracy              0.6831
Error Rate            0.3169
Recall (macro)        0.5983
Specificity (macro)   0.9187
Balanced Accuracy     0.7585
Precision (macro)     0.3274
F1 Score (macro)      0.3592
MCC                   0.2173
ROC-AUC (macro)       0.8499
*******************************
Medidas                   Valor
Accuracy                 0.6831
Error Rate               0.3169
Recall (weighted)        0.6831
Specificity (weighted)   0.8292
Balanced Accuracy        0.7562
Precision (weighted)     0.9651
F1 Score (weighted)      0.7916
MCC                      0.2173
ROC-AUC                  0.8499
