# 📄 Descripción del Dataset: Wine Quality

## 📌 Fuente
[UCI Machine Learning Repository – Wine Quality Dataset](https://archive.ics.uci.edu/ml/datasets/Wine+Quality)

---

## 📊 Resumen General

| Atributo                  | Valor                            |
|--------------------------|----------------------------------|
| **Número de instancias** | 6,497 registros totales          |
| **Tipos de vino**        | Vino blanco (4,898) y tinto (1,599) |
| **Número de atributos**  | 11 características + 1 objetivo  |
| **Tipo de atributos**    | Todos numéricos continuos        |
| **Variable objetivo**    | `quality` (entero de 3 a 9)      |

---

## 🧬 Variables (entradas)

Las variables representan propiedades físico-químicas del vino:

- `fixed acidity`: Ácido tartárico
- `volatile acidity`: Ácido acético
- `citric acid`: Ácido cítrico
- `residual sugar`: Azúcar residual (g/L)
- `chlorides`: Concentración de sal
- `free sulfur dioxide`: SO₂ libre
- `total sulfur dioxide`: SO₂ total
- `density`: Densidad del vino
- `pH`: Nivel de acidez
- `sulphates`: Sulfatos (agentes conservantes)
- `alcohol`: Porcentaje de alcohol (% vol.)

---

## 🎯 Variable Objetivo: `quality`

- Representa la calidad sensorial del vino evaluada por catadores expertos.
- Es un entero entre **3 y 9** (7 clases posibles).
- Distribución **desbalanceada** (la mayoría son calidad 5 o 6).

| Clase (`quality`) | Interpretación        |
|-------------------|------------------------|
| 3–4               | Calidad muy baja       |
| 5–6               | Calidad media          |
| 7–9               | Calidad buena a alta   |

---

## 🔍 Clasificación vs Regresión

- Aunque la variable `quality` es numérica discreta, se puede usar como:
  - **Clasificación multiclase**: ✅ Recomendado para esta práctica.
  - **Regresión**: posible, pero **no se ajusta al objetivo de la práctica**.

---

## 📈 Aplicabilidad en Preprocesamiento

| Técnica                      | ¿Aplica?  | Comentario                                     |
|-----------------------------|-----------|------------------------------------------------|
| Escalado / Normalización    | ✅        | Todas las variables son numéricas              |
| Detección de Outliers       | ✅        | Algunas variables tienen valores extremos      |
| Balanceo de Clases          | ✅        | Distribución de clases desbalanceada           |
| One-hot Encoding             | ⚠️        | No tiene categóricas directas, pero se pueden crear artificialmente |
| Clasificación Multiclase    | ✅        | 7 clases disponibles (cumple con los requisitos del trabajo) |

---

## ✅ Conclusión

El dataset **Wine Quality** es una excelente opción para abordar un problema de **clasificación multiclase supervisado**, permitiendo aplicar técnicas como normalización, detección de outliers, y balanceo de clases. Su estructura simple y bien documentada lo hace ideal para proyectos de aprendizaje de máquinas.



In [116]:
%pip install pandas
%pip install scikit-learn
%pip install torch numpy
%pip install matplotlib
%pip install seaborn

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Collecting matplotlib
  Downloading matplotlib-3.10.3-cp311-cp311-win_amd64.whl.metadata (11 kB)
Collecting contourpy>=1.0.1 (from matplotlib)
  Downloading contourpy-1.3.2-cp311-cp311-win_amd64.whl.metadata (5.5 kB)
Collecting cycler>=0.10 (from matplotlib)
  Using cached cycler-0.12.1-py3-none-any.whl.metadata (3.8 kB)
Collecting fonttools>=4.22.0 (from matplotlib)
  Downloading fonttools-4.58.1-cp311-cp311-win_amd64.whl.metadata (108 kB)
Collecting kiwisolver>=1.3.1 (from matplotlib)
  Downloading kiwisolver-1.4.8-cp311-cp311-win_amd64.whl.metadata (6.3 kB)
Collecting pillow>=8 (from matplotlib)
  Using cached pillow-11.2.1-cp311-cp311-win_amd64.whl.metadata (9.1 kB)
Collecting pyparsing>=2.3.1 (from matplotlib)
  Using cached pyparsing-3.2.3-py3-none-any.whl.metadata (5.0 kB)
Downloading

In [117]:
import os
import pandas as pd

# Nombre del archivo local
archivo_local = "winequality-white.csv"
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv"

# Solo descarga si no existe localmente
if not os.path.exists(archivo_local):
    print("Descargando dataset...")
    df = pd.read_csv(url, sep=';')
    df.to_csv(archivo_local, index=False)
else:
    print("Cargando dataset local...")
    df = pd.read_csv(archivo_local)

# Mostrar primeras filas
print(df.head())


Cargando dataset local...
   fixed acidity  volatile acidity  citric acid  residual sugar  chlorides  \
0            7.0              0.27         0.36            20.7      0.045   
1            6.3              0.30         0.34             1.6      0.049   
2            8.1              0.28         0.40             6.9      0.050   
3            7.2              0.23         0.32             8.5      0.058   
4            7.2              0.23         0.32             8.5      0.058   

   free sulfur dioxide  total sulfur dioxide  density    pH  sulphates  \
0                 45.0                 170.0   1.0010  3.00       0.45   
1                 14.0                 132.0   0.9940  3.30       0.49   
2                 30.0                  97.0   0.9951  3.26       0.44   
3                 47.0                 186.0   0.9956  3.19       0.40   
4                 47.0                 186.0   0.9956  3.19       0.40   

   alcohol  quality  
0      8.8        6  
1      9.5      

In [118]:
import pandas as pd

# Intenta cargar el archivo con separador coma
df = pd.read_csv("winequality-white.csv", sep=',')

# Limpiar espacios en nombres de columnas
df.columns = df.columns.str.strip()

# Mostrar nombres de columnas corregidos
print(df.columns.tolist())



['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar', 'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density', 'pH', 'sulphates', 'alcohol', 'quality']


Creemos el primer dataframe el cual solo tiene la conversión de la variable categorica.

In [119]:
# Crear variable categórica basada en niveles de alcohol
df["alcohol_level"] = pd.cut(df["alcohol"],
                             bins=[0, 9, 11, 13, 100],
                             labels=["bajo", "medio", "alto", "muy_alto"])

# Aplicar One-Hot Encoding a 'alcohol_level' (sin escalado aún)
df_v1 = pd.get_dummies(df, columns=["alcohol_level"], drop_first=True)

# Verificar que se creó correctamente
print(df_v1.head())
print("Forma:", df_v1.shape)


   fixed acidity  volatile acidity  citric acid  residual sugar  chlorides  \
0            7.0              0.27         0.36            20.7      0.045   
1            6.3              0.30         0.34             1.6      0.049   
2            8.1              0.28         0.40             6.9      0.050   
3            7.2              0.23         0.32             8.5      0.058   
4            7.2              0.23         0.32             8.5      0.058   

   free sulfur dioxide  total sulfur dioxide  density    pH  sulphates  \
0                 45.0                 170.0   1.0010  3.00       0.45   
1                 14.0                 132.0   0.9940  3.30       0.49   
2                 30.0                  97.0   0.9951  3.26       0.44   
3                 47.0                 186.0   0.9956  3.19       0.40   
4                 47.0                 186.0   0.9956  3.19       0.40   

   alcohol  quality  alcohol_level_medio  alcohol_level_alto  \
0      8.8        6   

Construyamos el v2, el cual contiene el balanceoy la conversion de las categoricas.

In [120]:
from sklearn.utils import resample

# Separar clases
df_grupos = [df_v1[df_v1["quality"] == clase] for clase in df_v1["quality"].unique()]

# Encontrar la clase más grande
max_len = max(len(grupo) for grupo in df_grupos)

# Balancear todas las clases al tamaño de la más grande
df_resampled = [resample(grupo, 
                         replace=True, 
                         n_samples=max_len, 
                         random_state=42) for grupo in df_grupos]

# Combinar y mezclar
df_v2 = pd.concat(df_resampled).sample(frac=1, random_state=42).reset_index(drop=True)

# Verificación
print(df_v2["quality"].value_counts())
print("Forma:", df_v2.shape)


quality
5    2198
4    2198
3    2198
9    2198
8    2198
6    2198
7    2198
Name: count, dtype: int64
Forma: (15386, 15)


Sigamos ahora con V3, el cual esta con la conversion de categoricas y el tratamiento de outliders

In [121]:
from scipy.stats import zscore
import numpy as np

# Crear copia base desde df_v1 (ya tiene categóricas codificadas)
df_temp = df_v1.copy()

# Seleccionamos solo las columnas numéricas originales (sin dummies ni 'quality')
columnas_numericas = ['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
                      'chlorides', 'free sulfur dioxide', 'total sulfur dioxide',
                      'density', 'pH', 'sulphates', 'alcohol']

# Calcular Z-scores
z_scores = np.abs(zscore(df_temp[columnas_numericas]))

# Eliminar filas con algún Z-score > 2.0 (puedes usar 3.0 si quieres conservar más)
umbral = 2.0
filtro = (z_scores < umbral).all(axis=1)
df_v3 = df_temp[filtro].reset_index(drop=True)

# Verificación
print("Registros originales:", df_v1.shape[0])
print("Registros después de eliminar outliers:", df_v3.shape[0])
print("Clases:", df_v3['quality'].value_counts())


Registros originales: 4898
Registros después de eliminar outliers: 3368
Clases: quality
6    1577
5     928
7     651
8     122
4      79
3       7
9       4
Name: count, dtype: int64


Sigamos ahora con V4, el cual esta con la conversion de categoricas, el tratamiento de outliders y balnaceo de clases.

In [122]:
from sklearn.utils import resample

# Base: df_v3 (ya sin outliers)
df_grupos = [df_v3[df_v3["quality"] == clase] for clase in df_v3["quality"].unique()]

# Tamaño de la clase más grande
max_len = max(len(grupo) for grupo in df_grupos)

# Sobremuestreo de todas las clases
df_resampled = [
    resample(grupo, replace=True, n_samples=max_len, random_state=42)
    for grupo in df_grupos
]

# Unir y mezclar
df_v4 = pd.concat(df_resampled).sample(frac=1, random_state=42).reset_index(drop=True)

# Verificación
print(df_v4["quality"].value_counts())
print("Forma:", df_v4.shape)


quality
6    1577
7    1577
5    1577
3    1577
4    1577
8    1577
9    1577
Name: count, dtype: int64
Forma: (11039, 15)


Construyamos ahora v5 con la conversion categorica + escalado de los datos.

In [123]:
from sklearn.preprocessing import MinMaxScaler

# Base: df_v1
df_v5 = df_v1.copy()

# Columnas numéricas originales (sin 'quality' ni dummies)
columnas_numericas = ['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
                      'chlorides', 'free sulfur dioxide', 'total sulfur dioxide',
                      'density', 'pH', 'sulphates', 'alcohol']

# Aplicar escalado Min-Max
scaler = MinMaxScaler()
df_v5[columnas_numericas] = scaler.fit_transform(df_v5[columnas_numericas])

# Verificar
print(df_v5.head())
print("Forma:", df_v5.shape)



   fixed acidity  volatile acidity  citric acid  residual sugar  chlorides  \
0       0.307692          0.186275     0.216867        0.308282   0.106825   
1       0.240385          0.215686     0.204819        0.015337   0.118694   
2       0.413462          0.196078     0.240964        0.096626   0.121662   
3       0.326923          0.147059     0.192771        0.121166   0.145401   
4       0.326923          0.147059     0.192771        0.121166   0.145401   

   free sulfur dioxide  total sulfur dioxide   density        pH  sulphates  \
0             0.149826              0.373550  0.267785  0.254545   0.267442   
1             0.041812              0.285383  0.132832  0.527273   0.313953   
2             0.097561              0.204176  0.154039  0.490909   0.255814   
3             0.156794              0.410673  0.163678  0.427273   0.209302   
4             0.156794              0.410673  0.163678  0.427273   0.209302   

    alcohol  quality  alcohol_level_medio  alcohol_level

Creemos ahora el v6 categóricas convertidas + escalado + balanceo de clases.



In [124]:
from sklearn.utils import resample

# Base: df_v5
df_grupos = [df_v5[df_v5["quality"] == clase] for clase in df_v5["quality"].unique()]

# Encontrar clase con más registros
max_len = max(len(grupo) for grupo in df_grupos)

# Sobremuestreo para balancear
df_resampled = [
    resample(grupo, replace=True, n_samples=max_len, random_state=42)
    for grupo in df_grupos
]

# Combinar y mezclar
df_v6 = pd.concat(df_resampled).sample(frac=1, random_state=42).reset_index(drop=True)

# Verificación
print(df_v6["quality"].value_counts())
print("Forma:", df_v6.shape)


quality
5    2198
4    2198
3    2198
9    2198
8    2198
6    2198
7    2198
Name: count, dtype: int64
Forma: (15386, 15)


Construyamos v7, Conversión categórica + Escalado + Eliminación de Outliers

In [125]:
from scipy.stats import zscore
import numpy as np

# Base: df_v5
df_temp = df_v5.copy()

# Columnas numéricas escaladas
columnas_numericas = ['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
                      'chlorides', 'free sulfur dioxide', 'total sulfur dioxide',
                      'density', 'pH', 'sulphates', 'alcohol']

# Calcular z-score y filtrar
z_scores = np.abs(zscore(df_temp[columnas_numericas]))
umbral = 2.0
filtro = (z_scores < umbral).all(axis=1)

# Aplicar filtro
df_v7 = df_temp[filtro].reset_index(drop=True)

# Verificación
print("Registros originales:", df_v5.shape[0])
print("Registros después de eliminar outliers:", df_v7.shape[0])
print("Distribución de clases:\n", df_v7["quality"].value_counts())


Registros originales: 4898
Registros después de eliminar outliers: 3368
Distribución de clases:
 quality
6    1577
5     928
7     651
8     122
4      79
3       7
9       4
Name: count, dtype: int64


Hagamos ahora v8 Conversión categórica + Escalado + Outliers + Balanceo.

In [126]:
from sklearn.utils import resample

# Base: df_v7
df_grupos = [df_v7[df_v7["quality"] == clase] for clase in df_v7["quality"].unique()]

# Encontrar la clase más numerosa
max_len = max(len(grupo) for grupo in df_grupos)

# Sobremuestreo
df_resampled = [
    resample(grupo, replace=True, n_samples=max_len, random_state=42)
    for grupo in df_grupos
]

# Combinar y mezclar
df_v8 = pd.concat(df_resampled).sample(frac=1, random_state=42).reset_index(drop=True)

# Verificación
print(df_v8["quality"].value_counts())
print("Forma final:", df_v8.shape)


quality
6    1577
7    1577
5    1577
3    1577
4    1577
8    1577
9    1577
Name: count, dtype: int64
Forma final: (11039, 15)


In [127]:
# Lista de versiones y nombres
versiones = {
    "df_v1": df_v1,
    "df_v2": df_v2,
    "df_v3": df_v3,
    "df_v4": df_v4,
    "df_v5": df_v5,
    "df_v6": df_v6,
    "df_v7": df_v7,
    "df_v8": df_v8
}

# Guardar cada versión como CSV
for nombre, df in versiones.items():
    df.to_csv(f"{nombre}.csv", index=False)
    print(f"{nombre}.csv guardado.")


df_v1.csv guardado.
df_v2.csv guardado.
df_v3.csv guardado.
df_v4.csv guardado.
df_v5.csv guardado.
df_v6.csv guardado.
df_v7.csv guardado.
df_v8.csv guardado.


Generacion de los split y evaluacion de modelos.


In [128]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
import torch
import torch.nn as nn
import torch.optim as optim

# Dataset versions
versiones = [f"df_v{i}.csv" for i in range(1, 9)]

# Red neuronal con regularización
class WineNet(nn.Module):
    def __init__(self, input_size, num_classes):
        super(WineNet, self).__init__()
        self.net = nn.Sequential(
            nn.Linear(input_size, 64),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.Linear(32, num_classes)
        )

    def forward(self, x):
        return self.net(x)

# Entrenamiento y evaluación
def entrenar_y_evaluar(df, nombre):
    print(f"\n📊 Evaluando {nombre}")

    # Separar características y etiquetas
    X = df.drop("quality", axis=1)
    y = df["quality"]

    # 🔧 Reindexar las clases para PyTorch
    unique_classes = sorted(y.unique())
    class_to_index = {c: i for i, c in enumerate(unique_classes)}
    y = y.map(class_to_index)
    num_classes = len(unique_classes)

    # Escalar características
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)

    # División de datos
    X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, stratify=y, random_state=42)

    # 🌳 Árbol de Decisión
    tree = DecisionTreeClassifier(random_state=42)
    tree.fit(X_train, y_train)
    y_pred_tree = tree.predict(X_test)
    print("\n🌳 Árbol de Decisión:")
    print(classification_report(y_test, y_pred_tree, zero_division=0))

    # 👥 KNN
    knn = KNeighborsClassifier(n_neighbors=5)
    knn.fit(X_train, y_train)
    y_pred_knn = knn.predict(X_test)
    print("\n👥 KNN:")
    print(classification_report(y_test, y_pred_knn, zero_division=0))

    # 💠 SVM
    svm = SVC(kernel="rbf", C=1.0)
    svm.fit(X_train, y_train)
    y_pred_svm = svm.predict(X_test)
    print("\n💠 SVM:")
    print(classification_report(y_test, y_pred_svm, zero_division=0))

    # 🧠 Red Neuronal (PyTorch)
    X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
    y_train_tensor = torch.tensor(y_train.values, dtype=torch.long)
    X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
    y_test_tensor = torch.tensor(y_test.values, dtype=torch.long)

    model = WineNet(input_size=X.shape[1], num_classes=num_classes)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)

    best_loss = np.inf
    early_stopping_counter = 0

    for epoch in range(50):
        model.train()
        optimizer.zero_grad()
        output = model(X_train_tensor)
        loss = criterion(output, y_train_tensor)
        loss.backward()
        optimizer.step()

        # Early stopping
        if loss.item() < best_loss:
            best_loss = loss.item()
            early_stopping_counter = 0
        else:
            early_stopping_counter += 1
        if early_stopping_counter >= 5:
            break

    model.eval()
    with torch.no_grad():
        y_pred_nn = model(X_test_tensor).argmax(dim=1).numpy()
    print("\n🧠 Red Neuronal (PyTorch):")
    print(classification_report(y_test, y_pred_nn, zero_division=0))

# Iterar sobre los datasets
for archivo in versiones:
    df = pd.read_csv(archivo)
    entrenar_y_evaluar(df, archivo)



📊 Evaluando df_v1.csv

🌳 Árbol de Decisión:
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         6
           1       0.28      0.22      0.25        49
           2       0.61      0.61      0.61       437
           3       0.63      0.62      0.63       660
           4       0.52      0.58      0.55       264
           5       0.47      0.51      0.49        53
           6       0.00      0.00      0.00         1

    accuracy                           0.59      1470
   macro avg       0.36      0.36      0.36      1470
weighted avg       0.59      0.59      0.59      1470


👥 KNN:
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         6
           1       0.32      0.18      0.23        49
           2       0.55      0.59      0.57       437
           3       0.58      0.64      0.61       660
           4       0.52      0.46      0.49       264
           5       0.40  

In [131]:
import pandas as pd
import re

# Texto con los resultados
texto_resultados = """[📊 Evaluando df_v1.csv

🌳 Árbol de Decisión:
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         6
           1       0.28      0.22      0.25        49
           2       0.61      0.61      0.61       437
           3       0.63      0.62      0.63       660
           4       0.52      0.58      0.55       264
           5       0.47      0.51      0.49        53
           6       0.00      0.00      0.00         1

    accuracy                           0.59      1470
   macro avg       0.36      0.36      0.36      1470
weighted avg       0.59      0.59      0.59      1470


👥 KNN:
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         6
           1       0.32      0.18      0.23        49
           2       0.55      0.59      0.57       437
           3       0.58      0.64      0.61       660
           4       0.52      0.46      0.49       264
           5       0.40      0.11      0.18        53
           6       0.00      0.00      0.00         1

    accuracy                           0.55      1470
   macro avg       0.34      0.28      0.30      1470
weighted avg       0.54      0.55      0.54      1470


💠 SVM:
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         6
           1       0.00      0.00      0.00        49
           2       0.61      0.60      0.60       437
           3       0.55      0.76      0.64       660
           4       0.54      0.25      0.35       264
           5       0.00      0.00      0.00        53
           6       0.00      0.00      0.00         1

    accuracy                           0.57      1470
   macro avg       0.24      0.23      0.23      1470
weighted avg       0.52      0.57      0.53      1470


🧠 Red Neuronal (PyTorch):
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         6
           1       0.00      0.00      0.00        49
           2       0.70      0.08      0.15       437
           3       0.46      0.96      0.62       660
           4       0.53      0.07      0.12       264
           5       0.00      0.00      0.00        53
           6       0.00      0.00      0.00         1

    accuracy                           0.47      1470
   macro avg       0.24      0.16      0.13      1470
weighted avg       0.51      0.47      0.35      1470


📊 Evaluando df_v2.csv

🌳 Árbol de Decisión:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00       659
           1       0.96      1.00      0.98       659
           2       0.88      0.88      0.88       659
           3       0.85      0.75      0.80       660
           4       0.88      0.90      0.89       660
           5       0.95      1.00      0.97       659
           6       1.00      1.00      1.00       660

    accuracy                           0.93      4616
   macro avg       0.93      0.93      0.93      4616
weighted avg       0.93      0.93      0.93      4616


👥 KNN:
              precision    recall  f1-score   support

           0       0.99      1.00      1.00       659
           1       0.87      0.99      0.93       659
           2       0.73      0.72      0.73       659
           3       0.64      0.46      0.54       660
           4       0.75      0.78      0.76       660
           5       0.88      0.99      0.94       659
           6       1.00      1.00      1.00       660

    accuracy                           0.85      4616
   macro avg       0.84      0.85      0.84      4616
weighted avg       0.84      0.85      0.84      4616


💠 SVM:
              precision    recall  f1-score   support

           0       0.97      1.00      0.99       659
           1       0.76      0.86      0.81       659
           2       0.62      0.59      0.60       659
           3       0.46      0.36      0.40       660
           4       0.59      0.56      0.58       660
           5       0.74      0.84      0.78       659
           6       1.00      1.00      1.00       660

    accuracy                           0.74      4616
   macro avg       0.73      0.74      0.74      4616
weighted avg       0.73      0.74      0.74      4616


🧠 Red Neuronal (PyTorch):
              precision    recall  f1-score   support

           0       0.43      0.50      0.47       659
           1       0.45      0.33      0.38       659
           2       0.33      0.67      0.44       659
           3       0.21      0.04      0.07       660
           4       0.44      0.08      0.13       660
           5       0.37      0.34      0.35       659
           6       0.43      0.76      0.55       660

    accuracy                           0.39      4616
   macro avg       0.38      0.39      0.34      4616
weighted avg       0.38      0.39      0.34      4616


📊 Evaluando df_v3.csv

🌳 Árbol de Decisión:
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         2
           1       0.29      0.29      0.29        24
           2       0.63      0.62      0.62       279
           3       0.64      0.64      0.64       473
           4       0.51      0.55      0.53       195
           5       0.46      0.32      0.38        37
           6       0.00      0.00      0.00         1

    accuracy                           0.60      1011
   macro avg       0.36      0.35      0.35      1011
weighted avg       0.60      0.60      0.59      1011


👥 KNN:
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         2
           1       0.00      0.00      0.00        24
           2       0.58      0.62      0.60       279
           3       0.61      0.71      0.65       473
           4       0.51      0.37      0.43       195
           5       0.33      0.14      0.19        37
           6       0.00      0.00      0.00         1

    accuracy                           0.58      1011
   macro avg       0.29      0.26      0.27      1011
weighted avg       0.55      0.58      0.56      1011


💠 SVM:
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         2
           1       0.00      0.00      0.00        24
           2       0.64      0.60      0.62       279
           3       0.56      0.82      0.67       473
           4       0.63      0.21      0.32       195
           5       0.00      0.00      0.00        37
           6       0.00      0.00      0.00         1

    accuracy                           0.59      1011
   macro avg       0.26      0.23      0.23      1011
weighted avg       0.56      0.59      0.54      1011


🧠 Red Neuronal (PyTorch):
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         2
           1       0.00      0.00      0.00        24
           2       0.55      0.37      0.44       279
           3       0.49      0.85      0.62       473
           4       0.00      0.00      0.00       195
           5       0.00      0.00      0.00        37
           6       0.00      0.00      0.00         1

    accuracy                           0.50      1011
   macro avg       0.15      0.17      0.15      1011
weighted avg       0.38      0.50      0.41      1011


📊 Evaluando df_v4.csv

🌳 Árbol de Decisión:
              precision    recall  f1-score   support

           0       0.99      1.00      1.00       473
           1       0.96      1.00      0.98       473
           2       0.90      0.90      0.90       473
           3       0.88      0.78      0.83       473
           4       0.89      0.93      0.91       473
           5       0.98      1.00      0.99       473
           6       1.00      1.00      1.00       474

    accuracy                           0.94      3312
   macro avg       0.94      0.94      0.94      3312
weighted avg       0.94      0.94      0.94      3312


👥 KNN:
              precision    recall  f1-score   support

           0       0.99      1.00      1.00       473
           1       0.86      1.00      0.93       473
           2       0.76      0.74      0.75       473
           3       0.68      0.54      0.60       473
           4       0.78      0.73      0.75       473
           5       0.89      1.00      0.94       473
           6       1.00      1.00      1.00       474

    accuracy                           0.86      3312
   macro avg       0.85      0.86      0.85      3312
weighted avg       0.85      0.86      0.85      3312


💠 SVM:
              precision    recall  f1-score   support

           0       0.99      1.00      0.99       473
           1       0.83      0.97      0.89       473
           2       0.74      0.64      0.69       473
           3       0.60      0.46      0.52       473
           4       0.64      0.68      0.66       473
           5       0.79      0.89      0.83       473
           6       1.00      1.00      1.00       474

    accuracy                           0.81      3312
   macro avg       0.80      0.81      0.80      3312
weighted avg       0.80      0.81      0.80      3312


🧠 Red Neuronal (PyTorch):
              precision    recall  f1-score   support

           0       0.46      0.29      0.36       473
           1       0.38      0.53      0.44       473
           2       0.31      0.42      0.35       473
           3       0.30      0.12      0.17       473
           4       0.30      0.17      0.22       473
           5       0.31      0.23      0.26       473
           6       0.52      1.00      0.69       474

    accuracy                           0.39      3312
   macro avg       0.37      0.39      0.36      3312
weighted avg       0.37      0.39      0.36      3312


📊 Evaluando df_v5.csv

🌳 Árbol de Decisión:
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         6
           1       0.28      0.22      0.25        49
           2       0.61      0.61      0.61       437
           3       0.63      0.62      0.63       660
           4       0.52      0.58      0.55       264
           5       0.47      0.51      0.49        53
           6       0.00      0.00      0.00         1

    accuracy                           0.59      1470
   macro avg       0.36      0.36      0.36      1470
weighted avg       0.59      0.59      0.59      1470


👥 KNN:
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         6
           1       0.32      0.18      0.23        49
           2       0.55      0.59      0.57       437
           3       0.58      0.64      0.61       660
           4       0.52      0.46      0.49       264
           5       0.40      0.11      0.18        53
           6       0.00      0.00      0.00         1

    accuracy                           0.55      1470
   macro avg       0.34      0.28      0.30      1470
weighted avg       0.54      0.55      0.54      1470


💠 SVM:
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         6
           1       0.00      0.00      0.00        49
           2       0.61      0.60      0.60       437
           3       0.55      0.76      0.64       660
           4       0.54      0.25      0.35       264
           5       0.00      0.00      0.00        53
           6       0.00      0.00      0.00         1

    accuracy                           0.57      1470
   macro avg       0.24      0.23      0.23      1470
weighted avg       0.52      0.57      0.53      1470


🧠 Red Neuronal (PyTorch):
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         6
           1       0.00      0.00      0.00        49
           2       0.56      0.50      0.53       437
           3       0.49      0.79      0.60       660
           4       0.50      0.02      0.03       264
           5       0.00      0.00      0.00        53
           6       0.00      0.00      0.00         1

    accuracy                           0.51      1470
   macro avg       0.22      0.19      0.17      1470
weighted avg       0.47      0.51      0.43      1470


📊 Evaluando df_v6.csv

🌳 Árbol de Decisión:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00       659
           1       0.96      1.00      0.98       659
           2       0.88      0.88      0.88       659
           3       0.85      0.75      0.80       660
           4       0.88      0.90      0.89       660
           5       0.95      1.00      0.97       659
           6       1.00      1.00      1.00       660

    accuracy                           0.93      4616
   macro avg       0.93      0.93      0.93      4616
weighted avg       0.93      0.93      0.93      4616


👥 KNN:
              precision    recall  f1-score   support

           0       0.99      1.00      1.00       659
           1       0.87      0.99      0.93       659
           2       0.73      0.72      0.73       659
           3       0.64      0.46      0.54       660
           4       0.75      0.78      0.76       660
           5       0.88      0.99      0.94       659
           6       1.00      1.00      1.00       660

    accuracy                           0.85      4616
   macro avg       0.84      0.85      0.84      4616
weighted avg       0.84      0.85      0.84      4616


💠 SVM:
              precision    recall  f1-score   support

           0       0.97      1.00      0.99       659
           1       0.76      0.86      0.81       659
           2       0.62      0.59      0.60       659
           3       0.46      0.36      0.40       660
           4       0.59      0.56      0.58       660
           5       0.74      0.84      0.78       659
           6       1.00      1.00      1.00       660

    accuracy                           0.74      4616
   macro avg       0.73      0.74      0.74      4616
weighted avg       0.73      0.74      0.74      4616


🧠 Red Neuronal (PyTorch):
              precision    recall  f1-score   support

           0       0.41      0.50      0.45       659
           1       0.38      0.52      0.44       659
           2       0.37      0.45      0.40       659
           3       0.26      0.08      0.12       660
           4       0.20      0.05      0.09       660
           5       0.38      0.31      0.34       659
           6       0.42      0.76      0.55       660

    accuracy                           0.38      4616
   macro avg       0.35      0.38      0.34      4616
weighted avg       0.35      0.38      0.34      4616


📊 Evaluando df_v7.csv

🌳 Árbol de Decisión:
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         2
           1       0.29      0.29      0.29        24
           2       0.63      0.62      0.62       279
           3       0.64      0.64      0.64       473
           4       0.51      0.55      0.53       195
           5       0.46      0.32      0.38        37
           6       0.00      0.00      0.00         1

    accuracy                           0.60      1011
   macro avg       0.36      0.35      0.35      1011
weighted avg       0.60      0.60      0.59      1011


👥 KNN:
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         2
           1       0.00      0.00      0.00        24
           2       0.58      0.62      0.60       279
           3       0.61      0.71      0.65       473
           4       0.51      0.37      0.43       195
           5       0.33      0.14      0.19        37
           6       0.00      0.00      0.00         1

    accuracy                           0.58      1011
   macro avg       0.29      0.26      0.27      1011
weighted avg       0.55      0.58      0.56      1011


💠 SVM:
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         2
           1       0.00      0.00      0.00        24
           2       0.64      0.60      0.62       279
           3       0.56      0.82      0.67       473
           4       0.63      0.21      0.32       195
           5       0.00      0.00      0.00        37
           6       0.00      0.00      0.00         1

    accuracy                           0.59      1011
   macro avg       0.26      0.23      0.23      1011
weighted avg       0.56      0.59      0.54      1011


🧠 Red Neuronal (PyTorch):
              precision    recall  f1-score   support

           0       0.00      0.00      0.00         2
           1       0.00      0.00      0.00        24
           2       0.79      0.15      0.25       279
           3       0.48      0.97      0.65       473
           4       0.25      0.01      0.01       195
           5       0.00      0.00      0.00        37
           6       0.00      0.00      0.00         1

    accuracy                           0.50      1011
   macro avg       0.22      0.16      0.13      1011
weighted avg       0.49      0.50      0.37      1011


📊 Evaluando df_v8.csv

🌳 Árbol de Decisión:
              precision    recall  f1-score   support

           0       0.99      1.00      1.00       473
           1       0.96      1.00      0.98       473
           2       0.90      0.90      0.90       473
           3       0.88      0.78      0.83       473
           4       0.89      0.93      0.91       473
           5       0.98      1.00      0.99       473
           6       1.00      1.00      1.00       474

    accuracy                           0.94      3312
   macro avg       0.94      0.94      0.94      3312
weighted avg       0.94      0.94      0.94      3312


👥 KNN:
              precision    recall  f1-score   support

           0       0.99      1.00      1.00       473
           1       0.86      1.00      0.93       473
           2       0.76      0.74      0.75       473
           3       0.68      0.54      0.60       473
           4       0.78      0.73      0.75       473
           5       0.89      1.00      0.94       473
           6       1.00      1.00      1.00       474

    accuracy                           0.86      3312
   macro avg       0.85      0.86      0.85      3312
weighted avg       0.85      0.86      0.85      3312


💠 SVM:
              precision    recall  f1-score   support

           0       0.99      1.00      0.99       473
           1       0.83      0.97      0.89       473
           2       0.74      0.64      0.69       473
           3       0.60      0.46      0.52       473
           4       0.64      0.68      0.66       473
           5       0.79      0.89      0.83       473
           6       1.00      1.00      1.00       474

    accuracy                           0.81      3312
   macro avg       0.80      0.81      0.80      3312
weighted avg       0.80      0.81      0.80      3312


🧠 Red Neuronal (PyTorch):
              precision    recall  f1-score   support

           0       0.53      0.58      0.55       473
           1       0.39      0.64      0.48       473
           2       0.41      0.27      0.32       473
           3       0.25      0.14      0.18       473
           4       0.22      0.02      0.04       473
           5       0.35      0.32      0.33       473
           6       0.50      1.00      0.67       474

    accuracy                           0.42      3312
   macro avg       0.38      0.42      0.37      3312
weighted avg       0.38      0.42      0.37      3312]"""

# Patrones para identificar bloques de evaluación
eval_blocks = re.split(r"\n\n(?=\ud83d\udcca Evaluando df_v\d\.csv)", texto_resultados)

# Lista para guardar los resultados
resultados = []

for bloque in eval_blocks:
    match_version = re.search(r"Evaluando (df_v\d\.csv)", bloque)
    version = match_version.group(1) if match_version else ""

    for modelo in ["\U0001f333.*?\n", "\U0001f46e.*?\n", "\ud83d\udd20.*?\n", "\U0001f9e0.*?\n"]:
        modelo_match = re.search(modelo, bloque)
        if not modelo_match:
            continue

        nombre_modelo = modelo_match.group().strip().split(' ')[-1]

        # Extraer métricas globales (accuracy, macro avg, weighted avg)
        metrics = {
            'accuracy': None,
            'precision': None,
            'recall': None,
            'f1-score': None
        }

        acc_match = re.search(r"accuracy\s+(\d\.\d+)", bloque)
        if acc_match:
            metrics['accuracy'] = float(acc_match.group(1))

        macro_match = re.search(r"macro avg\s+(\d\.\d+)\s+(\d\.\d+)\s+(\d\.\d+)", bloque)
        if macro_match:
            metrics['precision'] = float(macro_match.group(1))
            metrics['recall'] = float(macro_match.group(2))
            metrics['f1-score'] = float(macro_match.group(3))

        resultados.append({
            'dataset': version,
            'modelo': nombre_modelo,
            **metrics
        })

# Crear DataFrame
df_resultados = pd.DataFrame(resultados)
df_resultados.to_csv("metricas_completas.csv", index=False)
print(df_resultados)


     dataset      modelo  accuracy  precision  recall  f1-score
0  df_v1.csv   Decisión:      0.59       0.36    0.36      0.36
1  df_v1.csv  (PyTorch):      0.59       0.36    0.36      0.36
