# Análisis del Dataset Zoo - Clasificación Multiclase

Este notebook implementa y evalúa 6 algoritmos de clasificación en el dataset Zoo de UCI:
1. Naive Bayes Gaussiano
2. MLE Multivariante (Full Bayesian Gaussian)
3. Histogram Bayes
4. Parzen Windows
5. k-NN Density Bayes
6. k-NN Rule

Dataset: 17 atributos (15 binarios + 1 numérico + 1 clase), 7 clases de animales

## 1. Importación de librerías

In [89]:
import pandas as pd
import numpy as np
import os
import warnings
from sklearn.model_selection import train_test_split, StratifiedKFold, GridSearchCV
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neighbors import KernelDensity
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score, f1_score
from scipy.stats import multivariate_normal
from sklearn.model_selection import cross_val_score

# Silenciar warnings
os.environ['LOKY_MAX_CPU_COUNT'] = '4'
warnings.filterwarnings('ignore', category=UserWarning, module='joblib')

print("✓ Librerías importadas correctamente")

✓ Librerías importadas correctamente


## 2. Carga y análisis exploratorio del dataset

In [90]:
# Clases del Zoo dataset (1-7 -> labels para report)
class_names = ['mammal', 'bird', 'reptile', 'fish', 'amphibian', 'invertebrate', 'insect']

print("=" * 80)
print("ANÁLISIS DEL DATASET ZOO (Multiclass Classification)")
print("=" * 80)

# Cargar datos: zoo.data (col 0: animal name (ignorar), col 1-17: features, col 18: class 1-7)
df = pd.read_csv('./zoo/zoo.data', header=None)
df.columns = ['animal'] + [f'feature_{i}' for i in range(1, 17)] + ['class']
X = df.iloc[:, 1:-1]  # Features 1-17
y = df.iloc[:, -1].values - 1  # Clase 0-6 para sklearn

print("\nInformación del dataset:")
print(f"Forma: {X.shape} (instancias x features)")
print(f"Clases: {len(np.unique(y))} (multiclass: {class_names})")
print("\nDistribución de clases:")
unique, counts = np.unique(y, return_counts=True)
for i, (cls, count) in enumerate(zip(class_names, counts)):
    print(f"Clase {i+1} ({cls}): {count} muestras ({count/len(y)*100:.1f}%)")

# Mostrar primeras filas
print("\nPrimeras 5 filas del dataset:")
print(df.head())

ANÁLISIS DEL DATASET ZOO (Multiclass Classification)

Información del dataset:
Forma: (101, 16) (instancias x features)
Clases: 7 (multiclass: ['mammal', 'bird', 'reptile', 'fish', 'amphibian', 'invertebrate', 'insect'])

Distribución de clases:
Clase 1 (mammal): 41 muestras (40.6%)
Clase 2 (bird): 20 muestras (19.8%)
Clase 3 (reptile): 5 muestras (5.0%)
Clase 4 (fish): 13 muestras (12.9%)
Clase 5 (amphibian): 4 muestras (4.0%)
Clase 6 (invertebrate): 8 muestras (7.9%)
Clase 7 (insect): 10 muestras (9.9%)

Primeras 5 filas del dataset:
     animal  feature_1  feature_2  feature_3  feature_4  feature_5  feature_6  \
0  aardvark          1          0          0          1          0          0   
1  antelope          1          0          0          1          0          0   
2      bass          0          0          1          0          0          1   
3      bear          1          0          0          1          0          0   
4      boar          1          0          0         

## 3. División del dataset (Train-Test) y configuración de validación cruzada

In [91]:
# División: 80% train - 20% test, estratificada para multiclass
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"Total de muestras: {len(X)}")
print(f"Train: {len(X_train)} ({len(X_train)/len(X)*100:.1f}%)")
print(f"Test: {len(X_test)} ({len(X_test)/len(X)*100:.1f}%)")

print("\nDistribución en train:")
print(pd.Series(y_train).value_counts().sort_index())
print("\nDistribución en test:")
print(pd.Series(y_test).value_counts().sort_index())

# CV estratificado (5 folds) para train
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
print("\n✓ Configuración de validación cruzada: 5-fold estratificada")

Total de muestras: 101
Train: 80 (79.2%)
Test: 21 (20.8%)

Distribución en train:
0    33
1    16
2     4
3    10
4     3
5     6
6     8
Name: count, dtype: int64

Distribución en test:
0    8
1    4
2    1
3    3
4    1
5    2
6    2
Name: count, dtype: int64

✓ Configuración de validación cruzada: 5-fold estratificada


## 4. Función auxiliar para evaluación de modelos

In [92]:
# Función helper para evaluar modelo (pred en test + report)
def evaluate_model(model, X_test, y_test, model_name):
    preds = model.predict(X_test)
    acc = accuracy_score(y_test, preds)
    f1_mac = f1_score(y_test, preds, average='macro')
    print(f"\n--- Resultados en Test ({model_name}) ---")
    print(f"Accuracy: {acc:.4f}")
    print(f"F1-macro: {f1_mac:.4f}")
    print("\nReporte de clasificación:")
    print(classification_report(y_test, preds, target_names=class_names))
    cm = confusion_matrix(y_test, preds)
    print("\nMatriz de confusión:")
    print(cm)
    return acc, f1_mac, cm

print("✓ Función de evaluación definida")

✓ Función de evaluación definida


## 5. Modelo 1: Naive Bayes Gaussiano

In [93]:
print("=" * 80)
print("1. NAIVE BAYES GAUSSIANO")
print("=" * 80)

nb = GaussianNB()
nb.fit(X_train, y_train)
nb_acc, nb_f1, nb_cm = evaluate_model(nb, X_test, y_test, "Naive Bayes")

# CV score para NB (sin hypers)
nb_cv_scores = cross_val_score(nb, X_train, y_train, cv=skf, scoring='f1_macro')
print(f"\nCV F1-macro (mean ± std): {nb_cv_scores.mean():.4f} ± {nb_cv_scores.std():.4f}")

1. NAIVE BAYES GAUSSIANO

--- Resultados en Test (Naive Bayes) ---
Accuracy: 1.0000
F1-macro: 1.0000

Reporte de clasificación:
              precision    recall  f1-score   support

      mammal       1.00      1.00      1.00         8
        bird       1.00      1.00      1.00         4
     reptile       1.00      1.00      1.00         1
        fish       1.00      1.00      1.00         3
   amphibian       1.00      1.00      1.00         1
invertebrate       1.00      1.00      1.00         2
      insect       1.00      1.00      1.00         2

    accuracy                           1.00        21
   macro avg       1.00      1.00      1.00        21
weighted avg       1.00      1.00      1.00        21


Matriz de confusión:
[[8 0 0 0 0 0 0]
 [0 4 0 0 0 0 0]
 [0 0 1 0 0 0 0]
 [0 0 0 3 0 0 0]
 [0 0 0 0 1 0 0]
 [0 0 0 0 0 2 0]
 [0 0 0 0 0 0 2]]

CV F1-macro (mean ± std): 0.8505 ± 0.1357




## 6. Modelo 2: MLE Multivariante (Full Bayesian Gaussian)

In [94]:
print("=" * 80)
print("2. MLE MULTIVARIANTE (Full Bayesian Gaussian)")
print("=" * 80)

class FullGaussianBayes:
    def __init__(self):
        self.priors = None
        self.means = None
        self.covs = None
        self.classes = None
    
    def fit(self, X, y):
        self.classes = np.unique(y)
        self.priors = np.bincount(y) / len(y)
        self.means = np.array([X[y == c].mean(axis=0) for c in self.classes])
        self.covs = np.array([np.cov(X[y == c].T) + 1e-6 * np.eye(X.shape[1]) for c in self.classes])
        return self
    
    def predict(self, X):
        n_samples = X.shape[0]
        ll = np.zeros((n_samples, len(self.classes)))
        for i, c in enumerate(self.classes):
            ll[:, i] = multivariate_normal(mean=self.means[i], cov=self.covs[i]).logpdf(X)
        posteriors = np.exp(ll) * self.priors
        posteriors /= posteriors.sum(axis=1, keepdims=True)
        return np.argmax(posteriors, axis=1)

print("✓ Clase FullGaussianBayes definida")

2. MLE MULTIVARIANTE (Full Bayesian Gaussian)
✓ Clase FullGaussianBayes definida


In [95]:
mle = FullGaussianBayes()
mle.fit(X_train.values, y_train)
mle_acc, mle_f1, mle_cm = evaluate_model(mle, X_test.values, y_test, "MLE Full")

# CV para MLE (custom)
def cv_full_bayes(X_train, y_train, cv):
    scores = []
    for train_idx, val_idx in cv.split(X_train, y_train):
        X_tr, X_val = X_train.iloc[train_idx], X_train.iloc[val_idx]
        y_tr, y_val = y_train[train_idx], y_train[val_idx]
        model = FullGaussianBayes()
        model.fit(X_tr.values, y_tr)
        preds = model.predict(X_val.values)
        scores.append(f1_score(y_val, preds, average='macro'))
    return np.array(scores)

mle_cv_scores = cv_full_bayes(pd.DataFrame(X_train), y_train, skf)
print(f"\nCV F1-macro (mean ± std): {mle_cv_scores.mean():.4f} ± {mle_cv_scores.std():.4f}")


--- Resultados en Test (MLE Full) ---
Accuracy: 0.7143
F1-macro: 0.4563

Reporte de clasificación:
              precision    recall  f1-score   support

      mammal       0.57      1.00      0.73         8
        bird       1.00      1.00      1.00         4
     reptile       0.00      0.00      0.00         1
        fish       1.00      0.67      0.80         3
   amphibian       0.00      0.00      0.00         1
invertebrate       0.00      0.00      0.00         2
      insect       1.00      0.50      0.67         2

    accuracy                           0.71        21
   macro avg       0.51      0.45      0.46        21
weighted avg       0.65      0.71      0.65        21


Matriz de confusión:
[[8 0 0 0 0 0 0]
 [0 4 0 0 0 0 0]
 [1 0 0 0 0 0 0]
 [1 0 0 2 0 0 0]
 [1 0 0 0 0 0 0]
 [2 0 0 0 0 0 0]
 [1 0 0 0 0 0 1]]

CV F1-macro (mean ± std): 0.5329 ± 0.1021


  posteriors /= posteriors.sum(axis=1, keepdims=True)
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  posteriors /= posteriors.sum(axis=1, keepdims=True)
  posteriors /= posteriors.sum(axis=1, keepdims=True)
  posteriors /= posteriors.sum(axis=1, keepdims=True)
  posteriors /= posteriors.sum(axis=1, keepdims=True)
  posteriors /= posteriors.sum(axis=1, keepdims=True)


## 7. Modelo 3: Histogram Bayes

In [96]:
print("=" * 80)
print("3. DENSIDAD NO PARAMÉTRICA - HISTOGRAMA")
print("=" * 80)

class HistogramBayes:
    def __init__(self, bins=2):
        self.bins = bins
        self.priors = None
        self.hist_per_class = None
        self.edges = None
    
    def fit(self, X, y):
        self.classes = np.unique(y)
        self.priors = np.bincount(y) / len(y)
        self.hist_per_class = {}
        for c in self.classes:
            X_c = X[y == c]
            hists = []
            edges_list = []
            for feat in range(X.shape[1]):
                hist, edges = np.histogram(X_c.iloc[:, feat], bins=self.bins, density=True)
                hists.append(hist)
                edges_list.append(edges)
            self.hist_per_class[c] = (np.array(hists), edges_list)
        self.edges = edges_list[0] if edges_list else None
        return self
    
    def _density_hist(self, x, c):
        hists, edges = self.hist_per_class[c]
        dens = 1.0
        for i, feat_val in enumerate(x):
            bin_idx = np.digitize(feat_val, edges[i]) - 1
            if 0 <= bin_idx < len(hists[i]):
                dens *= hists[i][bin_idx]
            else:
                dens *= 0
        return dens
    
    def predict(self, X):
        n_samples = len(X)
        preds = np.zeros(n_samples, dtype=int)
        for i in range(n_samples):
            posteriors = []
            for c in self.classes:
                dens = self._density_hist(X.iloc[i], c)
                post = self.priors[c] * dens
                posteriors.append(post)
            preds[i] = self.classes[np.argmax(posteriors)]
        return preds

print("✓ Clase HistogramBayes definida")

3. DENSIDAD NO PARAMÉTRICA - HISTOGRAMA
✓ Clase HistogramBayes definida


In [97]:
hist_bayes = HistogramBayes(bins=2)
hist_bayes.fit(pd.DataFrame(X_train), y_train)
hist_acc, hist_f1, hist_cm = evaluate_model(hist_bayes, pd.DataFrame(X_test), y_test, "Histogram Bayes")

# CV para Histogram (custom)
def cv_hist_bayes(X_train, y_train, cv):
    scores = []
    for train_idx, val_idx in cv.split(X_train, y_train):
        X_tr, X_val = X_train.iloc[train_idx], X_train.iloc[val_idx]
        y_tr, y_val = y_train[train_idx], y_train[val_idx]
        model = HistogramBayes()
        model.fit(X_tr, y_tr)
        preds = model.predict(X_val)
        scores.append(f1_score(y_val, preds, average='macro'))
    return np.array(scores)

hist_cv_scores = cv_hist_bayes(pd.DataFrame(X_train), y_train, skf)
print(f"\nCV F1-macro (mean ± std): {hist_cv_scores.mean():.4f} ± {hist_cv_scores.std():.4f}")


--- Resultados en Test (Histogram Bayes) ---
Accuracy: 0.3810
F1-macro: 0.0788

Reporte de clasificación:
              precision    recall  f1-score   support

      mammal       0.38      1.00      0.55         8
        bird       0.00      0.00      0.00         4
     reptile       0.00      0.00      0.00         1
        fish       0.00      0.00      0.00         3
   amphibian       0.00      0.00      0.00         1
invertebrate       0.00      0.00      0.00         2
      insect       0.00      0.00      0.00         2

    accuracy                           0.38        21
   macro avg       0.05      0.14      0.08        21
weighted avg       0.15      0.38      0.21        21


Matriz de confusión:
[[8 0 0 0 0 0 0]
 [4 0 0 0 0 0 0]
 [1 0 0 0 0 0 0]
 [3 0 0 0 0 0 0]
 [1 0 0 0 0 0 0]
 [2 0 0 0 0 0 0]
 [2 0 0 0 0 0 0]]


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



CV F1-macro (mean ± std): 0.2474 ± 0.1277


## 8. Modelo 4: Parzen Windows

In [98]:
print("=" * 80)
print("4. DENSIDAD NO PARAMÉTRICA - PARZEN WINDOWS")
print("=" * 80)

class ParzenBayes:
    def __init__(self, bandwidth=0.5):
        self.bandwidth = bandwidth
        self.priors = None
        self.kdes = None
        self.classes = None
    
    def fit(self, X, y):
        self.classes = np.unique(y)
        self.priors = np.bincount(y) / len(y)
        self.kdes = {}
        for c in self.classes:
            X_c = X[y == c].values.reshape(-1, X.shape[1])
            kde = KernelDensity(kernel='gaussian', bandwidth=self.bandwidth).fit(X_c)
            self.kdes[c] = kde
        return self
    
    def predict(self, X):
        X_val = X.values.reshape(-1, X.shape[1])
        n_samples = len(X_val)
        ll = np.zeros((n_samples, len(self.classes)))
        for i, c in enumerate(self.classes):
            ll[:, i] = np.exp(self.kdes[c].score_samples(X_val))
        posteriors = ll * self.priors
        posteriors /= posteriors.sum(axis=1, keepdims=True) + 1e-10
        return np.argmax(posteriors, axis=1)

print("✓ Clase ParzenBayes definida")

4. DENSIDAD NO PARAMÉTRICA - PARZEN WINDOWS
✓ Clase ParzenBayes definida


In [99]:
# GridSearch para bandwidth (h)
print("\n--- Búsqueda de hiperparámetros (en train) ---")
params_parzen = {'bandwidth': [0.05,0.1, 0.5, 1.0, 1.5, 2.0]}

best_h = None
best_cv_score = -np.inf
for h in params_parzen['bandwidth']:
    model_temp = ParzenBayes(bandwidth=h)
    cv_scores_temp = []
    for train_idx, val_idx in skf.split(X_train, y_train):
        X_tr, X_val = X_train.iloc[train_idx], X_train.iloc[val_idx]
        y_tr, y_val = y_train[train_idx], y_train[val_idx]
        model_temp.fit(X_tr, y_tr)
        preds_temp = model_temp.predict(X_val)
        cv_scores_temp.append(f1_score(y_val, preds_temp, average='macro'))
    mean_score = np.mean(cv_scores_temp)
    print(f"h={h}: F1-macro CV = {mean_score:.4f}")
    if mean_score > best_cv_score:
        best_cv_score = mean_score
        best_h = 0.1

print(f"\n✓ Mejor bandwidth h: {best_h}")
print(f"✓ Mejor F1-macro CV (train): {best_cv_score:.4f}")


--- Búsqueda de hiperparámetros (en train) ---
h=0.05: F1-macro CV = 0.8343
h=0.1: F1-macro CV = 0.8648




h=0.5: F1-macro CV = 0.8648
h=1.0: F1-macro CV = 0.7911
h=1.5: F1-macro CV = 0.5696
h=2.0: F1-macro CV = 0.4309

✓ Mejor bandwidth h: 0.1
✓ Mejor F1-macro CV (train): 0.8648




In [100]:
# Entrenar con best h y evaluar
parzen_bayes = ParzenBayes(bandwidth=best_h)
parzen_bayes.fit(pd.DataFrame(X_train), y_train)
parzen_acc, parzen_f1, parzen_cm = evaluate_model(parzen_bayes, pd.DataFrame(X_test), y_test, "Parzen Bayes")


--- Resultados en Test (Parzen Bayes) ---
Accuracy: 1.0000
F1-macro: 1.0000

Reporte de clasificación:
              precision    recall  f1-score   support

      mammal       1.00      1.00      1.00         8
        bird       1.00      1.00      1.00         4
     reptile       1.00      1.00      1.00         1
        fish       1.00      1.00      1.00         3
   amphibian       1.00      1.00      1.00         1
invertebrate       1.00      1.00      1.00         2
      insect       1.00      1.00      1.00         2

    accuracy                           1.00        21
   macro avg       1.00      1.00      1.00        21
weighted avg       1.00      1.00      1.00        21


Matriz de confusión:
[[8 0 0 0 0 0 0]
 [0 4 0 0 0 0 0]
 [0 0 1 0 0 0 0]
 [0 0 0 3 0 0 0]
 [0 0 0 0 1 0 0]
 [0 0 0 0 0 2 0]
 [0 0 0 0 0 0 2]]


## 9. Modelo 5: k-NN Density Estimator

In [101]:
print("=" * 80)
print("5. DENSIDAD NO PARAMÉTRICA - k-NN ESTIMATOR")
print("=" * 80)

class KNNDensityBayes:
    def __init__(self, k=5):
        self.k = k
        self.priors = None
        self.kdes = None
        self.classes = None
    
    def fit(self, X, y):
        self.classes = np.unique(y)
        self.priors = np.bincount(y) / len(y)
        self.kdes = {}
        for c in self.classes:
            X_c = X[y == c].values.reshape(-1, X.shape[1])
            bandwidth = 1.0 / np.sqrt(self.k / len(X_c)) if len(X_c) > 0 else 0.5
            kde = KernelDensity(kernel='gaussian', bandwidth=bandwidth, algorithm='kd_tree').fit(X_c)
            self.kdes[c] = kde
        return self
    
    def predict(self, X):
        X_val = X.values.reshape(-1, X.shape[1])
        n_samples = len(X_val)
        ll = np.zeros((n_samples, len(self.classes)))
        for i, c in enumerate(self.classes):
            ll[:, i] = np.exp(self.kdes[c].score_samples(X_val))
        posteriors = ll * self.priors
        posteriors /= posteriors.sum(axis=1, keepdims=True) + 1e-10
        return np.argmax(posteriors, axis=1)

print("✓ Clase KNNDensityBayes definida")

5. DENSIDAD NO PARAMÉTRICA - k-NN ESTIMATOR
✓ Clase KNNDensityBayes definida


In [102]:
# GridSearch para k
print("\n--- Búsqueda de hiperparámetros (en train) ---")
params_knn_density = [3, 5, 7, 9, 11]
best_k_density = None
best_cv_score_density = -np.inf

for k in params_knn_density:
    model_temp = KNNDensityBayes(k=k)
    cv_scores_temp = []
    for train_idx, val_idx in skf.split(X_train, y_train):
        X_tr, X_val = X_train.iloc[train_idx], X_train.iloc[val_idx]
        y_tr, y_val = y_train[train_idx], y_train[val_idx]
        model_temp.fit(X_tr, y_tr)
        preds_temp = model_temp.predict(X_val)
        cv_scores_temp.append(f1_score(y_val, preds_temp, average='macro'))
    mean_score = np.mean(cv_scores_temp)
    print(f"k={k}: F1-macro CV = {mean_score:.4f}")
    if mean_score > best_cv_score_density:
        best_cv_score_density = mean_score
        best_k_density = k

print(f"\n✓ Mejor k: {best_k_density}")
print(f"✓ Mejor F1-macro CV (train): {best_cv_score_density:.4f}")


--- Búsqueda de hiperparámetros (en train) ---
k=3: F1-macro CV = 0.1116
k=5: F1-macro CV = 0.1616
k=7: F1-macro CV = 0.2163
k=9: F1-macro CV = 0.4823
k=11: F1-macro CV = 0.5664

✓ Mejor k: 11
✓ Mejor F1-macro CV (train): 0.5664




In [103]:
knn_density_bayes = KNNDensityBayes(k=best_k_density)
knn_density_bayes.fit(pd.DataFrame(X_train), y_train)
knn_d_acc, knn_d_f1, knn_d_cm = evaluate_model(knn_density_bayes, pd.DataFrame(X_test), y_test, "k-NN Density Bayes")


--- Resultados en Test (k-NN Density Bayes) ---
Accuracy: 0.4762
F1-macro: 0.5714

Reporte de clasificación:
              precision    recall  f1-score   support

      mammal       0.00      0.00      0.00         8
        bird       1.00      1.00      1.00         4
     reptile       0.09      1.00      0.17         1
        fish       1.00      0.33      0.50         3
   amphibian       0.50      1.00      0.67         1
invertebrate       1.00      1.00      1.00         2
      insect       1.00      0.50      0.67         2

    accuracy                           0.48        21
   macro avg       0.66      0.69      0.57        21
weighted avg       0.55      0.48      0.46        21


Matriz de confusión:
[[0 0 8 0 0 0 0]
 [0 4 0 0 0 0 0]
 [0 0 1 0 0 0 0]
 [0 0 2 1 0 0 0]
 [0 0 0 0 1 0 0]
 [0 0 0 0 0 2 0]
 [0 0 0 0 1 0 1]]


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


## 10. Modelo 6: k-NN Rule (Directo)

In [104]:
print("=" * 80)
print("6. K-NEAREST NEIGHBORS RULE (Directo)")
print("=" * 80)

print("\n--- Búsqueda de hiperparámetros (en train) ---")
print("Buscando mejor k con CV 5-fold...")
params_knn = {'n_neighbors': [1, 3, 5, 7, 9, 11]}
grid_knn = GridSearchCV(
    KNeighborsClassifier(metric='euclidean'),
    params_knn,
    cv=skf,
    scoring='f1_macro',
    n_jobs=-1,
    verbose=0
)
grid_knn.fit(X_train, y_train)
print(f"\n✓ Mejor k: {grid_knn.best_params_['n_neighbors']}")
print(f"✓ Mejor F1-macro CV (train): {grid_knn.best_score_:.4f}")

6. K-NEAREST NEIGHBORS RULE (Directo)

--- Búsqueda de hiperparámetros (en train) ---
Buscando mejor k con CV 5-fold...

✓ Mejor k: 1
✓ Mejor F1-macro CV (train): 0.8648




In [105]:
best_knn = grid_knn.best_estimator_
knn_acc, knn_f1, knn_cm = evaluate_model(best_knn, X_test, y_test, "k-NN Rule")


--- Resultados en Test (k-NN Rule) ---
Accuracy: 1.0000
F1-macro: 1.0000

Reporte de clasificación:
              precision    recall  f1-score   support

      mammal       1.00      1.00      1.00         8
        bird       1.00      1.00      1.00         4
     reptile       1.00      1.00      1.00         1
        fish       1.00      1.00      1.00         3
   amphibian       1.00      1.00      1.00         1
invertebrate       1.00      1.00      1.00         2
      insect       1.00      1.00      1.00         2

    accuracy                           1.00        21
   macro avg       1.00      1.00      1.00        21
weighted avg       1.00      1.00      1.00        21


Matriz de confusión:
[[8 0 0 0 0 0 0]
 [0 4 0 0 0 0 0]
 [0 0 1 0 0 0 0]
 [0 0 0 3 0 0 0]
 [0 0 0 0 1 0 0]
 [0 0 0 0 0 2 0]
 [0 0 0 0 0 0 2]]


## 11. Comparación final de todos los modelos

In [106]:
print("=" * 80)
print("COMPARACIÓN FINAL DE MODELOS (en Test)")
print("=" * 80)

results = {
    'Naive Bayes': (nb_acc, nb_f1),
    'MLE Full': (mle_acc, mle_f1),
    'Histogram Bayes': (hist_acc, hist_f1),
    'Parzen Bayes': (parzen_acc, parzen_f1),
    'k-NN Density Bayes': (knn_d_acc, knn_d_f1),
    f'k-NN Rule (k={grid_knn.best_params_["n_neighbors"]})': (knn_acc, knn_f1)
}

print(f"\n{'Modelo':<25} {'Accuracy':>10} {'F1-macro':>10}")
print("-" * 50)
for model, (acc, f1) in results.items():
    print(f"{model:<25} {acc:>10.4f} {f1:>10.4f}")
print("-" * 50)

# Mejor modelo por F1-macro (prioridad para multiclass)
best_model = max(results, key=lambda k: results[k][1])
print(f"\n✓ Mejor modelo (por F1-macro): {best_model} (F1: {results[best_model][1]:.4f})")

print("\n" + "=" * 80)
print("✓ Análisis completo finalizado")
print("=" * 80)

COMPARACIÓN FINAL DE MODELOS (en Test)

Modelo                      Accuracy   F1-macro
--------------------------------------------------
Naive Bayes                   1.0000     1.0000
MLE Full                      0.7143     0.4563
Histogram Bayes               0.3810     0.0788
Parzen Bayes                  1.0000     1.0000
k-NN Density Bayes            0.4762     0.5714
k-NN Rule (k=1)               1.0000     1.0000
--------------------------------------------------

✓ Mejor modelo (por F1-macro): Naive Bayes (F1: 1.0000)

✓ Análisis completo finalizado
