# Parte I. 

Programa y valida el Clasificador de la Distancia Mínima, valídalo con 3 datasets (Iris, Wine y Digits) y los siguientes métodos de validación. 

Hold-Out 70/30 estratificado
10-Fold Cross-Validation estratificado
Leave-One-Out.

Cargamos los datasets utilizando Scikit-learn.
Implementamos el Clasificador de la Distancia Mínima.
Validamos los resultados usando los tres métodos de validación: Hold-Out (70/30 estratificado), 10-Fold Cross-Validation estratificado, y Leave-One-Out.
Calculamos las métricas de desempeño (Accuracy y Matriz de Confusión).

In [1]:
# Importamos librerías necesarias
from sklearn.datasets import load_iris, load_wine, load_digits
from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold, LeaveOneOut
from sklearn.metrics import accuracy_score, confusion_matrix
import numpy as np

# Clasificador de la Distancia Mínima
class MinimumDistanceClassifier:
    def fit(self, X, y):
        self.classes_ = np.unique(y)
        self.centroids_ = {cls: X[y == cls].mean(axis=0) for cls in self.classes_}

    def predict(self, X):
        predictions = []
        for x in X:
            distances = {cls: np.linalg.norm(x - centroid) for cls, centroid in self.centroids_.items()}
            predictions.append(min(distances, key=distances.get))
        return np.array(predictions)

# Cargar datasets
datasets = {
    "Iris": load_iris(),
    "Wine": load_wine(),
    "Digits": load_digits()
}

# Validación y Métricas
results = {}

for name, data in datasets.items():
    X, y = data.data, data.target
    print(f"\nDataset: {name}")
    
    # Hold-Out 70/30 estratificado
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=42)
    clf = MinimumDistanceClassifier()
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    cm = confusion_matrix(y_test, y_pred)
    print(f"Hold-Out Accuracy: {acc}")
    print(f"Hold-Out Confusion Matrix:\n{cm}")
    
    # 10-Fold Cross-Validation estratificado
    skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)
    cv_scores = []
    for train_idx, test_idx in skf.split(X, y):
        X_train, X_test = X[train_idx], X[test_idx]
        y_train, y_test = y[train_idx], y[test_idx]
        clf.fit(X_train, y_train)
        y_pred = clf.predict(X_test)
        cv_scores.append(accuracy_score(y_test, y_pred))
    print(f"10-Fold CV Accuracy: {np.mean(cv_scores)}")
    
    # Leave-One-Out
    loo = LeaveOneOut()
    loo_scores = []
    for train_idx, test_idx in loo.split(X):
        X_train, X_test = X[train_idx], X[test_idx]
        y_train, y_test = y[train_idx], y[test_idx]
        clf.fit(X_train, y_train)
        y_pred = clf.predict(X_test)
        loo_scores.append(accuracy_score(y_test, y_pred))
    print(f"Leave-One-Out Accuracy: {np.mean(loo_scores)}")

    # Guardar resultados
    results[name] = {
        "Hold-Out Accuracy": acc,
        "10-Fold CV Accuracy": np.mean(cv_scores),
        "Leave-One-Out Accuracy": np.mean(loo_scores),
        "Confusion Matrix (Hold-Out)": cm
    }

# Mostrar resultados generales
print("\nResultados Generales:")
for dataset, metrics in results.items():
    print(f"\nDataset: {dataset}")
    for metric, value in metrics.items():
        print(f"{metric}: {value}")



Dataset: Iris
Hold-Out Accuracy: 0.9111111111111111
Hold-Out Confusion Matrix:
[[15  0  0]
 [ 0 14  1]
 [ 0  3 12]]
10-Fold CV Accuracy: 0.9200000000000002
Leave-One-Out Accuracy: 0.92

Dataset: Wine
Hold-Out Accuracy: 0.7222222222222222
Hold-Out Confusion Matrix:
[[15  0  3]
 [ 0 14  7]
 [ 0  5 10]]
10-Fold CV Accuracy: 0.7245098039215687
Leave-One-Out Accuracy: 0.7247191011235955

Dataset: Digits
Hold-Out Accuracy: 0.8722222222222222
Hold-Out Confusion Matrix:
[[52  0  0  0  2  0  0  0  0  0]
 [ 0 36  7  0  0  1  2  0  3  6]
 [ 1  3 47  0  0  0  0  1  1  0]
 [ 0  1  0 51  0  1  0  2  0  0]
 [ 0  2  0  0 49  0  0  2  1  0]
 [ 0  0  0  0  0 45  0  0  0 10]
 [ 0  1  0  0  0  0 52  0  1  0]
 [ 0  0  0  0  0  0  0 54  0  0]
 [ 0 10  1  0  0  1  0  2 38  0]
 [ 0  0  0  0  3  0  0  3  1 47]]
10-Fold CV Accuracy: 0.9009404096834267
Leave-One-Out Accuracy: 0.9020589872008904

Resultados Generales:

Dataset: Iris
Hold-Out Accuracy: 0.9111111111111111
10-Fold CV Accuracy: 0.9200000000000002
Le

# Parte II. 

Programa y valida el Clasificador 1NN, valídalo con 3 datasets y los siguientes métodos de validación. 

Hold-Out 70/30 estratificado
10-Fold Cross-Validation estratificado
Leave-One-Out.


Pasos que realizare a continuacion:

Implementar el clasificador 1NN utilizando Scikit-learn (KNeighborsClassifier con n_neighbors=1).

Usar los datasets Iris, Wine y Digits.

Validar usando:
Hold-Out 70/30 estratificado.
10-Fold Cross-Validation estratificado.
Leave-One-Out.

Calcular las métricas: Accuracy y Matriz de Confusión.

In [2]:
# Importamos librerías necesarias
from sklearn.datasets import load_iris, load_wine, load_digits
from sklearn.model_selection import train_test_split, StratifiedKFold, LeaveOneOut
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.neighbors import KNeighborsClassifier
import numpy as np

# Cargar datasets
datasets = {
    "Iris": load_iris(),
    "Wine": load_wine(),
    "Digits": load_digits()
}

# Validación y Métricas
results = {}

for name, data in datasets.items():
    X, y = data.data, data.target
    print(f"\nDataset: {name}")
    
    # Clasificador 1NN
    knn = KNeighborsClassifier(n_neighbors=1)
    
    # Hold-Out 70/30 estratificado
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=42)
    knn.fit(X_train, y_train)
    y_pred = knn.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    cm = confusion_matrix(y_test, y_pred)
    print(f"Hold-Out Accuracy: {acc}")
    print(f"Hold-Out Confusion Matrix:\n{cm}")
    
    # 10-Fold Cross-Validation estratificado
    skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)
    cv_scores = []
    for train_idx, test_idx in skf.split(X, y):
        X_train, X_test = X[train_idx], X[test_idx]
        y_train, y_test = y[train_idx], y[test_idx]
        knn.fit(X_train, y_train)
        y_pred = knn.predict(X_test)
        cv_scores.append(accuracy_score(y_test, y_pred))
    print(f"10-Fold CV Accuracy: {np.mean(cv_scores)}")
    
    # Leave-One-Out
    loo = LeaveOneOut()
    loo_scores = []
    for train_idx, test_idx in loo.split(X):
        X_train, X_test = X[train_idx], X[test_idx]
        y_train, y_test = y[train_idx], y[test_idx]
        knn.fit(X_train, y_train)
        y_pred = knn.predict(X_test)
        loo_scores.append(accuracy_score(y_test, y_pred))
    print(f"Leave-One-Out Accuracy: {np.mean(loo_scores)}")

    # Guardar resultados
    results[name] = {
        "Hold-Out Accuracy": acc,
        "10-Fold CV Accuracy": np.mean(cv_scores),
        "Leave-One-Out Accuracy": np.mean(loo_scores),
        "Confusion Matrix (Hold-Out)": cm
    }

# Mostrar resultados generales
print("\nResultados Generales:")
for dataset, metrics in results.items():
    print(f"\nDataset: {dataset}")
    for metric, value in metrics.items():
        print(f"{metric}: {value}")



Dataset: Iris
Hold-Out Accuracy: 0.9333333333333333
Hold-Out Confusion Matrix:
[[15  0  0]
 [ 0 15  0]
 [ 0  3 12]]
10-Fold CV Accuracy: 0.9600000000000002
Leave-One-Out Accuracy: 0.96

Dataset: Wine
Hold-Out Accuracy: 0.7037037037037037
Hold-Out Confusion Matrix:
[[14  3  1]
 [ 1 15  5]
 [ 1  5  9]]
10-Fold CV Accuracy: 0.7300653594771241
Leave-One-Out Accuracy: 0.7696629213483146

Dataset: Digits
Hold-Out Accuracy: 0.987037037037037
Hold-Out Confusion Matrix:
[[54  0  0  0  0  0  0  0  0  0]
 [ 0 55  0  0  0  0  0  0  0  0]
 [ 0  0 53  0  0  0  0  0  0  0]
 [ 0  0  0 55  0  0  0  0  0  0]
 [ 0  0  0  0 54  0  0  0  0  0]
 [ 0  0  0  0  0 54  0  0  0  1]
 [ 0  0  0  0  0  0 54  0  0  0]
 [ 0  0  0  0  0  0  0 54  0  0]
 [ 0  3  0  1  0  0  0  0 48  0]
 [ 0  0  0  0  1  0  0  0  1 52]]
10-Fold CV Accuracy: 0.9894227188081937
Leave-One-Out Accuracy: 0.988313856427379

Resultados Generales:

Dataset: Iris
Hold-Out Accuracy: 0.9333333333333333
10-Fold CV Accuracy: 0.9600000000000002
Leav