Find two datasets with labels. Remember to split the dataset into training/test subsets.
1. Run k-nn on these two datasets.
2. Calculate the classification error, precision, recall, and f1-score (by comparing the class labels obtained with the prediction and the original labels of the test data).
3. Vary the value of k, and comment on the results.
4. Try to normalize the input dataset. Is the performance better?
5. Repeat these questions for SVM.

In [13]:
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.svm import SVC



In [10]:
def evaluate_kun(X_train, X_test, y_train, y_test, k_values):
    results = []
    for k in k_values:
        model = KNeighborsClassifier(n_neighbors=k)
        model.fit(X_train, y_train)
        y_pred = model.predict(X_test)

        error = 1 - accuracy_score(y_test, y_pred)
        results.append({
            'k': k,
            'error': error,
            'precision': precision_score(y_test, y_pred, average='macro'),
            'recall': recall_score(y_test, y_pred, average='macro'),
            'f1': f1_score(y_test, y_pred, average='macro')
        })
    return pd.DataFrame(results)


In [2]:
iris = datasets.load_iris()  ## Dữ liệu hoa iris
wine = datasets.load_wine()  ## Dữ liệu rượu vang

In [3]:
X_iris, y_iris = iris.data, iris.target
X_wine, y_wine = wine.data, wine.target

In [4]:
from sklearn.model_selection import train_test_split


In [5]:
X_iris_train, X_iris_test, y_iris_train, y_iris_test = train_test_split(X_iris, y_iris, test_size=0.2, random_state=42)
X_wine_train, X_wine_test, y_wine_train, y_wine_test = train_test_split(X_wine, y_wine, test_size=0.2, random_state=42)

In [11]:
## IRIS
k_values = [1, 3, 5, 7, 9]
knn_results_iris = evaluate_kun(X_iris_train, X_iris_test, y_iris_train, y_iris_test, k_values)
print("K-NN Results on Iris Dataset:")
print(knn_results_iris)

# Chuẩn hóa dữ liệu
scaler_iris = StandardScaler()
X_iris_train_scaled = scaler_iris.fit_transform(X_iris_train)
X_iris_test_scaled = scaler_iris.transform(X_iris_test)

knn_results_iris_scaled = evaluate_kun(X_iris_train_scaled, X_iris_test_scaled, y_iris_train, y_iris_test, k_values)
print("\nK-NN Results on Scaled Iris Dataset:")
print(knn_results_iris_scaled)

K-NN Results on Iris Dataset:
   k     error  precision    recall        f1
0  1  0.000000   1.000000  1.000000  1.000000
1  3  0.000000   1.000000  1.000000  1.000000
2  5  0.000000   1.000000  1.000000  1.000000
3  7  0.033333   0.972222  0.962963  0.965899
4  9  0.000000   1.000000  1.000000  1.000000

K-NN Results on Scaled Iris Dataset:
   k  error  precision  recall   f1
0  1    0.0        1.0     1.0  1.0
1  3    0.0        1.0     1.0  1.0
2  5    0.0        1.0     1.0  1.0
3  7    0.0        1.0     1.0  1.0
4  9    0.0        1.0     1.0  1.0


In [12]:
## Wine
knn_results_wine = evaluate_kun(X_wine_train, X_wine_test, y_wine_train, y_wine_test, k_values)
print("\nK-NN Results on Wine Dataset:")
print(knn_results_wine)
# Chuẩn hóa dữ liệu
scaler_wine = StandardScaler()
X_wine_train_scaled = scaler_wine.fit_transform(X_wine_train)
X_wine_test_scaled = scaler_wine.transform(X_wine_test)
knn_results_wine_scaled = evaluate_kun(X_wine_train_scaled, X_wine_test_scaled, y_wine_train, y_wine_test, k_values)
print("\nK-NN Results on Scaled Wine Dataset:")
print(knn_results_wine_scaled)


K-NN Results on Wine Dataset:
   k     error  precision    recall        f1
0  1  0.222222   0.770147  0.755952  0.760494
1  3  0.194444   0.791270  0.797619  0.789988
2  5  0.277778   0.672619  0.672619  0.672619
3  7  0.305556   0.650000  0.648810  0.647267
4  9  0.277778   0.724603  0.708333  0.702381

K-NN Results on Scaled Wine Dataset:
   k     error  precision    recall        f1
0  1  0.055556   0.940741  0.952381  0.943257
1  3  0.055556   0.940741  0.952381  0.943257
2  5  0.055556   0.940741  0.952381  0.943257
3  7  0.055556   0.940741  0.952381  0.943257
4  9  0.055556   0.940741  0.952381  0.943257


In [14]:
def evaluate_svm(X_train, X_test, y_train, y_test, kernel_types):
    results = []
    for kernel in kernel_types:
        model = SVC(kernel=kernel)
        model.fit(X_train, y_train)
        y_pred = model.predict(X_test)
        error = 1 - accuracy_score(y_test, y_pred)
        results.append({
            'kernel': kernel,
            'error': error,
            'precision': precision_score(y_test, y_pred, average='macro'),
            'recall': recall_score(y_test, y_pred, average='macro'),
            'f1': f1_score(y_test, y_pred, average='macro')
        })
    return pd.DataFrame(results)

kernels = ['linear', 'rbf', 'poly']
svm_results_iris = evaluate_svm(X_iris_train_scaled, X_iris_test_scaled, y_iris_train, y_iris_test, kernels)
svm_results_wine = evaluate_svm(X_wine_train_scaled, X_wine_test_scaled, y_wine_train, y_wine_test, kernels)

print("\nSVM results on IRIS (normalized):")
print(svm_results_iris)

print("\nSVM results on WINE (normalized):")
print(svm_results_wine)



SVM results on IRIS (normalized):
   kernel     error  precision    recall        f1
0  linear  0.033333   0.972222  0.962963  0.965899
1     rbf  0.000000   1.000000  1.000000  1.000000
2    poly  0.033333   0.966667  0.969697  0.966583

SVM results on WINE (normalized):
   kernel     error  precision   recall        f1
0  linear  0.027778   0.962963  0.97619  0.968046
1     rbf  0.000000   1.000000  1.00000  1.000000
2    poly  0.027778   0.977778  0.97619  0.976160
