📌 Classification Using KNN — Summary

**Objective**

Predict the **wine class (0, 1, 2)** using 13 chemical features with **K-Nearest Neighbors (KNN)** classifier.

🍷 **Dataset: Wine (from sklearn)**

* **Samples:** 178 wines
* **Features:** 13 chemical attributes (e.g., alcohol, flavanoids, color intensity)
* **Target:** `class` (categorical: 0, 1, 2 based on region)

🎯 **What Are We Predicting?**

The `class` column is our **response variable** — it has 3 unique values:

* **Class 0**: Wines from one region
* **Class 1**: Wines from another region
* **Class 2**: Wines from a third region

We aim to **build a model** that predicts this wine class based on its **chemical composition**.

⚙️ **Why KNN?**

* Simple and intuitive
* Classifies based on similarity (distance)
* No training phase — predictions use majority vote from `k` closest neighbors

 🔁 **Workflow Steps**

1. Load wine dataset
2. Convert to `pandas` DataFrame
3. Standardize features using `StandardScaler`
4. Split data into training and testing sets
5. Train initial KNN model (`k=3`)
6. Evaluate using **accuracy, precision, recall**
7. Use `GridSearchCV` to find best `k`
8. Retrain model using optimal `k` and re-evaluate


In [13]:
# Import standard libraries
import pandas as pd
import numpy as np
import random
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import recall_score, precision_score
from sklearn.model_selection import cross_validate
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_wine


In [14]:


# Load the Wine dataset
wine_data = load_wine()

# Convert to DataFrame
wine_df = pd.DataFrame(wine_data.data, columns=wine_data.feature_names)

# Bind the 'class' (wine target) to the DataFrame
wine_df['class'] = wine_data.target

# Select predictors (excluding the target column)
predictors = wine_df.iloc[:, :-1]

# Standardize the predictors
scaler = StandardScaler()
predictors_standardized = pd.DataFrame(scaler.fit_transform(predictors), columns=predictors.columns)

# Set a seed for reproducibility
np.random.seed(123)



In [15]:
# Split the data into training (75%) and testing (25%) sets
X_train, X_test, y_train, y_test = train_test_split(
    predictors_standardized, wine_df["class"], test_size=0.25, random_state=123
)

# Initialize and train the KNN classifier
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)



In [16]:
# Predict and evaluate the model
y_pred = knn.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision (macro):", precision_score(y_test, y_pred, average='macro'))
print("Recall (macro):", recall_score(y_test, y_pred, average='macro'))




Accuracy: 0.9555555555555556
Precision (macro): 0.9484126984126985
Recall (macro): 0.9595238095238096


In [17]:
# Cross-validation for multiple metrics
cv_results = cross_validate(knn, predictors_standardized, wine_df["class"],
                            cv=5,
                            scoring=["accuracy", "precision_macro", "recall_macro"],
                            return_train_score=True)

print("Cross-validation results:")
print("Train Accuracy Mean:", np.mean(cv_results["train_accuracy"]))
print("Test Accuracy Mean:", np.mean(cv_results["test_accuracy"]))

Cross-validation results:
Train Accuracy Mean: 0.9691224268689058
Test Accuracy Mean: 0.943968253968254


In [18]:

# Grid Search to tune hyperparameter 'k'
param_grid = {'n_neighbors': list(range(1, 21))}
grid_search = GridSearchCV(KNeighborsClassifier(), param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

print("Best k value:", grid_search.best_params_["n_neighbors"])
print("Best cross-validated accuracy:", grid_search.best_score_)



Best k value: 6
Best cross-validated accuracy: 0.9849002849002849


In [19]:
# Train model with best k
best_k = grid_search.best_params_["n_neighbors"]
knn_best = KNeighborsClassifier(n_neighbors=best_k)
knn_best.fit(X_train, y_train)
y_pred_best = knn_best.predict(X_test)

print("Final model evaluation:")
print("Accuracy:", accuracy_score(y_test, y_pred_best))
print("Precision (macro):", precision_score(y_test, y_pred_best, average='macro'))
print("Recall (macro):", recall_score(y_test, y_pred_best, average='macro'))


Final model evaluation:
Accuracy: 0.9111111111111111
Precision (macro): 0.9007936507936508
Recall (macro): 0.9119047619047619
