**Programmer:** python_scripts (Abhijith Warrier)

**PYTHON SCRIPT TO **_DEMONSTRATE THE IMPACT OF FEATURE SCALING ON KNN ACCURACY_**. 🐍📊🤖**

This script shows how scaling features can significantly change the accuracy of distance-based algorithms like **K-Nearest Neighbors (KNN)**.
We’ll compare model accuracy **before and after scaling** using `StandardScaler`.

### 📦 Import Required Libraries

We’ll use scikit-learn for data preprocessing, modeling, and evaluation.

In [6]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
import pandas as pd

### 🧩 Load and Split the Dataset

We’ll use the Iris dataset and split it into training and testing sets.

In [7]:
# Load dataset
wine = load_wine()
X = wine.data
y = wine.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

### ⚙️ Train KNN Without Feature Scaling

We first train a KNN classifier on the raw, unscaled data.

In [8]:
# Train KNN without scaling
knn_unscaled = KNeighborsClassifier(n_neighbors=5)
knn_unscaled.fit(X_train, y_train)

# Predict and evaluate
y_pred_unscaled = knn_unscaled.predict(X_test)
acc_unscaled = accuracy_score(y_test, y_pred_unscaled)
print(f"Accuracy without scaling: {acc_unscaled:.2f}")

Accuracy without scaling: 0.74


### 🔄 Apply Feature Scaling

We standardize features using StandardScaler so that each feature contributes equally to distance calculations.

In [9]:
# Apply StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

### 🧠 Train KNN With Scaled Features

Now, train the same KNN classifier on scaled data and compare performance.

In [10]:
# Train KNN with scaled features
knn_scaled = KNeighborsClassifier(n_neighbors=5)
knn_scaled.fit(X_train_scaled, y_train)

# Predict and evaluate
y_pred_scaled = knn_scaled.predict(X_test_scaled)
acc_scaled = accuracy_score(y_test, y_pred_scaled)
print(f"Accuracy with scaling: {acc_scaled:.2f}")

Accuracy with scaling: 0.96


### ✅ Observation

You’ll likely see that the scaled model performs better.
That’s because KNN relies on distance metrics (e.g., Euclidean), and without scaling, features with larger numeric ranges dominate distance calculations — leading to biased results.