1. What is K-Nearest Neighbors (KNN) and how does it work in both
classification and regression problems?

Ans:

K-Nearest Neighbors (KNN):
A simple, non-parametric algorithm that predicts outcomes based on the ‘k’ closest training points.

How it works:

Choose k (number of neighbors).

Compute distance to all training points (e.g., Euclidean).

Select k nearest neighbors.

Predict:

Classification: majority class among neighbors.

Regression: average value of neighbors.

Pros: Simple, no assumptions, works for multi-class.
Cons: Slow on large data, sensitive to feature scale, struggles in high dimensions.

2. What is the Curse of Dimensionality and how does it affect KNN
performance?


Ans:

Curse of Dimensionality:

When the number of features (dimensions) in data increases, the data becomes sparse, distances between points become less meaningful, and patterns are harder to detect.

Effect on KNN:

KNN relies on distance to find neighbors.

In high dimensions, all points may seem equally far, so KNN’s predictions become less accurate.

Requires feature selection or dimensionality reduction to improve performance.

3.  What is Principal Component Analysis (PCA)? How is it different from
feature selection?

Ans:

Principal Component Analysis (PCA):
A dimensionality reduction technique that transforms original features into new uncorrelated components (principal components) capturing the most variance in the data.

Difference from Feature Selection:

PCA: Creates new features (linear combinations), reduces dimensions without dropping information.

Feature Selection: Chooses a subset of original features, keeping them unchanged.

4. What are eigenvalues and eigenvectors in PCA, and why are they
important?

Ans:

Eigenvectors and Eigenvalues in PCA:

Eigenvectors: Directions of the new axes (principal components).

Eigenvalues: Measure how much variance is along each eigenvector.

Importance:

Eigenvectors show the important directions in data.

Eigenvalues tell us how much information/variance each component captures, helping to choose top components.

5. How do KNN and PCA complement each other when applied in a single
pipeline?

Ans:

KNN + PCA pipeline:

PCA reduces dimensions, removes noise, and keeps most important variance.

KNN then finds neighbors in this lower-dimensional space.

Benefit:

Faster computation, less memory.

Improved accuracy because distances are more meaningful.

6. Train a KNN Classifier on the Wine dataset with and without feature
scaling. Compare model accuracy in both cases.


In [1]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load dataset
data = load_wine()
X, y = data.data, data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# --- KNN without scaling ---
knn1 = KNeighborsClassifier(n_neighbors=5)
knn1.fit(X_train, y_train)
y_pred1 = knn1.predict(X_test)
acc1 = accuracy_score(y_test, y_pred1)

# --- KNN with scaling ---
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

knn2 = KNeighborsClassifier(n_neighbors=5)
knn2.fit(X_train_scaled, y_train)
y_pred2 = knn2.predict(X_test_scaled)
acc2 = accuracy_score(y_test, y_pred2)

print("Accuracy without scaling:", acc1)
print("Accuracy with scaling:", acc2)


Accuracy without scaling: 0.7222222222222222
Accuracy with scaling: 0.9444444444444444


7. Train a PCA model on the Wine dataset and print the explained variance
ratio of each principal component.


In [2]:
from sklearn.datasets import load_wine
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Load dataset
data = load_wine()
X = data.data

# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Train PCA
pca = PCA()
X_pca = pca.fit_transform(X_scaled)

# Print explained variance ratio
print("Explained variance ratio of each PC:")
for i, ratio in enumerate(pca.explained_variance_ratio_):
    print(f"PC{i+1}: {ratio:.4f}")


Explained variance ratio of each PC:
PC1: 0.3620
PC2: 0.1921
PC3: 0.1112
PC4: 0.0707
PC5: 0.0656
PC6: 0.0494
PC7: 0.0424
PC8: 0.0268
PC9: 0.0222
PC10: 0.0193
PC11: 0.0174
PC12: 0.0130
PC13: 0.0080


8. Train a KNN Classifier on the PCA-transformed dataset (retain top 2
components). Compare the accuracy with the original dataset.


In [3]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load dataset
data = load_wine()
X, y = data.data, data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# --- Scale features ---
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# --- KNN on original scaled data ---
knn_orig = KNeighborsClassifier(n_neighbors=5)
knn_orig.fit(X_train_scaled, y_train)
y_pred_orig = knn_orig.predict(X_test_scaled)
acc_orig = accuracy_score(y_test, y_pred_orig)

# --- PCA (top 2 components) ---
pca = PCA(n_components=2)
X_train_pca = pca.fit_transform(X_train_scaled)
X_test_pca = pca.transform(X_test_scaled)

# --- KNN on PCA-transformed data ---
knn_pca = KNeighborsClassifier(n_neighbors=5)
knn_pca.fit(X_train_pca, y_train)
y_pred_pca = knn_pca.predict(X_test_pca)
acc_pca = accuracy_score(y_test, y_pred_pca)

print("Accuracy on original scaled data:", acc_orig)
print("Accuracy on top-2 PCA data:", acc_pca)


Accuracy on original scaled data: 0.9444444444444444
Accuracy on top-2 PCA data: 1.0


9. Train a KNN Classifier with different distance metrics (euclidean,
manhattan) on the scaled Wine dataset and compare the results.


In [4]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load dataset
data = load_wine()
X, y = data.data, data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# --- KNN with Euclidean distance ---
knn_euc = KNeighborsClassifier(n_neighbors=5, metric='euclidean')
knn_euc.fit(X_train_scaled, y_train)
y_pred_euc = knn_euc.predict(X_test_scaled)
acc_euc = accuracy_score(y_test, y_pred_euc)

# --- KNN with Manhattan distance ---
knn_man = KNeighborsClassifier(n_neighbors=5, metric='manhattan')
knn_man.fit(X_train_scaled, y_train)
y_pred_man = knn_man.predict(X_test_scaled)
acc_man = accuracy_score(y_test, y_pred_man)

print("Accuracy with Euclidean distance:", acc_euc)
print("Accuracy with Manhattan distance:", acc_man)


Accuracy with Euclidean distance: 0.9444444444444444
Accuracy with Manhattan distance: 0.9444444444444444
