Question 1: What is K-Nearest Neighbors (KNN) and how does it work in classification and regression?

Answer:
KNN is a non-parametric algorithm that predicts outcomes based on the K closest data points.
•	Classification: Uses majority vote.
•	Regression: Uses average of neighbors’ values.

Question 2: What is the Curse of Dimensionality and its effect on KNN?

Answer:
As dimensions increase, distances between points become less meaningful, reducing KNN accuracy and increasing computation time.

Question 3: What is PCA and how is it different from feature selection?

Answer:
PCA reduces dimensionality by creating new features (principal components), while feature selection keeps a subset of original features.

Question 4: What are eigenvalues and eigenvectors in PCA and why are they important?

Answer:
Eigenvectors define the direction of new axes, and eigenvalues indicate how much variance each component captures.

Question 5: How do KNN and PCA complement each other in a pipeline?

Answer:
PCA reduces dimensionality and noise, improving KNN performance and reducing computation time.


In [4]:
##Question 6: KNN with and without Feature Scaling (Wine Dataset)

from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

X, y = load_wine(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)

# Without scaling
knn = KNeighborsClassifier()
knn.fit(X_train, y_train)
print("Accuracy without scaling:", accuracy_score(y_test, knn.predict(X_test)))

# With scaling
scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_test_s = scaler.transform(X_test)

knn.fit(X_train_s, y_train)
print("Accuracy with scaling:", accuracy_score(y_test, knn.predict(X_test_s)))


Accuracy without scaling: 0.7111111111111111
Accuracy with scaling: 0.9555555555555556


In [2]:
## Question 7: PCA Explained Variance Ratio

from sklearn.decomposition import PCA
import numpy as np

pca = PCA()
pca.fit(X)

print("Explained Variance Ratio:", pca.explained_variance_ratio_)

Explained Variance Ratio: [9.98091230e-01 1.73591562e-03 9.49589576e-05 5.02173562e-05
 1.23636847e-05 8.46213034e-06 2.80681456e-06 1.52308053e-06
 1.12783044e-06 7.21415811e-07 3.78060267e-07 2.12013755e-07
 8.25392788e-08]


In [3]:
## Question 8: KNN on PCA-Transformed Data (2 Components)

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_pca, y)
knn.fit(X_train, y_train)

print("Accuracy with PCA:", accuracy_score(y_test, knn.predict(X_test)))

Accuracy with PCA: 0.7555555555555555


In [5]:
## Question 9: KNN with Different Distance Metrics

knn = KNeighborsClassifier(metric='manhattan')
knn.fit(X_train, y_train)
print("Accuracy with Manhattan distance:", accuracy_score(y_test, knn.predict(X_test)))

Accuracy with Manhattan distance: 0.7777777777777778


Question 10: Cancer Classification – PCA + KNN Pipeline

•  PCA: Reduce thousands of gene features to fewer components.
•  Components Selection: Use explained variance (e.g., 95%).
•  KNN: Classify patients using reduced data.
•  Evaluation: Accuracy, F1-score, cross-validation.
•  Justification: Reduces overfitting, improves accuracy, and handles high-dimensional biomedical data efficiently.
