In [1]:
#1.What is K-Nearest Neighbors (KNN) and how does it work in both classification and regression problems? 
#>>K-Nearest Neighbors (KNN) is a supervised learning algorithm used for both classification and regression tasks. It is non-parametric and instance-based, meaning it doesn’t make assumptions about the data distribution and doesn’t explicitly learn a model — instead, it makes predictions based on stored training data.
#In Classification:1Each of the k neighbors votes for their class.
#                  2The majority class among them becomes the predicted class.
#In Regression:1.The algorithm takes the average (or weighted average) of the values of the k nearest neighbors.
#              2.The predicted value is this mean of neighbors’ outputs.

In [2]:
#2. What is the Curse of Dimensionality and how does it affect KNN performance?
#>>Curse of Dimensionality refers to the problems that arise when the number of features (dimensions) in the data increases.
#KNN relies on distance to find nearest neighbors.
#In high dimensions:Distances between points become similar, so it’s hard to identify "nearest" neighbors.
#                   The algorithm becomes less accurate and computationally expensive.
#                   It may require more data to maintain good performance.

In [3]:
#3 What is Principal Component Analysis (PCA)? How is it different from feature selection? 
#**Principal Component Analysis (PCA) is a dimensionality reduction technique that transforms the original features into a new set of uncorrelated variables called principal components. These components capture the maximum variance in the data using fewer dimensions, helping to simplify datasets while retaining most of their important information.
#Unlike feature selection, which chooses a subset of the original features, PCA creates new features (linear combinations of the originals). Thus, PCA transforms the data, while feature selection filters existing features.

In [4]:
#4. What are eigenvalues and eigenvectors in PCA, and why are they important?
#>>Principal Component Analysis (PCA), eigenvalues and eigenvectors come from the covariance matrix of the data and are key to identifying the principal components.
#Eigenvectors:Represent the directions (axes) along which the data varies the most.
#             Each eigenvector corresponds to a principal component.
#Eigenvalues:Represent the amount of variance captured by each eigenvector.
#            A larger eigenvalue means that direction (component) captures more information (spread) in the data.

In [5]:
#5How do KNN and PCA complement each other when applied in a single pipeline?
#>>K-Nearest Neighbors(KNN) and Principal Component Analysis(PCA) complement each other effectively when used together in a single pipeline. PCA reduces the data’s dimensionality by transforming correlated features into a smaller set of uncorrelated principal components that capture most of the variance.
#This simplification helps KNN perform better because it relies on distance calculations, which become more meaningful in lower-dimensional spaces. By removing noise and redundant features, PCA improves KNN’s accuracy, reduces computational cost, and helps overcome the curse of dimensionality, leading to faster and more reliable predictions.

In [6]:
#6.Dataset: Use the Wine Dataset from sklearn.datasets.load_wine(). Question 6: Train a KNN Classifier on the Wine dataset with and without feature scaling. Compare model accuracy in both cases. 
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Load the Wine dataset
wine = load_wine()
X, y = wine.data, wine.target

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 1️⃣ Without Feature Scaling
knn_no_scaling = KNeighborsClassifier(n_neighbors=5)
knn_no_scaling.fit(X_train, y_train)
y_pred_no_scaling = knn_no_scaling.predict(X_test)
accuracy_no_scaling = accuracy_score(y_test, y_pred_no_scaling)
# 2️⃣ With Feature Scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

knn_scaled = KNeighborsClassifier(n_neighbors=5)
knn_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = knn_scaled.predict(X_test_scaled)
accuracy_scaled = accuracy_score(y_test, y_pred_scaled)
# Results
print("Accuracy without Scaling:", round(accuracy_no_scaling, 3))
print("Accuracy with Scaling:", round(accuracy_scaled, 3))

Accuracy without Scaling: 0.722
Accuracy with Scaling: 0.944


In [7]:
#7Train a PCA model on the Wine dataset and print the explained variance ratio of each principal component. 
from sklearn.datasets import load_wine
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import numpy as np

# Load the Wine dataset
wine = load_wine()
X = wine.data

# Step 1: Standardize the data (important before PCA)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Step 2: Apply PCA
pca = PCA()
pca.fit(X_scaled)

# Step 3: Print the explained variance ratio
print("Explained Variance Ratio of each Principal Component:")
for i, ratio in enumerate(pca.explained_variance_ratio_):
    print(f"PC{i+1}: {ratio:.4f}")

# Optional: print cumulative variance
print("\nCumulative Explained Variance:", np.cumsum(pca.explained_variance_ratio_))

Explained Variance Ratio of each Principal Component:
PC1: 0.3620
PC2: 0.1921
PC3: 0.1112
PC4: 0.0707
PC5: 0.0656
PC6: 0.0494
PC7: 0.0424
PC8: 0.0268
PC9: 0.0222
PC10: 0.0193
PC11: 0.0174
PC12: 0.0130
PC13: 0.0080

Cumulative Explained Variance: [0.36198848 0.55406338 0.66529969 0.73598999 0.80162293 0.85098116
 0.89336795 0.92017544 0.94239698 0.96169717 0.97906553 0.99204785
 1.        ]


In [8]:
#8 Train a KNN Classifier on the PCA-transformed dataset (retain top 2 components). Compare the accuracy with the original dataset.
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the Wine dataset
wine = load_wine()
X, y = wine.data, wine.target

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 1: Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Step 2: Train KNN on the original (scaled) dataset
knn_original = KNeighborsClassifier(n_neighbors=5)
knn_original.fit(X_train_scaled, y_train)
y_pred_original = knn_original.predict(X_test_scaled)
accuracy_original = accuracy_score(y_test, y_pred_original)

# Step 3: Apply PCA (retain top 2 components)
pca = PCA(n_components=2)
X_train_pca = pca.fit_transform(X_train_scaled)
X_test_pca = pca.transform(X_test_scaled)

# Step 4: Train KNN on the PCA-transformed dataset
knn_pca = KNeighborsClassifier(n_neighbors=5)
knn_pca.fit(X_train_pca, y_train)
y_pred_pca = knn_pca.predict(X_test_pca)
accuracy_pca = accuracy_score(y_test, y_pred_pca)

# Step 5: Compare results
print("Accuracy on Original (Scaled) Dataset:", round(accuracy_original, 3))
print("Accuracy on PCA-Transformed Dataset (2 Components):", round(accuracy_pca, 3))

Accuracy on Original (Scaled) Dataset: 0.944
Accuracy on PCA-Transformed Dataset (2 Components): 1.0


In [9]:
#9Train a KNN Classifier with different distance metrics (euclidean, manhattan) on the scaled Wine dataset and compare the results. 
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the Wine dataset
wine = load_wine()
X, y = wine.data, wine.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 1: Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Step 2: Train KNN with Euclidean distance (default)
knn_euclidean = KNeighborsClassifier(n_neighbors=5, metric='euclidean')
knn_euclidean.fit(X_train_scaled, y_train)
y_pred_euclidean = knn_euclidean.predict(X_test_scaled)
accuracy_euclidean = accuracy_score(y_test, y_pred_euclidean)

# Step 3: Train KNN with Manhattan distance
knn_manhattan = KNeighborsClassifier(n_neighbors=5, metric='manhattan')
knn_manhattan.fit(X_train_scaled, y_train)
y_pred_manhattan = knn_manhattan.predict(X_test_scaled)
accuracy_manhattan = accuracy_score(y_test, y_pred_manhattan)

# Step 4: Compare results
print("Accuracy with Euclidean Distance:", round(accuracy_euclidean, 3))
print("Accuracy with Manhattan Distance:", round(accuracy_manhattan, 3))

Accuracy with Euclidean Distance: 0.944
Accuracy with Manhattan Distance: 0.944


In [None]:
#10