In [None]:
Theoretical

1. What is K-Nearest Neighbors (KNN) and how does it work?
K-Nearest Neighbors (KNN) Overview
K-Nearest Neighbors (KNN) is a supervised learning algorithm used for classification and regression tasks. It is a non-parametric, instance-based (lazy learning) algorithm, meaning it does not make strong assumptions about the data and delays computations until a query is made.

How KNN Works
Training Phase:

Unlike other algorithms, KNN does not explicitly learn a model during training. Instead, it just stores the training data.

Prediction Phase:

When a new data point (query) needs classification or regression, the algorithm:

Computes the distance (e.g., Euclidean, Manhattan, or Minkowski distance) between the query point and all training points.

Finds the K nearest neighbors (based on the smallest distances).

Determines the output:

For Classification: The majority class among the K neighbors is assigned to the query point (majority voting).

For Regression: The average (or weighted average) of the K neighbors' values is used as the prediction.

Key Parameters of KNN
K (Number of Neighbors):

A small K may lead to overfitting, while a large K may result in underfitting.

Common practice is to choose an odd K (like 3, 5, or 7) to avoid ties in classification.

Distance Metric:

Euclidean Distance (default):
𝑑
(
𝑝
,
𝑞
)
=
∑
(
𝑝
𝑖
−
𝑞
𝑖
)
2
d(p,q)=
∑(p
i
​
 −q
i
​
 )
2

​


Manhattan Distance:
𝑑
(
𝑝
,
𝑞
)
=
∑
∣
𝑝
𝑖
−
𝑞
𝑖
∣
d(p,q)=∑∣p
i
​
 −q
i
​
 ∣

Minkowski Distance (generalized form)

Weighting Strategy:

Uniform weighting (each neighbor contributes equally)

Distance-weighted (closer points contribute more)

Pros and Cons
✅ Advantages:

Simple and intuitive.

Works well with small datasets.

Can handle multi-class classification.

❌ Disadvantages:

Computationally expensive for large datasets.

Sensitive to irrelevant or highly correlated features.

Performance depends on the choice of K and distance metric.


2.What is the difference between KNN Classification and KNN Regression?


Difference Between KNN Classification and KNN Regression
Feature	KNN Classification	KNN Regression
Output Type	Discrete (Class Labels)	Continuous (Numerical Values)
Prediction Strategy	Majority voting among K neighbors	Average (or weighted average) of K neighbors' values
Use Case	Used for categorical outputs (e.g., genre classification)	Used for numerical outputs (e.g., predicting track popularity)
Example	Classifying a song as Hip-hop, Pop, or R&B	Predicting a song’s popularity score based on features like duration and artist collaboration
Decision Boundary	Creates well-defined class regions	Produces smoother transitions between values
Example Scenarios
KNN Classification:

Predicting whether a song is "popular" or "not popular" based on its features.

Identifying the genre of a song based on audio characteristics.

KNN Regression:

Predicting the exact popularity score of a track.

Estimating the duration of a song based on other numerical features.


3. What is the role of the distance metric in KNN?


The distance metric in K-Nearest Neighbors (KNN) plays a crucial role in determining how the algorithm identifies the most similar data points. Since KNN relies on the proximity of data points to make predictions, the choice of distance metric directly affects classification and regression accuracy. Different metrics measure distance in unique ways, and selecting the appropriate one depends on the nature of the dataset.

The most commonly used distance metric is Euclidean distance, which calculates the straight-line distance between two points in a multi-dimensional space. It is effective when the dataset features are continuous and have similar scales. Another frequently used metric is Manhattan distance, which sums the absolute differences between feature values. This approach works well when dealing with grid-like structures or datasets where movement is restricted to one axis at a time.

For datasets that involve varying scales or distributions, Minkowski distance provides a more generalized approach, allowing a user to adjust the formula to behave like either Euclidean or Manhattan distance by changing a parameter. Additionally, Cosine similarity is often used when working with high-dimensional data, such as text or music recommendation systems, where the angle between vectors is more informative than their absolute distance.

The choice of distance metric can significantly impact the performance of KNN. If features are on different scales, improper metric selection can lead to biased predictions. Standardizing or normalizing data often helps mitigate these issues, ensuring that no single feature disproportionately influences the calculations. Would you like a practical example demonstrating the effect of different distance metrics on your Spotify dataset?


4. What is the Curse of Dimensionality in KNN?


Curse of Dimensionality in KNN
The Curse of Dimensionality refers to the challenges that arise when working with high-dimensional data in K-Nearest Neighbors (KNN) and other machine learning algorithms. As the number of features (dimensions) increases, the performance of KNN can degrade due to the following reasons:

1. Distance Becomes Less Meaningful
In high-dimensional spaces, all data points tend to become equidistant from each other.

Since KNN relies on finding the "nearest" neighbors, if all points are almost equally distant, the algorithm struggles to distinguish relevant neighbors from irrelevant ones.

Example: In a 2D space, the difference between two points may be clear, but in a 100D space, most points appear similarly distant.

2. Increased Computational Complexity
KNN must compute distances between the query point and every training point.

As the number of dimensions increases, the computational cost grows significantly.

High-dimensional datasets require more memory and time for distance calculations.

3. Sparse Data Distribution
In high dimensions, data points spread out, and meaningful clusters become harder to detect.

If data is sparse, finding K meaningful neighbors becomes difficult, leading to unreliable predictions.

4. Overfitting Risk
With many dimensions, KNN can overfit due to excessive noise and irrelevant features.

Some dimensions may contain little to no useful information, but they still contribute to distance calculations, reducing accuracy.

How to Overcome the Curse of Dimensionality in KNN?
✅ Feature Selection – Remove irrelevant or redundant features to reduce dimensionality.
✅ Dimensionality Reduction – Apply Principal Component Analysis (PCA) or t-SNE to reduce the number of dimensions while preserving important information.
✅ Feature Scaling – Normalize or standardize features to ensure fair distance calculations.
✅ Choose an Appropriate Distance Metric – In high-dimensional spaces, Cosine Similarity may work better than Euclidean distance.

 5. How can we choose the best value of K in KNN?


How to Choose the Best Value of K in KNN?
Choosing the right K (number of neighbors) is crucial for achieving good model performance in K-Nearest Neighbors (KNN). The value of K controls the trade-off between bias and variance:

Small K (e.g., 1, 3, 5) → More flexible, low bias, high variance (risk of overfitting).

Large K (e.g., 50, 100) → More stable, high bias, low variance (risk of underfitting).

Methods to Select the Optimal K
1️⃣ Elbow Method (Using Cross-Validation)

Train KNN for different values of K.

Plot error rate (or accuracy) vs. K.

Choose K where the error starts to stabilize ("elbow point").

2️⃣ Grid Search with Cross-Validation

Use GridSearchCV (from sklearn) to test multiple K values.

Select the K with the best cross-validation accuracy.

3️⃣ Odd vs. Even K

For classification, choose an odd K to avoid tie-breaking issues.

For regression, even K values can work since averaging is used.

4️⃣ Rule of Thumb

K is often chosen as:

𝐾
=
𝑁
K=
N
​

where N is the number of training samples.

Example: Finding the Best K for a Spotify Popularity Model

6. What are KD Tree and Ball Tree in KNN?


KD Tree and Ball Tree in KNN
K-Nearest Neighbors (KNN) is computationally expensive because it requires calculating distances between a query point and all training points. KD Tree and Ball Tree are data structures used to speed up nearest neighbor searches, making KNN more efficient.

1️⃣ KD Tree (K-Dimensional Tree)
🔹 What is it?

A binary tree-based data structure that recursively splits data along different dimensions.

Works well when dimensions (features) are low to moderate (usually < 30).

🔹 How it Works:

Choose a splitting dimension (e.g., x, y, z for 3D data).

Select a median value along that dimension.

Recursively split data into left and right subtrees.

During query, traverse the tree to find nearest neighbors efficiently.

🔹 Best For:
✅ Low to moderate-dimensional data.
✅ Faster than brute-force search when dimensions are ≤ 30.

🔹 Limitations:
❌ Inefficient in high-dimensional spaces (curse of dimensionality).

2️⃣ Ball Tree
🔹 What is it?

A hierarchical clustering structure that groups points into hyperspherical regions (“balls”).

Works well for high-dimensional data (>30 dimensions).

🔹 How it Works:

Partition data into a tree of nested balls (spheres) based on distances.

Use triangle inequality to prune unnecessary distance computations.

When searching for neighbors, avoid checking far-away regions.

🔹 Best For:
✅ High-dimensional data (better than KD Tree for >30 dimensions).
✅ Works well with non-axis-aligned data distributions.

🔹 Limitations:
❌ More complex to construct than KD Tree.
❌ Slower for very low-dimensional data.

Which One to Use in KNN?
Scenario	KD Tree	Ball Tree
Low-dimensional data (<30)	✅ Best choice	❌ Not optimal
High-dimensional data (>30)	❌ Becomes inefficient	✅ Works better
Large datasets	✅ Faster than brute-force	✅ Works well
Non-axis-aligned data	❌ May not work well	✅ Handles better
👉 In Scikit-Learn (sklearn.neighbors.KNeighborsClassifier and KNeighborsRegressor), you can choose:

"auto" → Automatically picks the best structure (KD Tree or Ball Tree).

"kd_tree" → Uses KD Tree.

"ball_tree" → Uses Ball Tree.

"brute" → Uses brute-force search.

7. When should you use KD Tree vs. Ball Tree?


When Should You Use KD Tree vs. Ball Tree?
Choosing between KD Tree and Ball Tree depends on the dataset's dimensionality, size, and structure. Here's a comparison to help decide:

Criteria	KD Tree ✅	Ball Tree ✅
Dimensionality	Works best for low to moderate dimensions (≤30 features).	Better for high-dimensional data (>30 features).
Data Distribution	Best for axis-aligned data (features with clear split boundaries).	Works well for non-axis-aligned data (e.g., spherical clusters).
Computational Speed	Faster than Ball Tree when dimensions are low.	Faster than KD Tree in high dimensions.
Memory Usage	Less memory-intensive than Ball Tree.	Requires more memory due to hierarchical clustering.
Training Time	Quick to construct.	Takes longer to construct but can be efficient for queries.
Query Efficiency	Slows down significantly as dimensions increase.	More efficient for nearest neighbor search in high-dimensional space.
Example Use Case	Finding similar songs based on tempo & duration (few numeric features).	Finding similar songs based on complex audio features (many dimensions).
If dimensions ≤ 30 → Use KD Tree

If dimensions > 30 → Use Ball Tree

If unsure → Use "auto" in Scikit-Learn, which picks the best one automatically.


8. What are the disadvantages of KNN?


Disadvantages of K-Nearest Neighbors (KNN)
Despite its simplicity, KNN has several drawbacks that can impact performance, especially on large or high-dimensional datasets. Here are the key disadvantages:

1️⃣ Computational Cost (Slow on Large Datasets)
KNN is a lazy learner, meaning it doesn’t learn a model during training. Instead, it stores the entire dataset and performs computations at prediction time.

Time Complexity:

Brute-force search: O(N × D) (where N = number of training samples, D = number of dimensions).

KD Tree / Ball Tree can speed it up, but they degrade in high dimensions.

✅ Solution: Use KD Tree / Ball Tree for speedup or approximate nearest neighbors (ANN) techniques.

2️⃣ High Memory Usage
Since KNN stores all training data, it requires large memory when dealing with big datasets.

Problem: If there are millions of data points, KNN becomes impractical.

✅ Solution: Reduce dataset size using feature selection or dimensionality reduction (PCA, t-SNE).

3️⃣ Curse of Dimensionality
As the number of features (dimensions) increases, distances become less meaningful, and all points appear similar.

KD Tree and Ball Tree become ineffective for D > 30.

✅ Solution: Use dimensionality reduction techniques (PCA, LDA) or switch to distance metrics like Cosine Similarity.

4️⃣ Sensitive to Noisy and Irrelevant Features
KNN treats all features equally, even if some are irrelevant or noisy.

Example: If predicting song popularity, "track duration" may be more important than "track ID", but KNN doesn’t know that.

✅ Solution: Apply feature selection (e.g., Mutual Information, Recursive Feature Elimination).

5️⃣ Imbalanced Data Issues
If one class is much more frequent than others, KNN may predict the majority class most of the time.

Example: If 90% of the songs in a dataset are "popular" and only 10% are "not popular," KNN will favor the majority class.

✅ Solution: Use weighted KNN, where closer neighbors have higher influence.

6️⃣ Difficult to Choose the Best K
Too small K → Overfitting (high variance, sensitive to noise).

Too large K → Underfitting (high bias, loses local patterns).

Finding the best K often requires trial and error or cross-validation.

✅ Solution: Use the Elbow Method or GridSearchCV to find the optimal K.

📌 Summary of Disadvantages & Solutions
Disadvantage	Solution
Slow on large datasets	KD Tree, Ball Tree, Approximate Nearest Neighbors (ANN)
High memory usage	Feature selection, dimensionality reduction
Curse of dimensionality	PCA, LDA, t-SNE, Cosine Similarity
Sensitive to noise & irrelevant features	Feature scaling, feature selection
Imbalanced data	Weighted KNN, SMOTE (oversampling)
Choosing K is tricky	Cross-validation, Elbow Method


9. How does feature scaling affect KNN?


How Does Feature Scaling Affect KNN?
Feature scaling is critical in K-Nearest Neighbors (KNN) because KNN is a distance-based algorithm. If features have different scales, the distance metric (e.g., Euclidean distance) will be dominated by larger-scale features, leading to biased predictions.

1️⃣ Why is Feature Scaling Important for KNN?
KNN calculates distances between points (e.g., Euclidean, Manhattan).

Unscaled features distort distance measurements, making certain features dominate.

Example:

Suppose a dataset has track duration (seconds) ranging from 100 to 500 and popularity score from 0 to 100.

Since track duration values are much larger, they will overshadow the influence of popularity in distance calculations.

KNN will end up treating "track duration" as more important than "popularity."

✅ Solution: Normalize or standardize features before applying KNN.

2️⃣ Common Feature Scaling Methods for KNN
🔹 (1) Min-Max Scaling (Normalization)
Formula:

𝑋
scaled
=
𝑋
−
𝑋
min
𝑋
max
−
𝑋
min
X
scaled
​
 =
X
max
​
 −X
min
​

X−X
min
​

​

Scales features between 0 and 1.

Best when features have different ranges but no extreme outliers.

Example (before and after scaling):

Track Duration (sec)	Popularity Score	Normalized Duration	Normalized Popularity
200	50	0.25	0.50
300	70	0.50	0.70
500	90	1.00	0.90
📌 Use when: ✔️ Features have different scales.
✔️ Data does not have extreme outliers.

🔹 (2) Standardization (Z-score Normalization)
Formula:

𝑋
scaled
=
𝑋
−
𝜇
𝜎
X
scaled
​
 =
σ
X−μ
​

Mean-centered with unit variance (mean = 0, std = 1).

Helps when features have different units and outliers.

Example:

Track Duration (sec)	Popularity Score	Standardized Duration	Standardized Popularity
200	50	-1.2	-0.8
300	70	-0.2	0.4
500	90	1.5	1.3
📌 Use when: ✔️ Data has outliers.
✔️ Features have different distributions.

3️⃣ Impact of Feature Scaling on KNN Performance
Before Scaling → Distances are skewed by large-scale features.

After Scaling → All features contribute equally to distance calculations.

🔹 Without Scaling → Poor classification & wrong neighbors chosen.
🔹 With Scaling → Improved accuracy & better nearest neighbors.

📌 Key Takeaways
Aspect	Min-Max Scaling	Standardization (Z-score)
Scales Between	0 to 1	Mean = 0, Std = 1
Handles Outliers?	❌ No	✅ Yes
Best For	Features with similar distributions	Features with different distributions & outliers
Used in KNN?	✅ Yes	✅ Yes

10. What is PCA (Principal Component Analysis)?


What is PCA (Principal Component Analysis)?
Principal Component Analysis (PCA) is a dimensionality reduction technique used in machine learning and data analysis. It transforms high-dimensional data into a lower-dimensional space while preserving as much variance as possible.

1️⃣ Why Use PCA?
🔹 Curse of Dimensionality → Too many features can slow down algorithms like KNN and lead to overfitting.
🔹 Feature Correlation → PCA removes redundancy by combining highly correlated features.
🔹 Visualization → Reduces complex datasets to 2D or 3D for easy visualization.
🔹 Speeds Up Computation → Reducing features improves efficiency in distance-based algorithms like KNN.

✅ Example in Your Spotify Dataset: If you have 30 audio features per song, PCA can reduce them to 5-10 key features without losing much information.

2️⃣ How Does PCA Work?
PCA finds new axes (principal components) that maximize variance while minimizing information loss.

📌 Step-by-Step Process
Standardize the Data (mean = 0, variance = 1).

Compute the Covariance Matrix → Measures how features are related.

Find Eigenvalues & Eigenvectors → Identify the most important directions in the data.

Select Top K Principal Components → Choose the top components that explain most of the variance.

Transform Data → Project original data onto the new lower-dimensional space.

3️⃣ Example: PCA on Spotify Song Features
Imagine you have these features:
🎵 Danceability, Tempo, Energy, Loudness, Duration, Popularity

Before PCA (Original Features - 6D)
Song	Danceability	Tempo	Energy	Loudness	Duration	Popularity
A	0.8	120	0.9	-5.0	210	85
B	0.6	100	0.7	-7.2	180	70
After PCA (Reduced to 2D)
Song	PC1 (Main Trend)	PC2 (Secondary Trend)
A	1.45	-0.30
B	0.92	0.12
👉 PCA captures most of the variation in just two features instead of six! 🚀

4️⃣ When to Use PCA in KNN?
✅ Before KNN if dataset has many features (e.g., 30+ features in Spotify data).
✅ If features are highly correlated (PCA removes redundancy).
✅ When reducing dimensions improves model speed without sacrificing accuracy.


11. How does PCA work?


How Does PCA (Principal Component Analysis) Work?
PCA works by transforming a high-dimensional dataset into a lower-dimensional space while preserving as much variance as possible. It does this by identifying new principal components (PCs), which are linear combinations of the original features.

📌 Step-by-Step Process of PCA
Let's break PCA down into five key steps:

1️⃣ Standardize the Data
Since PCA is affected by scale, we first normalize or standardize the dataset to ensure all features contribute equally.

𝑋
scaled
=
𝑋
−
𝜇
𝜎
X
scaled
​
 =
σ
X−μ
​

👉 This ensures each feature has mean = 0 and variance = 1.

2️⃣ Compute the Covariance Matrix
The covariance matrix captures relationships between different features.

If two features are highly correlated, PCA will combine them into one principal component.

Σ
=
1
𝑛
(
𝑋
𝑇
𝑋
)
Σ=
n
1
​
 (X
T
 X)
👉 This helps us find the directions where the data varies the most.

3️⃣ Compute Eigenvalues & Eigenvectors
Eigenvectors represent the direction of the new feature space (principal components).

Eigenvalues represent the importance of each principal component (variance captured).

🔹 We solve this equation:

Σ
𝑣
=
𝜆
𝑣
Σv=λv
where:

𝑣
v = Eigenvector (Principal Component Direction)

𝜆
λ = Eigenvalue (Amount of Variance Captured)

👉 Larger eigenvalues = More important principal components.

4️⃣ Select the Top K Principal Components
Rank eigenvalues from highest to lowest.

Choose the top K components that explain the most variance.

The proportion of variance explained (PVE) by each component is:

𝑃
𝑉
𝐸
=
𝜆
𝑖
∑
𝜆
PVE=
∑λ
λ
i
​

​

🔹 Elbow Rule → Choose K where cumulative variance stabilizes (e.g., 95% variance).

5️⃣ Transform the Data into the New Feature Space
The dataset is projected onto the selected principal components using matrix multiplication:

𝑋
PCA
=
𝑋
⋅
𝑉
𝐾
X
PCA
​
 =X⋅V
K
​

where:

𝑋
X = Original data

𝑉
𝐾
V
K
​
  = Matrix of top K eigenvectors

𝑋
PCA
X
PCA
​
  = Transformed lower-dimensional data

🔹 Now, we have a dataset with K principal components instead of original features.

🎵 Example: PCA on a Spotify Dataset
Original Features (6D)
Song	Danceability	Tempo	Energy	Loudness	Duration	Popularity
A	0.8	120	0.9	-5.0	210	85
B	0.6	100	0.7	-7.2	180	70
After PCA (Reduced to 2D)
Song	PC1 (Main Trend)	PC2 (Secondary Trend)
A	1.45	-0.30
B	0.92	0.12
👉 PCA reduced 6 features to just 2, preserving most of the variation. 🚀

📌 When to Use PCA?
✅ High-dimensional datasets (30+ features).
✅ Features are correlated (PCA removes redundancy).
✅ Need to speed up models (like KNN, SVM).
✅ Want to visualize data in 2D/3D.


12. What is the geometric intuition behind PCA?


Geometric Intuition Behind PCA
PCA can be understood geometrically as finding the best lower-dimensional space that captures the most variance in the data. Instead of looking at individual features, PCA finds new axes (principal components) that best describe the data.

📌 Key Geometric Concepts of PCA
1️⃣ PCA Finds New Coordinate Axes
Imagine a cloud of data points in a high-dimensional space.

PCA rotates the coordinate system to find new axes that better capture the spread of data.

These new axes are the principal components (PCs).

🔹 Example (2D to 1D Reduction)

Given 2D data (e.g., Tempo & Energy of songs), PCA finds a new 1D axis that best represents the data.

The first principal component (PC1) is along the direction of maximum variance.

The second principal component (PC2) is perpendicular to PC1 and captures the remaining variance.

📌 Think of PCA as finding a tilted coordinate system that better fits the data.

2️⃣ PCA Finds the "Best-Fit Line" (Higher-Dimensional Analogy)
In 2D, PCA finds the best-fit line that minimizes distance errors.

In 3D, PCA finds the best-fit plane.

In higher dimensions, PCA finds the best-fit hyperplane that captures most of the variance.

🔹 Analogy:

If you have a sheet of paper (2D plane) inside a 3D space, PCA finds the best way to flatten your data onto that sheet while keeping as much information as possible.

3️⃣ Principal Components = Eigenvectors of the Covariance Matrix
PCA finds eigenvectors of the covariance matrix, which represent the directions of maximum variance.

Eigenvalues tell us how much variance is captured by each eigenvector.

🔹 Think of Eigenvectors as "axes" and Eigenvalues as "importance" of those axes.

PC1 (First Principal Component) → Points in the direction of most variance.

PC2 (Second Principal Component) → Perpendicular to PC1, capturing less variance.

Higher PCs capture even less variance and can be discarded for dimensionality reduction.

📌 In essence, PCA "reorients" the data along the directions where it varies the most.

🎵 Example: PCA on a Spotify Dataset
Original Data (High Dimensional)
Imagine plotting songs based on Tempo, Energy, and Loudness (3D space).

Some songs cluster along a specific direction.

PCA finds a new 2D plane (PC1 & PC2) that captures most of the variation.

Instead of using 3D data, we now describe songs in a 2D plane, losing minimal information.

After PCA (Reduced Dimensions)
Song	PC1 (Main Trend)	PC2 (Secondary Trend)
A	1.45	-0.30
B	0.92	0.12

📌 Summary of Geometric Intuition
Concept	Geometric Meaning
Principal Components	New rotated axes capturing maximum variance
Dimensionality Reduction	Finding the best-fit lower-dimensional subspace
Eigenvectors	Directions (axes) of maximum variance
Eigenvalues	Amount of variance captured by each principal component
Flattening Data	Reducing dimensionality while preserving patterns


13. What is the difference between Feature Selection and Feature Extraction?


Feature Selection vs. Feature Extraction
Both Feature Selection and Feature Extraction aim to reduce dimensionality, but they do so in different ways.

1️⃣ Feature Selection (Choosing the Best Features)
Feature selection involves keeping a subset of the original features while discarding irrelevant or redundant ones.
👉 No transformation is applied—we simply select the most useful features.

🔹 Methods of Feature Selection:
✅ Filter Methods (Statistical tests) → Use correlation, chi-square, or mutual information to rank and select features.
✅ Wrapper Methods (Model-based) → Train models with different feature subsets and select the best-performing set (e.g., Recursive Feature Elimination).
✅ Embedded Methods (Built-in model selection) → Models like Lasso Regression automatically remove irrelevant features.

📌 Example in Your Spotify Dataset

Suppose you have 30 audio features (e.g., Tempo, Danceability, Loudness, Popularity).

Using feature selection, you might find that only 10 are relevant for predicting song popularity and drop the rest.

2️⃣ Feature Extraction (Creating New Features)
Feature extraction transforms the original features into a new set of features that better represent the data.
👉 Instead of selecting from existing features, we create new features that capture the most important information.

🔹 Methods of Feature Extraction:
✅ Principal Component Analysis (PCA) → Creates new components that combine multiple features while preserving variance.
✅ t-SNE, UMAP → Non-linear techniques for reducing high-dimensional data into 2D/3D.
✅ Autoencoders (Deep Learning) → Learn compressed feature representations automatically.

📌 Example in Your Spotify Dataset

Instead of selecting features like Tempo and Danceability, PCA might create a new feature PC1 that represents a mix of both.

This new feature captures most of the variance, reducing redundancy.

📌 Key Differences:
Aspect	Feature Selection	Feature Extraction
Definition	Selects important features from the original dataset	Creates new features from existing ones
Approach	Keeps original features, removes irrelevant ones	Transforms data into a new feature space
Examples	Filter Methods, Wrapper Methods, Embedded Methods	PCA, t-SNE, Autoencoders
Information Loss?	Minimal if done well	Possible, but aims to retain key variance
Used When	Some features are irrelevant or redundant	Features are correlated, and we need a lower-dimensional representation
💡 When to Use Each?
✅ Use Feature Selection if you have many features but only a few are useful (e.g., some audio features may be irrelevant for popularity prediction).
✅ Use Feature Extraction if features are highly correlated or redundant (e.g., PCA can reduce 30 correlated audio features into 5 meaningful ones).

14. What are Eigenvalues and Eigenvectors in PCA?


Eigenvalues and Eigenvectors in PCA
Eigenvalues and eigenvectors are fundamental to Principal Component Analysis (PCA) because they help find the principal components—the new feature axes that capture the most variance in the data.

📌 What Are Eigenvalues and Eigenvectors?
1️⃣ Eigenvectors (Direction of Data Spread)
Eigenvectors define the new axes (principal components) where the data varies the most.

They are unit vectors (direction-only, no magnitude).

Each eigenvector is a linear combination of the original features.

2️⃣ Eigenvalues (Importance of Each Eigenvector)
Eigenvalues tell us how much variance (or information) each eigenvector captures.

Larger eigenvalues = More important principal component.

The sum of all eigenvalues gives the total variance in the data.

📌 Think of eigenvectors as "directions" and eigenvalues as "importance" of those directions.

📌 How Do Eigenvalues & Eigenvectors Work in PCA?
PCA transforms the data by:

Computing the Covariance Matrix (
Σ
Σ) → Measures relationships between features.

Solving for Eigenvalues & Eigenvectors → Using the equation:

Σ
𝑣
=
𝜆
𝑣
Σv=λv
where:

𝑣
v = Eigenvector (new axis direction)

𝜆
λ = Eigenvalue (variance along that direction)

Sorting Eigenvalues in Descending Order → The eigenvector with the largest eigenvalue becomes the first principal component (PC1).

Selecting the Top K Principal Components → Keep the eigenvectors that capture most of the variance.

🎵 Example: PCA on a Spotify Dataset
Original Features (Before PCA)
Song	Danceability	Tempo	Energy
A	0.8	120	0.9
B	0.6	100	0.7
C	0.9	130	1.0
Eigenvectors & Eigenvalues from PCA
Principal Component	Eigenvector (Direction)	Eigenvalue (Variance Captured)
PC1 (Main Trend)	(0.5, 0.7, 0.5)	3.2 (Most Important)
PC2 (Secondary Trend)	(-0.6, 0.2, 0.8)	1.1
PC3 (Least Important)	(0.7, -0.7, 0.2)	0.3
👉 PC1 captures the most variance, so we might reduce the dataset to just PC1 & PC2, removing PC3.

📌 Summary of Eigenvalues & Eigenvectors in PCA
Concept	Meaning in PCA
Eigenvectors	New axes (principal components) along which data is projected
Eigenvalues	Amount of variance captured by each eigenvector
Larger Eigenvalue	More important principal component (captures more variance)
Sorting Eigenvalues	Helps select the most informative components
Dimensionality Reduction	Keep top K principal components (largest eigenvalues)


15. How do you decide the number of components to keep in PCA?


How to Decide the Number of Components to Keep in PCA?
When using PCA, we need to decide how many principal components (PCs) to keep while retaining most of the important information. The goal is to reduce dimensionality while minimizing information loss.

📌 Methods for Choosing the Optimal Number of Components
1️⃣ Explained Variance (Cumulative Variance) – The "95% Rule"
Each principal component captures a certain amount of variance (information).

We compute the cumulative variance and keep the top K components that capture at least 95% of the total variance.

✅ Steps:

Compute the variance explained by each principal component:

𝑃
𝑉
𝐸
=
𝜆
𝑖
∑
𝜆
PVE=
∑λ
λ
i
​

​

Sum the top K principal components until reaching ≥ 95% variance.

📌 Example (Variance Explained by Each PC):

PC	Eigenvalue	% Variance Explained	Cumulative Variance
PC1	4.2	55%	55%
PC2	2.3	30%	85%
PC3	0.9	10%	95% ✅
PC4	0.6	5%	100%
👉 Here, PC1 + PC2 + PC3 explain 95% of the variance, so we keep 3 components and drop PC4.

2️⃣ Scree Plot (Elbow Method)
A Scree Plot shows the variance explained by each principal component.

Look for the "elbow point", where the variance gain starts flattening.

✅ Steps:

Plot PC number (X-axis) vs. Explained Variance (%) (Y-axis).

Identify the elbow point, where adding more PCs gives diminishing returns.

📌 Example Scree Plot:
📉 Sharp drop → Elbow → Small gains after that

lua
Copy
Edit
Variance (%)
  |
  |  *
  |  *  *
  |  *  *  *
  |  *  *  *  *
  |-----------------
     PC1 PC2 PC3 PC4 ...
👉 The elbow is at PC3, so we keep 3 components.

3️⃣ Cross-Validation (Performance-Based)
If PCA is used before a machine learning model (e.g., KNN, regression), test different values of K (number of components) using cross-validation to find the best balance between accuracy and efficiency.

✅ Steps:

Train a model using different K values.

Compare performance (accuracy, RMSE, etc.).

Choose the smallest K that gives the best performance.

🎵 Example: Choosing K for Your Spotify Dataset
Imagine we apply PCA to 30 audio features (Danceability, Tempo, Energy, etc.).

The Explained Variance shows 95% variance at K = 5 PCs.

The Scree Plot has an elbow at PC5.

A KNN model performs best when using 5 principal components.

📌 Final Decision: Keep 5 PCs instead of 30 original features! 🚀

📌 Summary: How to Choose the Number of Components?
Method	What It Does	When to Use
Explained Variance (95% Rule)	Keep components that explain 95% of variance	Quick, standard approach
Scree Plot (Elbow Method)	Look for the point where variance gain flattens	Visual, intuitive
Cross-Validation	Test performance of different K values	If using PCA before ML models

16. Can PCA be used for classification?


Can PCA Be Used for Classification?
Yes, PCA can be used in a classification pipeline, but not directly as a classifier. Instead, PCA helps by reducing dimensionality and improving model efficiency before applying classification algorithms.

📌 How PCA Helps in Classification?
1️⃣ Reduces Dimensionality → Helps classifiers perform better by removing redundant features.
2️⃣ Removes Noise & Correlation → Eliminates irrelevant variations in data.
3️⃣ Speeds Up Computation → Reducing features makes training classifiers faster.
4️⃣ Improves Visualization → PCA can project high-dimensional data into 2D or 3D for better insights.

🎵 Example: PCA + Classification in Your Spotify Dataset
Imagine you want to classify songs as “Hit” or “Flop” based on audio features (e.g., Tempo, Energy, Loudness).

✅ Without PCA → 30 raw features → High complexity, risk of overfitting
✅ With PCA → Reduce to 5 PCs → Faster and better generalization

📌 Steps:

Apply PCA to reduce 30 features → Keep top K components.

Train a classifier (e.g., KNN, SVM, Random Forest) using the reduced dataset.

Evaluate model performance → Compare accuracy before and after PCA.

📌 When Should You Use PCA in Classification?
✅ Use PCA if:

The dataset has many correlated features.

You need to speed up model training.

You want to visualize high-dimensional data (e.g., PCA to 2D before classification).

❌ Avoid PCA if:

Features are already optimized and PCA removes useful information.

You need interpretability (PCA transforms features, making them hard to interpret).

📌 Summary: PCA for Classification
Aspect	Details
Direct Classifier?	❌ No, PCA is not a classification algorithm
Used for?	✅ Feature reduction before classification
Helps With?	✅ Speed, accuracy, visualization
Best For?	✅ High-dimensional, correlated datasets
Common Classifiers	✅ KNN, SVM, Logistic Regression, Random Forest

17. What are the limitations of PCA?


Limitations of PCA (Principal Component Analysis)
While PCA is a powerful technique for dimensionality reduction, it has several limitations that can affect its effectiveness.

📌 1️⃣ PCA Assumes Linearity
PCA assumes that the data has linear relationships between features.

If the structure in the data is nonlinear, PCA may not be effective.

❌ Example: If audio features in your Spotify dataset have complex, nonlinear interactions, PCA might not capture them well.

✅ Alternative: Use t-SNE, UMAP, or Autoencoders for nonlinear data.

📌 2️⃣ Loss of Interpretability
PCA transforms original features into new principal components, making them harder to interpret.

❌ Example: Instead of "Danceability" and "Tempo," you get PC1, PC2, etc., which don’t have direct meanings.

✅ Solution: Analyze feature contributions (e.g., loadings) to understand what each PC represents.

📌 3️⃣ Sensitive to Feature Scaling
PCA is affected by differences in scale—features with larger magnitudes dominate the principal components.

❌ Example: "Tempo (100-200 BPM)" may dominate "Danceability (0-1)" unless scaled.

✅ Solution: Always apply feature scaling (StandardScaler or MinMaxScaler) before PCA.

📌 4️⃣ Can Remove Important Features
PCA removes less important dimensions, but sometimes these "low variance" features contain useful classification information.

❌ Example: A rare but important feature (e.g., a special beat pattern in your Spotify dataset) might be discarded.

✅ Solution: Compare PCA performance with and without using classification accuracy as a metric.

📌 5️⃣ Assumes Gaussian Distribution
PCA works best when data is normally distributed.

❌ Example: If your Spotify dataset has skewed distributions (e.g., Loudness or Popularity is highly skewed), PCA might not work well.

✅ Solution: Try log transformations or other dimensionality reduction techniques like ICA (Independent Component Analysis).

📌 6️⃣ Computationally Expensive for Large Datasets
PCA requires computing the covariance matrix and eigenvalues, which can be slow for high-dimensional data.

❌ Example: If your dataset has thousands of songs with 50+ audio features, PCA might be computationally expensive.

✅ Solution: Use Incremental PCA or Randomized PCA for large datasets.

📌 Summary: When to Be Cautious with PCA?
Limitation	Why It’s a Problem	Possible Solution
Assumes Linearity	Can't capture complex patterns	Try t-SNE, UMAP, Autoencoders
Loss of Interpretability	PCs don’t have direct meaning	Check feature contributions (loadings)
Sensitive to Scaling	Large-magnitude features dominate	Standardize data (StandardScaler)
Removes Important Features	Low-variance but useful features get dropped	Compare performance with & without PCA
Assumes Gaussian Data	Works best on normally distributed data	Use log transforms or ICA
Computational Cost	Slow on high-dimensional datasets	Use Incremental/Randomized PCA

18. How do KNN and PCA complement each other?


How Do KNN and PCA Complement Each Other?
K-Nearest Neighbors (KNN) and Principal Component Analysis (PCA) are often used together in machine learning pipelines to improve classification performance, especially when dealing with high-dimensional data.

📌 How PCA Helps KNN?
✅ 1️⃣ Reduces Dimensionality → Faster KNN Computation

KNN is computationally expensive because it calculates distances between all points.

PCA reduces the number of features, making distance calculations faster.

Example: If your Spotify dataset has 30 audio features, PCA can reduce it to 5 PCs, speeding up KNN.

✅ 2️⃣ Removes Noise & Correlation → Improves KNN Accuracy

PCA removes redundant & correlated features, helping KNN focus on meaningful variations.

Example: "Tempo" and "Beat Strength" might be highly correlated—PCA merges them into a single PC.

✅ 3️⃣ Avoids Curse of Dimensionality → Better Distance Metrics

KNN performs poorly when dimensions are too high because distances become meaningless.

PCA reduces dimensions, making Euclidean distance (used in KNN) more reliable.

✅ 4️⃣ Enables Data Visualization → Better Model Understanding

PCA allows 2D/3D visualization of high-dimensional data, making it easier to analyze clusters before applying KNN.

📌 How KNN Works After PCA?
Apply PCA → Reduce N features to K principal components.

Use KNN on Transformed Data → Classify data using the reduced feature set.

Evaluate Performance → Compare accuracy before & after PCA.

🎵 Example: PCA + KNN on Spotify Data (Classifying “Hit” vs. “Flop” Songs)
Without PCA:

30 audio features → Slow KNN training, risk of overfitting.

KNN might struggle due to irrelevant/correlated features.

With PCA:

Reduce to 5 principal components (95% variance retained).

Faster KNN and better classification accuracy.

📌 When to Use PCA Before KNN?
Scenario	Should You Use PCA?
High-dimensional data (D > 10)	✅ Yes, PCA reduces dimensions for better KNN
Highly correlated features	✅ Yes, PCA removes redundancy
Small dataset with few features	❌ No, PCA may remove useful information
KNN performing poorly due to curse of dimensionality	✅ Yes, PCA can improve distance calculations

19. How does KNN handle missing values in a dataset?


How Does KNN Handle Missing Values in a Dataset?
K-Nearest Neighbors (KNN) does not inherently handle missing values, but there are effective ways to deal with them before or during KNN classification/regression.

📌 1️⃣ Common Methods to Handle Missing Values in KNN
✅ 1. Remove Rows with Missing Values

If only a few rows have missing data, you can drop them.

❌ Risk: If too many rows are removed, it may reduce dataset quality.

Example: If 5 out of 500 songs in your Spotify dataset have missing values, you can remove them without major impact.

✅ 2. Impute Missing Values Using Mean/Median/Mode

Replace missing values with the mean (for numerical data) or mode (for categorical data).

Example: If "Tempo" is missing for a song, replace it with the average tempo of other songs.

❌ Risk: If data is not normally distributed, mean imputation might not be accurate.

✅ 3. Use KNN Imputation (Best for Missing Data in KNN)

Instead of using the mean, find K nearest neighbors and use their values to fill in missing data.

Steps:

Find K most similar rows (excluding the missing feature).

Compute the average value of the missing feature from neighbors.

Replace the missing value with this computed value.

Example: If a song is missing its "Energy" value, KNN finds similar songs based on other features and imputes "Energy" accordingly.

✅ 4. Use a Machine Learning Model for Imputation

Train a regression model to predict missing values using other features.

Example: Use Linear Regression to predict missing "Loudness" values from correlated features like "Energy" and "Danceability."

📌 2️⃣ Which Method Should You Use?
Scenario	Best Method
Few missing values (<5%)	Drop rows
Continuous numerical data	Mean/Median imputation
Categorical data	Mode imputation
Large dataset, missing at random	KNN Imputation ✅
Highly structured relationships	Machine Learning-based imputation
📌 3️⃣ Summary: Handling Missing Values in KNN
Method	Pros	Cons
Drop Rows	Simple, quick	Data loss
Mean/Median Imputation	Fast, works for numerical data	Ignores relationships between features
KNN Imputation	Captures patterns from similar data	Computationally expensive
ML-based Imputation	More accurate for complex data	Requires extra model training


20. What are the key differences between PCA and Linear Discriminant Analysis (LDA)?


Key Differences Between PCA and LDA
Both PCA (Principal Component Analysis) and LDA (Linear Discriminant Analysis) are dimensionality reduction techniques, but they serve different purposes.

📌 1️⃣ Purpose: Unsupervised vs. Supervised
Method	Type	Main Goal
PCA	Unsupervised	Maximizes variance in the data
LDA	Supervised	Maximizes class separability
✅ PCA: Reduces dimensions by capturing the directions with the most variance—ignores class labels.
✅ LDA: Finds directions that best separate classes—uses class labels.

📌 Example: If you want to reduce features in your Spotify dataset, PCA will preserve overall structure, while LDA will focus on distinguishing "Hit" vs. "Flop" songs.

📌 2️⃣ How They Work
Method	What It Finds?	Transformation
PCA	Eigenvectors of covariance matrix	Projects data onto new axes with max variance
LDA	Eigenvectors of class separation	Projects data onto axes that maximize class separation
✅ PCA Steps:

Compute the covariance matrix.

Find eigenvectors (principal components).

Project data onto the top K components.

✅ LDA Steps:

Compute between-class and within-class scatter matrices.

Solve for eigenvectors that maximize class separation.

Project data onto LDA components (≤ number of classes - 1).

📌 3️⃣ Number of Components: PCA Can Be Larger Than LDA
PCA: Can have as many components as the number of original features.

LDA: Maximum C - 1 components (where C = number of classes).

📌 Example:
If your dataset has 10 features and 3 classes:

PCA can have up to 10 components.

LDA can have at most 2 components (C-1 = 3-1).

📌 4️⃣ When to Use PCA vs. LDA?
Scenario	Use PCA?	Use LDA?
Unlabeled data	✅ Yes	❌ No (LDA needs labels)
Dimensionality reduction for any purpose	✅ Yes	❌ No
Feature extraction for classification	✅ Yes	✅ Yes
Maximizing class separation	❌ No	✅ Yes
When features are correlated	✅ Yes	✅ Yes
📌 5️⃣ Example: PCA vs. LDA in Spotify Dataset
🔹 PCA: If you want to reduce 30 audio features to 5 principal components, PCA will find the most informative features based on variance.
🔹 LDA: If you want to classify songs as "Hit" or "Flop", LDA will find the features that best separate these two classes.


Practical

21. Train a KNN Classifier on the Iris dataset and print model accuracy?
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features for better performance
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train the KNN Classifier with k=3
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

# Make predictions on the test set
y_pred = knn.predict(X_test)

# Calculate and print model accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")

22. Train a KNN Regressor on a synthetic dataset and evaluate using Mean Squared Error (MSE)?
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error

# Generate a synthetic dataset
np.random.seed(42)
X = np.sort(5 * np.random.rand(100, 1), axis=0)  # Feature: Random values between 0 and 5
y = np.sin(X).ravel() + np.random.normal(0, 0.2, X.shape[0])  # Target with noise

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train the KNN Regressor with k=5
knn_reg = KNeighborsRegressor(n_neighbors=5)
knn_reg.fit(X_train, y_train)

# Make predictions on the test set
y_pred = knn_reg.predict(X_test)

# Calculate and print Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.4f}")

# Plot the results
plt.scatter(X_test, y_test, color='blue', label="Actual")
plt.scatter(X_test, y_pred, color='red', label="Predicted")
plt.xlabel("Feature")
plt.ylabel("Target")
plt.legend()
plt.title("KNN Regression Results")
plt.show()


23. Train a KNN Classifier using different distance metrics (Euclidean and Manhattan) and compare accuracy?
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features for better performance
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train KNN Classifier with Euclidean distance (default)
knn_euclidean = KNeighborsClassifier(n_neighbors=3, metric='euclidean')
knn_euclidean.fit(X_train, y_train)
y_pred_euclidean = knn_euclidean.predict(X_test)
accuracy_euclidean = accuracy_score(y_test, y_pred_euclidean)

# Train KNN Classifier with Manhattan distance
knn_manhattan = KNeighborsClassifier(n_neighbors=3, metric='manhattan')
knn_manhattan.fit(X_train, y_train)
y_pred_manhattan = knn_manhattan.predict(X_test)
accuracy_manhattan = accuracy_score(y_test, y_pred_manhattan)

# Print accuracy comparison
print(f"Accuracy with Euclidean Distance: {accuracy_euclidean:.4f}")
print(f"Accuracy with Manhattan Distance: {accuracy_manhattan:.4f}")


24. Train a KNN Classifier with different values of K and visualize decision boundaried?
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier

# Load the Iris dataset
iris = load_iris()
X = iris.data[:, :2]  # Use only the first two features for 2D visualization
y = iris.target

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Function to plot decision boundaries
def plot_decision_boundary(knn, X, y, title):
    h = 0.02  # Step size in the mesh
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))

    # Predict class for each point in the meshgrid
    Z = knn.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    # Plot decision boundary
    plt.contourf(xx, yy, Z, alpha=0.3)
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolor='k', cmap=plt.cm.Paired)
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.title(title)

# Train KNN with different values of K and plot decision boundaries
plt.figure(figsize=(12, 6))
for i, k in enumerate([1, 5, 10], 1):
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X_train, y_train)

    plt.subplot(1, 3, i)
    plot_decision_boundary(knn, X_train, y_train, f'KNN Decision Boundary (k={k})')

plt.tight_layout()
plt.show()

25 .Apply Feature Scaling before training a KNN model and compare results with unscaled data?
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train KNN without feature scaling
knn_unscaled = KNeighborsClassifier(n_neighbors=3)
knn_unscaled.fit(X_train, y_train)
y_pred_unscaled = knn_unscaled.predict(X_test)
accuracy_unscaled = accuracy_score(y_test, y_pred_unscaled)

# Apply Standardization (Feature Scaling)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train KNN with feature scaling
knn_scaled = KNeighborsClassifier(n_neighbors=3)
knn_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = knn_scaled.predict(X_test_scaled)
accuracy_scaled = accuracy_score(y_test, y_pred_scaled)

# Print results
print(f"Accuracy without Scaling: {accuracy_unscaled:.4f}")
print(f"Accuracy with Scaling: {accuracy_scaled:.4f}")

26. Train a PCA model on synthetic data and print the explained variance ratio for each component?
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Generate synthetic data (100 samples, 5 features)
np.random.seed(42)
X = np.random.rand(100, 5) * 10  # Random values between 0 and 10

# Standardize the data (PCA works best with scaled data)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Train PCA model (retain all components)
pca = PCA(n_components=X.shape[1])
pca.fit(X_scaled)

# Print explained variance ratio for each component
explained_variance = pca.explained_variance_ratio_
for i, var in enumerate(explained_variance, 1):
    print(f"Principal Component {i}: {var:.4f}")

# Plot explained variance
plt.figure(figsize=(8, 5))
plt.bar(range(1, len(explained_variance) + 1), explained_variance, alpha=0.7, color='b', label='Explained Variance')
plt.xlabel('Principal Component')
plt.ylabel('Variance Ratio')
plt.title('Explained Variance Ratio of PCA Components')
plt.legend()
plt.show()


27. Apply PCA before training a KNN Classifier and compare accuracy with and without PCA?
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train KNN without PCA
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train_scaled, y_train)
y_pred = knn.predict(X_test_scaled)
accuracy_without_pca = accuracy_score(y_test, y_pred)

# Apply PCA (retain 95% of variance)
pca = PCA(n_components=0.95)
X_train_pca = pca.fit_transform(X_train_scaled)
X_test_pca = pca.transform(X_test_scaled)

# Train KNN with PCA-transformed data
knn_pca = KNeighborsClassifier(n_neighbors=3)
knn_pca.fit(X_train_pca, y_train)
y_pred_pca = knn_pca.predict(X_test_pca)
accuracy_with_pca = accuracy_score(y_test, y_pred_pca)

# Print comparison results
print(f"Accuracy without PCA: {accuracy_without_pca:.4f}")
print(f"Accuracy with PCA: {accuracy_with_pca:.4f}")

# Print explained variance
print(f"Explained Variance Ratio: {pca.explained_variance_ratio_}")
print(f"Number of Principal Components Retained: {pca.n_components_}")

28. Perform Hyperparameter Tuning on a KNN Classifier using GridSearchCV?
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Define KNN model
knn = KNeighborsClassifier()

# Define hyperparameters to tune
param_grid = {
    'n_neighbors': [1, 3, 5, 7, 9, 11],  # Different values of K
    'metric': ['euclidean', 'manhattan']  # Distance metrics
}

# Perform GridSearchCV
grid_search = GridSearchCV(knn, param_grid, cv=5, scoring='accuracy', n_jobs=-1)
grid_search.fit(X_train_scaled, y_train)

# Get best model
best_knn = grid_search.best_estimator_

# Predict using the best model
y_pred_best = best_knn.predict(X_test_scaled)

# Print results
print(f"Best Parameters: {grid_search.best_params_}")
print(f"Best Cross-Validation Accuracy: {grid_search.best_score_:.4f}")
print(f"Test Set Accuracy with Best Model: {accuracy_score(y_test, y_pred_best):.4f}")


29. Train a KNN Classifier and check the number of misclassified samples?
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train KNN Classifier with k=3
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train_scaled, y_train)

# Predict on test set
y_pred = knn.predict(X_test_scaled)

# Calculate misclassified samples
misclassified_samples = (y_test != y_pred).sum()

# Print results
print(f"Total Test Samples: {len(y_test)}")
print(f"Misclassified Samples: {misclassified_samples}")
print(f"Test Set Accuracy: {accuracy_score(y_test, y_pred):.4f}")


30. Train a PCA model and visualize the cumulative explained variance.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
X = iris.data

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA
pca = PCA()
pca.fit(X_scaled)

# Compute cumulative explained variance
cumulative_variance = np.cumsum(pca.explained_variance_ratio_)

# Plot cumulative explained variance
plt.figure(figsize=(8, 5))
plt.plot(range(1, len(cumulative_variance) + 1), cumulative_variance, marker='o', linestyle='--', color='b')
plt.xlabel('Number of Principal Components')
plt.ylabel('Cumulative Explained Variance')
plt.title('PCA Cumulative Explained Variance')
plt.axhline(y=0.95, color='r', linestyle='--', label='95% Variance')
plt.legend()
plt.grid(True)
plt.show()


31. Train a KNN Classifier using different values of the weights parameter (uniform vs. distance) and compare  accuracy?
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train KNN with uniform weights (equal weight to all neighbors)
knn_uniform = KNeighborsClassifier(n_neighbors=5, weights='uniform')
knn_uniform.fit(X_train_scaled, y_train)
y_pred_uniform = knn_uniform.predict(X_test_scaled)
accuracy_uniform = accuracy_score(y_test, y_pred_uniform)

# Train KNN with distance-based weights (closer neighbors contribute more)
knn_distance = KNeighborsClassifier(n_neighbors=5, weights='distance')
knn_distance.fit(X_train_scaled, y_train)
y_pred_distance = knn_distance.predict(X_test_scaled)
accuracy_distance = accuracy_score(y_test, y_pred_distance)

# Print comparison results
print(f"Accuracy with 'uniform' weights: {accuracy_uniform:.4f}")
print(f"Accuracy with 'distance' weights: {accuracy_distance:.4f}")


32. Train a KNN Regressor and analyze the effect of different K values on performance?
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic dataset
np.random.seed(42)
X = np.linspace(0, 10, 100).reshape(-1, 1)  # Features
y = np.sin(X).ravel() + np.random.normal(0, 0.1, X.shape[0])  # Target with noise

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Try different K values and evaluate performance
k_values = [1, 3, 5, 7, 10, 15]
mse_values = []

for k in k_values:
    knn_regressor = KNeighborsRegressor(n_neighbors=k)
    knn_regressor.fit(X_train_scaled, y_train)
    y_pred = knn_regressor.predict(X_test_scaled)
    mse = mean_squared_error(y_test, y_pred)
    mse_values.append(mse)
    print(f"K={k}, MSE={mse:.4f}")

# Plot MSE vs K values
plt.figure(figsize=(8, 5))
plt.plot(k_values, mse_values, marker='o', linestyle='--', color='b')
plt.xlabel('Number of Neighbors (K)')
plt.ylabel('Mean Squared Error (MSE)')
plt.title('Effect of K on KNN Regression Performance')
plt.grid(True)
plt.show()


33. Implement KNN Imputation for handling missing values in a dataset?
import numpy as np
import pandas as pd
from sklearn.impute import KNNImputer

# Create a synthetic dataset with missing values
data = {
    'Feature1': [5, 2, np.nan, 8, 4, 7, np.nan, 6, 3, 9],
    'Feature2': [1, np.nan, 5, 7, np.nan, 6, 8, 3, 4, np.nan],
    'Feature3': [np.nan, 3, 6, 2, 9, 5, 7, np.nan, 8, 4]
}

df = pd.DataFrame(data)
print("Original Dataset with Missing Values:")
print(df)

# Apply KNN Imputation (using k=3 neighbors)
imputer = KNNImputer(n_neighbors=3)
df_imputed = pd.DataFrame(imputer.fit_transform(df), columns=df.columns)

print("\nDataset After KNN Imputation:")
print(df_imputed)

34. Train a PCA model and visualize the data projection onto the first two principal components?
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
target_names = iris.target_names

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA and keep the first 2 principal components
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

# Create a scatter plot of the first two principal components
plt.figure(figsize=(8, 6))
sns.scatterplot(x=X_pca[:, 0], y=X_pca[:, 1], hue=y, palette='viridis', style=y, legend=True)
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('PCA Projection of the Iris Dataset')
plt.legend(labels=target_names)
plt.grid(True)
plt.show()

35. Train a KNN Classifier using the KD Tree and Ball Tree algorithms and compare performance?
import time
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Define KNN models using 'kd_tree' and 'ball_tree'
algorithms = ['kd_tree', 'ball_tree']
results = {}

for algo in algorithms:
    start_time = time.time()

    knn = KNeighborsClassifier(n_neighbors=5, algorithm=algo)
    knn.fit(X_train_scaled, y_train)

    y_pred = knn.predict(X_test_scaled)
    accuracy = accuracy_score(y_test, y_pred)

    elapsed_time = time.time() - start_time
    results[algo] = {'accuracy': accuracy, 'time': elapsed_time}

# Print comparison results
for algo, metrics in results.items():
    print(f"Algorithm: {algo}")
    print(f"  Accuracy: {metrics['accuracy']:.4f}")
    print(f"  Training Time: {metrics['time']:.6f} seconds\n")


36. Train a PCA model on a high-dimensional dataset and visualize the Scree plot?
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_digits

# Load a high-dimensional dataset (Digits dataset with 64 features)
digits = load_digits()
X = digits.data

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA
pca = PCA()
pca.fit(X_scaled)

# Compute explained variance ratio
explained_variance = pca.explained_variance_ratio_

# Plot the Scree plot
plt.figure(figsize=(8, 5))
plt.plot(range(1, len(explained_variance) + 1), explained_variance, marker='o', linestyle='--', color='b')
plt.xlabel('Principal Component')
plt.ylabel('Explained Variance Ratio')
plt.title('Scree Plot of PCA on High-Dimensional Data')
plt.grid(True)
plt.show()


37. Train a KNN Classifier and evaluate performance using Precision, Recall, and F1-Score?
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train KNN Classifier with k=5
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train_scaled, y_train)

# Predict on test set
y_pred = knn.predict(X_test_scaled)

# Evaluate using Precision, Recall, and F1-Score
print("Classification Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))


38. Train a PCA model and analyze the effect of different numbers of components on accuracy?
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Try different numbers of PCA components and evaluate accuracy
component_range = range(1, X.shape[1] + 1)
accuracy_scores = []

for n_components in component_range:
    # Apply PCA
    pca = PCA(n_components=n_components)
    X_train_pca = pca.fit_transform(X_train_scaled)
    X_test_pca = pca.transform(X_test_scaled)

    # Train KNN classifier
    knn = KNeighborsClassifier(n_neighbors=5)
    knn.fit(X_train_pca, y_train)

    # Evaluate accuracy
    y_pred = knn.predict(X_test_pca)
    accuracy = accuracy_score(y_test, y_pred)
    accuracy_scores.append(accuracy)
    print(f"PCA Components: {n_components}, Accuracy: {accuracy:.4f}")

# Plot accuracy vs. number of PCA components
plt.figure(figsize=(8, 5))
plt.plot(component_range, accuracy_scores, marker='o', linestyle='--', color='b')
plt.xlabel('Number of PCA Components')
plt.ylabel('KNN Accuracy')
plt.title('Effect of PCA Components on Classification Accuracy')
plt.grid(True)
plt.show()

39. Train a KNN Classifier with different leaf_size values and compare accuracy?
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the data
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Try different leaf_size values and evaluate accuracy
leaf_sizes = [5, 10, 20, 30, 50]
accuracy_scores = []

for leaf_size in leaf_sizes:
    knn = KNeighborsClassifier(n_neighbors=5, leaf_size=leaf_size)
    knn.fit(X_train_scaled, y_train)

    # Predict and calculate accuracy
    y_pred = knn.predict(X_test_scaled)
    accuracy = accuracy_score(y_test, y_pred)
    accuracy_scores.append(accuracy)
    print(f"Leaf Size: {leaf_size}, Accuracy: {accuracy:.4f}")

# Plot accuracy vs. leaf_size values
plt.figure(figsize=(8, 5))
plt.plot(leaf_sizes, accuracy_scores, marker='o', linestyle='--', color='b')
plt.xlabel('Leaf Size')
plt.ylabel('KNN Accuracy')
plt.title('Effect of Leaf Size on KNN Classification Accuracy')
plt.grid(True)
plt.show()


 40. Train a PCA model and visualize how data points are transformed before and after PCA?
 import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target
target_names = iris.target_names

# Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA and reduce to 2 components
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

# Plot original data (first two features)
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=y, palette='viridis', style=y, legend=True)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Original Data (First Two Features)')
plt.legend(labels=target_names)

# Plot transformed data (First two PCA components)
plt.subplot(1, 2, 2)
sns.scatterplot(x=X_pca[:, 0], y=X_pca[:, 1], hue=y, palette='viridis', style=y, legend=True)
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('Data Transformed by PCA')
plt.legend(labels=target_names)

plt.tight_layout()
plt.show()


 41. Train a KNN Classifier on a real-world dataset (Wine dataset) and print classification report?
 from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import classification_report

# Load the Wine dataset
wine = load_wine()
X, y = wine.data, wine.target

# Split the dataset into training (80%) and testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train a KNN Classifier with k=5
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train_scaled, y_train)

# Predict on test set
y_pred = knn.predict(X_test_scaled)

# Print the classification report
print("Classification Report:")
print(classification_report(y_test, y_pred, target_names=wine.target_names))


 42. Train a KNN Regressor and analyze the effect of different distance metrics on prediction error?
 import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error
from sklearn.datasets import fetch_california_housing  # Alternative for Boston dataset

# Load the dataset (using California Housing as a substitute for Boston)
housing = fetch_california_housing()
X, y = housing.data, housing.target

# Split the dataset (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Define distance metrics to test
metrics = ['euclidean', 'manhattan']
errors = {}

for metric in metrics:
    # Train KNN Regressor
    knn = KNeighborsRegressor(n_neighbors=5, metric=metric)
    knn.fit(X_train_scaled, y_train)

    # Predict and calculate Mean Squared Error (MSE)
    y_pred = knn.predict(X_test_scaled)
    mse = mean_squared_error(y_test, y_pred)
    errors[metric] = mse
    print(f"Distance Metric: {metric}, MSE: {mse:.4f}")

# Plot the comparison
plt.figure(figsize=(6, 4))
plt.bar(errors.keys(), errors.values(), color=['blue', 'green'])
plt.xlabel('Distance Metric')
plt.ylabel('Mean Squared Error (MSE)')
plt.title('Effect of Distance Metrics on KNN Regression Error')
plt.show()


 43. Train a KNN Classifier and evaluate using ROC-AUC score?
 import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import roc_auc_score, roc_curve

# Load the Breast Cancer dataset
data = load_breast_cancer()
X, y = data.data, data.target  # Binary classification (0 = malignant, 1 = benign)

# Split the dataset (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train a KNN Classifier (k=5)
knn = KNeighborsClassifier(n_neighbors=5, probability=True)
knn.fit(X_train_scaled, y_train)

# Get predicted probabilities for the positive class
y_scores = knn.predict_proba(X_test_scaled)[:, 1]

# Compute ROC-AUC score
roc_auc = roc_auc_score(y_test, y_scores)
print(f"ROC-AUC Score: {roc_auc:.4f}")

# Compute ROC curve
fpr, tpr, _ = roc_curve(y_test, y_scores)

# Plot ROC Curve
plt.figure(figsize=(6, 5))
plt.plot(fpr, tpr, color='blue', label=f'KNN (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], linestyle='--', color='gray')  # Random guess line
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve for KNN Classifier')
plt.legend()
plt.grid(True)
plt.show()

  44. Train a PCA model and visualize the variance captured by each principal component?
  import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA (keep all components)
pca = PCA(n_components=X.shape[1])
X_pca = pca.fit_transform(X_scaled)

# Get explained variance ratio
explained_variance = pca.explained_variance_ratio_
cumulative_variance = np.cumsum(explained_variance)

# Plot Explained Variance Ratio
plt.figure(figsize=(8, 5))
plt.bar(range(1, len(explained_variance) + 1), explained_variance, alpha=0.7, label='Explained Variance')
plt.plot(range(1, len(explained_variance) + 1), cumulative_variance, marker='o', linestyle='--', color='r', label='Cumulative Variance')

plt.xlabel('Principal Component')
plt.ylabel('Variance Explained')
plt.title('Explained Variance by Principal Components')
plt.xticks(range(1, len(explained_variance) + 1))
plt.legend()
plt.grid(True)
plt.show()


 45. Train a KNN Classifier and perform feature selection before training?
 import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
data = load_breast_cancer()
X, y = data.data, data.target
feature_names = data.feature_names

# Split dataset (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Perform Feature Selection (Select Top 10 Features)
selector = SelectKBest(score_func=f_classif, k=10)
X_train_selected = selector.fit_transform(X_train_scaled, y_train)
X_test_selected = selector.transform(X_test_scaled)

# Get selected feature names
selected_features = feature_names[selector.get_support()]
print(f"Selected Features: {selected_features}")

# Train KNN Classifier (Before Feature Selection)
knn_all = KNeighborsClassifier(n_neighbors=5)
knn_all.fit(X_train_scaled, y_train)
y_pred_all = knn_all.predict(X_test_scaled)
accuracy_all = accuracy_score(y_test, y_pred_all)

# Train KNN Classifier (After Feature Selection)
knn_selected = KNeighborsClassifier(n_neighbors=5)
knn_selected.fit(X_train_selected, y_train)
y_pred_selected = knn_selected.predict(X_test_selected)
accuracy_selected = accuracy_score(y_test, y_pred_selected)

print(f"Accuracy Before Feature Selection: {accuracy_all:.4f}")
print(f"Accuracy After Feature Selection: {accuracy_selected:.4f}")

# Bar plot comparison
plt.figure(figsize=(6, 4))
plt.bar(['All Features', 'Selected Features'], [accuracy_all, accuracy_selected], color=['blue', 'green'])
plt.ylabel('Accuracy')
plt.title('KNN Accuracy Before and After Feature Selection')
plt.ylim(0.9, 1)  # Set y-axis range for better visualization
plt.grid(True)
plt.show()


 46. Train a PCA model and visualize the data reconstruction error after reducing dimensions?
 import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_digits
from sklearn.metrics import mean_squared_error

# Load the Digits dataset
digits = load_digits()
X, y = digits.data, digits.target  # X: pixel data, y: digit labels

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA with 10 components
n_components = 10
pca = PCA(n_components=n_components)
X_pca = pca.fit_transform(X_scaled)

# Reconstruct the data from PCA components
X_reconstructed = pca.inverse_transform(X_pca)

# Compute Reconstruction Error (Mean Squared Error)
reconstruction_error = mean_squared_error(X_scaled, X_reconstructed)
print(f"Reconstruction Error (MSE): {reconstruction_error:.4f}")

# Visualizing Original vs Reconstructed Images
fig, axes = plt.subplots(2, 5, figsize=(10, 5))

for i in range(5):
    # Original Image
    axes[0, i].imshow(X[i].reshape(8, 8), cmap='gray')
    axes[0, i].axis('off')
    axes[0, i].set_title("Original")

    # Reconstructed Image
    axes[1, i].imshow(X_reconstructed[i].reshape(8, 8), cmap='gray')
    axes[1, i].axis('off')
    axes[1, i].set_title("Reconstructed")

plt.suptitle(f"PCA Reconstruction with {n_components} Components", fontsize=14)
plt.tight_layout()
plt.show()


 47. Train a KNN Classifier and visualize the decision boundary?
 import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier

# Load the Iris dataset
iris = load_iris()
X, y = iris.data[:, :2]  # Select first two features for 2D visualization

# Split dataset (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train KNN Classifier
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train_scaled, y_train)

# Create a mesh grid for plotting decision boundary
x_min, x_max = X_train_scaled[:, 0].min() - 1, X_train_scaled[:, 0].max() + 1
y_min, y_max = X_train_scaled[:, 1].min() - 1, X_train_scaled[:, 1].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100), np.linspace(y_min, y_max, 100))

# Predict labels for each point in the mesh grid
Z = knn.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plot the decision boundary
plt.figure(figsize=(8, 6))
plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.Paired)
plt.scatter(X_train_scaled[:, 0], X_train_scaled[:, 1], c=y_train, edgecolor='k', cmap=plt.cm.Paired, label="Train")
plt.scatter(X_test_scaled[:, 0], X_test_scaled[:, 1], c=y_test, edgecolor='k', cmap=plt.cm.Paired, marker='x', label="Test")
plt.xlabel('Feature 1 (Standardized)')
plt.ylabel('Feature 2 (Standardized)')
plt.title('KNN Decision Boundary (k=5)')
plt.legend()
plt.grid(True)
plt.show()


 48. Train a PCA model and analyze the effect of different numbers of components on data variance.
 import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_breast_cancer

# Load the Breast Cancer dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Apply PCA (keep all components)
pca = PCA()
X_pca = pca.fit_transform(X_scaled)

# Compute explained variance ratio
explained_variance = pca.explained_variance_ratio_
cumulative_variance = np.cumsum(explained_variance)

# Plot Explained Variance vs. Number of Components
plt.figure(figsize=(8, 5))
plt.plot(range(1, len(explained_variance) + 1), cumulative_variance, marker='o', linestyle='--', color='b')
plt.axhline(y=0.95, color='r', linestyle='--', label='95% Variance Threshold')
plt.xlabel('Number of Principal Components')
plt.ylabel('Cumulative Explained Variance')
plt.title('Effect of PCA Components on Data Variance')
plt.grid(True)
plt.legend()
plt.show()

# Print the number of components required for 95% variance
n_components_95 = np.argmax(cumulative_variance >= 0.95) + 1
print(f"Number of components required to retain 95% variance: {n_components_95}")
