*1.What is a Support Vector Machine (SVM)?*

A Support Vector Machine (SVM) is a supervised machine learning algorithm used primarily for classification tasks, though it can also be used for regression. Its main goal is to find the optimal hyperplane that best separates data points of different classes in a high-dimensional space.

Key Concepts:
Hyperplane: A decision boundary that separates classes. In 2D, it's a line; in 3D, a plane; and in higher dimensions, it's called a hyperplane.

Support Vectors: The data points that are closest to the hyperplane. These points are critical because they define the position and orientation of the hyperplane.

Margin: The distance between the hyperplane and the nearest data points from each class. SVM aims to maximize this margin for better generalization.

Linear vs Non-linear SVM:

Linear SVM: Works well when data is linearly separable.

Non-linear SVM: Uses kernel functions (like RBF, polynomial, sigmoid) to transform data into a higher-dimensional space where it becomes linearly separable.

Kernel Trick: A mathematical technique that allows SVMs to operate in a high-dimensional space without explicitly computing the coordinates. It enables SVMs to handle non-linearly separable data efficiently.

*2. What is the difference between Hard Margin and Soft Margin SVM ?*

The difference between Hard Margin and Soft Margin SVM lies in how strictly the algorithm separates the classes when training the model.

🔷 Hard Margin SVM
Assumes that the data is perfectly linearly separable.

The algorithm finds the maximum-margin hyperplane with no tolerance for misclassification.

No data points are allowed inside the margin or on the wrong side of the hyperplane.

Sensitive to noise and outliers—even a single misclassified point can make a hard margin infeasible.

Use case: Clean, linearly separable datasets without outliers.

🔶 Soft Margin SVM
Allows some misclassifications to achieve better generalization on real-world, noisy data.

Introduces a regularization parameter (C) that controls the trade-off between maximizing the margin and minimizing classification errors:

High C → less tolerance for misclassification (closer to hard margin).

Low C → more tolerance for errors, allowing a wider margin.

More robust in practice and applicable to most real-world problems.

Use case: Datasets with noise, overlap, or imperfect separability.

*3. What is the mathematical intuition behind SVM ?*

The mathematical intuition behind SVM revolves around finding the optimal hyperplane that best separates data points from different classes by maximizing the margin—the distance between the hyperplane and the nearest points (support vectors) from each class.
SVMs maximize margin and use support vectors to define the decision boundary. The optimization balances margin width (simplicity) and classification error (accuracy), with extensions to non-linear problems via kernel functions

*4. What is the role of Lagrange Multipliers in SVM ?*

The role of Lagrange multipliers in SVM is to solve the constrained optimization problem efficiently using duality theory from convex optimization. This approach converts the original primal problem into a dual problem that's often easier to solve—especially when using kernels for non-linear SVMs.

🔹 Recap: The Primal Problem (Hard Margin SVM)
min
⁡
𝑤
,
𝑏

1
2
∥
𝑤
∥
2
subject to
𝑦
𝑖
(
𝑤
𝑇
𝑥
𝑖
+
𝑏
)
≥
1
∀
𝑖
w,b
min
​

2
1
​
 ∥w∥
2

subject to y
i
​
 (w
T
 x
i
​
 +b)≥1∀i
🔹 Step 1: Introduce Lagrange Multipliers
We incorporate the constraints using Lagrange multipliers
𝛼
𝑖
≥
0
α
i
​
 ≥0:

𝐿
(
𝑤
,
𝑏
,
𝛼
)
=
1
2
∥
𝑤
∥
2
−
∑
𝑖
=
1
𝑛
𝛼
𝑖
[
𝑦
𝑖
(
𝑤
𝑇
𝑥
𝑖
+
𝑏
)
−
1
]
L(w,b,α)=
2
1
​
 ∥w∥
2
 −
i=1
∑
n
​
 α
i
​
 [y
i
​
 (w
T
 x
i
​
 +b)−1]
Here:

𝐿
L is the Lagrangian function

𝛼
𝑖
α
i
​
  are Lagrange multipliers, one per constraint

🔹 Step 2: Solve the Dual Problem
We take the partial derivatives of
𝐿
L with respect to
𝑤
w and
𝑏
b, set them to zero, and substitute back into
𝐿
L, giving the dual optimization problem:

max
⁡
𝛼
∑
𝑖
=
1
𝑛
𝛼
𝑖
−
1
2
∑
𝑖
=
1
𝑛
∑
𝑗
=
1
𝑛
𝛼
𝑖
𝛼
𝑗
𝑦
𝑖
𝑦
𝑗
𝑥
𝑖
𝑇
𝑥
𝑗
subject to:
∑
𝑖
=
1
𝑛
𝛼
𝑖
𝑦
𝑖
=
0
,
𝛼
𝑖
≥
0
α
max
​
  
i=1
∑
n
​
 α
i
​
 −
2
1
​
  
i=1
∑
n
​
  
j=1
∑
n
​
 α
i
​
 α
j
​
 y
i
​
 y
j
​
 x
i
T
​
 x
j
​

subject to:
i=1
∑
n
​
 α
i
​
 y
i
​
 =0,α
i
​
 ≥0
This dual form has key advantages:

Depends only on dot products of input vectors, enabling the kernel trick.

Often more computationally efficient for high-dimensional data.


*5. What are Support Vectors in SVM ?*

Support Vectors in SVM are the critical data points that lie closest to the decision boundary (the hyperplane). They are the "supporting" elements that define the position and orientation of the optimal hyperplane.

🔹 Key Characteristics:
Closest Points to the Hyperplane:

These are the data points that lie exactly on the margin boundaries in the hard-margin case, or within or on the margin in the soft-margin case.

Non-zero Lagrange Multipliers:

In the dual formulation of SVM, only data points with non-zero
𝛼
𝑖
α
i
​
  are support vectors.

These are the only points that affect the model—others are irrelevant to the final decision boundary.

Decision Boundary Depends Only on Them:

The optimal hyperplane is completely determined by the support vectors. If you remove non-support vectors, the hyperplane remains unchanged.

*6. What is a Support Vector Classifier (SVC) ?*

A Support Vector Classifier (SVC) is the practical implementation of the Support Vector Machine (SVM) algorithm for classification tasks.

While "SVM" refers to the broader concept or algorithm (including regression and classification), SVC specifically refers to its use in classification, especially as implemented in tools like scikit-learn's sklearn.svm.SVC.

*7. What is a Support Vector Regressor (SVR) ?*

A Support Vector Regressor (SVR) is the regression counterpart of Support Vector Machines (SVM). Instead of classifying data into categories, SVR is used to predict continuous values.

🔹 Core Idea of SVR
SVR tries to find a function
𝑓
(
𝑥
)
f(x) that deviates from the actual target
𝑦
y by at most
𝜖
ϵ for all training data, while being as flat (simple) as possible.

Think of it as fitting a tube of width
2
𝜖
2ϵ around the data. The goal is to include as many data points as possible inside this tube (i.e., within the tolerance
𝜖
ϵ), and penalize those that fall outside.

*8. What is the Kernel Trick in SVM ?*

The Kernel Trick is a powerful mathematical technique in SVM that allows it to perform non-linear classification or regression without explicitly transforming the data into higher-dimensional space.

🔹 The Problem
Some datasets are not linearly separable in their original input space. To separate them with a linear hyperplane, you’d need to map them to a higher-dimensional space where they become linearly separable.

🔹 The Challenge
Explicitly mapping data to a higher-dimensional space (via a function
𝜙
(
𝑥
)
ϕ(x)) is:

Computationally expensive

Sometimes even infeasible, especially if the feature space is infinite-dimensional

🔹 The Solution: The Kernel Trick
Instead of computing
𝜙
(
𝑥
)
𝑇
𝜙
(
𝑧
)
ϕ(x)
T
 ϕ(z), we use a kernel function:

𝐾
(
𝑥
,
𝑧
)
=
⟨
𝜙
(
𝑥
)
,
𝜙
(
𝑧
)
⟩
K(x,z)=⟨ϕ(x),ϕ(z)⟩
This lets SVM work implicitly in high-dimensional space without ever computing
𝜙
(
𝑥
)
ϕ(x) directly.

*9. Compare Linear Kernel, Polynomial Kernel, and RBF Kernel ?*

 1. Linear Kernel
Definition:
𝐾
(
𝑥
,
𝑧
)
=
𝑥
𝑇
𝑧
K(x,z)=x
T
 z
Intuition:
No transformation; the decision boundary is a straight line or hyperplane.

Assumes data is linearly separable.

Pros:
Fast and simple

Works well for high-dimensional, sparse data (e.g., text classification)

No hyperparameters to tune

Cons:
Cannot capture non-linear relationships

Use Case:
Linearly separable data

Text data (e.g., TF-IDF vectors)


 2. Polynomial Kernel
Definition:
𝐾
(
𝑥
,
𝑧
)
=
(
𝑥
𝑇
𝑧
+
𝑐
)
𝑑
K(x,z)=(x
T
 z+c)
d

Intuition:
Maps input data into a higher-degree polynomial space

Captures non-linear patterns using curved decision boundaries

Pros:
More flexible than linear

Can model interactions between features

Cons:
Sensitive to degree (d) and coefficient (c)

Higher-degree polynomials can lead to overfitting

Slower than linear kernel

Use Case:
Data with non-linear but structured relationships

When interactions between features are important


 3. RBF (Radial Basis Function) Kernel
Definition:
𝐾
(
𝑥
,
𝑧
)
=
exp
⁡
(
−
𝛾
∥
𝑥
−
𝑧
∥
2
)
K(x,z)=exp(−γ∥x−z∥
2
 )
Intuition:
Measures similarity based on distance

Projects data into an infinite-dimensional space

Can model very complex boundaries

Pros:
Highly flexible and powerful

Can fit very complex, non-linear data

Cons:
Requires tuning of gamma and C

Can overfit if not properly regularized

Less interpretable

Use Case:
General-purpose, non-linear problems

When you don’t know the data structure well in advance

*10. What is the effect of the C parameter in SVM ?*

The C parameter in SVM controls the trade-off between maximizing the margin and minimizing classification error. It is a regularization parameter that balances model complexity against training accuracy.

effect

| C Value     | Behavior                             | Model Effect                                                                    |
| ----------- | ------------------------------------ | ------------------------------------------------------------------------------- |
| **Large C** | Less tolerance for misclassification | Narrow margin, fits training data tightly → can **overfit**                     |
| **Small C** | More tolerance for misclassification | Wider margin, allows some errors → can **underfit**, but **generalizes better** |

*11. What is the role of the Gamma parameter in RBF Kernel SVM ?*

The Gamma (γ) parameter in the RBF kernel (Radial Basis Function kernel) plays a crucial role in controlling the spread or influence of a single training point in the decision boundary.

What Does Gamma Control?
Gamma defines how much influence a single training point has on the decision boundary.

It controls the width of the Gaussian (RBF) function used to measure the similarity between data points.

Specifically, it controls the distance within which points are considered influential in shaping the decision boundary.

Mathematically, for two points
𝑥
x and
𝑧
z, the RBF kernel function is:

𝐾
(
𝑥
,
𝑧
)
=
exp
⁡
(
−
𝛾
∥
𝑥
−
𝑧
∥
2
)
K(x,z)=exp(−γ∥x−z∥
2
 )
As
𝛾
γ increases, the influence of individual points becomes more localized, and the model is more sensitive to small variations in the data.

As
𝛾
γ decreases, the influence of each point spreads out, leading to a smoother decision boundary that may be less sensitive to noise.

Intuitive Explanation
High Gamma:

Each training point's influence is very localized, so the decision boundary is influenced by individual points. This can lead to a complicated and overfitted model.

In this case, the model can fit the training data tightly, even capturing noise and outliers, making it less likely to generalize well on new data.

Low Gamma:

Each point's influence is spread over a larger area, resulting in a smoother decision boundary. This leads to a more generalized model that might underfit the data if the boundary is too simple for the underlying patterns.

*12. What is the Naïve Bayes classifier, and why is it called "Naïve" ?*

The Naïve Bayes classifier is a probabilistic machine learning algorithm based on Bayes' Theorem, which is used for classification tasks. It is particularly popular for text classification (e.g., spam filtering, sentiment analysis) due to its simplicity and efficiency.

Bayes' Theorem:
At its core, the Naïve Bayes classifier applies Bayes' Theorem to calculate the probability of each class given the features of the input data. Bayes' Theorem is:

𝑃
(
𝐶
𝑘
∣
𝑋
)
=
𝑃
(
𝑋
∣
𝐶
𝑘
)
⋅
𝑃
(
𝐶
𝑘
)
𝑃
(
𝑋
)
P(C
k
​
 ∣X)=
P(X)
P(X∣C
k
​
 )⋅P(C
k
​
 )
​

Where:

𝑃
(
𝐶
𝑘
∣
𝑋
)
P(C
k
​
 ∣X): The posterior probability of class
𝐶
𝑘
C
k
​
  given the features
𝑋
X.

𝑃
(
𝑋
∣
𝐶
𝑘
)
P(X∣C
k
​
 ): The likelihood of the features
𝑋
X given class
𝐶
𝑘
C
k
​
 .

𝑃
(
𝐶
𝑘
)
P(C
k
​
 ): The prior probability of class
𝐶
𝑘
C
k
​
 .

𝑃
(
𝑋
)
P(X): The marginal likelihood or probability of the features (which acts as a normalization factor).

The classifier predicts the class that maximizes the posterior probability
𝑃
(
𝐶
𝑘
∣
𝑋
)
P(C
k
​
 ∣X).



The "Naïve" Assumption:
The "naïve" part comes from the assumption that the features are conditionally independent given the class label. This assumption means that the presence (or absence) of a particular feature in the input data is assumed to be independent of the presence or absence of other features, given the class.

This is a strong and often unrealistic assumption, hence the term "naïve." Despite its simplicity, this assumption works surprisingly well in many practical situations, especially when the features are weakly correlated.

Mathematically, for
𝑛
n features
𝑋
=
(
𝑥
1
,
𝑥
2
,
.
.
.
,
𝑥
𝑛
)
X=(x
1
​
 ,x
2
​
 ,...,x
n
​
 ), the classifier assumes:

𝑃
(
𝑋
∣
𝐶
𝑘
)
=
𝑃
(
𝑥
1
,
𝑥
2
,
.
.
.
,
𝑥
𝑛
∣
𝐶
𝑘
)
=
∏
𝑖
=
1
𝑛
𝑃
(
𝑥
𝑖
∣
𝐶
𝑘
)
P(X∣C
k
​
 )=P(x
1
​
 ,x
2
​
 ,...,x
n
​
 ∣C
k
​
 )=
i=1
∏
n
​
 P(x
i
​
 ∣C
k
​
 )
This drastically simplifies the computation of the likelihood
𝑃
(
𝑋
∣
𝐶
𝑘
)
P(X∣C
k
​
 ), as you only need to consider the individual likelihoods of each feature
𝑥
𝑖
x
i
​
  given the class
𝐶
𝑘
C
k
​
 .

 The classifier is called "naïve" because of the strong assumption that all features are independent given the class label, which is typically not true in real-world data. For example, in text classification, the presence of one word (e.g., "free") often depends on the presence of another word (e.g., "money"), but the Naïve Bayes classifier ignores these dependencies.

Despite this unrealistic assumption, the model works quite well in practice, especially when the features are relatively independent or when dependencies are weak.

*13. What is Bayes’ Theorem ?*

Bayes' Theorem is a fundamental theorem in probability theory that describes how to update the probability of a hypothesis (or event) based on new evidence. It provides a way to compute the posterior probability of an event, given prior knowledge and new data.

Bayes' Theorem Formula
Mathematically, Bayes' Theorem is expressed as:

𝑃
(
𝐴
∣
𝐵
)
=
𝑃
(
𝐵
∣
𝐴
)
⋅
𝑃
(
𝐴
)
𝑃
(
𝐵
)
P(A∣B)=
P(B)
P(B∣A)⋅P(A)
​

Where:

𝑃
(
𝐴
∣
𝐵
)
P(A∣B) is the posterior probability: the probability of event
𝐴
A happening given that event
𝐵
B has occurred.

𝑃
(
𝐵
∣
𝐴
)
P(B∣A) is the likelihood: the probability of observing event
𝐵
B given that event
𝐴
A is true.

𝑃
(
𝐴
)
P(A) is the prior probability: the initial probability of event
𝐴
A before observing event
𝐵
B.

𝑃
(
𝐵
)
P(B) is the marginal likelihood (or evidence): the total probability of observing event
𝐵
B (across all possible causes).

*14. Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes ?*

The Naïve Bayes classifier is a probabilistic model based on Bayes' Theorem. Different types of Naïve Bayes classifiers are used depending on the type of data being analyzed. The main variants of Naïve Bayes are Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes. These variants differ in the assumptions they make about the underlying data distributions.


1. Gaussian Naïve Bayes
Data Assumption:
Assumes that the features (i.e., the input data) are continuous and follow a Gaussian (Normal) distribution for each class.

Mathematical Model:
For a given feature
𝑥
x, the likelihood of
𝑥
x given the class
𝐶
𝑘
C
k
​
  is modeled as a Gaussian distribution:

𝑃
(
𝑥
𝑖
∣
𝐶
𝑘
)
=
1
2
𝜋
𝜎
2
exp
⁡
(
−
(
𝑥
𝑖
−
𝜇
)
2
2
𝜎
2
)
P(x
i
​
 ∣C
k
​
 )=
2πσ
2

​

1
​
 exp(−
2σ
2

(x
i
​
 −μ)
2

​
 )
Where:

𝜇
μ is the mean of the feature
𝑥
𝑖
x
i
​
  for class
𝐶
𝑘
C
k
​
 ,

𝜎
2
σ
2
  is the variance of
𝑥
𝑖
x
i
​
  for class
𝐶
𝑘
C
k
​
 .

Use Case:
Best suited for continuous, real-valued data, where the distribution of the features is approximately Gaussian.

Example:
Predicting the price of a house based on continuous features like square footage, number of rooms, and age of the house.

🔹 2. Multinomial Naïve Bayes
Data Assumption:
Assumes that the features are discrete and follow a Multinomial distribution. This is common when the features represent counts or frequencies of occurrences.

Mathematical Model:
Given a set of features
𝑥
=
(
𝑥
1
,
𝑥
2
,
.
.
.
,
𝑥
𝑛
)
x=(x
1
​
 ,x
2
​
 ,...,x
n
​
 ), the likelihood of the feature vector given a class
𝐶
𝑘
C
k
​
  is modeled using the Multinomial distribution:

𝑃
(
𝑥
∣
𝐶
𝑘
)
=
∏
𝑖
=
1
𝑛
(
𝑓
𝑖
)
!
(
𝑓
𝑖
−
𝑥
𝑖
)
!
𝑥
𝑖
!
⋅
𝑝
𝑖
𝑥
𝑖
P(x∣C
k
​
 )=
i=1
∏
n
​
  
(f
i
​
 −x
i
​
 )!x
i
​
 !
(f
i
​
 )!
​
 ⋅p
i
x
i
​

​

Where:

𝑓
𝑖
f
i
​
  is the total count of feature
𝑥
𝑖
x
i
​
 ,

𝑝
𝑖
p
i
​
  is the probability of feature
𝑥
𝑖
x
i
​
  occurring in class
𝐶
𝑘
C
k
​
 ,

𝑥
𝑖
x
i
​
  is the frequency of feature
𝑖
i in a given sample.

Use Case:
Most commonly used for document classification problems, where the features are word counts or term frequencies (TF) from text data.

Example:
Classifying a document as spam or not based on the frequency of words (e.g., “money,” “offer,” etc.).

🔹 3. Bernoulli Naïve Bayes
Data Assumption:
Assumes that the features are binary (i.e., they can take values 0 or 1), representing the presence or absence of a feature in a given class.

Mathematical Model:
Given a binary feature vector
𝑥
=
(
𝑥
1
,
𝑥
2
,
.
.
.
,
𝑥
𝑛
)
x=(x
1
​
 ,x
2
​
 ,...,x
n
​
 ), the likelihood of the feature vector given a class
𝐶
𝑘
C
k
​
  is modeled using the Bernoulli distribution:

𝑃
(
𝑥
∣
𝐶
𝑘
)
=
∏
𝑖
=
1
𝑛
𝑝
𝑖
𝑥
𝑖
(
1
−
𝑝
𝑖
)
1
−
𝑥
𝑖
P(x∣C
k
​
 )=
i=1
∏
n
​
 p
i
x
i
​

​
 (1−p
i
​
 )
1−x
i
​


Where:

𝑝
𝑖
p
i
​
  is the probability of feature
𝑖
i being present in class
𝐶
𝑘
C
k
​
 ,

𝑥
𝑖
x
i
​
  is a binary indicator for feature
𝑖
i (1 if present, 0 if absent).

Use Case:
Suitable for binary or Boolean features, such as text classification problems where each feature represents the presence or absence of a particular word (often in bag-of-words models).

Example:
Classifying emails as spam or not based on the presence or absence of specific words (e.g., “free,” “discount”).

*15. When should you use Gaussian Naïve Bayes over other variants ?*

You should consider using Gaussian Naïve Bayes (GNB) over other variants of Naïve Bayes (like Multinomial Naïve Bayes or Bernoulli Naïve Bayes) when your data exhibits the following characteristics:

1. Continuous Data:
Gaussian Naïve Bayes is ideal when your features are continuous variables (i.e., real-valued numbers).

This variant assumes that the data follows a Gaussian (Normal) distribution, so it works well when your features naturally follow this type of distribution, such as measurements like height, weight, or temperature.

Examples:

Predicting house prices based on continuous features like square footage, number of rooms, and age of the house.

Classifying medical conditions based on continuous health data like blood pressure, cholesterol level, and body mass index (BMI).

2. Feature Distribution Resembling Normal Distribution:
If you have continuous features and you suspect that each feature in a class follows a Gaussian distribution (or approximately so), then Gaussian Naïve Bayes is the right choice.

This is often the case in many real-world scenarios, where many natural phenomena follow a normal distribution due to the Central Limit Theorem.

Example:

Predicting a student's exam score based on continuous study hours and prior performance, where the distribution of these features is likely normal.

3. Large Datasets:
GNB tends to perform well on large datasets with many continuous features because it assumes simple relationships (mean and variance) between the features and the class, which is computationally efficient.

The model is easy to train and doesn't require complex optimization, making it a good option for large-scale problems.

Example:

Classifying customer behaviors in an e-commerce store based on continuous features such as browsing time, number of items viewed, and transaction amounts.

4. Computational Efficiency:
Gaussian Naïve Bayes is computationally efficient and performs well in terms of speed, especially when the dataset contains many features. It calculates probabilities using simple parameters: mean and variance.

This makes it faster than more complex algorithms like SVM or Random Forest, which is useful when you're dealing with large datasets or require real-time predictions.

Example:

Classifying sensor data from IoT devices, where data from sensors (such as temperature or humidity) are continuous and need quick classification.

5. When Data is Well-Conditioned and Low Noise:
Gaussian Naïve Bayes assumes that features are independent and normally distributed within each class. If your data is well-conditioned (features are relatively clean, without much noise or outliers), Gaussian Naïve Bayes will perform well.

If the dataset has a lot of outliers or the features do not follow a Gaussian distribution, you may need to consider data transformation (e.g., normalizing or transforming features to make them more Gaussian) or choose a different variant like Multinomial Naïve Bayes.

Example:

Predicting quality control in a manufacturing process where each feature (e.g., size, weight, and material properties) follows a normal distribution with few outliers.

*16. What are the key assumptions made by Naïve Bayes ?*

The Naïve Bayes classifier makes several key assumptions that are crucial to how it models data and computes probabilities. These assumptions simplify the model significantly, making it fast and effective in many scenarios—but they also introduce limitations.

Here are the key assumptions made by Naïve Bayes:

🔹 1. Feature Independence (The "Naïve" Assumption)
Assumption: All features are conditionally independent of each other given the class label.

Mathematically:

𝑃
(
𝑥
1
,
𝑥
2
,
.
.
.
,
𝑥
𝑛
∣
𝑦
)
=
∏
𝑖
=
1
𝑛
𝑃
(
𝑥
𝑖
∣
𝑦
)
P(x
1
​
 ,x
2
​
 ,...,x
n
​
 ∣y)=
i=1
∏
n
​
 P(x
i
​
 ∣y)
This means the model assumes that knowing the value of one feature gives no information about any other feature, once we know the class.

Impact: This is a strong and often unrealistic assumption in real-world data (where features are often correlated), but it works surprisingly well in practice—especially when the correlations among features are similar across classes.

🔹 2. Class Conditional Independence of Features
This is an extension of the first assumption: for each class label
𝑦
y, the joint probability of the feature vector
𝑥
⃗
x
  is equal to the product of the individual feature probabilities conditioned on that class.

𝑃
(
𝑥
⃗
∣
𝑦
)
=
𝑃
(
𝑥
1
∣
𝑦
)
⋅
𝑃
(
𝑥
2
∣
𝑦
)
⋅
⋯
⋅
𝑃
(
𝑥
𝑛
∣
𝑦
)
P(
x
 ∣y)=P(x
1
​
 ∣y)⋅P(x
2
​
 ∣y)⋅⋯⋅P(x
n
​
 ∣y)
Why it matters: It allows the model to compute the likelihood of a data point very efficiently, even in high dimensions.

🔹 3. Correct Model of Feature Distribution
Depending on the variant of Naïve Bayes, it makes an assumption about the distribution of features for each class:

Gaussian Naïve Bayes: Assumes each continuous feature follows a normal (Gaussian) distribution.

Multinomial Naïve Bayes: Assumes features represent count data, typically used in text classification.

Bernoulli Naïve Bayes: Assumes binary features (e.g., presence or absence of a word).

If the actual distribution deviates too much from the assumed model, the classifier’s performance may degrade.

🔹 4. All Features Contribute Equally and Independently
Each feature contributes independently and equally to the final classification. That means no feature is treated as inherently more important unless reflected in its conditional probability.

This can be limiting when some features are highly predictive and others are mostly noise.

🔹 5. No Interaction Terms or Feature Combinations
Since features are treated as independent, Naïve Bayes does not model interactions (e.g.,
𝑥
1
×
𝑥
2
x
1
​
 ×x
2
​
  effects).

This can be a limitation in domains where the interaction between features is critical for accurate prediction.

*17. What are the advantages and disadvantages of Naïve Bayes ?*

Advantages of Naïve Bayes
1. Simple and Fast
Training and prediction are extremely fast, especially with large datasets.

Time complexity is linear in the number of features and training examples.

2. Works Well with High-Dimensional Data
Performs well even when the number of features is very large (e.g., in text classification).

Often used in spam detection, sentiment analysis, and document categorization.

3. Effective with Small Data
Requires relatively little training data to estimate model parameters (means and variances or frequencies).

Performs well with limited training data, especially when the assumptions hold.

4. Robust to Irrelevant Features
Even if some features are irrelevant (don’t contribute meaningfully), it can still perform well.

5. Handles Both Binary and Multiclass Classification
Naturally supports multiple classes without any modification.

6. Probabilistic Output
Provides not just a classification, but also the probability of each class, which is useful for decision-making under uncertainty.

❌ Disadvantages of Naïve Bayes
1. Strong Independence Assumption
Assumes conditional independence between features, which is often violated in practice.

If features are highly correlated, performance may degrade.

2. Zero-Frequency Problem
If a feature value is not present in the training data for a given class, the model assigns it zero probability, which can lead to incorrect classifications.

This is usually handled with Laplace smoothing (adding 1 to all counts).

3. Limited Expressiveness
Doesn’t model interactions between features (e.g., "if feature A and B occur together").

Can underperform when decision boundaries are highly nonlinear or complex.

4. Not Good with Continuous, Non-Gaussian Data (Unless Transformed)
Gaussian Naïve Bayes assumes that features are normally distributed. If the actual data distribution is non-Gaussian, accuracy may drop.

May require feature scaling or transformations.

5. Overconfident Probabilities
Due to the independence assumption, the probability estimates can be unrealistically high or low, making them poorly calibrated.

*18. Why is Naïve Bayes a good choice for text classification ?*

Naïve Bayes is a popular and effective choice for text classification due to several key strengths that align well with the nature of text data:

✅ Why Naïve Bayes Works Well for Text Classification
1. High Dimensionality Compatibility
Text data often has thousands to millions of features (words or tokens).

Naïve Bayes handles this effortlessly because its computational complexity is linear with respect to the number of features.

2. Assumption of Feature Independence is Reasonable
In text classification, features (words) are typically modeled as independent given the document class.

While words in natural language aren't truly independent, this naïve assumption surprisingly works well in practice—especially with bag-of-words or TF-IDF representations.

3. Robust to Irrelevant Features
In text, many words (features) may be irrelevant to the class.

Naïve Bayes is robust to noisy and irrelevant features because it evaluates each word independently, so a few noisy words don’t drastically skew predictions.

4. Efficient with Sparse Data
Text datasets are typically sparse (most word counts are zero).

Naïve Bayes naturally handles sparse matrices well, making it memory- and speed-efficient.

5. Works Well with Small Data
Even with small training sets, Naïve Bayes can build effective models, especially when classes are well-separated by distinctive words.

6. Supports Multiclass Classification
Many real-world text problems (e.g., news topic classification, intent recognition) involve multiple classes.

Naïve Bayes naturally supports multiclass classification without any special modification.

7. Simple, Interpretable, and Fast
Naïve Bayes is easy to implement, fast to train and predict, and the learned model is interpretable (you can inspect which words are most informative for each class).

*19.  Compare SVM and Naïve Bayes for classification tasks ?*

Core Comparison: SVM vs. Naïve Bayes


| **Aspect**                   | **SVM (Support Vector Machine)**                             | **Naïve Bayes**                                                    |
| ---------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------------ |
| **Model Type**               | Discriminative (learns decision boundary)                    | Generative (learns data distribution given a class)                |
| **Assumptions**              | No specific assumptions; finds optimal separating hyperplane | Assumes conditional independence between features                  |
| **Performance**              | High accuracy, especially with clear margin separation       | Fast and performs well with simple and high-dimensional data       |
| **Speed (Training)**         | Slower, especially on large datasets                         | Extremely fast (especially with Multinomial or Bernoulli variants) |
| **Speed (Prediction)**       | Slower, especially with non-linear kernels                   | Very fast                                                          |
| **Handling High Dimensions** | Good (especially with kernel trick)                          | Excellent (text data, sparse matrices)                             |
| **Noise Sensitivity**        | Sensitive to noisy data and outliers                         | More robust to noise due to probabilistic nature                   |
| **Interpretability**         | Less interpretable (complex decision boundaries)             | More interpretable (based on feature likelihoods)                  |
| **Probabilistic Output**     | Not inherently probabilistic (but can be calibrated)         | Outputs direct class probabilities                                 |
| **Multi-class Support**      | Requires strategy (e.g., One-vs-All, One-vs-One)             | Naturally supports multi-class classification                      |


*20. How does Laplace Smoothing help in Naïve Bayes?*

Laplace Smoothing (also known as add-one smoothing) helps in Naïve Bayes by addressing the zero-frequency problem, which occurs when a categorical feature value is not observed in the training data for a given class. Without smoothing, this leads to zero probability for the entire observation—effectively causing the model to ignore otherwise relevant information.


Laplace Smoothing adds a small constant (usually 1) to each count to avoid zero probabilities.

For a categorical feature:
𝑃
(
𝑥
𝑖
=
𝑤
∣
𝑦
)
=
count
(
𝑤
,
𝑦
)
+
1
∑
𝑤
′
count
(
𝑤
′
,
𝑦
)
+
𝑉
P(x
i
​
 =w∣y)=
∑
w
′

​
 count(w
′
 ,y)+V
count(w,y)+1
​

Where:

count
(
𝑤
,
𝑦
)
count(w,y) = number of times word
𝑤
w appears in class
𝑦
y

𝑉
V = number of possible unique feature values (e.g., vocabulary size)

Adding 1 ensures no probability is zero.

Without smoothing: unseen words/features have zero probability, ruining predictions.

With smoothing: all words/features have non-zero probabilities, so rare or unseen values are still assigned small but nonzero likelihood.

In [None]:
# 21  Write a Python program to train an SVM Classifier on the Iris dataset and evaluate accuracy.

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Step 1: Load the Iris dataset
iris = datasets.load_iris()
X = iris.data  # Features
y = iris.target  # Labels

# Step 2: Split into train and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3: Create and train an SVM classifier
svm_clf = SVC(kernel='linear')  # You can try 'rbf', 'poly', etc.
svm_clf.fit(X_train, y_train)

# Step 4: Predict on the test set
y_pred = svm_clf.predict(X_test)

# Step 5: Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"SVM Classifier Accuracy on Iris Dataset: {accuracy:.2f}")


In [None]:
# 23 Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies.

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Step 1: Load the Wine dataset
wine = datasets.load_wine()
X = wine.data
y = wine.target

# Step 2: Split dataset into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3: Train SVM with Linear Kernel
svm_linear = SVC(kernel='linear')
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)
acc_linear = accuracy_score(y_test, y_pred_linear)

# Step 4: Train SVM with RBF Kernel
svm_rbf = SVC(kernel='rbf')
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)
acc_rbf = accuracy_score(y_test, y_pred_rbf)

# Step 5: Compare and print accuracies
print(f"Accuracy with Linear Kernel: {acc_linear:.2f}")
print(f"Accuracy with RBF Kernel:    {acc_rbf:.2f}")

if acc_linear > acc_rbf:
    print("➡️ Linear kernel performed better.")
elif acc_rbf > acc_linear:
    print("➡️ RBF kernel performed better.")
else:
    print("➡️ Both kernels performed equally.")


In [None]:
# 23 Write a Python program to train an SVM Regressor (SVR) on a housing dataset and evaluate it using Mean Squared Error (MSE)

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler

# Step 1: Load the California Housing dataset
data = fetch_california_housing()
X = data.data
y = data.target

# Step 2: Train-test split (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3: Feature scaling (important for SVR)
scaler_X = StandardScaler()
scaler_y = StandardScaler()

X_train_scaled = scaler_X.fit_transform(X_train)
X_test_scaled = scaler_X.transform(X_test)

y_train_scaled = scaler_y.fit_transform(y_train.reshape(-1, 1)).ravel()
y_test_scaled = scaler_y.transform(y_test.reshape(-1, 1)).ravel()

# Step 4: Train the SVR model (with RBF kernel)
svr = SVR(kernel='rbf')
svr.fit(X_train_scaled, y_train_scaled)

# Step 5: Predict and inverse transform the target
y_pred_scaled = svr.predict(X_test_scaled)
y_pred = scaler_y.inverse_transform(y_pred_scaled.reshape(-1, 1)).ravel()

# Step 6: Evaluate using Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error (MSE) of SVR on Housing Dataset: {mse:.3f}")


In [None]:
# 24 Write a Python program to train an SVM Classifier with a Polynomial Kernel and visualize the decision boundary

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split

# Step 1: Generate a synthetic 2D dataset
X, y = make_classification(
    n_samples=200, n_features=2, n_informative=2, n_redundant=0,
    n_clusters_per_class=1, class_sep=1.5, random_state=42
)

# Step 2: Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3: Train SVM with a Polynomial kernel
svm_poly = SVC(kernel='poly', degree=3, C=1.0)
svm_poly.fit(X_train, y_train)

# Step 4: Create meshgrid for plotting decision boundary
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 300),
                     np.linspace(y_min, y_max, 300))
Z = svm_poly.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Step 5: Plot the decision boundary and data points
plt.figure(figsize=(8, 6))
plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.coolwarm)
plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=plt.cm.coolwarm, edgecolors='k', label="Train")
plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=plt.cm.coolwarm, marker='x', label="Test")
plt.title("SVM with Polynomial Kernel (degree=3)")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()


In [None]:
# 25  Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluate accuracy

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Step 1: Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Step 2: Split into train and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3: Train Gaussian Naïve Bayes classifier
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Step 4: Predict and evaluate
y_pred = gnb.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy of Gaussian Naïve Bayes on Breast Cancer Dataset: {accuracy:.2f}")


In [None]:
# 26 Write a Python program to train a Multinomial Naïve Bayes classifier for text classification using the 20 Newsgroups dataset.

from sklearn.datasets import fetch_20newsgroups
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report

# Step 1: Load the 20 Newsgroups dataset
newsgroups = fetch_20newsgroups(subset='all')  # Use 'all' for training and testing

X = newsgroups.data  # Text data
y = newsgroups.target  # Labels (20 classes)

# Step 2: Split dataset into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3: Convert text data to feature vectors using CountVectorizer
vectorizer = CountVectorizer(stop_words='english')  # Remove common stop words
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)

# Step 4: Train Multinomial Naïve Bayes classifier
nb_classifier = MultinomialNB()
nb_classifier.fit(X_train_vec, y_train)

# Step 5: Predict and evaluate the model
y_pred = nb_classifier.predict(X_test_vec)

# Step 6: Evaluate accuracy and print classification report
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of Multinomial Naïve Bayes on 20 Newsgroups dataset: {accuracy:.4f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=newsgroups.target_names))


In [None]:
# 27 Write a Python program to train an SVM Classifier with different C values and compare the decision boundaries visually

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split

# Step 1: Generate a synthetic 2D dataset
X, y = make_classification(n_samples=200, n_features=2, n_informative=2, n_redundant=0,
                            n_clusters_per_class=1, class_sep=1.5, random_state=42)

# Step 2: Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 3: Define different values for the C parameter
C_values = [0.1, 1, 10]

# Step 4: Create subplots to visualize decision boundaries
fig, axes = plt.subplots(1, 3, figsize=(18, 6))

# Step 5: Train SVM Classifiers with different C values and plot decision boundaries
for i, C in enumerate(C_values):
    # Train SVM classifier
    svm_clf = SVC(kernel='linear', C=C)
    svm_clf.fit(X_train, y_train)

    # Create meshgrid for plotting decision boundary
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 300),
                         np.linspace(y_min, y_max, 300))
    Z = svm_clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    # Plot decision boundary
    ax = axes[i]
    ax.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.coolwarm)
    ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=plt.cm.coolwarm, edgecolors='k', label="Train")
    ax.set_title(f"SVM with C={C}")
    ax.set_xlabel("Feature 1")
    ax.set_ylabel("Feature 2")

plt.tight_layout()
plt.show()


In [None]:
# 28 Write a Python program to train a Bernoulli Naïve Bayes classifier for binary classification on a dataset with binary features

from sklearn.naive_bayes import BernoulliNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import make_classification

# Step 1: Generate a synthetic dataset with binary features
X, y = make_classification(n_samples=1000, n_features=5, n_informative=3, n_redundant=0,
                            n_clusters_per_class=1, n_classes=2, random_state=42)

# Convert features to binary (Bernoulli Naive Bayes works with binary features)
X = (X > 0).astype(int)

# Step 2: Split dataset into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3: Train Bernoulli Naïve Bayes classifier
bnb = BernoulliNB()
bnb.fit(X_train, y_train)

# Step 4: Predict and evaluate
y_pred = bnb.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

# Step 5: Print the accuracy
print(f"Accuracy of Bernoulli Naïve Bayes on Binary Feature Dataset: {accuracy:.4f}")


In [None]:
# 29.  Write a Python program to apply feature scaling before training an SVM model and compare results with unscaled data

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Step 1: Generate a synthetic 2D dataset
X, y = make_classification(n_samples=200, n_features=2, n_informative=2, n_redundant=0,
                            n_clusters_per_class=1, class_sep=1.5, random_state=42)

# Step 2: Split into training and test sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3: Train SVM on unscaled data
svm_unscaled = SVC(kernel='linear')
svm_unscaled.fit(X_train, y_train)
y_pred_unscaled = svm_unscaled.predict(X_test)
accuracy_unscaled = accuracy_score(y_test, y_pred_unscaled)

# Step 4: Apply feature scaling using StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Step 5: Train SVM on scaled data
svm_scaled = SVC(kernel='linear')
svm_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = svm_scaled.predict(X_test_scaled)
accuracy_scaled = accuracy_score(y_test, y_pred_scaled)

# Step 6: Print accuracies for comparison
print(f"Accuracy on unscaled data: {accuracy_unscaled:.4f}")
print(f"Accuracy on scaled data: {accuracy_scaled:.4f}")

# Step 7: Visualize the decision boundaries
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Plot for unscaled data
axes[0].set_title("SVM with Unscaled Data")
axes[0].scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=plt.cm.coolwarm, edgecolors='k')
axes[0].scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=plt.cm.coolwarm, marker='x', label="Test Data")
axes[0].set_xlabel("Feature 1")
axes[0].set_ylabel("Feature 2")
axes[0].legend()

# Plot for scaled data
axes[1].set_title("SVM with Scaled Data")
axes[1].scatter(X_train_scaled[:, 0], X_train_scaled[:, 1], c=y_train, cmap=plt.cm.coolwarm, edgecolors='k')
axes[1].scatter(X_test_scaled[:, 0], X_test_scaled[:, 1], c=y_test, cmap=plt.cm.coolwarm, marker='x', label="Test Data")
axes[1].set_xlabel("Feature 1 (Scaled)")
axes[1].set_ylabel("Feature 2 (Scaled)")
axes[1].legend()

plt.tight_layout()
plt.show()


In [None]:
# 30 Write a Python program to train a Gaussian Naïve Bayes model and compare the predictions before and after Laplace Smoothing.

import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Step 1: Generate a synthetic 2D dataset
X, y = make_classification(n_samples=1000, n_features=2, n_informative=2, n_redundant=0,
                            n_clusters_per_class=1, n_classes=2, random_state=42)

# Step 2: Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3: Train Gaussian Naïve Bayes model without Laplace smoothing
gnb_no_smoothing = GaussianNB()
gnb_no_smoothing.fit(X_train, y_train)
y_pred_no_smoothing = gnb_no_smoothing.predict(X_test)
accuracy_no_smoothing = accuracy_score(y_test, y_pred_no_smoothing)

# Step 4: Train Gaussian Naïve Bayes model with Laplace smoothing (var_smoothing parameter)
gnb_with_smoothing = GaussianNB(var_smoothing=1e-9)  # var_smoothing adds smoothing to variance
gnb_with_smoothing.fit(X_train, y_train)
y_pred_with_smoothing = gnb_with_smoothing.predict(X_test)
accuracy_with_smoothing = accuracy_score(y_test, y_pred_with_smoothing)

# Step 5: Print the accuracies for comparison
print(f"Accuracy without Laplace Smoothing: {accuracy_no_smoothing:.4f}")
print(f"Accuracy with Laplace Smoothing: {accuracy_with_smoothing:.4f}")


In [None]:
# 31 Write a Python program to train an SVM Classifier and use GridSearchCV to tune the hyperparameters (C, gamma, kernel)

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Step 1: Load the Iris dataset
data = load_iris()
X = data.data
y = data.target

# Step 2: Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3: Set up the parameter grid for hyperparameter tuning
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': ['scale', 'auto', 0.01, 0.1, 1],
    'kernel': ['linear', 'rbf', 'poly']
}

# Step 4: Initialize the SVM classifier
svm = SVC()

# Step 5: Perform GridSearchCV for hyperparameter tuning
grid_search = GridSearchCV(estimator=svm, param_grid=param_grid, cv=5, n_jobs=-1)
grid_search.fit(X_train, y_train)

# Step 6: Get the best parameters and evaluate the model
best_params = grid_search.best_params_
print(f"Best hyperparameters found: {best_params}")

# Step 7: Train the best model on the full training data
best_svm = grid_search.best_estimator_

# Step 8: Evaluate the model on the test set
y_pred = best_svm.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

# Step 9: Print the accuracy of the best model
print(f"Accuracy of the tuned SVM model: {accuracy:.4f}")


In [None]:
# 32. Write a Python program to train an SVM Classifier on an imbalanced dataset and apply class weighting and check it improve accuracy.

import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report

# Step 1: Generate an imbalanced dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=15, n_redundant=5,
                            n_classes=2, weights=[0.9, 0.1], flip_y=0, random_state=42)

# Step 2: Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3: Train SVM Classifier without class weighting (baseline model)
svm_no_weight = SVC(kernel='linear', class_weight=None)
svm_no_weight.fit(X_train, y_train)
y_pred_no_weight = svm_no_weight.predict(X_test)
accuracy_no_weight = accuracy_score(y_test, y_pred_no_weight)

# Step 4: Train SVM Classifier with class weighting (to handle the imbalance)
svm_with_weight = SVC(kernel='linear', class_weight='balanced')
svm_with_weight.fit(X_train, y_train)
y_pred_with_weight = svm_with_weight.predict(X_test)
accuracy_with_weight = accuracy_score(y_test, y_pred_with_weight)

# Step 5: Print the results
print(f"Accuracy without class weighting: {accuracy_no_weight:.4f}")
print(f"Accuracy with class weighting: {accuracy_with_weight:.4f}")

# Step 6: Print classification reports for more detailed evaluation
print("\nClassification Report (No Weighting):")
print(classification_report(y_test, y_pred_no_weight))

print("\nClassification Report (With Class Weighting):")
print(classification_report(y_test, y_pred_with_weight))


In [None]:
# 33.  Write a Python program to implement a Naïve Bayes classifier for spam detection using email data

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report

# Step 1: Load the dataset (SMS Spam Collection Dataset)
# Download the dataset from https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection
# For simplicity, we will assume the dataset is loaded in a CSV format.

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00228/smsspamcollection.zip"

# Load the dataset
# Dataset is in tab-separated values format
df = pd.read_csv("SMSSpamCollection", sep='\t', header=None, names=['label', 'message'])

# Step 2: Preprocessing
# Convert the labels into binary form: 1 for 'spam' and 0 for 'ham'
df['label'] = df['label'].map({'ham': 0, 'spam': 1})

# Step 3: Split dataset into features (X) and labels (y)
X = df['message']
y = df['label']

# Step 4: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 5: Convert text data into numerical format using CountVectorizer (Bag of Words)
vectorizer = CountVectorizer(stop_words='english')
X_train_vectorized = vectorizer.fit_transform(X_train)
X_test_vectorized = vectorizer.transform(X_test)

# Step 6: Train the Naïve Bayes classifier
nb = MultinomialNB()
nb.fit(X_train_vectorized, y_train)

# Step 7: Make predictions on the test set
y_pred = nb.predict(X_test_vectorized)

# Step 8: Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

# Step 9: Print a classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=['ham', 'spam']))


In [None]:
# 34. Write a Python program to train an SVM Classifier and a Naïve Bayes Classifier on the same dataset and compare their accuracy.

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Step 1: Load the Iris dataset
data = load_iris()
X = data.data
y = data.target

# Step 2: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 3: Train the SVM Classifier
svm_classifier = SVC(kernel='linear')
svm_classifier.fit(X_train, y_train)
y_pred_svm = svm_classifier.predict(X_test)
svm_accuracy = accuracy_score(y_test, y_pred_svm)

# Step 4: Train the Naïve Bayes Classifier
nb_classifier = GaussianNB()
nb_classifier.fit(X_train, y_train)
y_pred_nb = nb_classifier.predict(X_test)
nb_accuracy = accuracy_score(y_test, y_pred_nb)

# Step 5: Compare the results
print(f"Accuracy of SVM Classifier: {svm_accuracy:.4f}")
print(f"Accuracy of Naïve Bayes Classifier: {nb_accuracy:.4f}")


In [None]:
# 35. Write a Python program to perform feature selection before training a Naïve Bayes classifier and compare results

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Step 1: Load the Iris dataset
data = load_iris()
X = data.data
y = data.target

# Step 2: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 3: Train the Naïve Bayes Classifier without feature selection
nb_classifier = GaussianNB()
nb_classifier.fit(X_train, y_train)
y_pred_no_selection = nb_classifier.predict(X_test)
accuracy_no_selection = accuracy_score(y_test, y_pred_no_selection)

# Step 4: Perform Feature Selection using SelectKBest
# Select top 2 features using the f_classif statistical test
selector = SelectKBest(score_func=f_classif, k=2)
X_train_selected = selector.fit_transform(X_train, y_train)
X_test_selected = selector.transform(X_test)

# Step 5: Train the Naïve Bayes Classifier on the selected features
nb_classifier_selected = GaussianNB()
nb_classifier_selected.fit(X_train_selected, y_train)
y_pred_with_selection = nb_classifier_selected.predict(X_test_selected)
accuracy_with_selection = accuracy_score(y_test, y_pred_with_selection)

# Step 6: Compare the results
print(f"Accuracy without feature selection: {accuracy_no_selection:.4f}")
print(f"Accuracy with feature selection: {accuracy_with_selection:.4f}")


In [None]:
# 36. Write a Python program to train an SVM Classifier using One-vs-Rest (OvR) and One-vs-One (OvO) strategies on the Wine dataset and compare their accuracy

import numpy as np
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Step 1: Load the Wine dataset
data = load_wine()
X = data.data
y = data.target

# Step 2: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 3: Train the SVM Classifier using One-vs-Rest (OvR)
svm_ovr = SVC(decision_function_shape='ovr', kernel='linear')
svm_ovr.fit(X_train, y_train)
y_pred_ovr = svm_ovr.predict(X_test)
accuracy_ovr = accuracy_score(y_test, y_pred_ovr)

# Step 4: Train the SVM Classifier using One-vs-One (OvO)
svm_ovo = SVC(decision_function_shape='ovo', kernel='linear')
svm_ovo.fit(X_train, y_train)
y_pred_ovo = svm_ovo.predict(X_test)
accuracy_ovo = accuracy_score(y_test, y_pred_ovo)

# Step 5: Compare the results
print(f"Accuracy of SVM with One-vs-Rest: {accuracy_ovr:.4f}")
print(f"Accuracy of SVM with One-vs-One: {accuracy_ovo:.4f}")


In [None]:
# 37. Write a Python program to train an SVM Classifier using Linear, Polynomial, and RBF kernels on the Breast Cancer dataset and compare their accuracy

import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Step 1: Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Step 2: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 3: Train the SVM Classifier with a Linear Kernel
svm_linear = SVC(kernel='linear')
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)
accuracy_linear = accuracy_score(y_test, y_pred_linear)

# Step 4: Train the SVM Classifier with a Polynomial Kernel
svm_poly = SVC(kernel='poly', degree=3)  # degree=3 is a common choice for polynomial kernel
svm_poly.fit(X_train, y_train)
y_pred_poly = svm_poly.predict(X_test)
accuracy_poly = accuracy_score(y_test, y_pred_poly)

# Step 5: Train the SVM Classifier with an RBF Kernel
svm_rbf = SVC(kernel='rbf')
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)
accuracy_rbf = accuracy_score(y_test, y_pred_rbf)

# Step 6: Compare the results
print(f"Accuracy of SVM with Linear Kernel: {accuracy_linear:.4f}")
print(f"Accuracy of SVM with Polynomial Kernel: {accuracy_poly:.4f}")
print(f"Accuracy of SVM with RBF Kernel: {accuracy_rbf:.4f}")


In [None]:
# 38. Write a Python program to train an SVM Classifier using Stratified K-Fold Cross-Validation and compute the average accuracy

import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import StratifiedKFold
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Step 1: Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Step 2: Initialize Stratified K-Fold Cross-Validation (5-fold)
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# Step 3: Initialize the SVM Classifier
svm_classifier = SVC(kernel='linear')

# Step 4: Perform Stratified K-Fold Cross-Validation
accuracies = []
for train_index, test_index in skf.split(X, y):
    # Split data into training and testing sets for the current fold
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    # Train the SVM Classifier on the training data
    svm_classifier.fit(X_train, y_train)

    # Make predictions on the test data
    y_pred = svm_classifier.predict(X_test)

    # Calculate accuracy for the current fold
    fold_accuracy = accuracy_score(y_test, y_pred)
    accuracies.append(fold_accuracy)

# Step 5: Compute the average accuracy across all folds
average_accuracy = np.mean(accuracies)

# Step 6: Print the results
print(f"Accuracy for each fold: {accuracies}")
print(f"Average accuracy across all folds: {average_accuracy:.4f}")


In [None]:
# 39. Write a Python program to train a Naïve Bayes classifier using different prior probabilities and compare performance

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Step 1: Load the Iris dataset
data = load_iris()
X = data.data
y = data.target

# Step 2: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 3: Define different prior probabilities (uniform vs custom)
# 3.1 Uniform prior (default)
nb_uniform = GaussianNB()

# 3.2 Custom priors (e.g., giving more weight to class 0)
priors_custom = [0.4, 0.3, 0.3]  # Class 0 has a prior probability of 0.4, etc.
nb_custom = GaussianNB(priors=priors_custom)

# Step 4: Train and evaluate Naïve Bayes with uniform prior probabilities
nb_uniform.fit(X_train, y_train)
y_pred_uniform = nb_uniform.predict(X_test)
accuracy_uniform = accuracy_score(y_test, y_pred_uniform)

# Step 5: Train and ev


In [None]:
# 40. Write a Python program to perform Recursive Feature Elimination (RFE) before training an SVM Classifier and compare accuracy

import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.feature_selection import RFE
from sklearn.metrics import accuracy_score

# Step 1: Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Step 2: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 3: Train an SVM Classifier without RFE
svm_no_rfe = SVC(kernel='linear')
svm_no_rfe.fit(X_train, y_train)
y_pred_no_rfe = svm_no_rfe.predict(X_test)
accuracy_no_rfe = accuracy_score(y_test, y_pred_no_rfe)

# Step 4: Perform Recursive Feature Elimination (RFE)
# Initialize SVM Classifier
svm_rfe = SVC(kernel='linear')

# Initialize RFE with the SVM classifier and select top 10 features
rfe = RFE(estimator=svm_rfe, n_features_to_select=10)
X_train_rfe = rfe.fit_transform(X_train, y_train)
X_test_rfe = rfe.transform(X_test)

# Train the SVM classifier with selected features from RFE
svm_rfe.fit(X_train_rfe, y_train)
y_pred_rfe = svm_rfe.predict(X_test_rfe)
accuracy_rfe = accuracy_score(y_test, y_pred_rfe)

# Step 5: Compare the results
print(f"Accuracy without RFE: {accuracy_no_rfe:.4f}")
print(f"Accuracy with RFE (top 10 features): {accuracy_rfe:.4f}")


In [None]:
# 41 Write a Python program to train an SVM Classifier and evaluate its performance using Precision, Recall, and F1-Score instead of accuracy

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import precision_score, recall_score, f1_score

# Step 1: Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Step 2: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 3: Train the SVM Classifier with a linear kernel
svm_classifier = SVC(kernel='linear')
svm_classifier.fit(X_train, y_train)

# Step 4: Make predictions on the test set
y_pred = svm_classifier.predict(X_test)

# Step 5: Evaluate the model using Precision, Recall, and F1-Score
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

# Step 6: Print the evaluation metrics
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-Score: {f1:.4f}")


In [None]:
# 42. Write a Python program to train a Naïve Bayes Classifier and evaluate its performance using Log Loss (Cross-Entropy Loss)

import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import log_loss

# Step 1: Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Step 2: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 3: Train the Naïve Bayes Classifier
nb_classifier = GaussianNB()
nb_classifier.fit(X_train, y_train)

# Step 4: Make probability predictions on the test set
y_pred_prob = nb_classifier.predict_proba(X_test)

# Step 5: Compute Log Loss (Cross-Entropy Loss)
loss = log_loss(y_test, y_pred_prob)

# Step 6: Print the Log Loss
print(f"Log Loss (Cross-Entropy Loss): {loss:.4f}")


In [None]:
# 43.  Write a Python program to train an SVM Classifier and visualize the Confusion Matrix using seaborn

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix

# Step 1: Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Step 2: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 3: Train the SVM Classifier with a linear kernel
svm_classifier = SVC(kernel='linear')
svm_classifier.fit(X_train, y_train)

# Step 4: Make predictions on the test set
y_pred = svm_classifier.predict(X_test)

# Step 5: Compute the confusion matrix
cm = confusion_matrix(y_test, y_pred)

# Step 6: Visualize the confusion matrix using seaborn's heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='g', cmap='Blues', xticklabels=data.target_names, yticklabels=data.target_names)
plt.title("Confusion Matrix for SVM Classifier")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()


In [None]:
# 44. Write a Python program to train an SVM Regressor (SVR) and evaluate its performance using Mean Absolute Error (MAE) instead of MSE

import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import mean_absolute_error

# Step 1: Load the California Housing dataset
data = fetch_california_housing()
X = data.data
y = data.target

# Step 2: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 3: Train the SVM Regressor (SVR) with an RBF kernel
svr = SVR(kernel='rbf')
svr.fit(X_train, y_train)

# Step 4: Make predictions on the test set
y_pred = svr.predict(X_test)

# Step 5: Evaluate the model using Mean Absolute Error (MAE)
mae = mean_absolute_error(y_test, y_pred)

# Step 6: Print the MAE
print(f"Mean Absolute Error (MAE): {mae:.4f}")


In [None]:
# 45. Write a Python program to train a Naïve Bayes classifier and evaluate its performance using the ROC-AUC score.

import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import roc_auc_score

# Step 1: Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Step 2: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 3: Train the Naïve Bayes classifier (GaussianNB)
nb_classifier = GaussianNB()
nb_classifier.fit(X_train, y_train)

# Step 4: Make probability predictions on the test set
y_pred_prob = nb_classifier.predict_proba(X_test)[:, 1]  # Probabilities for the positive class

# Step 5: Compute the ROC-AUC score
roc_auc = roc_auc_score(y_test, y_pred_prob)

# Step 6: Print the ROC-AUC score
print(f"ROC-AUC Score: {roc_auc:.4f}")


In [None]:
# 46.  Write a Python program to train an SVM Classifier and visualize the Precision-Recall Curve.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import precision_recall_curve, auc

# Step 1: Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Step 2: Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 3: Train the SVM Classifier with a linear kernel
svm_classifier = SVC(kernel='linear', probability=True)  # Set probability=True to get predicted probabilities
svm_classifier.fit(X_train, y_train)

# Step 4: Get the predicted probabilities for the positive class
y_pred_prob = svm_classifier.predict_proba(X_test)[:, 1]

# Step 5: Compute precision, recall, and thresholds
precision, recall, thresholds = precision_recall_curve(y_test, y_pred_prob)

# Step 6: Compute the area under the Precision-Recall curve (PR AUC)
pr_auc = auc(recall, precision)

# Step 7: Plot the Precision-Recall curve
plt.figure(figsize=(8, 6))
plt.plot(recall, precision, color='blue', label=f'Precision-Recall curve (AUC = {pr_auc:.2f})')
plt.fill_between(recall, precision, color='lightblue', alpha=0.5)
plt.title('Precision-Recall Curve for SVM Classifier')
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.legend(loc='best')
plt.grid(True)
plt.show()
