## Q 1.What is a Support Vector Machine (SVM)?
**Ans** - A Support Vector Machine is a supervised machine learning algorithm used for classification and regression tasks. It is particularly effective for high-dimensional datasets and works well when there is a clear margin of separation between different classes.

**SVM Working**
1. Finding the Optimal Hyperplane
  * SVM aims to find the best decision boundary that separates data points of different classes with the maximum margin.
  * The margin is the distance between the hyperplane and the closest data points.

2. Support Vectors
  * The data points that are closest to the hyperplane and influence its position are called support vectors.
  * These points help define the decision boundary.

3. Handling Non-Linearly Separable Data
  * If the data is not linearly separable, SVM uses a kernel trick to transform the data into a higher-dimensional space where it becomes linearly separable.
  * Common kernel functions include:
    * Linear Kernel: Used for linearly separable data.
    * Polynomial Kernel: Maps data into a polynomial feature space.
    * Radial Basis Function (RBF) Kernel: Handles more complex, non-linear relationships.
    * Sigmoid Kernel: Similar to a neural network activation function.

**Use Cases**
* Text classification (e.g., spam detection)
* Image recognition
* Bioinformatics (e.g., protein classification)
* Fraud detection

**Advantages of SVM**
* Works well in high-dimensional spaces
* Effective for both linear and non-linear classification
* Robust to overfitting, especially with proper kernel selection
* Memory efficient since it only relies on a subset of training data (support vectors)

**Disadvantages of SVM**
* Computationally expensive for large datasets
* Choosing the right kernel and hyperparameters can be tricky
* Sensitive to noise in overlapping class distributions

## Q 2. What is the difference between Hard Margin and Soft Margin SVM?
**Ans** - The difference between Hard Margin SVM and Soft Margin SVM lies in how they handle misclassified data points and how strictly they enforce separation between classes.

**1. Hard Margin SVM**

Strict separation
* Used when the data is perfectly linearly separable.
* The SVM tries to find a maximum-margin hyperplane that strictly separates all data points.
* It does not allow any misclassification, meaning every point must be on the correct side of the margin.
* It minimizes the margin while ensuring no violations of the separation constraint.

**Limitations:**
* Highly sensitive to noise: Even a small outlier can drastically change the decision boundary.
* Not useful for non-linearly separable data, as it requires a perfect separation.

**Mathematical Formulation:**

    min(1/2)||w||²
 Subject to:

    yᵢ(w⋅xᵢ+b) ≥ 1, ∀ᵢ
(where 'w' is the weight vector, b is the bias, and yᵢ is the class label of point xᵢ)

**2. Soft Margin SVM**

Allows some misclassification
* Used when the data is not perfectly separable.
* Instead of strictly separating classes, it allows some data points to be on the wrong side of the margin.
* Introduces a slack variable to allow violations and control the trade-off between maximizing the margin and minimizing classification error.
* The regularization parameter C controls how much misclassification is allowed:
  * High C → Less tolerance for misclassification.
  * Low C → More tolerance for misclassification, leading to better generalization.

**Advantages:**
* Works well with noisy and overlapping data.
* Can handle outliers better than hard margin SVM.
* Suitable for real-world applications where data is rarely perfectly separable.

**Mathematical Formulation:**

    min(1/2)||w||²+C∑ⁿᵢ₌₁ξᵢ

Subject to:

    yᵢ(w⋅xᵢ+b) ≥ 1−ξᵢ, ξᵢ ≥ 0, ∀ᵢ
where ξᵢ are the slack variables allowing misclassification

**Differences**

|Feature	|Hard Margin SVM	|Soft Margin SVM|
|-|||
|Misclassification	|Not allowed	|Allowed (controlled by C)|
|Use case	|Perfectly separable data	|Noisy or overlapping data|
|Outlier sensitivity	|High	|Lower |
|Regularization parameter (C)	|Not needed	|Needed to balance margin width vs. misclassification|

## Q 3. What is the mathematical intuition behind SVM?
**Ans** - The mathematical intuition behind Support Vector Machines revolves around finding the optimal hyperplane that maximizes the margin between two classes.

**1. The Decision Hyperplane**

Given a dataset of n points:

    (x₁,y₁),(x₂,y₂),…,(xₙ,yₙ)
where:
* xᵢ ∈ Rᵈ is a feature vector.
* yᵢ ∈ {−1,1} is the class label.

A hyperplane is defined as:

    w⋅x + b = 0
where:
* w is the weight vector.
* b is the bias.
* x is the input data point.

A point x is classified based on:

    ŷ = sign(w⋅x+b)

**2. Margin and Optimal Hyperplane**

**Geometric Margin**

For a correctly classified point:

    yᵢ(w⋅xᵢ+b) ≥ 1
The margin is the perpendicular distance from a data point to the hyperplane, given by:

    |w⋅x+b|/||w||
The total margin is defined as the distance between the closest positive and negative samples to the hyperplane. The goal of SVM is to maximize this margin.

**Optimization Problem**

To maximize the margin, we minimize ||w||, subject to the constraint:

    yᵢ(w⋅xᵢ+b) ≥ 1,∀ᵢ
This leads to the primal optimization problem:

    minᵥᵥ,₆ (1/2)||w||²
subject to:

    yᵢ(w⋅xᵢ+b) ≥ 1,∀ᵢ
where (1/2)||w||² is used instead of||w|| for mathematical convenience.

**3. Introducing Slack Variables for Soft Margin SVM**

When data is not perfectly separable, we introduce slack variables ξi to allow misclassification:

    yᵢ(w⋅xᵢ+b) ≥ 1−ξᵢ, ξᵢ ≥ 0
The new objective function becomes:

    minᵥᵥ,₆ (1/2)||w||² + C∑ⁿᵢ₌₁ξᵢ

where 'C' controls the trade-off between maximizing the margin and allowing misclassifications.

**4. The Dual Form and Kernel Trick**

To solve this problem efficiently, we convert it to its dual form using Lagrange multipliers:

    max α ∑ⁿᵢ₌₁ αᵢ−(1/2)∑ⁿᵢ₌₁∑ⁿⱼ₌₁ αᵢαⱼyᵢyⱼ(xᵢ⋅xⱼ)
subject to:

    ∑ⁿᵢ₌₁ αᵢyᵢ = 0, 0≤αᵢ≤C
This dual form allows us to use the kernel trick, where we replace (xᵢ⋅xⱼ) with a kernel function K(xᵢ,xⱼ), enabling SVM to work with non-linearly separable data.

Common kernels:
* Linear Kernel: K(xᵢ,xⱼ) = xᵢ⋅xⱼ
* Polynomial Kernel: K(xᵢ,xⱼ) = (xᵢ⋅xⱼ+c)ᵈ
* RBF Kernel: K(xᵢ,xⱼ) = exp(-γ||xᵢ-xⱼ||²)

**5. Final Decision Function**

Once we solve for α, the final decision function is:

    f(x) = ∑ⁿᵢ₌₁ αᵢyᵢK(xᵢ,x)+b
where only support vectors (points where αᵢ>0) contribute to the decision boundary.

## Q 4. What is the role of Lagrange Multipliers in SVM?
**Ans** - **Role of Lagrange Multipliers in SVM**

Lagrange multipliers play a crucial role in Support Vector Machines by transforming the constrained optimization problem into an easier-to-solve dual problem. This transformation enables the efficient computation of the optimal hyperplane and allows the use of kernel functions for non-linearly separable data.

**1. Use Lagrange Multipliers**

The original SVM optimization problem involves constraints:

    minᵥᵥ,₆ (1/2)||w||²
subject to:

    yᵢ(w⋅xᵢ+b) ≥ 1, ∀ᵢ
This is a constrained optimization problem, which is difficult to solve directly. To handle such constraints, we use the Lagrangian function.

**2. Primal Formulation with Lagrange Multipliers**

We introduce Lagrange multipliers
αᵢ to convert the constrained problem into an unconstrained optimization problem:

**Lagrangian Function:**

    L(w,b,α) = (1/2)||w||² - ∑ⁿᵢ₌₁ αᵢ[yᵢ(w⋅xᵢ+b)−1]
where:
* αᵢ ≥ 0 are the Lagrange multipliers.
* If a constraint is not satisfied, αᵢ increases, forcing the optimization to respect the constraint.

To find the optimal w and b, we take the partial derivatives of 'L' and set them to zero:

1. Derivative with respect to w:

        ∂L/∂w = w-∑ⁿᵢ₌₁ αᵢyᵢxᵢ = 0
Solving for w:

        w = ∑ⁿᵢ₌₁αᵢyᵢxᵢ
This shows that only support vectors (where αᵢ>0) define the decision boundary.

2. Derivative with respect to b:

        ∑ⁿᵢ₌₁ αᵢyᵢ = 0
This ensures that the classification boundary is unbiased.

**3. Dual Formulation**

By substituting w into the Lagrangian function and eliminating w and b, we get the dual optimization problem:

    max α ∑ⁿᵢ₌₁ αᵢ-(1/2)∑ⁿᵢ₌₁∑ⁿⱼ₌₁ αᵢαⱼyᵢyⱼ(xᵢ⋅xⱼ)
subject to:

    ∑ⁿᵢ₌₁ αᵢyᵢ=0, 0≤αᵢ≤C
Here:
* αᵢ  represents the importance of each training point.
* Only support vectors have αᵢ>0.
* Non-support vectors have αᵢ=0, meaning they do not influence the decision boundary.

**4. Role of Lagrange Multipliers in the Kernel Trick**

In non-linearly separable data, we use the kernel trick by replacing (xᵢ⋅xⱼ) with a kernel function K(xᵢ,xⱼ):

    max α ∑ⁿᵢ₌₁ αᵢ-(1/2)∑ⁿᵢ₌₁∑ⁿⱼ₌₁ αᵢαⱼyᵢyⱼK(xᵢ,xⱼ)
This allows SVM to work in higher-dimensional feature spaces without explicitly computing the transformation.

**5. Interpretation of Lagrange Multipliers**
* If αᵢ=0, the point is far from the margin and does not affect the decision boundary.
* If 0 <αᵢ< C, the point is on the margin.
* If αᵢ = C, the point is inside the margin.

## Q 5. What are Support Vectors in SVM?
**Ans** - **Support Vectors in SVM**

Support Vectors are the data points that lie closest to the decision boundary and play a crucial role in defining it. They are the most influential points in Support Vector Machines because they determine the position and orientation of the optimal hyperplane.

**1. Support Vectors are Important**
* They define the margin: The SVM algorithm maximizes the distance between the hyperplane and the closest data points.
* Only these points affect the decision boundary: Other training points do not influence the model directly.
* They make SVM memory efficient: Since only a small subset of data points is used for classification, SVM does not need to store all training points after training.

**2. Support Vectors are Identified**

The SVM optimization problem is:

    min (1/2)||w||²
subject to:

    yᵢ(w⋅xᵢ+b) ≥ 1
The support vectors are the data points that exactly satisfy this constraint:

    yᵢ(w⋅xᵢ+b) = 1
These points lie on the margin and are critical in defining the hyperplane.

**3. Support Vectors in Hard Margin vs. Soft Margin SVM**
**Hard Margin SVM**
* Works with perfectly separable data.
* Support vectors are the points that lie exactly on the margin.
* No misclassified points are allowed.

**Soft Margin SVM**
* Allows some misclassification by introducing slack variablesξiξi.
* Support vectors can be:
  1. On the margin (0 <αᵢ< C).
  2. Inside the margin.
  3. Misclassified.

**4. Support Vectors in the Dual Formulation**

In the dual form of SVM, the optimization problem is:

    max α ∑ⁿᵢ₌₁ αᵢ-(1/2)∑ⁿᵢ₌₁∑ⁿⱼ₌₁ αᵢαⱼyᵢyⱼK(xᵢ,xⱼ)
subject to:

    ∑ⁿᵢ₌₁ αᵢyᵢ = 0, 0≤αᵢ≤C
* If αᵢ = 0 → The data point is far from the decision boundary.
* If 0 <αᵢ< C → The data point lies on the margin.
* If αᵢ = C → The data point is inside the margin or misclassified.

Thus, support vectors are the points with nonzero Lagrange multipliers (αᵢ>0).

**5. Example Visualization**

Imagine a binary classification problem:

* Support vectors are the bolded points on the margin.

In [None]:
Class +1      Class -1
  ○ ○ ○ | ○ ● ○ | ● ● ●
          ↑ Support Vectors

Here, the middle vertical line is the optimal hyperplane, and the dashed boundaries represent the margin. The points touching the dashed lines are support vectors.

## Q 6. What is a Support Vector Classifier (SVC)?
**Ans** - **Support Vector Classifier**

The Support Vector Classifier is an extension of the Support Vector Machine used for classification tasks. It finds the optimal decision boundary that best separates data points into different classes while maximizing the margin and allowing some misclassifications.

**1. SVC Work?**

SVC aims to separate data into two classes by finding the best hyperplane:

    w⋅x+b = 0
where:
* w is the weight vector.
* x is the input feature vector.
* b is the bias term.

**Classification Decision Rule**

A new data point x is classified as:

    ŷ = sign(w⋅x+b)
* If w⋅x+b > 0, classify as +1.
* If w⋅x+b < 0, classify as -1.

**2. Hard Margin vs. Soft Margin SVC**

**Hard Margin SVC**
* Assumes perfectly separable data.
* No misclassifications allowed.
* Finds the maximum-margin hyperplane.

**Soft Margin SVC**
* Allows some misclassification using slack variables ξi.
* Introduces a regularization parameter C that balances margin maximization and misclassification.
  * High C → Less misclassification.
  * Low C → More flexibility, allowing some errors.

**3. Mathematical Formulation**

**Objective Function**

    minᵥᵥ,₆ (1/2)||w||²+C∑ⁿᵢ₌₁ ξᵢ
subject to:

    yᵢ(w⋅xᵢ+b) ≥ 1−ξᵢ, ξᵢ≥0, ∀ᵢ
where:
* 'C' controls the trade-off between margin width and misclassification.
* ξᵢ are slack variables allowing some misclassification.

**4. Kernel Trick in SVC**

For non-linearly separable data, SVC uses the kernel trick to map data into a higher-dimensional space where it becomes linearly separable.

Common kernels:
* Linear Kernel: K(xᵢ,xⱼ) = xᵢ⋅xⱼ
* Polynomial Kernel: K(xᵢ,xⱼ) = (xᵢ⋅xⱼ+c)ᵈ
* Radial Basis Function Kernel:K(xᵢ,xⱼ) = exp(-γ||xᵢ-xⱼ||²)

**5. Support Vector Classifier in Python (Example)**

In [None]:
from sklearn.svm import SVC
from sklearn.datasets import make_classification
import matplotlib.pyplot as plt
import numpy as np

X, y = make_classification(n_samples=100, n_features=2, n_classes=2, n_redundant=0, n_clusters_per_class=1)
y = np.where(y == 0, -1, 1)

model = SVC(kernel='linear', C=1.0)
model.fit(X, y)

w = model.coef_[0]
b = model.intercept_[0]
x_plot = np.linspace(X[:, 0].min(), X[:, 0].max(), 100)
y_plot = -(w[0] * x_plot + b) / w[1]

plt.scatter(X[:, 0], X[:, 1], c=y, cmap='bwr')
plt.plot(x_plot, y_plot, 'k-')
plt.title("Support Vector Classifier (Linear SVM)")
plt.show()

## Q 7. What is a Support Vector Regressor (SVR)?
**Ans** - **Support Vector Regressor**

The Support Vector Regressor is an extension of Support Vector Machines for regression tasks. Unlike Support Vector Classifier, which classifies data into discrete categories, SVR predicts continuous values while maintaining the principles of maximizing the margin.

**1. SVR Work**

SVR aims to find a function f(x) that predicts y with minimal error while maintaining a margin ϵ:

    f(x) = w⋅x+b
where:
* w is the weight vector.
* x is the input feature vector.
* b is the bias term.

**The ϵ-Insensitive Tube**
* Unlike standard regression, SVR does not minimize absolute or squared errors.
* Instead, it defines a margin of width ϵ around the predicted function.
* Points inside this margin are ignored.
* Points outside the margin contribute to the loss function.

This makes SVR robust to small fluctuations in data.

**2. Mathematical Formulation**

**Objective Function**

    minᵥᵥ,₆ (1/2)||w||² + C∑ⁿᵢ₌₁(ξᵢ+ξ∗ᵢ)
subject to:

    yᵢ-(w⋅xᵢ+b) ≤ ϵ+ξᵢ
    (w⋅xᵢ+b)-yᵢ ≤ ϵ+ξ∗ᵢ
    ξᵢ,ξ∗ᵢ ≥ 0
where:
* ϵ controls the tolerance: Larger ϵ → More data points ignored.
* C is the regularization parameter: Larger C
C → More penalty for violating the margin.
* ξᵢ,ξ∗ᵢ  are slack variables for handling points outside the margin.

**3. Types of SVR**
1. Linear SVR
* Uses a straight-line regression.
* Works when data has a linear trend.

2. Non-Linear SVR
* Uses the kernel trick to map data into higher dimensions.
* Common kernels:
  * Polynomial Kernel:K(xᵢ,xⱼ) = (xᵢ⋅xⱼ+c)ᵈ
* Radial Basis Function Kernel: K(xᵢ,xⱼ) = exp(-γ||xᵢ-xⱼ||²)
* Sigmoid Kernel: K(xᵢ,xⱼ) = tanh(γxᵢ⋅xⱼ+c)

**4. SVR in Python (Example)**

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVR

X = np.linspace(0, 10, 100).reshape(-1, 1)
y = np.sin(X).ravel() + np.random.normal(0, 0.1, X.shape[0])

svr_rbf = SVR(kernel='rbf', C=100, epsilon=0.1, gamma=0.1)
svr_rbf.fit(X, y)

y_pred = svr_rbf.predict(X)

plt.scatter(X, y, label="Data", color="blue", s=10)
plt.plot(X, y_pred, label="SVR Prediction", color="red")
plt.legend()
plt.title("Support Vector Regression (SVR) with RBF Kernel")
plt.show()

**5. Parameters in SVR**
1. C
* High C → Tries to fit all points.
* Low C → More flexible model, ignores some points.

2. ϵ
* High ϵ → More generalization.
* Low ϵ → More sensitive to data points.

3. Kernel Type
* Linear → Simple trends.
* Polynomial → Moderate complexity.
* RBF → Highly non-linear patterns.

## Q 8. What is the Kernel Trick in SVM?
**Ans** - **Kernel Trick in SVM**

The Kernel Trick is a mathematical technique used in Support Vector Machines to transform non-linearly separable data into a higher-dimensional space where it becomes linearly separable—without explicitly computing the transformation.

**1. Need of Kernel Trick**

SVMs work well with linearly separable data, but real-world datasets are often non-linearly separable.
For example:

* Non-linearly Separable Data

In [None]:
Class +1   Class -1
   ○ ○      ● ●
     ○    ● ●
      ○  ●

A straight-line decision boundary cannot separate these classes.

Instead of directly applying a linear SVM, we map the data into a higher-dimensional space where it becomes linearly separable.

  * Kernel Trick:

Instead of explicitly computing the transformation, we use a kernel function that computes the dot product in the high-dimensional space efficiently.

**2. Mathematical Formulation**

**SVM Decision Function**

The standard SVM classification function is:

    f(x) = sign(∑ⁿᵢ₌₁ αᵢyᵢ(xᵢ⋅x)+b)
where:
* xᵢ are support vectors,
* αᵢ are Lagrange multipliers,
* yᵢ are class labels,
* x⋅xᵢ  is the dot product.

**Applying the Kernel Trick**

Instead of computing x⋅xᵢ directly, we replace it with a kernel function K(x,xᵢ):

    f(x) = sign(∑ⁿᵢ₌₁ αᵢyᵢK(xᵢ,x)+b)
This allows SVM to work in a higher-dimensional space without explicitly computing the transformation.

**3. Common Kernel Functions**
1. Linear Kernel (For Linearly Separable Data)

        K(x,x')=x⋅x'
* Equivalent to a regular dot product.
* Used when data is already linearly separable.

**Example:**

In [None]:
SVC(kernel='linear')

2. Polynomial Kernel

        K(x,x')=(x⋅x'+c)ᵈ
* Maps input data to a higher-degree polynomial space.
* c controls bias, and d is the polynomial degree.

* Example:

In [None]:
SVC(kernel='poly', degree=3)

3. Radial Basis Function Kernel

        K(x,x') = exp(-γ||x-x'||²)
* Transforms data into an infinite-dimensional space.
* Works well for complex patterns.

* Example:

In [None]:
SVC(kernel='rbf', gamma=0.1)

4. Sigmoid Kernel

        K(x,x') = tanh(γx⋅x'+c)
* Similar to activation functions in neural networks.
* Used less frequently than RBF or Polynomial.

* Example:

In [None]:
SVC(kernel='sigmoid')

**4. Kernel Trick in Python (Example)**

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import make_moons

X, y = make_moons(n_samples=200, noise=0.1, random_state=42)

svm_rbf = SVC(kernel='rbf', C=1, gamma=0.5)
svm_rbf.fit(X, y)

xx, yy = np.meshgrid(np.linspace(X[:, 0].min(), X[:, 0].max(), 100),
                     np.linspace(X[:, 1].min(), X[:, 1].max(), 100))
Z = svm_rbf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, levels=np.linspace(Z.min(), Z.max(), 20), alpha=0.75)
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors="k")
plt.title("SVM with RBF Kernel")
plt.show()

## Q 9. Compare Linear Kernel, Polynomial Kernel, and RBF Kernel.
**Ans** - **Comparison of Linear, Polynomial, and RBF Kernels in SVM**

|Feature	|Linear Kernel	|Polynomial Kernel	|RBF (Radial Basis Function) Kernel|
|-||||
|Formula |K(x,x') = x⋅x' |K(x,x') = (x⋅x'+c)ᵈ	|(K(x, x') = \exp(-\gamma|
|Complexity	|Low (Fast)	|Medium	|High (Slower but powerful)|
|Best for	|Linearly separable data	|Moderately complex data|	Highly non-linear data|
|Interpretability	|Easy to understand	|Harder to interpret	|Hardest to interpret|
|Hyperparameters	|None	|Degree d, Bias c |γ|
|Computational Cost	|Low	|Medium	|High (depends on γ)|
|Overfitting Risk	|Low	|Medium (ifd is high)	|High (if γ is too high)|
|Flexibility	|Low (only linear separation)	|Medium (depends on degree)	|High (can model complex shapes)|

**1. Linear Kernel**

When to Use -
* Data is linearly separable.
* High-dimensional data where computing other kernels is expensive.
* Interpretability is important.

**Advantages**
* Simple and fast.
* Less prone to overfitting.
* Works well when the number of features is high.

**Disadvantages**
* Only works for linearly separable data.
* Cannot capture complex relationships.

**Example in Python**

In [None]:
from sklearn.svm import SVC
svm_linear = SVC(kernel='linear', C=1.0)
svm_linear.fit(X, y)

**2. Polynomial Kernel**

When to Use -
* Data has a moderate level of non-linearity.
* Relationships are better modeled with polynomial functions.

**Advantages**
* Can model non-linear decision boundaries.
* More flexible than the linear kernel.
* Works well when there's a polynomial relationship between features.

**Disadvantages**
* Higher computational cost than a linear kernel.
* Choosing the right degree is difficult.
* High-degree polynomials can lead to overfitting.

**Example in Python**

In [None]:
svm_poly = SVC(kernel='poly', degree=3, C=1.0)
svm_poly.fit(X, y)

3. RBF Kernel

When to Use -
* Data is highly non-linear and complex.
* No prior knowledge about the feature relationships.

**Advantages**
* Extremely powerful for complex decision boundaries.
* Works well with most datasets.
* Can map data into infinite-dimensional space.

**Disadvantages**
* Higher computational cost.
* Overfitting risk if γ is too high.
* Less interpretable than linear models.

**Example in Python**

In [None]:
svm_rbf = SVC(kernel='rbf', C=1.0, gamma=0.1)
svm_rbf.fit(X, y)

4. Visual Comparison

Imagine a dataset like this:
* Non-linearly Separable Data:

In [None]:
Class +1   Class -1
   ○ ○      ● ●
     ○    ● ●
      ○  ●

Each kernel produces a different decision boundary:
* Linear Kernel: Draws a straight line, failing to separate complex data.
* Polynomial Kernel: Draws a curved boundary.
* RBF Kernel: Creates a highly flexible boundary, fitting complex patterns.

## Q 10. What is the effect of the C parameter in SVM?
**Ans** - **Effect of the C Parameter in SVM**

The C parameter in Support Vector Machines controls the trade-off between maximizing the margin and minimizing classification errors. It determines how much the model penalizes misclassified points.

**1. Role of C in SVM**
* A large C → More importance on classifying every point correctly, resulting in a smaller margin.
* A small C → More tolerance for misclassified points, leading to a wider margin.

The SVM objective function is:

    minᵥᵥ,₆ (1/2)||w||²+C∑ⁿᵢ₌₁ ξᵢ
where:
* ||w||² controls the margin width.
* C∑ξᵢ is the penalty for misclassified points.
* ξᵢ are slack variables for handling misclassified points.

**2. Effect of C on Decision Boundary**
1. High C
* Forces SVM to correctly classify most points.
* Smaller margin, leading to a more complex model.
* Risk of overfitting.

* Good for datasets with low noise.
* Can overfit noisy data.

2. Low C
* Allows some misclassifications to get a larger margin.
* More generalization, making it less sensitive to noise.
* Risk of underfitting.

* Good for datasets with noise or overlapping classes.
* May misclassify some points.

**3. Visual Example**

**Effect of C on Decision Boundaries**

|Low C (Soft Margin)	|High C (Hard Margin)|
|-||
|Wide margin, good generalization.	|Fits training data perfectly.|
|Allows some misclassification.	| Overfits (small margin, poor generalization).|

**4. Python Example: Effect of C in SVM**

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=100, n_features=2, n_classes=2, random_state=42)

svm_low_C = SVC(kernel='linear', C=0.1).fit(X, y)
svm_high_C = SVC(kernel='linear', C=100).fit(X, y)

plt.figure(figsize=(12, 5))

for i, (svm_model, title) in enumerate([(svm_low_C, "Low C (Soft Margin)"), (svm_high_C, "High C (Hard Margin)")]):
    plt.subplot(1, 2, i + 1)
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k')

    xlim = plt.xlim()
    ylim = plt.ylim()
    xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 100),
                         np.linspace(ylim[0], ylim[1], 100))
    Z = svm_model.decision_function(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    plt.contour(xx, yy, Z, levels=[0], colors='red')

    plt.title(title)

plt.show()

## Q 11. What is the role of the Gamma parameter in RBF Kernel SVM?
**Ans** - **Role of the γ Parameter in RBF Kernel SVM**

In Radial Basis Function Kernel SVM, the γ parameter controls how much influence a single training example has. It determines the range of influence of each data point when computing the decision boundary.

**1. Understanding γ in the RBF Kernel**

The RBF kernel function is:

    K(x,x') = exp(-γ||x−x′||²)
where:
* ||x-x'||²  is the Euclidean distance between two data points.
* γ determines how far the influence of each training point reaches.

**Effect of γ:**
* High γ
  * Each data point has a small area of influence.
  * The model captures fine details but may overfit.
* Low γ
  * Each data point has a large area of influence.
  * The model is smoother but may underfit.

**2. Visualizing the Effect of γ**

|Low γ (Underfitting)	|Optimal γ	|High γ (Overfitting)|
|-|||
|Smooth decision boundary	|Well-generalized decision boundary	|Fits training data perfectly|
|Poor accuracy on complex patterns	|Good balance between bias & variance	|Sensitive to noise, poor generalization|

**3. Choosing the Right γ**
* Too low γ → Model is too simple.
* Too high γ → Model memorizes training data.
* Best γ → Achieves a balance between bias & variance.

**4. Python Example: Effect of γ in RBF SVM**

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import make_moons

X, y = make_moons(n_samples=200, noise=0.1, random_state=42)

svm_low_gamma = SVC(kernel='rbf', C=1.0, gamma=0.1).fit(X, y)
svm_high_gamma = SVC(kernel='rbf', C=1.0, gamma=10).fit(X, y)

plt.figure(figsize=(12, 5))

for i, (svm_model, title) in enumerate([(svm_low_gamma, "Low Gamma (Underfitting)"), (svm_high_gamma, "High Gamma (Overfitting)")]):
    plt.subplot(1, 2, i + 1)
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k')

    xlim = plt.xlim()
    ylim = plt.ylim()
    xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 100),
                         np.linspace(ylim[0], ylim[1], 100))
    Z = svm_model.decision_function(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    plt.contour(xx, yy, Z, levels=[0], colors='red')

    plt.title(title)

plt.show()

## Q 12. What is the Naïve Bayes classifier, and why is it called "Naïve"?
**Ans** - The Naïve Bayes classifier is a probabilistic machine learning algorithm based on Bayes' Theorem. It is used for classification tasks, particularly in text classification, spam filtering, and sentiment analysis.

It assumes that features are independent given the class, which simplifies probability computations.

**Why is it called "Naïve"?**

The term "Naïve" comes from its assumption that all features are independent given the class label. This is rarely true in real-world data, but the assumption makes computations much simpler and often still gives good results.

For example, in email spam detection, the algorithm assumes that the presence of words like "money" and "free" in an email are independent of each other, even though in reality they often appear together.

**Bayes' Theorem in Naïve Bayes**

Bayes' Theorem states:

    P(Y|X) = (P(X|Y)⋅P(Y)/P(X))
Where:
* P(Y|X) = Posterior Probability.
* P(X|Y) = Likelihood.
* P(Y) = Prior Probability.
* P(X) = Evidence.

Since P(X) is constant across all classes, the formula simplifies to:

    P(Y|X) ∝ P(X|Y)P(Y)

**Naïve Bayes Assumption**

If X=(x₁,x₂,...,xₙ) represents multiple features, Naïve Bayes assumes conditional independence:

    P(X|Y) = P(x₁|Y)P(x₂|Y)...P(xₙ|Y)
Thus, the final formula for classification is:

    P(Y|X) ∝ P(Y)∏ⁿᵢ₌₁ P(xᵢ|Y)
**Types of Naïve Bayes Classifiers**
1. Gaussian Naïve Bayes
2. Multinomial Naïve Bayes
3. Bernoulli Naïve Bayes

**Example: Spam Classification**

Suppose we want to classify an email as spam or not spam based on words:

|Word|Spam Probability P(X|Spam)|Not Spam Probability P(X|NotSpam)||------|----------------|------------------||Free|0.8|0.1||Money|0.7|0.2||Win|0.9|0.1|

If we receive an email containing "Free Money", we compute:

P(Spam|X) ∝ P(Spam) x P(Free|Spam) x P(Money|Spam)
P(NotSpam|X) ∝ P(NotSpam) x P(Free|NotSpam) x P(Money|NotSpam)

Whichever is higher determines the classification.

**Advantages**
* Fast and efficient, even on large datasets.
* Performs well with high-dimensional data.
* Handles missing data well.
* Works surprisingly well even if the independence assumption is violated.

**Disadvantages**
* Feature independence assumption is unrealistic in many real-world cases.
* Poor performance on highly correlated features.
* Cannot capture complex relationships between features.

**Python Example**

In [None]:
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

emails = ["free money now", "win a lottery", "hello friend, how are you?", "urgent: win money"]
labels = [1, 1, 0, 1]

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(emails)

nb = MultinomialNB()
nb.fit(X, labels)

new_email = ["win free cash"]
X_new = vectorizer.transform(new_email)
prediction = nb.predict(X_new)

print("Spam" if prediction[0] == 1 else "Not Spam")

## Q 13. What is Bayes' Theorem?
**Ans** - **Bayes' Theorem: Understanding Conditional Probability**

Bayes' Theorem is a fundamental theorem in probability theory that describes how to update our belief about an event based on new evidence. It is widely used in machine learning, statistics, and decision-making.

**1. The Formula for Bayes' Theorem**

    P(A|B) = (P(B|A)⋅P(A))/P(B)
Where:
* P(A|B) = Posterior Probability (Probability of event A occurring given that B has occurred).
* P(B|A) = Likelihood (Probability of observing B given A is true).
* P(A) = Prior Probability (Initial belief about A, before observing B).
* P(B) = Marginal Probability (Total probability of B occurring, across all possible A).

**2. Intuition Behind Bayes' Theorem**

Bayes' theorem allows us to update our initial belief about an event after observing new evidence.

**Example: Medical Diagnosis**

A test for a disease that is 99% accurate, but the disease is very rare. If you test positive.

Using Bayes' theorem
* P(Disease) = 0.0001
* P(Positive|Disease) = 0.99
* P(Positive|No Disease) = 0.01
* P(No Disease) = 1-0.0001 = 0.9999

The total probability of testing positive (P(Positive)) is:

P(Positive) = (0.99x0.0001)+(0.01x0.9999) = 0.010098

Now, applying Bayes' Theorem:

P(Disease | Positive) = \frac{0.99 \times 0.0001}{0.010098} \approx 0.0098 \text{ (or 0.98%)}

Even though the test is 99% accurate, the probability of actually having the disease is only 0.98% because the disease is so rare!

**3. Applications of Bayes' Theorem**
**Spam Filtering (Naïve Bayes Classifier)**
* Determines whether an email is spam or not based on word probabilities.
* Uses Bayes' Theorem to compute P(Spam|Words)P(Spam|Words).

**Machine Learning & AI**
* Used in Naïve Bayes classifiers, Bayesian networks, and probabilistic models.

**Medical Diagnosis**
* Updates probability of a disease given symptoms and test results.

**Legal & Forensics**
* Determines the probability of guilt based on evidence.

**Speech & Image Recognition**
* Computes probabilities for pattern matching.

## Q 14. Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes.
**Ans** - **Differences Between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes**

Naïve Bayes classifiers are a family of probabilistic classifiers based on Bayes' Theorem. The main difference between Gaussian, Multinomial, and Bernoulli Naïve Bayes is the type of data they handle.

**1. Gaussian Naïve Bayes**
* Used for: Continuous numerical features
* Assumption: Features follow a Gaussian distribution

**It's Work**

Instead of using frequency counts, it assumes that each feature follows a normal distribution.

**Example Use Cases:**
* Iris classification
* Weather prediction
* Medical diagnosis

**Python Example:**

In [None]:
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
model = GaussianNB()
model.fit(X, y)
print(model.predict([[5.1, 3.5, 1.4, 0.2]]))

**2. Multinomial Naïve Bayes**
* Used for: Discrete count data
* Assumption: Features represent counts or probabilities of occurrence

**It's Work**
* Uses word frequencies or token counts in documents.
* Computes probabilities based on relative word occurrences in each class.

**Use Cases:**
* Text classification
* Document categorization
* Bag-of-Words models

**Python Example:**

In [None]:
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

texts = ["buy cheap meds", "cheap watches for sale", "meeting at noon", "schedule for tomorrow"]
labels = [1, 1, 0, 0]

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

model = MultinomialNB()
model.fit(X, labels)

print(model.predict(vectorizer.transform(["cheap watches"])))

**3. Bernoulli Naïve Bayes**
* Used for: Binary feature data
* Assumption: Features are Boolean

**It's Works**
* Instead of word counts, it works with word presence/absence.
* Useful for binary text classification.

**Use Cases**
* Spam filtering
* Sentiment analysis
* Document classification with binary features

**Python Example:**

In [None]:
from sklearn.naive_bayes import BernoulliNB
from sklearn.feature_extraction.text import CountVectorizer

texts = ["buy cheap meds", "cheap watches", "meeting at noon", "schedule tomorrow"]
labels = [1, 1, 0, 0]

vectorizer = CountVectorizer(binary=True)
X = vectorizer.fit_transform(texts)

model = BernoulliNB()
model.fit(X, labels)

print(model.predict(vectorizer.transform(["cheap watches"])))

**4. Comparison**

|Feature	|Gaussian Naïve Bayes (GNB)	|Multinomial Naïve Bayes (MNB)	|Bernoulli Naïve Bayes (BNB)|
|-||||
|Data Type	|Continuous (numerical)	|Discrete (word counts)	|Binary (0/1)|
|Assumption	|Normal (Gaussian) distribution	|Features represent count data	|Features represent binary presence/absence|
|Example Use Cases	|Iris classification, medical diagnosis	|Text classification (spam detection, sentiment analysis)	|Spam filtering, sentiment analysis|
|Feature Values	|Any real number	|Positive integers (counts)	|Binary (0 or 1)|
|Works Well When	|Features follow a normal distribution	|Words appear multiple times with different frequencies	|Features are either present or absent|

## Q 15. When should you use Gaussian Naïve Bayes over other variants?
**Ans** - Gaussian Naïve Bayes is best suited for continuous numerical data that follows a normal distribution. It is preferred over Multinomial Naïve Bayes and Bernoulli Naïve Bayes in certain scenarios.

**Use Gaussian Naïve Bayes**
1. Our Features Are Continuous
* GNB is designed for continuous features, whereas MNB and BNB work best with discrete data.
* If our dataset has features like height, weight, temperature, pressure, age, income, blood sugar levels, etc., GNB is the best choice.

**Example:**
* Medical Diagnosis → Predicting diseases based on numerical data like blood pressure, cholesterol levels, glucose levels.
* Iris Classification → Classifying iris flowers based on petal length, petal width, etc..

2. The Data Follows a Normal Distribution
* GNB assumes that each feature follows a Gaussian distribution.
* If our features are normally distributed, GNB performs well.

**Example:**
* Stock Market Prediction → Daily price changes often follow a normal distribution.
* Weather Forecasting → Temperature and humidity tend to be normally distributed.

**How to Check Normal Distribution**
* Histogram → Plot the feature values and see if it forms a bell curve.
* Shapiro-Wilk Test or Kolmogorov-Smirnov Test → Statistical tests for normality.

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

data = np.random.normal(loc=50, scale=10, size=1000)

sns.histplot(data, kde=True)
plt.show()

3. We Want a Fast and Simple Model
* GNB is computationally efficient and works well even on large datasets.
* Training is fast since it only requires calculating the mean and variance for each feature.

**Example:**
* Real-time applications → Fraud detection, spam filtering, anomaly detection.
* Quick prototyping → If we want a baseline model to test before using more complex algorithms.

4. We Have a Small Dataset
* GNB works well even with a small number of training samples because it makes strong independence assumptions.
* Other models like deep learning require large datasets, but GNB can perform well even with limited data.

**Example:**
* Medical research → Small datasets for disease diagnosis.
* Sensor data analysis → Limited real-world observations.

**When NOT to Use Gaussian Naïve Bayes**

|Situation	|Why GNB Fails	|Alternative|
|-|||
|Features are categorical (e.g., colors, names, brands)	|GNB requires numerical data	|Use MultinomialNB or BernoulliNB|
|Features do not follow a normal distribution	|GNB assumes Gaussian distribution, leading to incorrect probability estimates	|Use Random Forest, SVM, or Neural Networks|
|Features are highly correlated	|Naïve Bayes assumes feature independence, which may not hold	|Use Decision Trees or Logistic Regression|

## Q 16. What are the key assumptions made by Naïve Bayes?
**Ans** - Naïve Bayes classifiers are based on Bayes' Theorem and rely on a few key assumptions to simplify probability calculations. While these assumptions may not always hold perfectly in real-world data, Naïve Bayes still performs well in many applications.

**1. Feature Independence Assumption**
* Each feature is assumed to be independent of every other feature, given the class label.
* Mathematically, for a set of features X₁,X₂,...,Xₙ, the probability of a class C is calculated as:

      P(C|X₁,X₂,...,Xₙ) ∝ P(C)⋅P(X₁|C)⋅P(X₂|C)⋯P(Xₙ|C)
* Example: Spam Filtering

If an email contains the words "cheap" and "offer," Naïve Bayes assumes the probability of the email being spam depends on these words individually, rather than considering their co-occurrence.

* Limitation:

In reality, features are often correlated. Despite this, Naïve Bayes still works well in practice.

**2. Class Conditional Independence**
* Given the class label, the features are assumed to be conditionally independent.
* This means that once we know the class, the presence of one feature does not influence the presence of another.

* Example: Sentiment Analysis

If we classify movie reviews as positive or negative, the words "amazing" and "great" might appear together in positive reviews. Naïve Bayes assumes that knowing "amazing" appears does not affect the probability of "great" appearing, given the review is positive.

**3. Probability Distribution Assumption**
* Each variant of Naïve Bayes makes an assumption about how features are distributed:
  * Gaussian Naïve Bayes assumes features follow a normal distribution.
  * Multinomial Naïve Bayes assumes features follow a multinomial distribution.
  * Bernoulli Naïve Bayes assumes features are binary.

* Example: Gaussian Naïve Bayes in Medical Diagnosis

If predicting whether a patient has a disease based on blood sugar level, GNB assumes the blood sugar values follow a bell curve within each class.

**4. Prior Probability Assumption**
* Naïve Bayes uses prior probabilities.
* If prior probabilities are inaccurate or biased, it may affect classification.

* Example: Fraud Detection

If fraud occurs only 1% of the time, the model must account for this imbalance using class priors.

**When Do These Assumptions Work Well?**
* When features are actually independent.
* When a small dataset is available.
* When speed is important.

**These Assumptions Fail**
* When features are strongly correlated.
* When features do not match the assumed probability distribution.
* When dealing with complex feature interactions.

##Q 17. What are the advantages and disadvantages of Naïve Bayes?
**Ans** - Advantages and Disadvantages of Naïve Bayes
Naïve Bayes is a simple yet powerful algorithm used in classification problems, especially in text classification, spam filtering, sentiment analysis, and medical diagnosis. However, it comes with its own strengths and weaknesses.

**Advantages of Naïve Bayes**
1. Simple and Easy to Implement
* Requires minimal training and is easy to understand.
* Works well with small datasets.

* Example:
  * A spam filter can be implemented with just a few lines of code using sklearn.naive_bayes.

2. Extremely Fast and Scalable
* Works well even with millions of features.
* Requires only probability calculations.

* Example:
  * Search engines classify documents into categories in real-time.

3. Works Well with High-Dimensional Data
* Can handle a large number of features efficiently.
* Performs well in text classification.

*  Example:
  * Email spam detection uses thousands of words as features, and Naïve Bayes can efficiently classify emails based on word probabilities.

4. Handles Missing Data Well
* Does not require imputation.

*  Example:
  * In medical diagnosis, if some patient test results are missing, NB can still predict the disease.

5. Works Well on Small Datasets
* Unlike deep learning, which requires a large amount of data, Naïve Bayes performs well even with limited training samples.

* Example:
  * Fraud detection in banking where fraudulent transactions are rare.

6. Naturally Handles Imbalanced Data
Uses prior probabilities, making it robust to class imbalances.

* Example:
  * Detecting rare diseases where most cases are non-disease.

**Disadvantages of Naïve Bayes**
1. Assumes Feature Independence
* Naïve Bayes assumes all features are independent, which is often not true in real-world data.
* If features are correlated, it reduces accuracy.

*  Example:
  * In text classification, words like "New York" should be treated as a single entity, but Naïve Bayes treats "New" and "York" separately.

* Solution: Use n-grams or TF-IDF to capture dependencies.

2. Ignores Feature Interactions
* Cannot capture relationships between features.
* Struggles with context-dependent meaning.

*  Example:
  * "I am NOT happy" vs. "I am happy" → Naïve Bayes may classify both as positive sentiment since it treats words independently.

* Solution: Use bigram/trigram models or deep learning for complex text.

3. Zero Probability Problem
* If a word never appeared in training data, Naïve Bayes assigns it zero probability, making classification impossible.

* Example:
  * If a spam filter never saw the word "Bitcoin", it will fail to classify new emails containing that word.

* Solution: Use Laplace Smoothing.

4. Not Ideal for Continuous Data
* Standard Naïve Bayes works best with categorical or text data.
* If features are continuous, assumptions may not hold.

* Example:
  * Predicting house prices.

* Solution: Use Gaussian Naïve Bayes or other models like Random Forest or SVM for continuous data.

## Q 18. Why is Naïve Bayes a good choice for text classification?
**Ans** - Naïve Bayes is widely used for text classification tasks such as spam filtering, sentiment analysis, and topic categorization because of its efficiency, simplicity, and strong performance.

**Advantages of Naïve Bayes for Text Classification**
**1. Works Well with High-Dimensional Data**
* Text data has thousands of features.
* Naïve Bayes handles this efficiently because it computes probabilities separately for each feature without needing complex feature selection.

**Example**
  * In a spam filter, each word in an email acts as a feature. A Naïve Bayes model can classify emails based on word probabilities without requiring dimensionality reduction.

**2. Fast and Efficient**
* Training and inference are very fast since it only involves counting occurrences and multiplying probabilities.
* Unlike deep learning models, NB does not require extensive training on GPUs.

**Example:**
  * Real-time applications like spam filtering and chatbot responses can quickly classify new text without high computational cost.

**3. Handles Small Datasets Well**
* Many ML models require a lot of labeled data, but Naïve Bayes performs well even with limited training examples.
* Because it uses prior probabilities and strong independence assumptions, it generalizes well with fewer samples.

**Example:**
  * Medical text classification where the dataset is small.

**4. Handles Missing Features Well**
* If a document does not contain certain words, NB can still classify it correctly since each word's probability is computed independently.

**Example:**
  * If a news classifier learns that the words "election" and "candidate" are important for the Politics category, it can still classify articles that contain only one of these words.

**5. Performs Well on Imbalanced Datasets**
* Many real-world text classification tasks have class imbalances.
* Naïve Bayes naturally accounts for class imbalance using prior probabilities.

**Example:**
  * Fraud detection.

**6. Simple and Interpretable**
* Unlike deep learning, Naïve Bayes is easy to interpret:
* We can see which words contribute most to classification.
* No complex hyperparameters to tune.

**Example:**
  * If a spam filter marks an email as spam, We can check which words had the highest probabilities of appearing in spam emails.

**Limitations of Naïve Bayes for Text Classification**

|Limitation	|Why?	|Possible Solution|
|-|||
|Assumes Feature Independence	|Words in text are not truly independent (e.g., “New York” should be treated as a phrase, not two separate words).	|Use n-grams or TF-IDF to capture word dependencies.|
|Ignores Word Order	|“not happy” and “happy” may get similar probabilities.	|Use Bigram or Trigram models to preserve local context.|
|Zero Probability Issue	|If a word is not in training data, it gets a probability of 0, making classification impossible.	|Use Laplace Smoothing (add small nonzero probabilities).|
|Not Good for Long Texts	|Works best for short text (emails, tweets, reviews, news headlines). |Struggles with long documents.	|Use TF-IDF weighting or topic modeling (LDA, BERT).|

## Q 19. Compare SVM and Naïve Bayes for classification tasks.
**Ans** - Both Support Vector Machines and Naïve Bayes are widely used classification algorithms, but they have different strengths and weaknesses. Here's a detailed comparison:

**1. Basic Concept**

|Algorithm	|Concept|
|-||
|SVM (Support Vector Machine)	|Finds the optimal decision boundary (hyperplane) that maximizes the margin between different classes. Works well for complex decision boundaries.|
|Naïve Bayes (NB)	|Uses Bayes’ Theorem with the assumption that features are independent. Computes class probabilities based on feature occurrences.|

* Example:
  * SVM finds a clear boundary to separate spam vs. non-spam emails.
  * Naïve Bayes calculates word probabilities to classify emails as spam or not.

**2. Performance on Small vs. Large Datasets**

|Factor	|SVM	|Naïve Bayes|
|-|||
|Small Datasets	|Works well, especially for complex decision boundaries.	|Performs well, even with very few training examples.|
|Large Datasets	|Slower and computationally expensive for large datasets.	|Very fast and scales well to large datasets.|

**3. Computational Efficiency**

|Factor	|SVM	|Naïve Bayes|
|-|||
|Training Time	|Slow for large datasets (especially with non-linear kernels).	|Extremely fast training (only involves counting occurrences).|
|Prediction Time	|Slower, especially for large feature spaces.	|Very fast predictions (constant-time probability calculations).|

* Example:
  * SVM takes longer to train on a large text dataset.
  * Naïve Bayes can classify new emails instantly after training.

**4. Handling High-Dimensional Data**

|Factor	|SVM	|Naïve Bayes|
|-|||
|High-Dimensional Data	|Works well with feature selection or kernel tricks.	|Naturally handles high-dimensional data (e.g., text classification).|

* Example:
  * For text classification, NB works better since it doesn't need feature selection.

**5. Assumptions and Interpretability**

|Factor	|SVM	|Naïve Bayes|
|-|||
|Assumptions	|No strong assumptions but relies on well-separated data.	|Assumes independence between features (which is often unrealistic).|
|Interpretability	|Hard to interpret (black-box).	|Easy to interpret (probabilities of each class).|

* Example:
  * Spam filtering: Naïve Bayes tells you which words contribute to spam.
  * SVM doesn't explain decisions easily.

**6. Handling Outliers and Noise**

|Factor	|SVM	|Naïve Bayes|
|-|||
|Robustness to Outliers	|Sensitive to outliers, especially if using a hard margin.	|More robust to outliers since it uses probability distributions.|
|Noisy Data	|Struggles with overlapping classes.	|Works well when noise is random and feature independence holds.|

* Example:
  * Medical diagnosis: If patient data has some erroneous values, NB is more robust than SVM.

**7. Handling Imbalanced Data**

|Factor	|SVM	|Naïve Bayes|
|-|||
|Imbalanced Datasets	|Can be biased toward majority class unless properly tuned.	|Handles imbalance well using class priors.|

* Example:
  * Fraud detection: Since fraud cases are rare, Naïve Bayes naturally accounts for it by using priors.
  * SVM needs special techniques to handle imbalance properly.

**8. Handling Non-Linearly Separable Data**

|Factor	|SVM	|Naïve Bayes|
|-|||
|Non-Linear Data	|Works well with non-linear kernels (RBF, polynomial).	|Struggles if features are dependent or require complex interactions.|

* Example:
  * If sentiment analysis requires detecting complex relationships between words, SVM with RBF kernel may perform better than Naïve Bayes.

**9. Use Cases**

|Use Case	|SVM	|Naïve Bayes|
|-|||
|Spam Filtering	|Good but slower.	|Fast and widely used.|
|Sentiment Analysis	|Good for complex sentiment relationships.	|Fast but assumes word independence.|
|Text Classification	|Works well with feature selection.	|Best for large-scale text data.|
|Medical Diagnosis	|Works well with continuous data. |Handles missing data well.|
|Fraud Detection	|Needs tuning for imbalanced data.	|Works well with rare events.|

## Q 20. How does Laplace Smoothing help in Naïve Bayes?
**Ans** - **The Problem: Zero Probability Issue**

In Naïve Bayes classification, we calculate the probability of a class given a feature using Bayes' Theorem:

    P(C|X) = (P(X|C)P(C))/P(X)
where
* P(C) is the prior probability of class C.
* P(X|C) is the likelihood (probability of feature X given class C).
* P(X) is the overall probability of featureXX.

**Issue:** If a feature X is not present in the training data for a particular class, then P(X|C)=0, causing the entire probability P(C|X) to become zero.

* Example (Spam Detection):

Suppose you are classifying emails as spam or not spam based on words. If the word "Bitcoin" never appeared in spam emails during training, then:

    P("Bitcoin"|Spam) = 0
This would make the entire email probability zero, even if other words strongly indicate spam.

* The Solution: Laplace Smoothing

Laplace Smoothing adds a small positive value to every probability estimate to prevent zero probabilities.

The formula for smoothed probability is:

    P(X|C) = (count(X,C)+α)/(count(C)+α×N)
where
* α is the smoothing parameter.
* count(X,C) is the number of times feature X
 appears in class C.
* count(C) is the total number of words in class C.
 N is the total number of unique words in the dataset.

* Example (With Laplace Smoothing):

If "Bitcoin" never appeared in spam emails, but there are 10,000 unique words in the dataset, and we use α=1:

    P("Bitcoin"|Spam) = (0+1)/(Total spam words+10,000)
Now, instead of zero, we get a very small probability, preventing the classification from failing.

* When to Use Laplace Smoothing
  * Common in Text Classification.
  * Useful when datasets are small.
  * Not always needed for large datasets, where probabilities are naturally well-distributed.

# Practical

## Q 21. Write a Python program to train an SVM Classifier on the Iris dataset and evaluate accuracy.
**Ans** - Python program to train an SVM Classifier on the Iris dataset and evaluate its accuracy.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

iris = datasets.load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

svm_model = SVC(kernel='rbf', C=1.0, gamma='scale', random_state=42)
svm_model.fit(X_train, y_train)

y_pred = svm_model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))

print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))

**Explanation**
1. Load the Iris dataset using datasets.load_iris().
2. Split into train & test sets using train_test_split().
3. Standardize features using StandardScaler() to improve SVM performance.
4. Train an SVM classifier with an RBF kernel (SVC(kernel='rbf')).
5. Predict and evaluate accuracy using accuracy_score(), classification_report(), and confusion_matrix().

**Sample Output**

In [None]:
Accuracy: 1.00

Classification Report:
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      1.00      1.00        10
   virginica       1.00      1.00      1.00        10

   accuracy                            1.00        30
  macro avg        1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30


Confusion Matrix:
[[10  0  0]
 [ 0 10  0]
 [ 0  0 10]]

## Q 22. Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies.
**Ans** - Python program to train two SVM classifiers on the Wine dataset and compare their accuracies.

In [None]:
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report

wine = datasets.load_wine()
X, y = wine.data, wine.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

svm_linear = SVC(kernel='linear', C=1.0, random_state=42)
svm_linear.fit(X_train, y_train)

svm_rbf = SVC(kernel='rbf', C=1.0, gamma='scale', random_state=42)
svm_rbf.fit(X_train, y_train)

y_pred_linear = svm_linear.predict(X_test)
y_pred_rbf = svm_rbf.predict(X_test)

accuracy_linear = accuracy_score(y_test, y_pred_linear)
accuracy_rbf = accuracy_score(y_test, y_pred_rbf)

print(f"Accuracy (Linear SVM): {accuracy_linear:.2f}")
print(f"Accuracy (RBF SVM): {accuracy_rbf:.2f}")

print("\nClassification Report (Linear SVM):")
print(classification_report(y_test, y_pred_linear, target_names=wine.target_names))

print("\nClassification Report (RBF SVM):")
print(classification_report(y_test, y_pred_rbf, target_names=wine.target_names))

**Explanation**
1. Load the Wine dataset using datasets.load_wine().
2. Split into train & test sets using train_test_split().
3. Standardize features using StandardScaler() to improve SVM performance.
4. Train two SVM classifiers:
  * One with a Linear Kernel (SVC(kernel='linear')).
  * One with an RBF Kernel (SVC(kernel='rbf')).
5. Predict and compare accuracies using accuracy_score() and classification_report().

**Sample Output**

In [None]:
Accuracy (Linear SVM): 0.97
Accuracy (RBF SVM): 1.00

Classification Report (Linear SVM):
               precision    recall  f1-score   support

    class_0       1.00      1.00      1.00        14
    class_1       0.93      1.00      0.97        13
    class_2       1.00      0.90      0.95         9

    accuracy                          0.97        36
   macro avg      0.98      0.97      0.97        36
weighted avg      0.97      0.97      0.97        36


Classification Report (RBF SVM):
               precision    recall  f1-score   support

    class_0       1.00      1.00      1.00        14
    class_1       1.00      1.00      1.00        13
    class_2       1.00      1.00      1.00         9

    accuracy                           1.00        36
   macro avg       1.00      1.00      1.00        36
weighted avg       1.00      1.00      1.00        36

## Q 23. Write a Python program to train an SVM Regressor (SVR) on a housing dataset and evaluate it using Mean Squared Error (MSE).
**Ans** - Python program to train an SVM Regressor on a housing dataset and evaluate it using Mean Squared Error.

**Steps**
1. Load the housing dataset.
2. Split into training & testing sets.
3. Standardize features for better SVR performance.
4. Train an SVR model with an RBF kernel.
5. Make predictions on the test set.
6. Evaluate using Mean Squared Error and R² score.

**Code**

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error, r2_score

from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing()
X, y = housing.data, housing.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

svr_model = SVR(kernel='rbf', C=100, gamma='scale', epsilon=0.1)
svr_model.fit(X_train, y_train)

y_pred = svr_model.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error (MSE): {mse:.2f}")
print(f"R² Score: {r2:.2f}")

plt.figure(figsize=(8, 6))
plt.scatter(y_test, y_pred, alpha=0.5, color='blue', label="Predictions")
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red', linestyle='dashed', label="Perfect Fit")
plt.xlabel("Actual Prices")
plt.ylabel("Predicted Prices")
plt.title("Actual vs Predicted House Prices (SVR)")
plt.legend()
plt.show()

**Explanation**
1. Loads the California Housing dataset (fetch_california_housing()) instead of the outdated Boston dataset.
2. Splits data into training (80%) and testing (20%) using train_test_split().
3. Standardizes features using StandardScaler(), which improves SVR performance.
4. Trains an SVM Regressor (SVR) with:
  * RBF Kernel (kernel='rbf')
  * Regularization parameter C=100
  * Gamma set to 'scale' for automatic computation
  * Epsilon ε=0.1 to define margin tolerance
5. Evaluates performance using:
  * Mean Squared Error (MSE) (mean_squared_error())
  * R² Score (r2_score())
6. Plots Actual vs. Predicted Prices with a red dashed line for perfect fit reference.

**Sample Output**

In [None]:
Mean Squared Error (MSE): 0.46
R² Score: 0.81

The closer R² is to 1, the better the model fits the data.

## Q 24. Write a Python program to train an SVM Classifier with a Polynomial Kernel and visualize the decision boundary.
**Ans** - Python program to train an SVM Classifier with a Polynomial Kernel and visualize the decision boundary.

**Steps**
1. Generate a synthetic dataset using make_moons().
2. Train an SVM Classifier with a Polynomial Kernel.
3. Plot the decision boundary.

**Code**

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import make_moons
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

X, y = make_moons(n_samples=200, noise=0.2, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

svm_poly = SVC(kernel='poly', degree=3, C=1.0, coef0=1, random_state=42)
svm_poly.fit(X_train, y_train)

def plot_decision_boundary(model, X, y):
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
                         np.linspace(y_min, y_max, 200))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.coolwarm)
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm, edgecolors='k')
    plt.xlabel("Feature 1")
    plt.ylabel("Feature 2")
    plt.title("SVM with Polynomial Kernel (Degree=3)")
    plt.show()

plot_decision_boundary(svm_poly, X_train, y_train)

**Explanation**
1. Creates a synthetic dataset using make_moons(), which has a non-linear pattern.
2. Splits data into training and testing sets.
3. Standardizes features using StandardScaler() to improve SVM performance.
4. Trains an SVM classifier with a Polynomial Kernel (degree=3).
5. Plots the decision boundary:
  * Uses a mesh grid to visualize decision regions.
  * Contours the boundary between different classes.
  * Plots actual data points with colors.

**Expected Output**

A plot showing decision regions with data points overlaid.
* Curved boundary shows how the polynomial kernel captures non-linearity.

## Q 25. Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluate accuracy.
**Ans** - Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluate its accuracy.

**Steps**
1. Load the Breast Cancer dataset from 'sklearn.datasets'.
2. Split into training & testing sets.
3. Train a Gaussian Naïve Bayes classifier.
4. Evaluate performance using accuracy score & classification report.

**Code**

In [None]:
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report

from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()
X, y = cancer.data, cancer.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

gnb = GaussianNB()
gnb.fit(X_train, y_train)

y_pred = gnb.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy:.2f}")
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=cancer.target_names))

**Explanation**
1. Loads the Breast Cancer dataset using load_breast_cancer().
2. Splits data into training (80%) and testing (20%) using train_test_split().
3. Trains a Gaussian Naïve Bayes classifier (GaussianNB()).
4. Predicts and evaluates performance using:
  * Accuracy Score (accuracy_score())
  * Detailed Classification Report (classification_report())

**Expected Output**

In [None]:
Accuracy: 0.94

Classification Report:
              precision    recall  f1-score   support

   malignant       0.93      0.94      0.94        71
      benign       0.95      0.93      0.94        43

    accuracy                           0.94       114
   macro avg       0.94      0.94      0.94       114
weighted avg       0.94      0.94      0.94       114

## Q 26. Write a Python program to train a Multinomial Naïve Bayes classifier for text classification using the 20 Newsgroups dataset.
**Ans** - Python program to train a Multinomial Naïve Bayes classifier for text classification using the 20 Newsgroups dataset.

**Steps**
1. Load the 20 Newsgroups dataset from sklearn.datasets.
2. Convert text to numerical features using TfidfVectorizer.
3. Train a Multinomial Naïve Bayes classifier.
4. Evaluate performance using accuracy score & classification report.

**Code**

In [None]:
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report

categories = ['alt.atheism', 'comp.graphics', 'sci.space', 'talk.religion.misc']
newsgroups = fetch_20newsgroups(subset='train', categories=categories, remove=('headers', 'footers', 'quotes'))
X, y = newsgroups.data, newsgroups.target

vectorizer = TfidfVectorizer(stop_words='english', max_features=5000)
X_transformed = vectorizer.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_transformed, y, test_size=0.2, random_state=42)

nb_classifier = MultinomialNB()
nb_classifier.fit(X_train, y_train)

y_pred = nb_classifier.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=categories))

**Explanation**
1. Loads the 20 Newsgroups dataset using fetch_20newsgroups().
  * Uses a subset with 4 categories for faster training.
  * Removes email headers, footers, and quotes for cleaner data.
2. Converts text into numerical features using TfidfVectorizer().
  * Removes stopwords to focus on important words.
  * Limits features to 5000 words for efficiency.
3. Splits data into training (80%) and testing (20%).
4. Trains a Multinomial Naïve Bayes classifier (MultinomialNB()).
5. Evaluates accuracy and prints a classification report.

**Expected Output**

In [None]:
Accuracy: 0.85

Classification Report:
                    precision    recall  f1-score   support
    alt.atheism       0.82      0.85      0.83        20
 comp.graphics        0.87      0.90      0.88        21
     sci.space        0.88      0.83      0.86        23
talk.religion.misc    0.81      0.80      0.81        16

      accuracy                            0.85        80
     macro avg        0.85      0.85      0.85        80
  weighted avg        0.85      0.85      0.85        80

## Q 27. Write a Python program to train an SVM Classifier with different C values and compare the decision boundaries visually.
**Ans** - Python program to train an SVM Classifier with different C values and compare the decision boundaries visually.

**Steps**
1. Generate a synthetic dataset using make_moons().
2. Train SVM classifiers with different C values (0.1, 1, 10).
3. Plot decision boundaries to show how C affects the margin.

**Code**

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import make_moons
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

X, y = make_moons(n_samples=200, noise=0.2, random_state=42)

scaler = StandardScaler()
X = scaler.fit_transform(X)

C_values = [0.1, 1, 10]
svm_models = [SVC(kernel='linear', C=C, random_state=42).fit(X, y) for C in C_values]

def plot_decision_boundary(model, X, y, C, ax):
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
                         np.linspace(y_min, y_max, 200))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    ax.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.coolwarm)
    ax.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm, edgecolors='k')
    ax.set_title(f"SVM with C={C}")

fig, axes = plt.subplots(1, 3, figsize=(15, 4))
for model, C, ax in zip(svm_models, C_values, axes):
    plot_decision_boundary(model, X, y, C, ax)

plt.tight_layout()
plt.show()

**Explanation**
1. Generates a non-linear dataset using make_moons() for clear visualization.
2. Standardizes features using StandardScaler() to improve SVM performance.
3. Trains three SVM models with C = 0.1, 1, 10:
  * C = 0.1 → Larger margin.
  * C = 1 → Balanced margin & accuracy.
  * C = 10 → Smaller margin.
4. Plots decision boundaries:
  * Uses meshgrid to create a contour plot.
  * Displays how C affects the classification boundary.

**Expected Output**

Three subplots showing decision boundaries for:
* C = 0.1 → Larger margin, some misclassifications.
* C = 1 → Balanced margin, reasonable classification.
* C = 10 → Smaller margin, strict classification.

## Q 28. Write a Python program to train a Bernoulli Naïve Bayes classifier for binary classification on a dataset with binary features.
**Ans** - Python program to train a Bernoulli Naïve Bayes classifier for binary classification on a dataset with binary features.

**Steps**
1. Load a binary dataset (make_classification with binary features).
2. Split into training & testing sets.
3. Train a Bernoulli Naïve Bayes classifier (BernoulliNB).
4. Evaluate performance using accuracy & classification report.

**Code**

In [None]:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import accuracy_score, classification_report

X, y = make_classification(n_samples=1000, n_features=10, n_informative=5,
                           n_redundant=2, random_state=42)

X = (X > np.median(X)).astype(int)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

bnb = BernoulliNB()
bnb.fit(X_train, y_train)

y_pred = bnb.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

**Explanation**
1. Generates a binary dataset using make_classification().
  * Features are converted to 0 or 1 using thresholding (X > median(X)).
2. Splits data into training (80%) and testing (20%).
3. Trains a Bernoulli Naïve Bayes classifier (BernoulliNB()).
4. Evaluates the classifier using:
  * Accuracy Score (accuracy_score())
  * Detailed Classification Report (classification_report())

**Expected Output**

In [None]:
Accuracy: 0.85

Classification Report:
              precision    recall  f1-score   support

           0       0.84      0.86      0.85       98
           1       0.86      0.84      0.85       102

    accuracy                           0.85       200
   macro avg       0.85      0.85      0.85       200
weighted avg       0.85      0.85      0.85       200

## Q 29. Write a Python program to apply feature scaling before training an SVM model and compare results with unscaled data.
**Ans** - Python program to apply feature scaling before training an SVM model and compare results with unscaled data.

**Steps**
1. Load the dataset (make_classification for binary classification).
2. Train an SVM classifier without scaling and measure accuracy.
3. Apply feature scaling using StandardScaler.
4. Train an SVM classifier with scaling and compare results.

**Code**

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

X, y = make_classification(n_samples=500, n_features=10, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

svm_unscaled = SVC(kernel='rbf', random_state=42)
svm_unscaled.fit(X_train, y_train)
y_pred_unscaled = svm_unscaled.predict(X_test)
accuracy_unscaled = accuracy_score(y_test, y_pred_unscaled)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

svm_scaled = SVC(kernel='rbf', random_state=42)
svm_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = svm_scaled.predict(X_test_scaled)
accuracy_scaled = accuracy_score(y_test, y_pred_scaled)

print(f"Accuracy without Scaling: {accuracy_unscaled:.2f}")
print(f"Accuracy with Scaling: {accuracy_scaled:.2f}")

**Explanation**
1. Generates a synthetic dataset using make_classification().
2. Splits data into training (80%) and testing (20%).
3. Trains an SVM classifier without feature scaling and evaluates accuracy.
4. Applies StandardScaler to normalize features.
5. Trains an SVM classifier with scaled features and compares accuracy.

**Expected Output**

In [None]:
Accuracy without Scaling: 0.78
Accuracy with Scaling: 0.87

## Q 30. Write a Python program to train a Gaussian Naïve Bayes model and compare the predictions before and after Laplace Smoothing.
**Ans** - Python program to train a Gaussian Naïve Bayes model and compare predictions before and after Laplace Smoothing (α = 1).

**Steps**
1. Load a dataset (make_classification for binary classification).
2. Train a Gaussian Naïve Bayes classifier without smoothing (var_smoothing=1e-9).
3. Train a Gaussian Naïve Bayes classifier with Laplace Smoothing (var_smoothing=1e-3).
4. Compare predictions and accuracy before & after smoothing.

**Python Code**

In [None]:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

X, y = make_classification(n_samples=500, n_features=10, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

gnb_no_smoothing = GaussianNB(var_smoothing=1e-9)
gnb_no_smoothing.fit(X_train, y_train)
y_pred_no_smoothing = gnb_no_smoothing.predict(X_test)
accuracy_no_smoothing = accuracy_score(y_test, y_pred_no_smoothing)

gnb_with_smoothing = GaussianNB(var_smoothing=1e-3)
gnb_with_smoothing.fit(X_train, y_train)
y_pred_with_smoothing = gnb_with_smoothing.predict(X_test)
accuracy_with_smoothing = accuracy_score(y_test, y_pred_with_smoothing)

print(f"Accuracy without Smoothing: {accuracy_no_smoothing:.2f}")
print(f"Accuracy with Laplace Smoothing: {accuracy_with_smoothing:.2f}")

print("\nSample Predictions (Before vs. After Smoothing):")
for i in range(10):
    print(f"True Label: {y_test[i]}, Without Smoothing: {y_pred_no_smoothing[i]}, With Smoothing: {y_pred_with_smoothing[i]}")

**Explanation**
1. Generates a binary dataset using make_classification().
2. Splits data into training (80%) and testing (20%).
3. Trains a Gaussian Naïve Bayes classifier without Laplace Smoothing (var_smoothing=1e-9).
4. Trains a Gaussian Naïve Bayes classifier with Laplace Smoothing (var_smoothing=1e-3).
5. Compares accuracy & sample predictions to see the effect of smoothing.

**Expected Output**

In [None]:
Accuracy without Smoothing: 0.82
Accuracy with Laplace Smoothing: 0.85

Sample Predictions (Before vs. After Smoothing):
True Label: 1, Without Smoothing: 1, With Smoothing: 1
True Label: 0, Without Smoothing: 0, With Smoothing: 0
True Label: 1, Without Smoothing: 0, With Smoothing: 1
True Label: 0, Without Smoothing: 1, With Smoothing: 0
...

## Q 31. Write a Python program to train an SVM Classifier and use GridSearchCV to tune the hyperparameters (C, gamma, kernel).
**Ans** - Python program to train an SVM Classifier and use GridSearchCV to tune the hyperparameters.

**Steps**
1. Load a dataset (make_classification for binary classification).
2. Split into training & testing sets.
3. Define a parameter grid for C, gamma, and kernel.
4. Use GridSearchCV to find the best hyperparameters.
5. Train the best model and evaluate its accuracy.

**Code**

In [None]:
import numpy as np
from sklearn.svm import SVC
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score

X, y = make_classification(n_samples=500, n_features=10, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

param_grid = {
    'C': [0.1, 1, 10],
    'gamma': ['scale', 'auto', 0.01, 0.1, 1],
    'kernel': ['linear', 'rbf', 'poly']
}

svm = SVC()

grid_search = GridSearchCV(svm, param_grid, cv=5, scoring='accuracy', n_jobs=-1)
grid_search.fit(X_train, y_train)

best_model = grid_search.best_estimator_

y_pred = best_model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

print("Best Hyperparameters:", grid_search.best_params_)
print(f"Accuracy with Best Hyperparameters: {accuracy:.2f}")

**Explanation**
1. Generates a dataset for binary classification.
2. Splits data into training (80%) and testing (20%).
3. Defines a hyperparameter grid (C, gamma, and kernel).
4. Uses GridSearchCV (5-fold cross-validation) to find the best settings.
5. Trains an SVM model with the best hyperparameters.
6. Evaluates accuracy on the test set.

**Expected Output**

In [None]:
Best Hyperparameters: {'C': 10, 'gamma': 0.1, 'kernel': 'rbf'}
Accuracy with Best Hyperparameters: 0.89

## Q 32. Write a Python program to train an SVM Classifier on an imbalanced dataset and apply class weighting and check it improve accuracy.
**Ans** - Python program to train an SVM Classifier on an imbalanced dataset and apply class weighting to check if it improves accuracy.

**Steps**
1. Generate an imbalanced dataset (make_classification with skewed class distribution).
2. Train an SVM without class weighting and check accuracy.
3. Train an SVM with class weighting (class_weight='balanced') and compare accuracy.

**Code**

In [None]:
import numpy as np
from sklearn.svm import SVC
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
import matplotlib.pyplot as plt
from collections import Counter

X, y = make_classification(n_samples=1000, n_features=10, weights=[0.9, 0.1], random_state=42)

print("Class distribution:", Counter(y))

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

svm_no_weight = SVC(kernel='rbf', random_state=42)
svm_no_weight.fit(X_train, y_train)
y_pred_no_weight = svm_no_weight.predict(X_test)
accuracy_no_weight = accuracy_score(y_test, y_pred_no_weight)

svm_weighted = SVC(kernel='rbf', class_weight='balanced', random_state=42)
svm_weighted.fit(X_train, y_train)
y_pred_weighted = svm_weighted.predict(X_test)
accuracy_weighted = accuracy_score(y_test, y_pred_weighted)

print(f"\nAccuracy without Class Weighting: {accuracy_no_weight:.2f}")
print(f"Accuracy with Class Weighting: {accuracy_weighted:.2f}")

print("\nClassification Report Without Class Weighting:")
print(classification_report(y_test, y_pred_no_weight))

print("\nClassification Report With Class Weighting:")
print(classification_report(y_test, y_pred_weighted))

**Explanation**
1. Creates an imbalanced dataset (weights=[0.9, 0.1] means 90% one class, 10% the other).
2. Trains an SVM without class weighting and evaluates accuracy.
3. Trains an SVM with class_weight='balanced', which automatically adjusts weights based on class distribution.
4. Compares accuracy & classification reports to see if class weighting improves performance.

**Expected Output**

In [None]:
Class distribution: Counter({0: 900, 1: 100})

Accuracy without Class Weighting: 0.92
Accuracy with Class Weighting: 0.88

Classification Report Without Class Weighting:
               precision    recall  f1-score   support
           0       0.93      0.99      0.96       180
           1       0.50      0.07      0.12        20

Classification Report With Class Weighting:
               precision    recall  f1-score   support
           0       0.94      0.97      0.95       180
           1       0.50      0.30      0.37        20

* Without class weighting, the model ignores the minority class.
* With class weighting, recall for class 1 improves significantly.

## Q 33. Write a Python program to implement a Naïve Bayes classifier for spam detection using email data.
**Ans** - Python program to implement a Naïve Bayes classifier for spam detection using email data.

**Steps**
1. Load the dataset (SMS Spam Collection from nltk or a CSV file).
2. Preprocess the text (lowercasing, removing stopwords, tokenization).
3. Convert text into numerical features using TfidfVectorizer.
4. Train a Multinomial Naïve Bayes model for classification.
5. Evaluate accuracy & print sample predictions.

**Code**

In [None]:
import pandas as pd
import numpy as np
import re
import nltk
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report
from nltk.corpus import stopwords

nltk.download('stopwords')

url = "https://raw.githubusercontent.com/justmarkham/pycon-2016-tutorial/master/data/sms-spam-collection.csv"
df = pd.read_csv(url, encoding='latin-1', names=['label', 'message'])

df['label'] = df['label'].map({'ham': 0, 'spam': 1})

def preprocess_text(text):
    text = text.lower()
    text = re.sub(r'\W', ' ', text)
    text = re.sub(r'\s+', ' ', text).strip()
    text = ' '.join(word for word in text.split() if word not in stopwords.words('english'))
    return text

df['cleaned_message'] = df['message'].apply(preprocess_text)

vectorizer = TfidfVectorizer(max_features=5000)
X = vectorizer.fit_transform(df['cleaned_message'])
y = df['label']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

nb_classifier = MultinomialNB()
nb_classifier.fit(X_train, y_train)

y_pred = nb_classifier.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

sample_messages = ["Congratulations! You won a free lottery ticket. Claim now!",
                   "Hey, are we still meeting for lunch today?",
                   "Your bank account is at risk. Click here to secure it."]
sample_features = vectorizer.transform(sample_messages)
sample_predictions = nb_classifier.predict(sample_features)

print("\nSample Predictions:")
for msg, label in zip(sample_messages, sample_predictions):
    print(f"Message: {msg} --> {'Spam' if label == 1 else 'Ham'}")

**Explanation**
1. Loads the SMS spam dataset.
2. Preprocesses text.
3. Uses TfidfVectorizer to transform text into numerical format.
4. Splits data.
5. Trains a Multinomial Naïve Bayes model.
6. Evaluates model accuracy and prints a classification report.
7. Tests on custom spam/ham messages.

**Expected Output**

In [None]:
Accuracy: 0.98

Classification Report:
              precision    recall  f1-score   support
           0       0.98      1.00      0.99       965
           1       0.99      0.90      0.94       150

Sample Predictions:
Message: Congratulations! You won a free lottery ticket. Claim now! --> Spam
Message: Hey, are we still meeting for lunch today? --> Ham
Message: Your bank account is at risk. Click here to secure it. --> Spam

* High accuracy (~98%) on spam detection!
* Detects phishing/spam messages accurately.

## Q 34. Write a Python program to train an SVM Classifier and a Naïve Bayes Classifier on the same dataset and compare their accuracy.
**Ans** - Python program to train both an SVM Classifier and a Naïve Bayes Classifier on the same dataset and compare their accuracy.

* Steps in the Program
1. Load the dataset.
2. Split into training and testing sets.
3. Train an SVM Classifier.
4. Train a Naïve Bayes Classifier.
5. Compare their accuracy and classification reports.

**Code**

In [None]:
import numpy as np
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

data = load_iris()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

svm_classifier = SVC(kernel='rbf', C=1, gamma='scale', random_state=42)
svm_classifier.fit(X_train, y_train)
y_pred_svm = svm_classifier.predict(X_test)

nb_classifier = GaussianNB()
nb_classifier.fit(X_train, y_train)
y_pred_nb = nb_classifier.predict(X_test)

accuracy_svm = accuracy_score(y_test, y_pred_svm)
accuracy_nb = accuracy_score(y_test, y_pred_nb)

print(f"SVM Accuracy: {accuracy_svm:.2f}")
print(f"Naïve Bayes Accuracy: {accuracy_nb:.2f}")

print("\nClassification Report for SVM:")
print(classification_report(y_test, y_pred_svm))

print("\nClassification Report for Naïve Bayes:")
print(classification_report(y_test, y_pred_nb))

**Explanation**
1. Loads the Iris dataset.
2. Splits data.
3. Trains an SVM model.
4. Trains a Gaussian Naïve Bayes model.
5. Evaluates and compares accuracy and classification reports.

**Expected Output**

* SVM usually performs better on structured, complex data.
* Naïve Bayes is faster but assumes feature independence.

In [None]:
SVM Accuracy: 0.97
Naïve Bayes Accuracy: 0.95

Classification Report for SVM:
              precision    recall  f1-score   support
           0       1.00      1.00      1.00        9
           1       1.00      0.92      0.96       13
           2       0.92      1.00      0.96        8

Classification Report for Naïve Bayes:
              precision    recall  f1-score   support
           0       1.00      1.00      1.00        9
           1       0.92      1.00      0.96       13
           2       1.00      0.88      0.93        8

## Q 35. Write a Python program to perform feature selection before training a Naïve Bayes classifier and compare results.
**Ans** - Python program to perform feature selection before training a Naïve Bayes classifier, and then compare results.

**Steps in the Program**
1. Load a dataset.
2. Perform feature selection using SelectKBest with the chi2 test.
3. Train a Naïve Bayes classifier on both full and reduced feature sets.
4. Compare accuracy and classification reports.

**Code**

In [None]:
import numpy as np
import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import SelectKBest, chi2
from sklearn.metrics import accuracy_score, classification_report

data = load_breast_cancer()
X, y = data.data, data.target
feature_names = data.feature_names

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

nb_full = GaussianNB()
nb_full.fit(X_train, y_train)
y_pred_full = nb_full.predict(X_test)
accuracy_full = accuracy_score(y_test, y_pred_full)

k = 10
selector = SelectKBest(score_func=chi2, k=k)
X_new = selector.fit_transform(X, y)

selected_features = feature_names[selector.get_support()]
print("Selected Features:", selected_features)

X_train_new, X_test_new, _, _ = train_test_split(X_new, y, test_size=0.2, random_state=42)

nb_reduced = GaussianNB()
nb_reduced.fit(X_train_new, y_train)
y_pred_reduced = nb_reduced.predict(X_test_new)
accuracy_reduced = accuracy_score(y_test, y_pred_reduced)

print(f"\nAccuracy with All Features: {accuracy_full:.2f}")
print(f"Accuracy with Selected Features: {accuracy_reduced:.2f}")

print("\nClassification Report (All Features):")
print(classification_report(y_test, y_pred_full))

print("\nClassification Report (Selected Features):")
print(classification_report(y_test, y_pred_reduced))

**Explanation**
1. Loads the Breast Cancer dataset.
2. Splits data.
3. Trains a Naïve Bayes classifier on all features and records accuracy.
4. Selects the top 10 most relevant features using the Chi-Square test.
5. Retrains the classifier on the reduced feature set.
6. Compares accuracy and classification reports before and after feature selection.

**Expected Output**

In [None]:
Selected Features: ['mean radius' 'mean texture' 'mean perimeter' 'mean area' ...]

Accuracy with All Features: 0.95
Accuracy with Selected Features: 0.94

Classification Report (All Features):
              precision    recall  f1-score   support
           0       0.96      0.94      0.95        42
           1       0.95      0.96      0.95        72

Classification Report (Selected Features):
              precision    recall  f1-score   support
           0       0.96      0.93      0.94        42
           1       0.94      0.96      0.95        72

* Feature selection slightly reduces accuracy (~1%) but improves efficiency.
* Useful for reducing model complexity and training time.

## Q 36. Write a Python program to train an SVM Classifier using One-vs-Rest (OvR) and One-vs-One (OvO) strategies on the Wine dataset and compare their accuracy.
**Ans** - Python program to train an SVM Classifier using One-vs-Rest and One-vs-One strategies on the Wine dataset and compare their accuracy.

**Steps**
1. Load the Wine dataset.
2. Split into training and testing sets.
3. Train an SVM Classifier using One-vs-Rest.
4. Train an SVM Classifier using One-vs-One.
5. Compare their accuracy and classification reports.

**Code**

In [None]:
import numpy as np
from sklearn.svm import SVC
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

data = load_wine()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

svm_ovr = SVC(kernel='rbf', decision_function_shape='ovr', random_state=42)
svm_ovr.fit(X_train, y_train)
y_pred_ovr = svm_ovr.predict(X_test)

svm_ovo = SVC(kernel='rbf', decision_function_shape='ovo', random_state=42)
svm_ovo.fit(X_train, y_train)
y_pred_ovo = svm_ovo.predict(X_test)

accuracy_ovr = accuracy_score(y_test, y_pred_ovr)
accuracy_ovo = accuracy_score(y_test, y_pred_ovo)

print(f"One-vs-Rest (OvR) Accuracy: {accuracy_ovr:.2f}")
print(f"One-vs-One (OvO) Accuracy: {accuracy_ovo:.2f}")

print("\nClassification Report for One-vs-Rest (OvR):")
print(classification_report(y_test, y_pred_ovr))

print("\nClassification Report for One-vs-One (OvO):")
print(classification_report(y_test, y_pred_ovo))

**Explanation**
1. Loads the Wine dataset.
2. Splits data.
3. Trains an SVM with the One-vs-Rest strategy (decision_function_shape='ovr').
4. Trains an SVM with the One-vs-One strategy (decision_function_shape='ovo').
5. Evaluates and compares accuracy and classification reports.

**Expected Output**

In [None]:
One-vs-Rest Accuracy: 0.97
One-vs-One Accuracy: 0.97

Classification Report for One-vs-Rest:
              precision    recall  f1-score   support
           0       1.00      0.94      0.97        16
           1       1.00      0.94      0.97        16
           2       0.93      1.00      0.96        10

Classification Report for One-vs-One:
              precision    recall  f1-score   support
           0       1.00      0.94      0.97        16
           1       1.00      0.94      0.97        16
           2       0.93      1.00      0.96        10

* Both OvR and OvO achieve similar accuracy (~97%) on the Wine dataset.
* OvR is preferred for high-class-count problems.
* OvO is better for smaller datasets with many classes.

## Q 37. Write a Python program to train an SVM Classifier using Linear, Polynomial, and RBF kernels on the Breast Cancer dataset and compare their accuracy.
**Ans** - Python program to train an SVM Classifier using Linear, Polynomial, and RBF kernels on the Breast Cancer dataset and compare their accuracy.

**Steps**
1. Load the Breast Cancer dataset.
2. Split into training and testing sets.
3. Train an SVM Classifier with different kernels.
4. Compare their accuracy and classification reports.

**Code**

In [None]:
import numpy as np
from sklearn.svm import SVC
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

data = load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

kernels = ['linear', 'poly', 'rbf']
models = {}

for kernel in kernels:
    model = SVC(kernel=kernel, degree=3, C=1, gamma='scale', random_state=42)
    model.fit(X_train, y_train)

    models[kernel] = model

for kernel, model in models.items():
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f"\n🔹 Accuracy with {kernel.capitalize()} Kernel: {accuracy:.4f}")
    print(classification_report(y_test, y_pred))

**Explanation**
1. Loads the Breast Cancer dataset.
2. Splits data.
3. Trains an SVM classifier with linear, poly, and rbf kernels.
4. Evaluates and compares accuracy and classification reports.

**Expected Output**

In [None]:
🔹 Accuracy with Linear Kernel: 0.9649
              precision    recall  f1-score   support
           0       0.97      0.94      0.96        42
           1       0.96      0.98      0.97        72

🔹 Accuracy with Polynomial Kernel: 0.9571
              precision    recall  f1-score   support
           0       0.97      0.93      0.95        42
           1       0.95      0.97      0.96        72

🔹 Accuracy with RBF Kernel: 0.9737
              precision    recall  f1-score   support
           0       0.98      0.95      0.97        42
           1       0.97      0.99      0.98        72

* RBF Kernel performs the best (~97.4% accuracy).
* Linear Kernel is close (~96.5%) and is often good for high-dimensional data.
* Polynomial Kernel is slightly worse (~95.7%) and may overfit for some cases.

## Q 38. Write a Python program to train an SVM Classifier using Stratified K-Fold Cross-Validation and compute the average accuracy.
**Ans** - Python program to train an SVM Classifier using Stratified K-Fold Cross-Validation and compute the average accuracy.

**Steps**
1. Load the Breast Cancer dataset.
2. Apply Stratified K-Fold Cross-Validation.
3. Train an SVM Classifier in each fold.
4. Compute accuracy for each fold and calculate the average accuracy.

**Code**

In [None]:
import numpy as np
from sklearn.svm import SVC
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import StratifiedKFold, cross_val_score

data = load_breast_cancer()
X, y = data.data, data.target

svm_model = SVC(kernel='rbf', C=1, gamma='scale', random_state=42)

k = 5
skf = StratifiedKFold(n_splits=k, shuffle=True, random_state=42)

accuracies = cross_val_score(svm_model, X, y, cv=skf, scoring='accuracy')

print(f"Accuracies for each fold: {accuracies}")
print(f"Average Accuracy: {np.mean(accuracies):.4f}")
print(f"Standard Deviation: {np.std(accuracies):.4f}")

**Explanation**
1. Loads the Breast Cancer dataset.
2. Defines an SVM Classifier with an RBF kernel.
3. Uses Stratified K-Fold to maintain class balance in each fold.
4. Computes accuracy for each fold and calculates the average accuracy.
5. Prints the accuracies and standard deviation for stability analysis.

**Expected Output**

In [None]:
Accuracies for each fold: [0.9649 0.9561 0.9737 0.9825 0.9649]
Average Accuracy: 0.9684
Standard Deviation: 0.0093

* Stratified K-Fold ensures balanced class distribution across folds.
* Average Accuracy (~96.8%) provides a better model evaluation.
* Standard Deviation (~0.93%) shows accuracy stability across folds.

## Q 39. Python program to train a Naïve Bayes classifier using different prior probabilities and compare performance.
**Ans** - Python program to train a Naïve Bayes classifier using different prior probabilities and compare performance on the Breast Cancer dataset.

**Steps**
1. Load the Breast Cancer dataset.
2. Split into training and testing sets.
3. Train a Gaussian Naïve Bayes Classifier with different priors.
4. Compare accuracy and classification reports.

**Code**

In [None]:
import numpy as np
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

data = load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

priors_list = [
    None,
    [0.5, 0.5],
    [0.3, 0.7],
    [0.7, 0.3]
]

for priors in priors_list:
    model = GaussianNB(priors=priors)
    model.fit(X_train, y_train)

    y_pred = model.predict(X_test)

    accuracy = accuracy_score(y_test, y_pred)

    print(f"\n🔹 Prior Probabilities: {priors}")
    print(f"Accuracy: {accuracy:.4f}")
    print(classification_report(y_test, y_pred))

**Explanation**
1. Loads the Breast Cancer dataset.
2. Splits data.
3. Trains a Gaussian Naïve Bayes classifier with different priors:
  * Default → Estimates from data.
  * Equal priors ([0.5, 0.5]) → Assumes equal probability for both classes.
  * Biased priors ([0.3, 0.7] and [0.7, 0.3]) → Gives more weight to one class.
4. Evaluates models using accuracy and classification reports.

**Expected Output**

In [None]:
🔹 Prior Probabilities: None
Accuracy: 0.9649
              precision    recall  f1-score   support
           0       0.95      0.95      0.95        42
           1       0.97      0.97      0.97        72

🔹 Prior Probabilities: [0.5, 0.5]
Accuracy: 0.9649
              precision    recall  f1-score   support
           0       0.95      0.95      0.95        42
           1       0.97      0.97      0.97        72

🔹 Prior Probabilities: [0.3, 0.7]
Accuracy: 0.9561
              precision    recall  f1-score   support
           0       0.93      0.93      0.93        42
           1       0.97      0.97      0.97        72

🔹 Prior Probabilities: [0.7, 0.3]
Accuracy: 0.9561
              precision    recall  f1-score   support
           0       0.94      0.93      0.94        42
           1       0.97      0.97      0.97        72

* Default priors perform best since they are estimated from data.
* Changing priors affects recall and precision, especially in imbalanced datasets.

## Q 40. Write a Python program to perform Recursive Feature Elimination (RFE) before training an SVM Classifier and compare accuracy.
**Ans** - Python program to perform Recursive Feature Elimination before training an SVM Classifier and compare the accuracy.

**Steps**
1. Load the Breast Cancer dataset.
2. Split into training and testing sets.
3. Apply RFE to select the top features using an SVM model.
4. Train an SVM classifier with and without RFE.
5. Compare accuracy before and after feature selection.

**Code**

In [None]:
import numpy as np
from sklearn.svm import SVC
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.feature_selection import RFE

data = load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

svm_model = SVC(kernel='linear', C=1, random_state=42)

num_features = 10
rfe = RFE(estimator=svm_model, n_features_to_select=num_features)
X_train_rfe = rfe.fit_transform(X_train, y_train)
X_test_rfe = rfe.transform(X_test)

svm_full = SVC(kernel='linear', C=1, random_state=42)
svm_full.fit(X_train, y_train)
y_pred_full = svm_full.predict(X_test)
accuracy_full = accuracy_score(y_test, y_pred_full)

svm_rfe = SVC(kernel='linear', C=1, random_state=42)
svm_rfe.fit(X_train_rfe, y_train)
y_pred_rfe = svm_rfe.predict(X_test_rfe)
accuracy_rfe = accuracy_score(y_test, y_pred_rfe)

print(f"🔹 Accuracy with All Features: {accuracy_full:.4f}")
print(f"🔹 Accuracy with RFE Selected Features ({num_features} features): {accuracy_rfe:.4f}")
print("\nSelected Features (1=True, 0=False):")
print(rfe.support_)

**Explanation**
1. Loads the Breast Cancer dataset.
2. Splits data.
3. Performs RFE to select the top 10 most important features.
4. Trains an SVM classifier with all features.
5. Trains an SVM classifier with the selected features.
6. Compares accuracy before and after feature selection.

**Expected Output**

In [None]:
🔹 Accuracy with All Features: 0.9649
🔹 Accuracy with RFE Selected Features (10 features): 0.9561

Selected Features (1=True, 0=False):
[False  True  True False False  True  True False False False  True False
 False  True False False False  True False  True False  True  True False
 False False False False  True False]

* Using fewer features (~ 10 instead of 30) maintains high accuracy (~ 95.6%).
* Feature selection helps reduce model complexity and training time.

## Q 41. Write a Python program to train an SVM Classifier and evaluate its performance using Precision, Recall, and F1-Score instead of accuracy.
**Ans** - Python program to train an SVM Classifier and evaluate its performance using Precision, Recall, and F1-Score instead of accuracy.

**Steps**
1. Load the Breast Cancer dataset.
2. Split into training and testing sets.
3. Train an SVM Classifier.
4. Make predictions on the test set.
5. Evaluate using Precision, Recall, and F1-Score.

**Code**

In [None]:
from sklearn.svm import SVC
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_score, recall_score, f1_score, classification_report

data = load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

svm_model = SVC(kernel='rbf', C=1, gamma='scale', random_state=42)
svm_model.fit(X_train, y_train)

y_pred = svm_model.predict(X_test)

precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f"🔹 Precision: {precision:.4f}")
print(f"🔹 Recall: {recall:.4f}")
print(f"🔹 F1-Score: {f1:.4f}")
print("\nClassification Report:\n", classification_report(y_test, y_pred))

**Explanation**
1. Loads the Breast Cancer dataset.
2. Splits data.
3. Trains an SVM classifier with an RBF kernel.
4. Predicts class labels for the test set.
5. Computes Precision, Recall, and F1-Score:
  * Precision: How many predicted positives are actually positive?
  * Recall: How many actual positives were correctly identified?
  * F1-Score: Harmonic mean of Precision & Recall.
6. Displays a full classification report for both classes.

**Expected Output**

In [None]:
🔹 Precision: 0.9815
🔹 Recall: 0.9722
🔹 F1-Score: 0.9768

Classification Report:
              precision    recall  f1-score   support

           0       0.95      0.95      0.95        42
           1       0.98      0.97      0.98        72

    accuracy                           0.97       114
   macro avg       0.97      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114

* High precision (~98%) → Low false positives.
* High recall (~97%) → Most positives are detected.
* Balanced F1-score (~97.6%) → Good trade-off.

## Q 42. Write a Python program to train a Naïve Bayes Classifier and evaluate its performance using Log Loss (Cross-Entropy Loss).
**Ans** - Python program to train a Naïve Bayes Classifier and evaluate its performance using Log Loss.

**Steps**
1. Load the Breast Cancer dataset.
2. Split into training and testing sets.
3. Train a Gaussian Naïve Bayes classifier.
4. Predict class probabilities.
5. Compute Log Loss.

**Code**

In [None]:
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import log_loss

data = load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

nb_model = GaussianNB()
nb_model.fit(X_train, y_train)

y_prob = nb_model.predict_proba(X_test)

loss = log_loss(y_test, y_prob)

print(f"🔹 Log Loss (Cross-Entropy Loss): {loss:.4f}")

**Explanation**
1. Loads the Breast Cancer dataset.
2. Splits data.
3. Trains a Gaussian Naïve Bayes classifier.
4. Predicts class probabilities using .predict_proba().
5. Computes Log Loss using log_loss(y_test, y_prob).

**Expected Output**

In [None]:
🔹 Log Loss (Cross-Entropy Loss): 0.1253

🔹 Lower Log Loss (~0.12) → Better probability predictions.
🔹 Log Loss penalizes incorrect confident predictions more than small errors.

## Q 43. Write a Python program to train an SVM Classifier and visualize the Confusion Matrix using seaborn.
**Ans** - Python program to train an SVM Classifier and visualize the Confusion Matrix using seaborn.

**Steps**
1. Load the Breast Cancer dataset.
2. Split into training and testing sets.
3. Train an SVM Classifier.
4. Predict class labels on the test set.
5. Compute the Confusion Matrix.
6. Visualize it using seaborn.heatmap().

**Code**

In [None]:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report

data = load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

svm_model = SVC(kernel='rbf', C=1, gamma='scale', random_state=42)
svm_model.fit(X_train, y_train)

y_pred = svm_model.predict(X_test)

cm = confusion_matrix(y_test, y_pred)

plt.figure(figsize=(6, 5))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", xticklabels=['Benign', 'Malignant'], yticklabels=['Benign', 'Malignant'])
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.title("Confusion Matrix")
plt.show()

print("\nClassification Report:\n", classification_report(y_test, y_pred))

**Explanation**
1. Loads the Breast Cancer dataset (binary classification: 0 = Benign, 1 = Malignant).
2. Splits data (80% train, 20% test).
3. Trains an SVM classifier with RBF kernel.
4. Predicts class labels for the test set.
5. Computes the Confusion Matrix using confusion_matrix().
6. Uses seaborn.heatmap() to visualize the matrix.
7. Displays a classification report (Precision, Recall, F1-Score).

**Expected Output**
* Confusion Matrix Visualization
  * Darker shades → More samples in that category.
  * Diagonal elements → Correct predictions.
  * Off-diagonal elements → Misclassifications.
* Sample Confusion Matrix

In [None]:

            Predicted
            Benign  Malignant
True Benign   41        1
True Malignant 2       70

* Sample Classification Report

In [None]:
              precision    recall  f1-score   support
       0       0.95      0.98      0.97        42
       1       0.99      0.97      0.98        72
    accuracy                           0.97       114
   macro avg       0.97      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114

## Q 44. Write a Python program to train an SVM Regressor (SVR) and evaluate its performance using Mean Absolute Error (MAE) instead of MSE.
**Ans** - Python program to train an SVM Regressor and evaluate its performance using Mean Absolute Error instead of Mean Squared Error.

**Steps**
1. Load the California Housing dataset.
2. Split into training and testing sets.
3. Apply feature scaling.
4. Train an SVM Regressor  with RBF Kernel.
5. Make predictions on the test set.
6. Evaluate performance using Mean Absolute Error.

**Code**

In [None]:
import numpy as np
from sklearn.svm import SVR
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error

data = fetch_california_housing()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler_X = StandardScaler()
scaler_y = StandardScaler()

X_train_scaled = scaler_X.fit_transform(X_train)
X_test_scaled = scaler_X.transform(X_test)

y_train_scaled = scaler_y.fit_transform(y_train.reshape(-1, 1)).flatten()
y_test_scaled = scaler_y.transform(y_test.reshape(-1, 1)).flatten()

svr_model = SVR(kernel='rbf', C=100, gamma='scale', epsilon=0.1)
svr_model.fit(X_train_scaled, y_train_scaled)

y_pred_scaled = svr_model.predict(X_test_scaled)

y_pred = scaler_y.inverse_transform(y_pred_scaled.reshape(-1, 1)).flatten()

mae = mean_absolute_error(y_test, y_pred)

print(f"🔹 Mean Absolute Error (MAE): {mae:.4f}")

**Explanation**
1. Loads the California Housing dataset.
2. Splits data (80% train, 20% test).
3. Applies Standard Scaling (StandardScaler()) for both X and y.
4. Trains an SVR model with RBF Kernel.
5. Predicts values and applies inverse scaling to return predictions to original scale.
6. Evaluates performance using Mean Absolute Error.

**Expected Output**

In [None]:
🔹 Mean Absolute Error (MAE): 0.5673

* Lower MAE (~0.56) → Smaller absolute errors in predictions.
* MAE is better for real-world applications since it penalizes large and small errors equally.

## Q 45. Write a Python program to train a Naïve Bayes classifier and evaluate its performance using the ROC-AUC score.
**Ans** - Python program to train a Naïve Bayes classifier and evaluate its performance using the ROC-AUC score.

**Steps**
1. Load the Breast Cancer dataset.
2. Split into training and testing sets.
3. Train a Gaussian Naïve Bayes classifier.
4. Predict probability scores for the ROC-AUC calculation.
5. Compute and display the ROC-AUC score.
6. Plot the ROC Curve using matplotlib & seaborn.

**Code**

In [None]:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, roc_curve

data = load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

nb_model = GaussianNB()
nb_model.fit(X_train, y_train)

y_prob = nb_model.predict_proba(X_test)[:, 1]

roc_auc = roc_auc_score(y_test, y_prob)
print(f"🔹 ROC-AUC Score: {roc_auc:.4f}")

fpr, tpr, _ = roc_curve(y_test, y_prob)

plt.figure(figsize=(6, 5))
sns.lineplot(x=fpr, y=tpr, label=f"ROC Curve (AUC = {roc_auc:.2f})", color='blue')
plt.plot([0, 1], [0, 1], 'k--', label="Random Classifier (AUC = 0.50)")
plt.xlabel("False Positive Rate (FPR)")
plt.ylabel("True Positive Rate (TPR)")
plt.title("ROC Curve for Naïve Bayes Classifier")
plt.legend()
plt.show()

**Explanation**
1. Loads the Breast Cancer dataset.
2. Splits the dataset into 80% training and 20% testing sets.
3. Trains a Gaussian Naïve Bayes classifier.
4. Predicts probability scores for the positive class (predict_proba()[:, 1]).
5. Computes the ROC-AUC score using roc_auc_score().
6. Generates the ROC curve using roc_curve().
7. Plots the ROC Curve with a diagonal reference line (y = x) to compare against a random classifier.

**Expected Output**

In [None]:
🔹 ROC-AUC Score: 0.98

* Higher AUC (~0.98) → Better classifier performance.
* AUC of 0.5 → Random classifier (no predictive power).
* Closer to 1.0 → Perfect classifier.

## Q 46. Write a Python program to train an SVM Classifier and visualize the Precision-Recall Curve.
**Ans** - Python program to train an SVM Classifier and visualize the Precision-Recall Curve.

**Steps**
1. Load the Breast Cancer dataset.
2. Split into training and testing sets.
3. Train an SVM Classifier with an RBF Kernel.
4. Predict probability scores for the Precision-Recall Curve.
5. Compute Precision-Recall values using precision_recall_curve().
6. Plot the Precision-Recall Curve using matplotlib.

**Code**

In [None]:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.svm import SVC
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_recall_curve, auc

data = load_breast_cancer()
X, y = data.data, data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

svm_model = SVC(kernel='rbf', C=1, gamma='scale', probability=True, random_state=42)
svm_model.fit(X_train, y_train)

y_prob = svm_model.predict_proba(X_test)[:, 1]

precision, recall, _ = precision_recall_curve(y_test, y_prob)

pr_auc = auc(recall, precision)

plt.figure(figsize=(6, 5))
sns.lineplot(x=recall, y=precision, label=f"PR Curve (AUC = {pr_auc:.2f})", color='blue')
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.title("Precision-Recall Curve for SVM Classifier")
plt.legend()
plt.show()

**Explanation**
1. Loads the Breast Cancer dataset.
2. Splits into 80% train and 20% test sets.
3. Trains an SVM classifier with an RBF kernel (probability=True) to enable probability estimates.
4. Predicts probability scores for the positive class (predict_proba()[:, 1]).
5. Computes Precision-Recall values using precision_recall_curve().
6. Calculates PR AUC score to summarize overall model performance.
7. Plots the Precision-Recall Curve using seaborn & matplotlib.

**Expected Output**
* Precision-Recall Curve
  * The higher the area under the curve, the better the classifier at distinguishing positive cases.
  * A steep curve close to (1,1) indicates better performance.
  * A PR AUC > 0.9 is generally excellent.