Question 1: What is a Support Vector Machine (SVM), and how does it work?

-  Support Vector Machine (SVM)

A Support Vector Machine is a supervised machine learning algorithm used for classification (mainly) and regression tasks.

It works by finding the optimal decision boundary (hyperplane) that separates different classes in the feature space.

How It Works (Step by Step)
1. Decision Boundary (Hyperplane)

- In 2D space, the decision boundary is a line.

- In 3D space, it’s a plane.

- In higher dimensions, it’s called a hyperplane.

- The goal is to find the hyperplane that maximizes the margin (distance between the hyperplane and the nearest data points).

2. Support Vectors

- The data points closest to the hyperplane are called support vectors.

- They are critical, because if you remove them, the position of the hyperplane changes.

- Only these points determine the boundary, not the others.

3. Margin Maximization

- SVM tries to maximize the margin = distance between the hyperplane and the nearest points of each class.

- A larger margin means better generalization and lower risk of overfitting.

4. Handling Non-Linearly Separable Data

- If the data is not linearly separable, SVM uses the Kernel Trick.

- Kernel functions (like Polynomial, RBF/Gaussian, Sigmoid) map data into higher-dimensional space where a linear separation is possible.

5. Soft Margin (Handling Noise & Outliers)

- Real-world data is often noisy.

- SVM introduces a parameter C (regularization):

- Large C → strict classification, less tolerance for misclassification.

- Small C → allows some misclassification, better generalization.

Question 2: Explain the difference between Hard Margin and Soft Margin SVM

-  Hard Margin SVM

- Assumes the dataset is perfectly linearly separable (no overlap, no noise).

- The SVM finds a hyperplane that separates the classes without any misclassification.

- The margin is maximized and all data points must lie outside or on the margin boundaries.

🔹 Advantages:

- Simple and works well if data is perfectly clean and separable.

🔹 Disadvantages:

- Very sensitive to outliers and noise.

- If just one point is misclassified, the hard margin solution may fail.

Soft Margin SVM :-

- Used when data is not perfectly separable (real-world cases).

- Introduces a slack variable (ξ) that allows some misclassifications.

- Adds a regularization parameter (C) to control trade-off:

- Large C → fewer misclassifications (harder margin, risk of overfitting).

- Small C → more tolerance to misclassifications (softer margin, better generalization).

🔹 Advantages:

- Robust to noise and outliers.

- Works well in practical datasets that are not perfectly separable.

🔹 Disadvantages:

- Needs proper tuning of C (regularization).

Question 3: What is the Kernel Trick in SVM? Give one example of a kernel and
explain its use case.

-  Kernel Trick in SVM

Many datasets are not linearly separable in their original feature space.

The kernel trick allows SVM to implicitly map data into a higher-dimensional space where it becomes linearly separable.

Importantly, the kernel trick avoids explicitly computing the high-dimensional transformation (which could be computationally expensive).

Instead, it uses a kernel function that directly computes the dot product of two vectors in the higher-dimensional feature space.

👉 This way, SVM can create non-linear decision boundaries while still solving the optimization problem efficiently.

Example: Radial Basis Function (RBF) Kernel

Formula:

𝐾
(
𝑥
,
𝑥
′
)
= exp
⁡
(
−
𝛾
∥
𝑥
−
𝑥
′
∥
^2
)

where

𝑥
,
𝑥
′
 = data points

𝛾
= controls the influence of a single training example
Use Case

Suppose you have a dataset shaped like concentric circles (inner circle = class 0, outer circle = class 1).

In 2D space, this is not linearly separable.

With an RBF kernel, SVM maps the data into higher dimensions where a simple linear separation (hyperplane) is possible.

Question 4: What is a Naïve Bayes Classifier, and why is it called “naïve”?

-  Naïve Bayes Classifier

- A probabilistic supervised learning algorithm based on Bayes’ Theorem.

Commonly used for classification tasks such as spam filtering, sentiment analysis, and text classification.

- It predicts the class of a sample based on the posterior probability:

  - P(Class∣Features) = P(Features∣Class)⋅P(Class)​/(Features)
​

- Why is it called “Naïve”?

- Because it makes the naïve assumption that all features are independent of each other given the class label.

- In reality, features often correlate (e.g., in text classification, the words “free” and “offer” are not independent).

- Despite this unrealistic assumption, Naïve Bayes works surprisingly well in many domains, especially with high-dimensional data (like text).

- Key Variants

- Gaussian Naïve Bayes → assumes features follow a normal distribution (e.g., continuous data like age, income).

- Multinomial Naïve Bayes → used for discrete count features (e.g., word counts in text classification).

- Bernoulli Naïve Bayes → used for binary features (e.g., word present or absent in a document).

Advantages

- Very fast and efficient for large datasets.

- Works well with text classification and spam detection.

- Requires less training data.

Disadvantages

- Independence assumption rarely holds in real-world data.

- Performs poorly when features are highly correlated.

Question 5: Describe the Gaussian, Multinomial, and Bernoulli Naïve Bayes variants.
When would you use each one?

 - 1. Gaussian Naïve Bayes

  - Assumes that the features follow a normal (Gaussian) distribution.

  - Used for continuous data.

  - For each class, it estimates the mean and variance of the feature values, then applies Bayes’ theorem.

  - Predicting whether a patient has a disease based on continuous features (age, blood pressure, cholesterol).

- Iris flower classification (sepal length, petal width, etc.).

2. Multinomial Naïve Bayes

- Assumes features are discrete counts (like word frequencies).

- The likelihood is based on multinomial distribution.

- Very popular in text classification tasks.

 - Use case examples:

- Spam filtering (count of words like “free”, “offer”).

- Sentiment analysis (word frequency in reviews).

- Document categorization (news, sports, politics).

3. Bernoulli Naïve Bayes

- Assumes features are binary (0 or 1) → whether a feature is present or absent.

- Instead of using counts, it only considers presence/absence of features.

- Use case examples:

- Text classification where features are “word present (1)” or “word absent (0)”.

- Sentiment classification with binary indicators for specific keywords.

- Simple recommendation systems (user has clicked/not clicked on an item).

In [1]:
# Import libraries
from sklearn.datasets import load_iris, load_breast_cancer, fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
from sklearn.preprocessing import Binarizer
from sklearn.metrics import accuracy_score

# -------------------------------
# 1. Gaussian Naïve Bayes on Iris dataset
# -------------------------------
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.2, random_state=42
)

gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)
print("GaussianNB (Iris dataset) Accuracy:", accuracy_score(y_test, y_pred))

# -------------------------------
# 2. Bernoulli Naïve Bayes on Breast Cancer dataset
# -------------------------------
cancer = load_breast_cancer()
# Binarize continuous features (turn into 0/1 for BernoulliNB)
binarizer = Binarizer(threshold=cancer.data.mean())
X_binarized = binarizer.fit_transform(cancer.data)

X_train, X_test, y_train, y_test = train_test_split(
    X_binarized, cancer.target, test_size=0.2, random_state=42
)

bnb = BernoulliNB()
bnb.fit(X_train, y_train)
y_pred = bnb.predict(X_test)
print("BernoulliNB (Breast Cancer dataset) Accuracy:", accuracy_score(y_test, y_pred))

# -------------------------------
# 3. Multinomial Naïve Bayes on Text Data (20 Newsgroups)
# -------------------------------
categories = ['alt.atheism', 'sci.space']  # just 2 categories for speed
newsgroups = fetch_20newsgroups(subset='train', categories=categories, shuffle=True, random_state=42)

# Convert text to word counts
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(newsgroups.data)
y = newsgroups.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

mnb = MultinomialNB()
mnb.fit(X_train, y_train)
y_pred = mnb.predict(X_test)
print("MultinomialNB (Text dataset) Accuracy:", accuracy_score(y_test, y_pred))


GaussianNB (Iris dataset) Accuracy: 1.0
BernoulliNB (Breast Cancer dataset) Accuracy: 0.8070175438596491
MultinomialNB (Text dataset) Accuracy: 0.9953488372093023


Question 6: Write a Python program to: ● Load the Iris dataset ● Train an SVM Classifier with a linear kernel ● Print the model's accuracy and support vectors.

In [2]:
# Import libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Initialize SVM with linear kernel
svm_clf = SVC(kernel='linear', random_state=42)

# Train the model
svm_clf.fit(X_train, y_train)

# Predictions
y_pred = svm_clf.predict(X_test)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print("SVM Classifier Accuracy:", accuracy)

# Support vectors
print("Support Vectors:\n", svm_clf.support_vectors_)
print("Number of support vectors per class:", svm_clf.n_support_)


SVM Classifier Accuracy: 1.0
Support Vectors:
 [[4.8 3.4 1.9 0.2]
 [5.1 3.3 1.7 0.5]
 [4.5 2.3 1.3 0.3]
 [5.6 3.  4.5 1.5]
 [5.4 3.  4.5 1.5]
 [6.7 3.  5.  1.7]
 [5.9 3.2 4.8 1.8]
 [5.1 2.5 3.  1.1]
 [6.  2.7 5.1 1.6]
 [6.3 2.5 4.9 1.5]
 [6.1 2.9 4.7 1.4]
 [6.5 2.8 4.6 1.5]
 [6.9 3.1 4.9 1.5]
 [6.3 2.3 4.4 1.3]
 [6.3 2.5 5.  1.9]
 [6.3 2.8 5.1 1.5]
 [6.3 2.7 4.9 1.8]
 [6.  3.  4.8 1.8]
 [6.  2.2 5.  1.5]
 [6.2 2.8 4.8 1.8]
 [6.5 3.  5.2 2. ]
 [7.2 3.  5.8 1.6]
 [5.6 2.8 4.9 2. ]
 [5.9 3.  5.1 1.8]
 [4.9 2.5 4.5 1.7]]
Number of support vectors per class: [ 3 11 11]


Question 7: Write a Python program to: ● Load the Breast Cancer dataset ● Train a Gaussian Naïve Bayes model ● Print its classification report including precision, recall, and F1-score

In [3]:
# Import libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report

# Load Breast Cancer dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Initialize Gaussian Naïve Bayes
gnb = GaussianNB()

# Train the model
gnb.fit(X_train, y_train)

# Predictions
y_pred = gnb.predict(X_test)

# Classification report
print("Gaussian Naïve Bayes - Breast Cancer Dataset")
print(classification_report(y_test, y_pred, target_names=data.target_names))


Gaussian Naïve Bayes - Breast Cancer Dataset
              precision    recall  f1-score   support

   malignant       1.00      0.93      0.96        43
      benign       0.96      1.00      0.98        71

    accuracy                           0.97       114
   macro avg       0.98      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114

