# SVM & Naive Bayes Assignment Theory Questions

# Question 1:  What is a Support Vector Machine (SVM), and how does it work?

# Answer:


### 🔹 What is SVM?

SVM finds the **optimal hyperplane** that best separates data points of different classes in a high-dimensional space. The goal is to **maximize the margin** between the two classes — the distance between the hyperplane and the nearest data points from each class.



### 🔹 How SVM Works:

1. **Hyperplane**:

   * In a 2D space, it's a line.
   * In a 3D space, it's a plane.
   * In higher dimensions, it's called a hyperplane.
   * The best hyperplane is the one that **maximizes the margin** between classes.

2. **Support Vectors**:

   * These are the **data points closest to the hyperplane**.
   * They are critical in defining the position and orientation of the hyperplane.
   * Removing them would change the decision boundary.

3. **Margin**:

   * The margin is the gap between the support vectors of the two classes.
   * SVM tries to **maximize this margin** to improve generalization.



### 🔹 SVM in Non-Linearly Separable Data:

When data is not linearly separable, SVM uses two techniques:

1. **Kernel Trick**:

   * Maps the data to a higher-dimensional space where it **becomes linearly separable**.
   * Common kernels:

     * Linear
     * Polynomial
     * Radial Basis Function (RBF)
     * Sigmoid

2. **Soft Margin (C parameter)**:

   * Allows some misclassifications to avoid overfitting.
   * **C** is a regularization parameter:

     * High C → low tolerance for misclassification (less regularization).
     * Low C → more tolerance for misclassification (more regularization).

# Question 2: Explain the difference between Hard Margin and Soft Margin SVM.

# Answer:

###  1. **Hard Margin SVM**

* **Definition**: A **Hard Margin SVM** assumes that the data is **perfectly linearly separable** — i.e., you can draw a straight line (or hyperplane) that completely separates the two classes **without any errors**.

* **Characteristics**:

  * No misclassifications are allowed.
  * All data points must lie **outside or on** the margin boundaries.
  * Maximizes the margin **under a strict constraint**.

* **When to Use**:

  * Only when your data is **perfectly separable**.
  * Rare in real-world applications due to noise/outliers.

* **Drawbacks**:

  * **Very sensitive** to outliers and noise.
  * Not flexible — real-world data is rarely perfectly separable.



###  2. **Soft Margin SVM**

* **Definition**: A **Soft Margin SVM** allows some data points to **violate** the margin boundaries (i.e., be misclassified or fall inside the margin) in order to improve generalization.

* **Characteristics**:

  * Introduces a **tuning parameter** `C` (regularization parameter).

    * Low `C`: wider margin, allows more misclassifications (better generalization).
    * High `C`: narrower margin, fewer misclassifications (may overfit).
  * Balances between **maximizing margin** and **minimizing classification error**.

* **When to Use**:

  * When data is **not perfectly separable** (which is common).
  * Suitable for noisy, real-world data.

* **Advantages**:

  * More **robust** to outliers.
  * Provides a **trade-off** between bias and variance.


# Question 3: What is the Kernel Trick in SVM? Give one example of a kernel and explain its use case.

# Answer:
### What is the Kernel Trick?

* Instead of explicitly transforming data to a higher-dimensional space (which can be computationally expensive), the kernel trick **computes the dot product of the data in that higher-dimensional space without ever transforming the data explicitly**.
* This allows SVM to create **non-linear decision boundaries** efficiently.



### Why Use It?

* Many datasets cannot be separated by a straight line (i.e., not linearly separable).
* The kernel trick enables SVM to **learn complex, non-linear patterns** by mapping input features into **higher-dimensional feature spaces**, where a linear separator **can exist**.



### Common Kernels in SVM:

| Kernel Name        | Use Case                                               |
| ------------------ | ------------------------------------------------------ |
| **Linear**         | When data is linearly separable                        |
| **Polynomial**     | When data shows polynomial relationships               |
| **RBF (Gaussian)** | When data has circular or radial patterns (non-linear) |
| **Sigmoid**        | Similar to neural networks (rarely used in practice)   |



### Example: **Radial Basis Function (RBF) Kernel**

#### Formula:

$$
K(x, x') = \exp\left(-\gamma \|x - x'\|^2\right)
$$

* Where:

  * $x, x'$ are data points
  * $\gamma$ is a parameter that controls the spread of the kernel

#### 🔹 Use Case:

* **Non-linearly separable data**, especially when the class boundaries are **circular or irregular**.
* Very common in **image classification**, **bioinformatics**, and **anomaly detection**.

#### 🔹 Why RBF Works Well:

* It considers **distance between points**: closer points are more similar.
* Allows the SVM to create a flexible, curved decision boundary.



### Visual Analogy:

* Imagine trying to separate two classes shaped like concentric circles.
* In 2D, it's impossible with a straight line.
* The kernel trick (e.g., RBF) maps the data into a higher-dimensional space where the two circles become **separable by a plane**.


# Question 4: What is a Naïve Bayes Classifier, and why is it called “naïve”?

# Answer:

###  What is a Naïve Bayes Classifier?

A **Naïve Bayes Classifier** is a **supervised learning algorithm** based on **Bayes' Theorem**, used primarily for **classification tasks**. It is especially effective in **text classification**, **spam detection**, and **sentiment analysis**.



###  How It Works:

It calculates the **probability of each class** given a set of input features and assigns the class with the **highest probability**.

#### 🔹 Bayes' Theorem:

$$
P(C|X) = \frac{P(X|C) \cdot P(C)}{P(X)}
$$

Where:

* $P(C|X)$: Posterior probability of class $C$ given features $X$
* $P(X|C)$: Likelihood of features given class
* $P(C)$: Prior probability of class
* $P(X)$: Evidence (constant for all classes)



###  Why Is It Called **“Naïve”**?

It is called **naïve** because it **assumes that all features are independent** of each other **given the class label**.

#### Example:

* In spam classification, Naïve Bayes assumes that the presence of the word “free” is independent of the presence of the word “offer,” even though in real life, they often appear together in spam emails.

This **strong independence assumption** is often **not true**, which is why the algorithm is called "naïve."


### 🔹 Despite the “Naïve” Assumption, It Works Well:

* Especially in high-dimensional problems like **text classification**, where the independence assumption is often “good enough.”
* It's also **fast**, **scalable**, and performs surprisingly well even when the independence assumption is violated.



### Types of Naïve Bayes Classifiers:

| Type               | Use Case                                          |
| ------------------ | ------------------------------------------------- |
| **Multinomial NB** | Text classification (word counts)                 |
| **Bernoulli NB**   | Binary/Boolean features (e.g., word presence)     |
| **Gaussian NB**    | Continuous features (assumes normal distribution) |



### 🔹 Example Use Case:

**Spam Detection**: Given features like word frequency, Naïve Bayes calculates the probability of an email being spam and classifies it accordingly.

# Question 5: Describe the Gaussian, Multinomial, and Bernoulli Naïve Bayes variants. When would you use each one?

# Answer:
###  1. **Gaussian Naïve Bayes**

####  Use when:

* Your features are **continuous numerical values** (e.g., height, weight, temperature).
* The features are assumed to follow a **normal (Gaussian) distribution**.

####  How it works:

* It models the likelihood of features using the **Gaussian (bell curve) distribution**.
* For each feature, it calculates the **mean** and **variance** from the training data to compute the probability.

####  Use case examples:

* Medical data (e.g., diagnosing based on lab results)
* Sensor data
* Iris dataset (flower classification)



###  2. **Multinomial Naïve Bayes**

####  Use when:

* Your features are **discrete counts** (e.g., word counts in text).
* Often used in **Natural Language Processing (NLP)** tasks.

####  How it works:

* Assumes that features represent **frequencies or counts** (e.g., how many times a word appears in a document).
* Calculates the probability of features occurring in a given class.

#### Use case examples:

* Text classification (e.g., spam detection, topic labeling)
* Document categorization
* Sentiment analysis (based on word frequency)



### 3. **Bernoulli Naïve Bayes**

####  Use when:

* Your features are **binary (0 or 1)** — i.e., feature is **present or absent**.
* Still useful in **text classification**, but using presence/absence of words instead of their frequency.

####  How it works:

* Assumes each feature follows a **Bernoulli (binary) distribution**.
* Calculates the probability that a feature is present or absent given the class.

#### Use case examples:

* Spam filtering (based on whether a word appears or not)
* Classifying emails or tweets by presence of certain keywords



# SVM & Naive Bayes Assignment Practical Questions

In [1]:
# Question 6:   Write a Python program to: ● Load the Iris dataset ● Train an SVM Classifier with a linear kernel ● Print the model's accuracy and support vectors.

# Answer:
# Import necessary libraries
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Step 1: Load the Iris dataset
iris = datasets.load_iris()
X = iris.data       # Feature matrix
y = iris.target     # Target labels

# Step 2: Split the dataset into training and testing sets (80/20 split)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3: Create and train the SVM model with a linear kernel
svm_model = SVC(kernel='linear')
svm_model.fit(X_train, y_train)

# Step 4: Predict on test data
y_pred = svm_model.predict(X_test)

# Step 5: Print accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")

# Step 6: Print support vectors
print("\nSupport Vectors:")
print(svm_model.support_vectors_)

# Optional: Print support vector indices and counts per class
print("\nIndices of Support Vectors:")
print(svm_model.support_)

print("\nNumber of Support Vectors for Each Class:")
print(svm_model.n_support_)


Model Accuracy: 1.00

Support Vectors:
[[4.8 3.4 1.9 0.2]
 [5.1 3.3 1.7 0.5]
 [4.5 2.3 1.3 0.3]
 [5.6 3.  4.5 1.5]
 [5.4 3.  4.5 1.5]
 [6.7 3.  5.  1.7]
 [5.9 3.2 4.8 1.8]
 [5.1 2.5 3.  1.1]
 [6.  2.7 5.1 1.6]
 [6.3 2.5 4.9 1.5]
 [6.1 2.9 4.7 1.4]
 [6.5 2.8 4.6 1.5]
 [6.9 3.1 4.9 1.5]
 [6.3 2.3 4.4 1.3]
 [6.3 2.5 5.  1.9]
 [6.3 2.8 5.1 1.5]
 [6.3 2.7 4.9 1.8]
 [6.  3.  4.8 1.8]
 [6.  2.2 5.  1.5]
 [6.2 2.8 4.8 1.8]
 [6.5 3.  5.2 2. ]
 [7.2 3.  5.8 1.6]
 [5.6 2.8 4.9 2. ]
 [5.9 3.  5.1 1.8]
 [4.9 2.5 4.5 1.7]]

Indices of Support Vectors:
[ 31  33  91  22  45  54  59  60  62  73  79  80 105 110   5  16  30  42
  68  81  87 101 112 113 116]

Number of Support Vectors for Each Class:
[ 3 11 11]


In [2]:
# Question 7:  Write a Python program to: ● Load the Breast Cancer dataset ● Train a Gaussian Naïve Bayes model ● Print its classification report including precision, recall, and F1-score.

# Answer:
# Import necessary libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report

# Step 1: Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data         # Features
y = data.target       # Labels (0 = malignant, 1 = benign)

# Step 2: Split the dataset into training and test sets (80/20 split)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3: Train a Gaussian Naïve Bayes model
model = GaussianNB()
model.fit(X_train, y_train)

# Step 4: Predict on test data
y_pred = model.predict(X_test)

# Step 5: Print the classification report
print("Classification Report:\n")
print(classification_report(y_test, y_pred, target_names=data.target_names))


Classification Report:

              precision    recall  f1-score   support

   malignant       1.00      0.93      0.96        43
      benign       0.96      1.00      0.98        71

    accuracy                           0.97       114
   macro avg       0.98      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114



In [3]:
# Question 8: Write a Python program to: ● Train an SVM Classifier on the Wine dataset using GridSearchCV to find the best C and gamma. ● Print the best hyperparameters and accuracy.

# Answer:
# Import necessary libraries
from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Step 1: Load the Wine dataset
wine = datasets.load_wine()
X = wine.data       # Feature matrix
y = wine.target     # Target labels

# Step 2: Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3: Define the SVM model
svm_model = SVC()

# Step 4: Define the parameter grid for C and gamma
param_grid = {
    'C': [0.1, 1, 10, 100],           # Regularization parameter
    'gamma': [0.001, 0.01, 0.1, 1],   # Kernel coefficient
    'kernel': ['rbf']                # Using RBF kernel
}

# Step 5: Use GridSearchCV to find the best parameters
grid_search = GridSearchCV(svm_model, param_grid, cv=5)
grid_search.fit(X_train, y_train)

# Step 6: Evaluate on test set
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

# Step 7: Print results
print("Best Hyperparameters:")
print(grid_search.best_params_)

print(f"\nTest Set Accuracy: {accuracy:.2f}")


Best Hyperparameters:
{'C': 100, 'gamma': 0.001, 'kernel': 'rbf'}

Test Set Accuracy: 0.83


In [4]:
# Question 9: Write a Python program to: ● Train a Naïve Bayes Classifier on a synthetic text dataset (e.g. using sklearn.datasets.fetch_20newsgroups). ● Print the model's ROC-AUC score for its predictions.

# Answer:
# Import necessary libraries
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import roc_auc_score

# Step 1: Load a binary subset of the 20 Newsgroups dataset
categories = ['sci.med', 'soc.religion.christian']  # Binary classification
data = fetch_20newsgroups(subset='all', categories=categories, remove=('headers', 'footers', 'quotes'))

# Step 2: Convert text to TF-IDF features
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(data.data)
y = data.target  # Binary labels (0 or 1)

# Step 3: Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 4: Train a Multinomial Naïve Bayes classifier
model = MultinomialNB()
model.fit(X_train, y_train)

# Step 5: Predict probabilities and compute ROC-AUC
y_probs = model.predict_proba(X_test)[:, 1]  # Probabilities for class 1
roc_auc = roc_auc_score(y_test, y_probs)

# Step 6: Print ROC-AUC score
print(f"ROC-AUC Score: {roc_auc:.4f}")


ROC-AUC Score: 0.9932


 # Question 10: Imagine you’re working as a data scientist for a company that handles email communications. Your task is to automatically classify emails as Spam or Not Spam. The emails may contain: ● Text with diverse vocabulary ● Potential class imbalance (far more legitimate emails than spam) ● Some incomplete or missing data Explain the approach you would take to: ● Preprocess the data (e.g. text vectorization, handling missing data) ● Choose and justify an appropriate model (SVM vs. Naïve Bayes) ● Address class imbalance ● Evaluate the performance of your solution with suitable metrics And explain the business impact of your solution.

#  Answer:

### 1. **Preprocessing the Data**

* **Text Cleaning & Normalization:**

  * Remove special characters, punctuation, and HTML tags.
  * Convert text to lowercase to ensure uniformity.
  * Optionally apply stemming or lemmatization to reduce words to their base form.

* **Handling Missing or Incomplete Data:**

  * Identify missing values (e.g., empty emails or missing fields).
  * Impute missing values where possible or discard incomplete entries if they are few.
  * For text data, empty or very short emails could be flagged or handled separately.

* **Text Vectorization:**

  * Use **TF-IDF Vectorizer** or **Count Vectorizer** to convert text into numerical feature vectors.
  * Consider n-grams (unigrams + bigrams) to capture context and phrases.
  * Possibly use dimensionality reduction (e.g., TruncatedSVD) if the feature space is too large.



### 2. **Choosing and Justifying an Appropriate Model**

| Model           | Pros                                                                     | Cons                                                      | Suitability for Spam Classification                             |
| --------------- | ------------------------------------------------------------------------ | --------------------------------------------------------- | --------------------------------------------------------------- |
| **Naïve Bayes** | Fast, simple, works well with text, handles high-dimensional sparse data | Assumes feature independence (naïve assumption)           | Excellent baseline; commonly used in spam filters               |
| **SVM**         | Effective with high-dimensional data, flexible with kernels              | Can be slower to train on large datasets; tuning required | Powerful if well-tuned, better with complex decision boundaries |

* **Recommended Approach:**

  * Start with **Multinomial Naïve Bayes** because it is fast, handles text data natively, and performs well even with diverse vocabulary.
  * If Naïve Bayes accuracy is insufficient, try **SVM with a linear or RBF kernel**, tuning hyperparameters via GridSearchCV.
  * Use cross-validation to compare model performance.



### 3. **Addressing Class Imbalance**

* Since **spam emails are often much fewer** than legitimate ones, imbalance can bias the model toward the majority class.

**Strategies:**

* **Resampling Techniques:**

  * **Oversampling** minority class (e.g., SMOTE).
  * **Undersampling** majority class.

* **Class Weighting:**

  * Use model parameters to assign higher weight to the minority class (e.g., `class_weight='balanced'` in SVM).

* **Threshold Tuning:**

  * Adjust classification threshold based on precision-recall trade-offs rather than default 0.5.

* **Ensemble Methods:**

  * Combine multiple models to improve minority class detection.



### 4. **Evaluating Performance with Suitable Metrics**

* **Accuracy** alone is misleading in imbalanced datasets (e.g., predicting all emails as non-spam could yield high accuracy).

**Better Metrics:**

* **Precision:** How many predicted spam emails are actually spam? (Important to avoid false positives)

* **Recall (Sensitivity):** How many actual spam emails were correctly detected? (Important to catch spam)

* **F1-Score:** Harmonic mean of precision and recall, balances both.

* **ROC-AUC:** Measures overall ability to discriminate between classes.

* **Precision-Recall Curve:** More informative than ROC for imbalanced data.

* Also, monitor **False Positive Rate (FPR)** carefully, as wrongly classifying legitimate emails as spam can hurt user trust.



### 5. **Business Impact**

* **Automating spam detection** reduces manual effort and improves user experience by filtering unwanted emails.
* **Accurate spam filtering** protects users from phishing, scams, and malware, improving security.
* **Minimizing false positives** ensures legitimate emails aren’t lost or delayed, maintaining customer satisfaction and communication reliability.
* **Adaptability** through continuous retraining helps cope with evolving spam tactics, reducing long-term risk.
* **Operational efficiency** improves by freeing IT resources and reducing storage/bandwidth from spam emails.

