#Question 1: What is a Support Vector Machine (SVM), and how does it work?


---

### **What is a Support Vector Machine (SVM)?**

A **Support Vector Machine (SVM)** is a **supervised machine learning algorithm** used for **classification and regression** tasks.

* It is primarily used for **binary classification**, but can be extended to multiclass problems.
* SVM tries to find the **optimal boundary (hyperplane)** that separates classes in the feature space.

---

### **How SVM Works**

1. **Separating Hyperplane:**

   * For linearly separable data, SVM finds a **hyperplane** that separates the two classes.
   * In 2D, this is a line; in 3D, it’s a plane; in higher dimensions, it’s a hyperplane.

2. **Maximum Margin:**

   * Among all possible hyperplanes, SVM selects the one that **maximizes the margin** — the distance between the hyperplane and the **closest data points** from each class (called **support vectors**).
   * Maximizing the margin helps **generalization** on unseen data.

3. **Support Vectors:**

   * Only the points **closest to the hyperplane** influence its position.
   * These are critical in defining the decision boundary.

4. **Non-linear Data:**

   * If data is not linearly separable, SVM uses a **kernel trick** to map data into a **higher-dimensional space** where it becomes linearly separable.
   * Common kernels:

     * **Linear**
     * **Polynomial**
     * **Radial Basis Function (RBF / Gaussian)**

---

### **Key Advantages**

* Effective in high-dimensional spaces.
* Works well even when the number of features > number of samples.
* Robust against overfitting (especially with proper regularization).

---

### **Intuitive Example**

* Imagine plotting two types of flowers on a 2D plane.
* SVM finds the **line** that separates the two flower types **with the largest gap**.
* Only the flowers closest to the line (support vectors) determine where the line is drawn.

---



#Question 2: Explain the difference between Hard Margin and Soft Margin SVM.



---

### **Hard Margin vs Soft Margin SVM**

SVM aims to find a hyperplane that separates classes, but the **approach depends on whether the data is perfectly separable or not**.

---

### **1. Hard Margin SVM**

* **Definition:**

  * Assumes that the data is **perfectly linearly separable**.
  * The SVM finds a hyperplane that **strictly separates all points** without any misclassification.
* **Characteristics:**

  * No points are allowed inside the margin.
  * Very sensitive to **outliers** — a single misclassified point can make it impossible to find a hyperplane.
* **Use Case:**

  * Rare in real-world data because data is usually noisy.

---

### **2. Soft Margin SVM**

* **Definition:**

  * Allows some points to **violate the margin** or even be misclassified.
  * Introduces a **slack variable** ($\xi_i$) to measure the degree of violation.
* **Characteristics:**

  * Controlled by a **regularization parameter $C$**:

    * **Large C:** Less tolerance for misclassification → smaller margin.
    * **Small C:** More tolerance → wider margin, better generalization.
  * More robust to **noise and outliers**.
* **Use Case:**

  * Most practical SVM applications, since real-world data is rarely perfectly separable.

---

### **Key Difference Table**

| Feature                 | Hard Margin              | Soft Margin                      |
| ----------------------- | ------------------------ | -------------------------------- |
| Separability            | Perfectly separable      | Allows some misclassification    |
| Margin Violations       | Not allowed              | Allowed via slack variables      |
| Sensitivity to Outliers | Very high                | Lower, more robust               |
| Parameter               | No regularization needed | Regularization controlled by $C$ |

---

**Intuition:**

* **Hard margin:** Draw a line that separates every point exactly.
* **Soft margin:** Draw a line that separates most points but allows a few “exceptions” to improve generalization.

---




#Question 3: What is the Kernel Trick in SVM? Give one example of a kernel and explain its use case.



---

### **What is the Kernel Trick in SVM?**

* In SVM, **linear separation** works only when classes are linearly separable.
* **Kernel Trick** allows SVM to handle **non-linear data** by implicitly mapping it into a **higher-dimensional space** without computing the coordinates explicitly.
* This makes it possible to find a **linear hyperplane in the transformed space**, which corresponds to a **non-linear boundary in the original space**.

**Mathematically:**

$$
K(x_i, x_j) = \phi(x_i) \cdot \phi(x_j)
$$

Where:

* $K$ is the kernel function
* $\phi(x)$ maps the original features into higher dimensions

---

### **Example Kernel: Radial Basis Function (RBF / Gaussian Kernel)**

* **Formula:**

$$
K(x_i, x_j) = \exp\left(-\gamma \|x_i - x_j\|^2\right)
$$

* **Use Case:**

  * Widely used when the **relationship between features and target is highly non-linear**.
  * Example: Classifying points in a **circular pattern** in 2D space — not separable with a straight line.
  * RBF maps the data to higher dimensions where a **linear hyperplane** can separate the classes.

---

### **Other Common Kernels**

* **Linear kernel:** No transformation, used for linearly separable data.
* **Polynomial kernel:** Maps data into polynomial feature space, useful for curved boundaries.

---

**Intuition:**

* Imagine trying to separate data shaped like concentric circles.
* In 2D, you cannot draw a straight line to separate them.
* Kernel Trick **lifts the data into 3D** (or higher), where a plane can separate the classes.

---




#Question 4: What is a Naïve Bayes Classifier, and why is it called “naïve”?


---

### **What is a Naïve Bayes Classifier?**

* A **Naïve Bayes (NB) classifier** is a **probabilistic machine learning algorithm** used for **classification tasks**.
* It is based on **Bayes’ Theorem**, which calculates the **probability of a class given the features**:

$$
P(C|X) = \frac{P(X|C) \cdot P(C)}{P(X)}
$$

Where:

* $C$ = class label

* $X$ = feature vector

* $P(C|X)$ = posterior probability of class given features

* $P(X|C)$ = likelihood of features given class

* $P(C)$ = prior probability of class

* $P(X)$ = evidence (normalizing factor)

* The classifier **predicts the class with the highest posterior probability**.

---

### **Why is it called “Naïve”?**

* It assumes that **all features are independent given the class**.
* In reality, features often have correlations, so this assumption is **“naïve”**.
* Despite this strong assumption, Naïve Bayes often performs **very well in practice**, especially for text classification, spam detection, and medical diagnosis.

---

### **Key Advantages**

* Simple and fast to train.
* Works well with **high-dimensional data**.
* Performs surprisingly well even if the independence assumption is violated.

---

### **Example Use Case**

* **Spam email detection:** Each word is treated as an independent feature to predict if an email is spam or not.

---



#Question 5: Describe the Gaussian, Multinomial, and Bernoulli Naïve Bayes variants.When would you use each one?



---

### **1. Gaussian Naïve Bayes**

* **Assumption:** Features are **continuous** and follow a **Gaussian (normal) distribution**.
* **Formula for likelihood:**

$$
P(x_i|C) = \frac{1}{\sqrt{2 \pi \sigma_C^2}} \exp\left(-\frac{(x_i - \mu_C)^2}{2\sigma_C^2}\right)
$$

Where $\mu_C$ and $\sigma_C^2$ are the mean and variance of the feature for class $C$.

* **Use Case:**

  * Predicting class labels based on **continuous data** like height, weight, temperature, or medical measurements.
* **Example:** Predicting if a patient has a disease based on blood test results.

---

### **2. Multinomial Naïve Bayes**

* **Assumption:** Features are **discrete counts**, often representing **frequency of events**.

* **Formula:** Uses probabilities based on **count of each feature per class**.

* **Use Case:**

  * Text classification or document categorization where features are **word counts or term frequencies**.

* **Example:** Spam email detection using **word frequency vectors**.

---

### **3. Bernoulli Naïve Bayes**

* **Assumption:** Features are **binary (0 or 1)**, representing the **presence or absence** of a feature.
* **Use Case:**

  * Text classification with **binary occurrence features** (word present or not).
* **Example:** Email spam detection using **binary bag-of-words** (1 if word exists, 0 otherwise).

---

### **Summary Table**

| Variant     | Feature Type    | Example Use Case                          |
| ----------- | --------------- | ----------------------------------------- |
| Gaussian    | Continuous      | Predicting disease from lab test values   |
| Multinomial | Discrete counts | Text classification with word counts      |
| Bernoulli   | Binary features | Spam detection with word presence/absence |

---

✅ **Intuition:**

* **Gaussian:** Continuous data → use distribution of values.
* **Multinomial:** Frequency counts → how many times each event occurs.
* **Bernoulli:** Presence/absence → whether an event occurs or not.

---



In [1]:
#Dataset Info: ● You can use any suitable datasets like Iris, Breast Cancer, or Wine from sklearn.datasets or a CSV file you have. Question 6: Write a Python program to: ● Load the Iris dataset ● Train an SVM Classifier with a linear kernel● Print the model's accuracy and support vectors.(Include your Python code and output in the code box below.)

# Import libraries
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load Iris dataset
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = pd.Series(iris.target)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train SVM Classifier with linear kernel
svm_model = SVC(kernel='linear', random_state=42)
svm_model.fit(X_train, y_train)

# Make predictions
y_pred = svm_model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

# Print support vectors
print("\nSupport Vectors:")
print(svm_model.support_vectors_)
print("\nNumber of Support Vectors for each class:", svm_model.n_support_)



Accuracy: 1.0

Support Vectors:
[[4.8 3.4 1.9 0.2]
 [5.1 3.3 1.7 0.5]
 [4.5 2.3 1.3 0.3]
 [5.6 3.  4.5 1.5]
 [5.4 3.  4.5 1.5]
 [6.7 3.  5.  1.7]
 [5.9 3.2 4.8 1.8]
 [5.1 2.5 3.  1.1]
 [6.  2.7 5.1 1.6]
 [6.3 2.5 4.9 1.5]
 [6.1 2.9 4.7 1.4]
 [6.5 2.8 4.6 1.5]
 [6.9 3.1 4.9 1.5]
 [6.3 2.3 4.4 1.3]
 [6.3 2.8 5.1 1.5]
 [6.3 2.7 4.9 1.8]
 [6.  3.  4.8 1.8]
 [6.  2.2 5.  1.5]
 [6.2 2.8 4.8 1.8]
 [6.5 3.  5.2 2. ]
 [7.2 3.  5.8 1.6]
 [5.6 2.8 4.9 2. ]
 [5.9 3.  5.1 1.8]
 [4.9 2.5 4.5 1.7]]

Number of Support Vectors for each class: [ 3 11 10]


In [2]:
#Question 7: Write a Python program to: ● Load the Breast Cancer dataset ● Train a Gaussian Naïve Bayes model ● Print its classification report including precision, recall, and F1-score.(Include your Python code and output in the code box below.)

# Import libraries
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report

# Load Breast Cancer dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train Gaussian Naive Bayes model
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Make predictions
y_pred = gnb.predict(X_test)

# Print classification report
report = classification_report(y_test, y_pred, target_names=data.target_names)
print(report)


              precision    recall  f1-score   support

   malignant       0.93      0.90      0.92        63
      benign       0.95      0.96      0.95       108

    accuracy                           0.94       171
   macro avg       0.94      0.93      0.94       171
weighted avg       0.94      0.94      0.94       171



In [3]:
#Question 8: Write a Python program to: ● Train an SVM Classifier on the Wine dataset using GridSearchCV to find the best C and gamma. ● Print the best hyperparameters and accuracy.(Include your Python code and output in the code box below.)


# Import libraries
import pandas as pd
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load Wine dataset
wine = load_wine()
X = pd.DataFrame(wine.data, columns=wine.feature_names)
y = pd.Series(wine.target)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define SVM model
svm_model = SVC(kernel='rbf', random_state=42)

# Define hyperparameter grid
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [0.01, 0.1, 1, 10]
}

# Apply GridSearchCV
grid = GridSearchCV(estimator=svm_model, param_grid=param_grid, cv=5, scoring='accuracy')
grid.fit(X_train, y_train)

# Print best hyperparameters
print("Best Hyperparameters:", grid.best_params_)

# Evaluate the best model
best_model = grid.best_estimator_
y_pred = best_model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy on test set:", accuracy)


Best Hyperparameters: {'C': 10, 'gamma': 0.01}
Accuracy on test set: 0.6666666666666666


In [5]:
#Question 9: Write a Python program to:● Train a Naïve Bayes Classifier on a synthetic text dataset (e.g. using sklearn.datasets.fetch_20newsgroups). ● Print the model's ROC-AUC score for its predictions.(Include your Python code and output in the code box below.)

# Import libraries
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import label_binarize

# Load a subset of 20 Newsgroups dataset (for simplicity, choose 3 categories)
categories = ['alt.atheism', 'comp.graphics', 'sci.med']
newsgroups = fetch_20newsgroups(subset='all', categories=categories, remove=('headers','footers','quotes'))

X = newsgroups.data
y = newsgroups.target

# Convert text to TF-IDF features
vectorizer = TfidfVectorizer(stop_words='english', max_features=5000)
X_tfidf = vectorizer.fit_transform(X)

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X_tfidf, y, test_size=0.3, random_state=42)

# Train Multinomial Naive Bayes
nb = MultinomialNB()
nb.fit(X_train, y_train)

# Predict probabilities
y_pred_prob = nb.predict_proba(X_test)

# Binarize labels for multi-class ROC-AUC
y_test_bin = label_binarize(y_test, classes=[0,1,2])


# Compute ROC-AUC (macro-average for multi-class)
roc_auc = roc_auc_score(y_test_bin, y_pred_prob, average='macro', multi_class='ovr')
print("ROC-AUC score:", roc_auc)

ROC-AUC score: 0.983642101133334


#Question 10: Imagine you’re working as a data scientist for a company that handles email communications. Your task is to automatically classify emails as Spam or Not Spam. The emails may contain: ● Text with diverse vocabulary ● Potential class imbalance (far more legitimate emails than spam) ● Some incomplete or missing data Explain the approach you would take to: ● Preprocess the data (e.g. text vectorization, handling missing data) ● Choose and justify an appropriate model (SVM vs. Naïve Bayes) ● Address class imbalance ● Evaluate the performance of your solution with suitable metrics And explain the business impact of your solution.(Include your Python code and output in the code box below.)


---

## **Step 1: Preprocess the Data**

1. **Handle missing values:**

   * Replace empty emails with an empty string or remove them.
2. **Text vectorization:**

   * Convert raw emails into numerical features using **TF-IDF** or **CountVectorizer**.
3. **Optional cleaning:**

   * Lowercase, remove punctuation, stopwords, and perform tokenization.

---

## **Step 2: Choose Model**

* **Naïve Bayes (MultinomialNB)** is often preferred for text classification because:

  * Assumes independent word occurrences (works well with high-dimensional text).
  * Fast to train and performs well even with small datasets.
* **SVM** can also work, but it’s slower on very large datasets and requires more tuning.

---

## **Step 3: Address Class Imbalance**

* Techniques:

  * **Class weighting:** `class_weight='balanced'` in SVM or resampling techniques.
  * **Oversampling minority class:** e.g., SMOTE or simple duplication.

---

## **Step 4: Train Model & Evaluate**

* Metrics:

  * **Precision:** Minimize false positives (legitimate emails marked as spam).
  * **Recall:** Minimize false negatives (spam emails not detected).
  * **F1-score:** Balance precision and recall.
  * **ROC-AUC:** Evaluate separability of classes.

---

## **Python Implementation Example**

```python
# Import libraries
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report, roc_auc_score
from sklearn.preprocessing import LabelBinarizer

# Example synthetic dataset
data = pd.DataFrame({
    'email': [
        "Win a free iPhone now",
        "Meeting at 10 am",
        "Get cheap meds online",
        "Project deadline extended",
        "Congratulations! You've won",
        "Lunch with client tomorrow"
    ],
    'label': ["spam", "ham", "spam", "ham", "spam", "ham"]
})

# Handle missing emails
data['email'] = data['email'].fillna("")

# Text vectorization
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(data['email'])

# Encode labels
lb = LabelBinarizer()
y = lb.fit_transform(data['label']).ravel()

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

# Train Multinomial Naive Bayes
model = MultinomialNB()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)
y_pred_prob = model.predict_proba(X_test)[:,1]

# Evaluation
print("Classification Report:\n")
print(classification_report(y_test, y_pred, target_names=['ham','spam']))

roc_auc = roc_auc_score(y_test, y_pred_prob)
print("ROC-AUC Score:", roc_auc)
```

**Sample Output:**

```
Classification Report:

              precision    recall  f1-score   support

         ham       1.00      1.00      1.00         1
        spam       1.00      1.00      1.00         1

ROC-AUC Score: 1.0
```

---

## **Business Impact**

* **Automated spam detection:** Reduces user exposure to malicious or unwanted emails.
* **Improves productivity:** Users focus on important emails.
* **Cost-effective:** Reduces need for manual filtering.
* **Customer trust:** Increases satisfaction by ensuring inbox quality.

---
