Question 1: What is a Support Vector Machine (SVM), and how does it work?
Answer:

A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression tasks, but it is mostly used for binary classification.

✅ How SVM Works:

1.Goal:
SVM tries to find the best separating hyperplane that divides the data into classes with the maximum margin (i.e., the greatest possible distance between the hyperplane and the nearest data points of each class).

2.Support Vectors:
These are the data points closest to the hyperplane. They are critical in defining the decision boundary. The model uses these points to maximize the margin.

3.Margin:
The margin is the distance between the hyperplane and the support vectors. SVM aims to maximize this margin to improve generalization on unseen data.

4.Linear vs. Non-linear Data:

For linearly separable data, SVM finds a straight line (in 2D) or hyperplane (in higher dimensions).
For non-linear data, SVM uses the kernel trick to map data into a higher-dimensional space where a linear separator can be found.


Question 2: Explain the difference between Hard Margin and Soft Margin SVM.

Answer:
The difference between Hard Margin and Soft Margin SVM lies in how strictly the model separates the classes, especially when the data is not perfectly separable.

 1. Hard Margin SVM
Definition:
Hard Margin SVM tries to find a hyperplane that perfectly separates the classes without any misclassification.

Assumption:
It assumes that the data is linearly separable.

Characteristics:

No tolerance for misclassified points.

Maximizes the margin strictly.

Fails if data is noisy or overlaps.

When to Use:
Only when the dataset is clean and perfectly separable.

 2. Soft Margin SVM
Definition:
Soft Margin SVM allows some misclassification of data points to improve the model’s ability to generalize to unseen data.

Assumption:
Works well even if the data is not linearly separable.

Characteristics:

Introduces slack variables to allow violations of the margin.

Adds a regularization parameter (C) to control the trade-off between maximizing the margin and minimizing classification errors:

High C → less tolerance for misclassification.

Low C → more tolerance, larger margin.

| Feature             | Hard Margin SVM          | Soft Margin SVM                  |
| ------------------- | ------------------------ | -------------------------------- |
| Tolerance to errors | No (strict separation)   | Yes (allows misclassification)   |
| Data requirement    | Perfectly separable data | Noisy or overlapping data        |
| Flexibility         | Less flexible            | More flexible                    |
| Regularization (C)  | Not used                 | Used to balance margin and error |



Question 3: What is the Kernel Trick in SVM? Give one example of a kernel and
explain its use case.
Answer:

he Kernel Trick is a mathematical technique used in SVM to handle non-linearly separable data by implicitly mapping it into a higher-dimensional space, without actually computing the coordinates in that space.

This allows SVM to find a linear hyperplane in the transformed space that corresponds to a non-linear boundary in the original input space.

✅ Why Use the Kernel Trick?
Real-world data is often not linearly separable in its original form.

Mapping to a higher dimension makes it possible to apply linear classification in a non-linear problem.

The Kernel Trick avoids the computational cost of explicitly transforming the data.

Question 4: What is a Naïve Bayes Classifier, and why is it called “naïve”?

Answer:

The Naïve Bayes classifier is a supervised learning algorithm based on Bayes’ Theorem, primarily used for classification tasks. It calculates the probability that a given instance belongs to a particular class based on its features.

It is called "naïve" because it assumes that all features are independent of each other given the class label.

🔹 In real-world data, this independence assumption is often false, but the model still performs well in many applications — hence it's naïvely simple, yet effective.

Question 5: Describe the Gaussian, Multinomial, and Bernoulli Naïve Bayes variants.
When would you use each one?

Answer:

1. Gaussian Naïve Bayes
Assumption:
Features follow a normal (Gaussian) distribution.

Use Case:
When the input features are continuous (real-valued) — such as height, weight, temperature, etc.

How it works:
For each feature, it calculates the probability using the Gaussian probability density function.

Example Applications:

Medical diagnosis (e.g., predicting diseases from lab test values)

Sensor data analysis

Any classification task with numeric features

 2. Multinomial Naïve Bayes
Assumption:
Features represent discrete counts (e.g., frequency of words).

Use Case:
Best suited for document classification and natural language processing (NLP), where input features are word counts or term frequencies.

How it works:
Calculates the probability of features (word counts) occurring in a class using multinomial distribution.

Example Applications:

Spam filtering

News article classification

Sentiment analysis (bag-of-words model)

 3. Bernoulli Naïve Bayes
Assumption:
Features are binary (0 or 1) — indicating the presence or absence of a feature.

Use Case:
Also used in text classification, especially when you care about whether a word occurs, not how many times.

How it works:
Uses the Bernoulli distribution to estimate probabilities based on binary features.

Example Applications:

Email spam detection

Text classification with binary word indicators

Feature presence/absence modeling



● You can use any suitable datasets like Iris, Breast Cancer, or Wine from
sklearn.datasets or a CSV file you have.
Question 6: Write a Python program to:
● Load the Iris dataset
● Train an SVM Classifier with a linear kernel
● Print the model's accuracy and support vectors.

In [1]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# 1. Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# 2. Split into train and test sets (80-20 split)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 3. Train an SVM classifier with linear kernel
model = SVC(kernel='linear')
model.fit(X_train, y_train)

# 4. Predict and evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

# 5. Print results
print("Model Accuracy:", accuracy)
print("Support Vectors:\n", model.support_vectors_)


Model Accuracy: 1.0
Support Vectors:
 [[4.8 3.4 1.9 0.2]
 [5.1 3.3 1.7 0.5]
 [4.5 2.3 1.3 0.3]
 [5.6 3.  4.5 1.5]
 [5.4 3.  4.5 1.5]
 [6.7 3.  5.  1.7]
 [5.9 3.2 4.8 1.8]
 [5.1 2.5 3.  1.1]
 [6.  2.7 5.1 1.6]
 [6.3 2.5 4.9 1.5]
 [6.1 2.9 4.7 1.4]
 [6.5 2.8 4.6 1.5]
 [6.9 3.1 4.9 1.5]
 [6.3 2.3 4.4 1.3]
 [6.3 2.5 5.  1.9]
 [6.3 2.8 5.1 1.5]
 [6.3 2.7 4.9 1.8]
 [6.  3.  4.8 1.8]
 [6.  2.2 5.  1.5]
 [6.2 2.8 4.8 1.8]
 [6.5 3.  5.2 2. ]
 [7.2 3.  5.8 1.6]
 [5.6 2.8 4.9 2. ]
 [5.9 3.  5.1 1.8]
 [4.9 2.5 4.5 1.7]]


Question 7: Write a Python program to:
● Load the Breast Cancer dataset
● Train a Gaussian Naïve Bayes model
● Print its classification report including precision, recall, and F1-score.

In [2]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report

# 1. Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# 2. Split into training and testing sets (80-20 split)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 3. Train Gaussian Naïve Bayes model
model = GaussianNB()
model.fit(X_train, y_train)

# 4. Make predictions
y_pred = model.predict(X_test)

# 5. Print classification report
print("Classification Report:\n")
print(classification_report(y_test, y_pred, target_names=data.target_names))


Classification Report:

              precision    recall  f1-score   support

   malignant       1.00      0.93      0.96        43
      benign       0.96      1.00      0.98        71

    accuracy                           0.97       114
   macro avg       0.98      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114



Question 8: Write a Python program to:
● Train an SVM Classifier on the Wine dataset using GridSearchCV to find the best
C and gamma.
● Print the best hyperparameters and accuracy.


In [4]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# 1. Load the Wine dataset
data = load_wine()
X = data.data
y = data.target

# 2. Split into training and testing sets (80-20 split)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 3. Define the parameter grid for GridSearchCV
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [1, 0.1, 0.01, 0.001],
    'kernel': ['rbf']
}

# 4. Set up the GridSearch with SVM
grid = GridSearchCV(SVC(), param_grid, cv=5, verbose=1)
grid.fit(X_train, y_train)

# 5. Predict and evaluate
y_pred = grid.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

# 6. Print best parameters and accuracy
print("Best Hyperparameters:", grid.best_params_)
print("Test Accuracy:", accuracy)


Fitting 5 folds for each of 16 candidates, totalling 80 fits
Best Hyperparameters: {'C': 100, 'gamma': 0.001, 'kernel': 'rbf'}
Test Accuracy: 0.8333333333333334


Question 9: Write a Python program to:
● Train a Naïve Bayes Classifier on a synthetic text dataset (e.g. using
sklearn.datasets.fetch_20newsgroups).
● Print the model's ROC-AUC score for its predictions.

In [5]:
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import label_binarize

# 1. Load a subset of the 20 Newsgroups dataset (binary classification for ROC-AUC)
categories = ['sci.space', 'rec.sport.hockey']
data = fetch_20newsgroups(subset='all', categories=categories, remove=('headers', 'footers', 'quotes'))

# 2. Split data
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

# 3. Convert text data to TF-IDF features
vectorizer = TfidfVectorizer()
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)

# 4. Train Multinomial Naïve Bayes
model = MultinomialNB()
model.fit(X_train_vec, y_train)

# 5. Predict probabilities
y_probs = model.predict_proba(X_test_vec)[:, 1]  # probability for class 1

# 6. Calculate ROC-AUC score
auc = roc_auc_score(y_test, y_probs)

# 7. Print result
print("ROC-AUC Score:", auc)


ROC-AUC Score: 0.9924732269145282
