# Assignment Code: DA-AG-013
# SVM & Naive Bayes | Assignment



Question 1:  What is a Support Vector Machine (SVM), and how does it work?

Ans> Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression tasks. It tries to find the best boundary known as hyperplane that separates different classes in the data. It is useful when you want to do binary classification like spam vs. not spam or cat vs. dog.

The key idea behind the SVM algorithm is to find the hyperplane that best separates two classes by maximizing the margin between them. This margin is the distance from the hyperplane to the nearest data points (support vectors) on each side.The best hyperplane also known as the "hard margin" is the one that maximizes the distance between the hyperplane and the nearest data points from both classes.

Question 2: Explain the difference between Hard Margin and Soft Margin SVM?

Ans>
**Hard Margin SVM**:
*  Attempts to find a hyperplane that perfectly separates the classes without any misclassification.
* Suitable when datasets are linearly separable,not robust to noise or overlapping classes.
* Maximizes the margin strictly, no data points are allowed inside or on the wrong side of the margin.
* Very sensitive to noise; a single misclassified point can make the optimization infeasible.
* No explicit hyperparameter for error trade-off; it assumes perfect separability.
* All points are strictly outside the margin boundaries; the hyperplane touches the closest points (support vectors).

**Soft Margin SVM**:
*  Allows for some misclassifications to achieve a better trade-off between margin width and classification error.
* uitable for non-linearly separable or noisy datasets, introduces flexibility to handle overlaps and outliers.
* Maximizes effective margin while permitting some points to violate the margin to reduce overall error.
* Hyperparameter must be tuned to balance margin width vs. error tolerance
* Some points may lie within the margin or be misclassified, but the hyperplane is chosen to optimize overall generalization.

Question 3: What is the Kernel Trick in SVM? Give one example of a kernel and
explain its use case.

Ans> The kernel trick is a method used in SVMs to enable them to classify non-linear data using a linear classifier. By applying a kernel function, SVMs can implicitly map input data into a higher-dimensional space where a linear separator (hyperplane) can be used to divide the classes.

**example:** This kernel allows SVM to model non-linear relationships by implicitly computing all polynomial combinations of input features up to degree, without explicitly creating new features.

**Its use:** The polynomial kernel is particularly useful when data exhibits curved or interaction effects between features. For example:
In a 2D classification problem where data points form a parabolic or checkerboard pattern, the polynomial kernel can map it into a higher-dimensional space where these patterns become separable by a hyperplane.

Question 4: What is a Naïve Bayes Classifier, and why is it called “naïve” ?

Ans> The Naive Bayes Classifier is a simple probabilistic classifier and it has very few number of parameters which are used to build the ML models that can predict at a faster speed than other classification algorithms.
It is a probabilistic classifier because it assumes that one feature in the model is independent of existence of another feature. In other words, each feature contributes to the predictions with no relation between each other.

It is named as "Naive" because it assumes the presence of one feature does not affect other features.

Question 5: Describe the Gaussian, Multinomial, and Bernoulli Naïve Bayes variants. When would you use each one?

Ans>
1. Gaussian Naive Bayes: It is continuous values associated with each feature are assumed to be distributed according to a Gaussian distribution.

use: Ideal when your dataset consists of numerical features such as height, weight, temperature, or blood pressure.
Doesn’t require feature scaling (like normalization or standardization) due to its probabilistic nature.
2. Multinomial Naive Bayes: It is used when features represent the frequency of terms (such as word counts) in a document. It is commonly applied in text classification, where term frequencies are important.

use:Primarily used in text classification where features are word counts or TF-IDF values.
Works well with bag-of-words or n-gram representations.
3. Bernoulli: It deals with binary features, where each feature indicates whether a word appears or not in a document. It is suited for scenarios where the presence or absence of terms is more relevant than their frequency. Both models are widely used in document classification tasks.
use: Appropriate for datasets where you’re interested in whether a feature appears, rather than how often.

In [2]:
# Question 6:   Write a Python program to:
# ● Load the breastcancer dataset
# ● Train an SVM Classifier with a linear kernel
# ● Print the model's accuracy and support vectors.

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler

# Load the dataset
breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the StandardScaler
scaler = StandardScaler()

# Fit the scaler on the training data and transform both training and testing data
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train an SVM Classifier with a linear kernel on the scaled data
svm_model = SVC(kernel='linear')
svm_model.fit(X_train_scaled, y_train)

# Predict on the scaled test set
y_pred = svm_model.predict(X_test_scaled)

# Print the model's accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of the SVM model with scaled data: {accuracy:.4f}")

# Print the support vectors
print(f"\nNumber of support vectors with scaled data: {len(svm_model.support_vectors_)}")
# Note: Support vectors are in the scaled space
# print("Support vectors with scaled data:\n", svm_model.support_vectors_)

Accuracy of the SVM model with scaled data: 0.9766

Number of support vectors with scaled data: 36


In [3]:
# Question 7:  Write a Python program to:
# ● Load the Breast Cancer dataset
# ● Train a Gaussian Naïve Bayes model
# ● Print its classification report including precision, recall, and F1-score.

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report
from sklearn.preprocessing import StandardScaler

# Load the dataset
breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the StandardScaler
scaler = StandardScaler()

# Fit the scaler on the training data and transform both training and testing data
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train a Gaussian Naive Bayes model on the scaled data
gnb_model = GaussianNB()
gnb_model.fit(X_train_scaled, y_train)

# Predict on the scaled test set
y_pred = gnb_model.predict(X_test_scaled)

# Print the classification report
print("Classification Report for Gaussian Naive Bayes model with scaled data:")
print(classification_report(y_test, y_pred))

Classification Report for Gaussian Naive Bayes model with scaled data:
              precision    recall  f1-score   support

           0       0.92      0.90      0.91        63
           1       0.94      0.95      0.95       108

    accuracy                           0.94       171
   macro avg       0.93      0.93      0.93       171
weighted avg       0.94      0.94      0.94       171



In [5]:
# Question 8: Write a Python program to:
# ● Train an SVM Classifier on the Wine dataset using GridSearchCV to find the best
# C and gamma.
# ● Print the best hyperparameters and accuracy.

from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
import pandas as pd

# Load the dataset
wine = load_wine()
X = wine.data
y = wine.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the StandardScaler
scaler = StandardScaler()

# Fit the scaler on the training data and transform both training and testing data
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Define the parameter grid for C and gamma
param_grid = {'C': [0.1, 1, 10, 100], 'gamma': [1, 0.1, 0.01, 0.001, 'scale', 'auto'], 'kernel': ['rbf']}

# Initialize GridSearchCV
grid_search = GridSearchCV(SVC(), param_grid, refit=True, verbose=0, cv=5)

# Fit the grid search to the scaled training data
grid_search.fit(X_train_scaled, y_train)

# Print the best hyperparameters
print("Best hyperparameters found by GridSearchCV:")
print(grid_search.best_params_)

# Predict on the scaled test set using the best model
y_pred = grid_search.predict(X_test_scaled)

# Print the accuracy of the best model
accuracy = accuracy_score(y_test, y_pred)
print(f"\nAccuracy of the best SVM model: {accuracy:.4f}")

Best hyperparameters found by GridSearchCV:
{'C': 1, 'gamma': 0.01, 'kernel': 'rbf'}

Accuracy of the best SVM model: 1.0000


In [6]:
#Question 9: Write a Python program to:
#● Train a Naïve Bayes Classifier on a synthetic text dataset (e.g. using
#sklearn.datasets.fetch_20newsgroups).
#● Print the model's ROC-AUC score for its predictions.

from sklearn.datasets import fetch_20newsgroups
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import LabelEncoder

# Load the dataset
# We'll load a subset of the data for simplicity and faster execution
categories = ['alt.atheism', 'soc.religion.christian', 'comp.graphics', 'sci.med']
newsgroups_train = fetch_20newsgroups(subset='train', categories=categories, shuffle=True, random_state=42)
newsgroups_test = fetch_20newsgroups(subset='test', categories=categories, shuffle=True, random_state=42)

X_train = newsgroups_train.data
y_train = newsgroups_train.target
X_test = newsgroups_test.data
y_test = newsgroups_test.target

# Vectorize the text data using TF-IDF
vectorizer = TfidfVectorizer()
X_train_vect = vectorizer.fit_transform(X_train)
X_test_vect = vectorizer.transform(X_test)

# Train a Multinomial Naive Bayes model
# Multinomial Naive Bayes is suitable for text classification with count/frequency data
nb_model = MultinomialNB()
nb_model.fit(X_train_vect, y_train)

# Predict probabilities for the positive class
# ROC-AUC requires probability estimates
y_pred_proba = nb_model.predict_proba(X_test_vect)

# Since ROC-AUC is typically for binary classification, we need to adapt for multiclass.
# We can calculate the one-vs-rest (OvR) ROC-AUC score.
# This requires the target variable to be binary or one-hot encoded for each class.
# For simplicity, let's demonstrate for a binary case if the dataset was binary,
# or calculate OvR AUC for each class and average.
# fetch_20newsgroups is multiclass, so let's calculate OvR AUC.

# First, we need to convert the integer labels to a format suitable for multiclass AUC (e.g., one-hot encoding)
# However, roc_auc_score with multi_class='ovr' can handle integer labels directly if predict_proba provides columns per class.

# Calculate ROC-AUC using the one-vs-rest (OvR) approach
# Make sure the number of classes in the target is > 2 for 'ovr'
if len(newsgroups_train.target_names) > 2:
    # roc_auc_score with multi_class='ovr' expects y_true to be the integer labels
    # and y_score to be the probability estimates with shape (n_samples, n_classes)
    roc_auc = roc_auc_score(y_test, y_pred_proba, multi_class='ovr')
    print(f"ROC-AUC (One-vs-Rest) for Naive Bayes model: {roc_auc:.4f}")
else:
    # If it were a binary dataset (e.g., 2 classes), you would use:
    # roc_auc = roc_auc_score(y_test, y_pred_proba[:, 1]) # assuming the positive class is at index 1
     print("Dataset is not multiclass (or has less than 3 classes), OvR ROC-AUC is not directly applicable here in the standard way.")
     # For binary, you would calculate it like this:
     # roc_auc = roc_auc_score(y_test, y_pred_proba[:, 1])
     # print(f"ROC-AUC for Naive Bayes model: {roc_auc:.4f}")

ROC-AUC (One-vs-Rest) for Naive Bayes model: 0.9810
