**Question 1: What is a Support Vector Machine (SVM), and how does it work?**

**Ans.** A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression tasks. It works by finding the optimal hyperplane that best separates data points of different classes in a high-dimensional space. The optimal hyperplane is the one that maximizes the margin between the closest points of the classes, which are called support vectors.

SVM can handle linear and non-linear classification. For linearly separable data, it finds a straight line (in 2D) or a flat hyperplane (in higher dimensions) that divides the classes. For non-linear data, SVM uses a technique called the kernel trick to transform the input space into a higher-dimensional feature space where a linear separation is possible.

Common kernel functions include:

* Linear kernel: $K(x, y) = x^T y$
* Polynomial kernel: $K(x, y) = (x^T y + c)^d$
* Radial Basis Function (RBF) kernel: $K(x, y) = \exp(-\gamma \|x - y\|^2)$

SVM is effective in high-dimensional spaces and is memory efficient because it uses only a subset of training points (support vectors) in the decision function.

**Q2. Explain the difference between Hard Margin and Soft Margin SVM.**

**Ans.** Hard Margin SVM is used when the data is linearly separable with no misclassification. It tries to find a hyperplane that perfectly separates the data into two classes with the maximum margin and no tolerance for any misclassified points. It assumes that a perfect separation exists, which makes it sensitive to noise and outliers.

Soft Margin SVM is used when the data is not perfectly linearly separable. It allows some misclassifications by introducing a penalty for errors in the objective function. It introduces a regularization parameter $C$ that controls the trade-off between maximizing the margin and minimizing classification error. A smaller $C$ allows a wider margin with more tolerance for errors, while a larger $C$ tries to classify all points correctly with a narrower margin.

Soft Margin SVM is more robust and practical in real-world scenarios where data may have noise or overlap between classes.

**Q3. What is the Kernel Trick in SVM? Give one example of a kernel and explain its use case.**

**Ans.**
The Kernel Trick in SVM is a mathematical technique that allows the algorithm to operate in a high-dimensional feature space without explicitly transforming the data. It enables SVM to perform non-linear classification by computing the dot product between the images of the data points in the feature space, using a kernel function. This avoids the computational cost of working in a high-dimensional space directly.

One common example is the **Radial Basis Function (RBF) kernel**, also known as the Gaussian kernel, defined as:
$K(x, x') = \exp(-\gamma \|x - x'\|^2)$
where $\gamma$ is a parameter that defines the spread of the kernel.

**Use case**: The RBF kernel is useful when the decision boundary between classes is highly non-linear. For example, in image classification or handwriting recognition, where the data points of different classes are not linearly separable, the RBF kernel helps in mapping the data into a higher-dimensional space where a linear separator can be found.

**Q4. What is a Naïve Bayes Classifier, and why is it called “naïve”?**

**Ans.**
A Naïve Bayes Classifier is a probabilistic machine learning model used for classification tasks. It is based on Bayes’ Theorem, which describes the probability of a class given certain features. The formula is:
$P(C|X) = \frac{P(X|C) \cdot P(C)}{P(X)}$
where:

* $P(C|X)$ is the posterior probability of class $C$ given features $X$,
* $P(X|C)$ is the likelihood of features $X$ given class $C$,
* $P(C)$ is the prior probability of class $C$,
* $P(X)$ is the prior probability of features $X$.

The classifier predicts the class with the highest posterior probability for the given input.

It is called “naïve” because it **assumes that all features are conditionally independent of each other given the class label**, which is rarely true in real-world scenarios. Despite this unrealistic assumption, the model often performs very well in practice, especially in high-dimensional datasets.

**Example**: In email spam detection, the classifier treats the presence of each word in an email as independent from the others, even though words often appear in meaningful combinations. Even with this simplification, Naïve Bayes is highly effective and computationally efficient for classifying spam vs. non-spam emails.

Naïve Bayes classifiers are also widely used in sentiment analysis, document classification, and medical diagnosis due to their simplicity, speed, and relatively good performance.


**Q5. Describe the Gaussian, Multinomial, and Bernoulli Naïve Bayes variants. When would you use each one?**

**Ans.**
Naïve Bayes classifiers have different variants based on the type and distribution of the input features. The three main variants are Gaussian, Multinomial, and Bernoulli Naïve Bayes. Each is suited for different types of data and applications.

**1. Gaussian Naïve Bayes:**
This variant assumes that the continuous features follow a normal (Gaussian) distribution. It is used when the input data is numerical and continuous in nature. The likelihood of the features is calculated using the Gaussian probability density function.

**Use case:**

* Used for datasets with continuous features, such as age, salary, or temperature.
* Example: Predicting whether a patient has a disease based on blood pressure and cholesterol levels.

**2. Multinomial Naïve Bayes:**
This variant is used for discrete count data, typically representing the frequency of events (e.g., word counts in documents). It assumes that features follow a multinomial distribution.

**Use case:**

* Commonly used in text classification tasks where features represent word counts or term frequencies.
* Example: Document classification, spam detection, sentiment analysis based on the frequency of words in texts.

**3. Bernoulli Naïve Bayes:**
This variant is designed for binary/boolean features. It models each feature as being present or absent (1 or 0) and assumes that features follow a Bernoulli distribution.

**Use case:**

* Suitable for binary feature datasets, such as whether a specific word exists in a document (not how many times it appears).
* Example: Email classification using a binary indicator for the presence of certain keywords.

**Summary of when to use each:**

* **Gaussian NB** → Continuous data (e.g., sensor readings)
* **Multinomial NB** → Discrete count data (e.g., word counts in NLP)
* **Bernoulli NB** → Binary features (e.g., presence/absence of words or features)

In [1]:
'''
Question 6: Write a Python program to:
● Load the Iris dataset
● Train an SVM Classifier with a linear kernel
● Print the model's accuracy and support vectors.
'''

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split data into training and testing sets (70% train, 30% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize SVM classifier with a linear kernel
svm_classifier = SVC(kernel='linear', random_state=42)

# Train the model
svm_classifier.fit(X_train, y_train)

# Predict on the test set
y_pred = svm_classifier.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2f}")

# Print support vectors
print("Support Vectors:")
print(svm_classifier.support_vectors_)

Model Accuracy: 1.00
Support Vectors:
[[4.8 3.4 1.9 0.2]
 [5.1 3.3 1.7 0.5]
 [4.5 2.3 1.3 0.3]
 [5.6 3.  4.5 1.5]
 [5.4 3.  4.5 1.5]
 [6.7 3.  5.  1.7]
 [5.9 3.2 4.8 1.8]
 [5.1 2.5 3.  1.1]
 [6.  2.7 5.1 1.6]
 [6.3 2.5 4.9 1.5]
 [6.1 2.9 4.7 1.4]
 [6.5 2.8 4.6 1.5]
 [6.9 3.1 4.9 1.5]
 [6.3 2.3 4.4 1.3]
 [6.3 2.8 5.1 1.5]
 [6.3 2.7 4.9 1.8]
 [6.  3.  4.8 1.8]
 [6.  2.2 5.  1.5]
 [6.2 2.8 4.8 1.8]
 [6.5 3.  5.2 2. ]
 [7.2 3.  5.8 1.6]
 [5.6 2.8 4.9 2. ]
 [5.9 3.  5.1 1.8]
 [4.9 2.5 4.5 1.7]]


In [2]:
'''
Question 7: Write a Python program to:
● Load the Breast Cancer dataset
● Train a Gaussian Naïve Bayes model
● Print its classification report including precision, recall, and F1-score.
'''

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report

# Load the Breast Cancer dataset
cancer = datasets.load_breast_cancer()
X = cancer.data
y = cancer.target

# Split data into training and testing sets (70% train, 30% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize Gaussian Naive Bayes classifier
gnb = GaussianNB()

# Train the model
gnb.fit(X_train, y_train)

# Predict on the test set
y_pred = gnb.predict(X_test)

# Print classification report
report = classification_report(y_test, y_pred, target_names=cancer.target_names)
print(report)

              precision    recall  f1-score   support

   malignant       0.93      0.90      0.92        63
      benign       0.95      0.96      0.95       108

    accuracy                           0.94       171
   macro avg       0.94      0.93      0.94       171
weighted avg       0.94      0.94      0.94       171



In [3]:
'''
Question 8: Write a Python program to:
● Train an SVM Classifier on the Wine dataset using GridSearchCV to find the best
C and gamma.
● Print the best hyperparameters and accuracy
'''

from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Wine dataset
wine = datasets.load_wine()
X = wine.data
y = wine.target

# Split data into training and testing sets (70% train, 30% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define the parameter grid for GridSearchCV
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': ['scale', 0.001, 0.01, 0.1, 1],
    'kernel': ['rbf']
}

# Initialize SVM classifier
svm = SVC()

# Initialize GridSearchCV with 5-fold cross-validation
grid_search = GridSearchCV(estimator=svm, param_grid=param_grid, cv=5, n_jobs=-1, verbose=1)

# Train model using GridSearchCV to find best hyperparameters
grid_search.fit(X_train, y_train)

# Predict on the test set using best estimator
y_pred = grid_search.best_estimator_.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

print(f"Best Hyperparameters: {grid_search.best_params_}")
print(f"Test Set Accuracy: {accuracy:.2f}")

Fitting 5 folds for each of 20 candidates, totalling 100 fits
Best Hyperparameters: {'C': 100, 'gamma': 'scale', 'kernel': 'rbf'}
Test Set Accuracy: 0.78


In [4]:
'''
Question 9: Write a Python program to:
● Train a Naïve Bayes Classifier on a synthetic text dataset (e.g. using
sklearn.datasets.fetch_20newsgroups).
● Print the model's ROC-AUC score for its predictions.
'''

from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import label_binarize
import numpy as np

# Load 20 newsgroups dataset (select two categories for binary classification)
categories = ['rec.sport.hockey', 'sci.space']
newsgroups = fetch_20newsgroups(subset='all', categories=categories, shuffle=True, random_state=42)

X = newsgroups.data
y = newsgroups.target

# Convert text data to TF-IDF features
vectorizer = TfidfVectorizer(stop_words='english')
X_tfidf = vectorizer.fit_transform(X)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_tfidf, y, test_size=0.3, random_state=42)

# Initialize Multinomial Naive Bayes classifier
nb = MultinomialNB()

# Train the model
nb.fit(X_train, y_train)

# Predict probabilities for the positive class
y_prob = nb.predict_proba(X_test)[:, 1]

# Since this is binary classification, compute ROC-AUC score
roc_auc = roc_auc_score(y_test, y_prob)

print(f"ROC-AUC Score: {roc_auc:.2f}")

ROC-AUC Score: 1.00


Q10. **Imagine you’re working as a data scientist for a company that handles email communications.
Your task is to automatically classify emails as Spam or Not Spam. The emails may contain:
● Text with diverse vocabulary
● Potential class imbalance (far more legitimate emails than spam)
● Some incomplete or missing data

Explain the approach you would take to:
● Preprocess the data (e.g. text vectorization, handling missing data)
● Choose and justify an appropriate model (SVM vs. Naïve Bayes)
● Address class imbalance
● Evaluate the performance of your solution with suitable metrics

And explain the business impact of your solution.
(Include your Python code and output in the code box below.)**

Ans.

Preprocessing:

Text Vectorization: Use TF-IDF vectorization to convert text emails into numerical feature vectors, which helps capture the importance of words while reducing the impact of common but less informative words.

Handling Missing Data: For missing or incomplete emails, either remove them if they are few or fill missing text with placeholders (like empty strings) so vectorization can still be applied without errors.

Text Cleaning: Lowercasing, removing punctuation, stopwords, and stemming/lemmatization can improve feature quality.

Model Choice:

Naïve Bayes is often preferred for spam classification due to its efficiency, ability to handle high-dimensional sparse data, and strong performance on text data with diverse vocabulary. It handles word independence assumptions well despite real-world correlations.

SVM can also be used and might perform better with fine-tuned kernels but is computationally heavier and slower on large datasets.

For a real-time or large-scale system, Naïve Bayes is typically preferred.

Addressing Class Imbalance:

Use techniques such as class weighting in models or resampling methods (like SMOTE for oversampling spam emails or downsampling majority class).

Alternatively, use threshold tuning based on precision-recall trade-offs to favor detecting spam while minimizing false positives.

Evaluation Metrics:

Accuracy alone is misleading with class imbalance. Use Precision, Recall, and F1-score to balance false positives and false negatives.

ROC-AUC and Precision-Recall AUC are useful to evaluate classifier performance across thresholds.

In spam detection, high recall (catching most spam) and high precision (not labeling legitimate emails as spam) are critical.

Business Impact:

Accurate spam filtering reduces user frustration and security risks (e.g., phishing).

Minimizing false positives preserves customer trust by avoiding loss of legitimate emails.

Efficient classification saves resources by reducing manual email sorting.