**Theoretical Questions**

 1. What is a Support Vector Machine (SVM)?
 ** A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression tasks. It works by finding the optimal hyperplane that best separates data points of different classes in a feature space. The goal is to maximize the margin between the classes, which helps improve generalization. SVM is effective in high-dimensional spaces and works well when the number of features exceeds the number of samples. It can also handle non-linear classification using kernel tricks that transform data into higher dimensions. SVMs are widely used in image recognition, bioinformatics, and text classification.


2.  What is the difference between Hard Margin and Soft Margin SVM?
** Hard Margin SVM is used when the data is perfectly linearly separable. It creates a decision boundary that does not allow any misclassification, aiming for the maximum margin between classes. However, it is sensitive to outliers and noise. Soft Margin SVM is more flexible and allows some misclassifications to achieve better generalization on unseen data. It introduces a penalty for incorrect classifications, making it suitable for real-world datasets that are not perfectly separable. Soft Margin SVM balances margin maximization and error minimization, making it more robust and widely applicable than Hard Margin SVM.


3.  What is the mathematical intuition behind SVM?
** The mathematical intuition behind SVM lies in finding the hyperplane that best separates data points of different classes by maximizing the margin between them. The margin is the distance between the hyperplane and the nearest data points from each class, known as support vectors. SVM aims to solve an optimization problem that minimizes the norm of the weight vector while ensuring correct classification of the training data. In cases where data is not linearly separable, a kernel function transforms it into a higher-dimensional space, making separation possible. This approach helps SVM achieve strong performance, even in complex classification tasks.


4.  What is the role of Lagrange Multipliers in SVM?
** Lagrange Multipliers play a key role in solving the optimization problem in SVM. They help convert the constrained optimization problem of finding the maximum margin hyperplane into a form that is easier to solve. By using the Lagrangian, constraints on the data points being correctly classified are incorporated into the objective function. This allows the problem to be expressed in its dual form, which depends only on the inner products of data points. It also helps identify the support vectors, as only data points with non-zero multipliers influence the final decision boundary, making the model efficient and effective.


5.  What are Support Vectors in SVM?
** Support Vectors are the data points in a dataset that lie closest to the decision boundary (or hyperplane) in a Support Vector Machine (SVM) model. They are the most critical elements of the training set because they directly influence the position and orientation of the optimal hyperplane. The margin, which SVM aims to maximize, is measured based on the distance between these support vectors and the hyperplane. Only these points are used in defining the decision function; all other points have no effect. As a result, support vectors help SVM generalize well and make efficient predictions.


6.  What is a Support Vector Classifier (SVC)?
** A Support Vector Classifier (SVC) is a type of Support Vector Machine (SVM) used specifically for classification tasks. It finds the optimal hyperplane that best separates data points from different classes with the maximum possible margin. SVC can handle both linearly and non-linearly separable data using kernel functions, which map data into higher-dimensional spaces where separation is easier. It allows some misclassifications through the use of a soft margin, helping it perform well on noisy or overlapping datasets. SVC is widely used in applications like image recognition, text classification, and bioinformatics due to its accuracy and robustness.


7. What is a Support Vector Regressor (SVR)?
** A Support Vector Regressor (SVR) is a type of Support Vector Machine (SVM) used for regression tasks. Unlike classification, SVR aims to predict continuous values. It tries to find a function that approximates the target values within a certain margin of tolerance, called epsilon. The model only considers data points that fall outside this margin, known as support vectors. SVR balances model complexity and prediction error by minimizing the weight vector while allowing some flexibility through slack variables. It is effective in high-dimensional spaces and performs well even when the relationship between features and target is non-linear using kernels.


8.  What is the Kernel Trick in SVM?
** The Kernel Trick in SVM is a mathematical technique that allows the algorithm to operate in a high-dimensional space without explicitly transforming the data. Instead of mapping data points into a higher-dimensional space directly, the kernel function computes the inner products between pairs of data points as if they were transformed. This enables SVM to learn complex, non-linear decision boundaries efficiently. Common kernel functions include linear, polynomial, radial basis function (RBF), and sigmoid. The Kernel Trick makes it possible to solve problems that are not linearly separable in the original feature space while keeping computations manageable.


9.  Compare Linear Kernel, Polynomial Kernel, and RBF Kernel.
** The Linear Kernel is best for linearly separable data and is fast and efficient, especially with high-dimensional features. The Polynomial Kernel captures curved relationships by applying polynomial transformations, making it suitable for moderately complex data, though it can be slower and prone to overfitting. The RBF (Radial Basis Function) Kernel handles highly non-linear patterns by mapping data into infinite-dimensional space using a Gaussian function. It is very flexible but requires careful tuning of parameters like gamma. Each kernel serves different data complexities, making kernel selection crucial for optimal model performance in SVM-based tasks.


10. What is the effect of the C parameter in SVM?
** The C parameter in SVM controls the trade-off between achieving a low training error and maintaining a large margin. It acts as a regularization parameter:

* A high C value tries to minimize classification errors by giving more importance to correctly classifying all training examples. This leads to less margin and potentially overfitting, as the model may become too sensitive to noise and outliers.

* A low C value allows more misclassifications but focuses on finding a wider margin. This encourages better generalization and can reduce overfitting, though it may increase training error.

In short, C balances model complexity and accuracy.


11.  What is the role of the Gamma parameter in RBF Kernel SVM?
** The Gamma parameter in an RBF (Radial Basis Function) Kernel SVM defines how far the influence of a single training point reaches:

* A high gamma value means each point has a narrow influence, creating very tight decision boundaries. This can lead to overfitting, as the model becomes too sensitive to individual data points.

* A low gamma value means each point has a wide influence, resulting in smoother, broader decision boundaries. This can lead to underfitting if the model is too simple to capture data patterns.

In essence, gamma controls the curvature of the decision boundaryin RBF SVM.


12. What is the Naïve Bayes classifier, and why is it called "Naïve?
** The Naïve Bayes classifier is a supervised machine learning algorithm based on Bayes’ Theorem, used for classification tasks. It calculates the probability of each class given a set of features and assigns the label with the highest probability.

It is called "Naïve" because it makes a strong assumption that all features are independent of each other given the class label, which is rarely true in real-world data. Despite this simplification, Naïve Bayes often performs surprisingly well, especially in high-dimensional problems like text classification, spam detection, and sentiment analysis, due to its simplicity and speed.


13. What is Bayes’ Theorem?
** Bayes’ Theorem is a mathematical formula used to calculate the probability of an event based on prior knowledge of conditions related to that event. In machine learning, it helps update the probability of a hypothesis as more evidence or data becomes available. It combines prior probability with the likelihood of the observed data to produce a posterior probability. This theorem is the foundation of the Naïve Bayes classifier. It enables models to make predictions even with limited data and is especially useful in classification problems like spam detection, medical diagnosis, and text classification tasks.


14. Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes?
** Gaussian, Multinomial, and Bernoulli Naïve Bayes are variants suited for different data types. Gaussian Naïve Bayes is used when features are continuous and assumes data follows a normal distribution, making it ideal for tasks like medical diagnosis. Multinomial Naïve Bayes works well with discrete features such as word counts in text classification, where data represents frequency. Bernoulli Naïve Bayes is designed for binary data, assuming features are either present or absent, commonly used in email spam detection. Each variant applies Bayes' Theorem with different assumptions about the feature distribution, helping choose the right model based on data characteristics.


15. When should you use Gaussian Naïve Bayes over other variants?
** Use Gaussian Naïve Bayes when your dataset contains continuous numerical features that follow a normal (Gaussian) distribution. It is ideal for problems where input variables, like age, income, or sensor readings, are real-valued and not discrete. This variant is commonly used in fields such as medical diagnosis, image classification, and real-time predictions due to its simplicity, speed, and efficiency. Gaussian Naïve Bayes assumes each feature is normally distributed within each class. It is preferred over Multinomial or Bernoulli Naïve Bayes when the data is not count-based or binary, making it highly suitable for real-world numerical classification tasks.


16. What are the key assumptions made by Naïve Bayes?
** Naïve Bayes makes three key assumptions. First, it assumes feature independence, meaning each feature contributes to the outcome independently, even though this is rarely true in real-world data. Second, it assumes class conditional independence, where the presence of one feature does not affect the probability of another, given the class. Third, it assumes a specific distribution based on the variant used: Gaussian for continuous data, Multinomial for count-based features, and Bernoulli for binary features. These simplifying assumptions make Naïve Bayes highly efficient and scalable for classification tasks, despite its “naïve” approach to feature relationships.


17.  What are the advantages and disadvantages of Naïve Bayes?
** Naïve Bayes is a simple, fast, and efficient classification algorithm, especially suitable for high-dimensional data like text. It performs well with small datasets, handles irrelevant features, and is robust to noise. Its ease of implementation makes it a popular choice for baseline models. However, it relies on the strong assumption that all features are independent given the class label, which is rarely true in real-world scenarios. It also faces the zero-probability problem when encountering unseen features, though this can be mitigated with smoothing. Additionally, its probability estimates may be inaccurate, and it struggles with complex feature interactions.


18.  Why is Naïve Bayes a good choice for text classification?
** Naïve Bayes is a good choice for text classification due to its simplicity, efficiency, and strong performance in high-dimensional spaces. Text data often has thousands of features (words or tokens), and Naïve Bayes handles this well because it assumes feature independence, which reduces computational complexity. It performs particularly well with word frequency or presence data, making it ideal for spam detection, sentiment analysis, and document categorization. It requires less training data, trains quickly, and works effectively even when data is noisy or sparse. Despite its “naïve” assumptions, it often matches or outperforms more complex models in text tasks.


19. Compare SVM and Naïve Bayes for classification tasks?
** Naïve Bayes and SVM are popular classification algorithms with distinct strengths. Naïve Bayes is fast, simple, and works well with high-dimensional data like text. It assumes feature independence and provides probabilistic outputs, making it ideal for spam detection and sentiment analysis. In contrast, SVM finds the optimal hyperplane to separate classes and performs better with complex, non-linear data. It is more accurate in many cases but computationally heavier and harder to interpret. SVM doesn’t natively offer probabilities. Choose Naïve Bayes for speed and simplicity, and SVM when accuracy and handling complex boundaries are priorities.


20. How does Laplace Smoothing help in Naïve Bayes?
** Laplace Smoothing, also known as add-one smoothing, is used in Naïve Bayes to address the zero-frequency problem, which occurs when a word or feature in the test data is not present in the training data for a given class. Without smoothing, this results in a zero probability, which can invalidate the entire probability calculation. Laplace Smoothing fixes this by adding one to each word count and adjusting the denominator accordingly, ensuring no probability is ever zero. This helps the model generalize better, especially in sparse datasets like text classification, where many rare or unseen words can affect predictions.


**Practical Questions**

In [None]:
21.  Write a Python program to train an SVM Classifier on the Iris dataset and evaluate accuracy.
** # Import necessary libraries
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create an SVM classifier (using RBF kernel by default)
svm_model = SVC(kernel='rbf', C=1.0, gamma='scale')

# Train the model
svm_model.fit(X_train, y_train)

# Predict on the test data
y_pred = svm_model.predict(X_test)

# Evaluate the model accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print the results
print("SVM Classifier Accuracy on Iris Dataset: {:.2f}%".format(accuracy * 100))


In [None]:
22. Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then
compare their accuracies.
** # Import necessary libraries
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Wine dataset
wine = datasets.load_wine()
X = wine.data
y = wine.target

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train SVM with Linear kernel
svm_linear = SVC(kernel='linear', C=1.0)
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)
accuracy_linear = accuracy_score(y_test, y_pred_linear)

# Train SVM with RBF kernel
svm_rbf = SVC(kernel='rbf', C=1.0, gamma='scale')
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)
accuracy_rbf = accuracy_score(y_test, y_pred_rbf)

# Print accuracies
print("SVM with Linear Kernel Accuracy: {:.2f}%".format(accuracy_linear * 100))
print("SVM with RBF Kernel Accuracy: {:.2f}%".format(accuracy_rbf * 100))


In [None]:
23. Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then
compare their accuracies.
** # Import necessary libraries
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Wine dataset
wine = datasets.load_wine()
X = wine.data
y = wine.target

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train SVM with Linear kernel
svm_linear = SVC(kernel='linear', C=1.0)
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)
accuracy_linear = accuracy_score(y_test, y_pred_linear)

# Train SVM with RBF kernel
svm_rbf = SVC(kernel='rbf', C=1.0, gamma='scale')
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)
accuracy_rbf = accuracy_score(y_test, y_pred_rbf)

# Print accuracies
print("SVM with Linear Kernel Accuracy: {:.2f}%".format(accuracy_linear * 100))
print("SVM with RBF Kernel Accuracy: {:.2f}%".format(accuracy_rbf * 100))


In [None]:
24. Write a Python program to train an SVM Regressor (SVR) on a housing dataset and evaluate it using Mean
Squared Error (MSE).
** # Import necessary libraries
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import StandardScaler

# Load the California Housing dataset
housing = fetch_california_housing()
X = housing.data
y = housing.target

# Standardize the data (important for SVR)
scaler_X = StandardScaler()
scaler_y = StandardScaler()

X_scaled = scaler_X.fit_transform(X)
y_scaled = scaler_y.fit_transform(y.reshape(-1, 1)).ravel()

# Split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_scaled, test_size=0.2, random_state=42)

# Train an SVR model (RBF kernel by default)
svr_model = SVR(kernel='rbf', C=100, gamma=0.1, epsilon=0.1)
svr_model.fit(X_train, y_train)

# Predict on test data
y_pred_scaled = svr_model.predict(X_test)

# Inverse transform to get predictions in original scale
y_test_orig = scaler_y.inverse_transform(y_test.reshape(-1, 1))
y_pred_orig = scaler_y.inverse_transform(y_pred_scaled.reshape(-1, 1))

# Calculate Mean Squared Error
mse = mean_squared_error(y_test_orig, y_pred_orig)

# Print the result
print("SVR Mean Squared Error on California Housing dataset: {:.4f}".format(mse))


In [None]:
25.  Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and
evaluate accuracy.
** # Import required libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train Gaussian Naïve Bayes classifier
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Predict on test data
y_pred = gnb.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print result
print("Gaussian Naïve Bayes Classifier Accuracy on Breast Cancer Dataset: {:.2f}%".format(accuracy * 100))


In [None]:
26. Write a Python program to train a Multinomial Naïve Bayes classifier for text classification using the 20
Newsgroups dataset.
** # Import required libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train Gaussian Naïve Bayes classifier
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Predict on test data
y_pred = gnb.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print result
print("Gaussian Naïve Bayes Classifier Accuracy on Breast Cancer Dataset: {:.2f}%".format(accuracy * 100))


In [None]:
27. Write a Python program to train an SVM Classifier with different C values and compare the decision
boundaries visually.
** import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split

# Generate a synthetic 2D classification dataset
X, y = make_classification(n_samples=300, n_features=2, n_informative=2,
                           n_redundant=0, n_clusters_per_class=1, random_state=42)

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# List of C values to compare
C_values = [0.01, 0.1, 1, 10, 100]

# Set up the plot
plt.figure(figsize=(15, 10))

# Create mesh grid for plotting decision boundaries
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 500),
                     np.linspace(y_min, y_max, 500))

# Train and plot for each C value
for i, C in enumerate(C_values):
    clf = SVC(kernel='linear', C=C)
    clf.fit(X_train, y_train)
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    plt.subplot(2, 3, i + 1)
    plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.coolwarm)
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm, edgecolors='k')
    plt.title(f"SVM Decision Boundary (C={C})")
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')

plt.tight_layout()
plt.show()


In [None]:
28. Write a Python program to train a Bernoulli Naïve Bayes classifier for binary classification on a dataset with
binary features.
** # Import necessary libraries
import numpy as np
from sklearn.datasets import make_classification
from sklearn.preprocessing import Binarizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import accuracy_score, classification_report

# Generate synthetic binary classification dataset
X, y = make_classification(n_samples=1000, n_features=20,
                           n_informative=10, n_redundant=0,
                           n_classes=2, random_state=42)

# Binarize features (threshold = 0.0)
binarizer = Binarizer(threshold=0.0)
X_binary = binarizer.fit_transform(X)

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_binary, y, test_size=0.2, random_state=42)

# Initialize and train Bernoulli Naïve Bayes classifier
bnb = BernoulliNB()
bnb.fit(X_train, y_train)

# Predict and evaluate
y_pred = bnb.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

# Print results
print("Bernoulli Naïve Bayes Classifier Accuracy: {:.2f}%".format(accuracy * 100))
print("\nClassification Report:\n", classification_report(y_test, y_pred))


In [None]:
29. Write a Python program to apply feature scaling before training an SVM model and compare results with
unscaled data.
** from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split dataset into train/test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# --------- SVM without scaling ---------
svm_unscaled = SVC(kernel='rbf')
svm_unscaled.fit(X_train, y_train)
y_pred_unscaled = svm_unscaled.predict(X_test)
accuracy_unscaled = accuracy_score(y_test, y_pred_unscaled)

# --------- Apply feature scaling ---------
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# --------- SVM with scaling ---------
svm_scaled = SVC(kernel='rbf')
svm_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = svm_scaled.predict(X_test_scaled)
accuracy_scaled = accuracy_score(y_test, y_pred_scaled)

# --------- Print Results ---------
print("SVM Accuracy without Scaling: {:.2f}%".format(accuracy_unscaled * 100))
print("SVM Accuracy with Scaling: {:.2f}%".format(accuracy_scaled * 100))


In [None]:
30.  Write a Python program to train a Gaussian Naïve Bayes model and compare the predictions before and
after Laplace Smoothing.
** from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score

# Load the digits dataset (suitable for Multinomial NB)
digits = load_digits()
X, y = digits.data, digits.target

# Normalize features to be count-like (0-16 pixel values)
X = (X / X.max()) * 10
X = X.astype(int)

# Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Without Laplace Smoothing (alpha=0.0)
model_no_smooth = MultinomialNB(alpha=0.0)
model_no_smooth.fit(X_train, y_train)
y_pred_no_smooth = model_no_smooth.predict(X_test)
acc_no_smooth = accuracy_score(y_test, y_pred_no_smooth)

# With Laplace Smoothing (alpha=1.0)
model_laplace = MultinomialNB(alpha=1.0)
model_laplace.fit(X_train, y_train)
y_pred_laplace = model_laplace.predict(X_test)
acc_laplace = accuracy_score(y_test, y_pred_laplace)

# Results
print("Accuracy WITHOUT Laplace Smoothing (alpha=0.0): {:.2f}%".format(acc_no_smooth * 100))
print("Accuracy WITH Laplace Smoothing (alpha=1.0): {:.2f}%".format(acc_laplace * 100))


In [None]:
31. Write a Python program to train an SVM Classifier and use GridSearchCV to tune the hyperparameters (C,
gamma, kernel).
** from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Train/Test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define parameter grid
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [1, 0.1, 0.01, 0.001],
    'kernel': ['linear', 'rbf', 'poly']
}

# Create and run GridSearchCV
grid = GridSearchCV(SVC(), param_grid, refit=True, verbose=1, cv=5)
grid.fit(X_train, y_train)

# Best parameters and estimator
print("Best Parameters:", grid.best_params_)

# Predict and evaluate
y_pred = grid.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Test Accuracy with Best Parameters: {:.2f}%".format(accuracy * 100))


In [None]:
32. Write a Python program to train an SVM Classifier on an imbalanced dataset and apply class weighting and
check it improve accuracy.
** from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report, accuracy_score
import numpy as np

# Step 1: Create an imbalanced dataset
X, y = make_classification(n_samples=1000, n_features=20, n_informative=2,
                           n_redundant=10, n_clusters_per_class=1,
                           weights=[0.9, 0.1], flip_y=0, random_state=42)

# Step 2: Split into training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 3: Train SVM without class weights
svm_default = SVC(kernel='rbf')
svm_default.fit(X_train, y_train)
y_pred_default = svm_default.predict(X_test)
print("Without Class Weighting:")
print("Accuracy:", accuracy_score(y_test, y_pred_default))
print("Classification Report:\n", classification_report(y_test, y_pred_default))

# Step 4: Train SVM with class_weight='balanced'
svm_weighted = SVC(kernel='rbf', class_weight='balanced')
svm_weighted.fit(X_train, y_train)
y_pred_weighted = svm_weighted.predict(X_test)
print("\nWith Class Weighting:")
print("Accuracy:", accuracy_score(y_test, y_pred_weighted))
print("Classification Report:\n", classification_report(y_test, y_pred_weighted))


In [None]:
33.  Write a Python program to implement a Naïve Bayes classifier for spam detection using email data.
** from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report, accuracy_score

# Sample email dataset (text, label)
emails = [
    ("Win a free iPhone now", "spam"),
    ("Limited time offer, click here!", "spam"),
    ("Meeting at 10am tomorrow", "ham"),
    ("Lunch plans?", "ham"),
    ("Get cash instantly, apply now", "spam"),
    ("Your invoice is attached", "ham"),
    ("Congratulations! You've been selected", "spam"),
    ("Let's catch up soon", "ham"),
    ("Earn money from home", "spam"),
    ("Don't forget the team meeting", "ham")
]

# Step 1: Separate data and labels
texts, labels = zip(*emails)

# Step 2: Convert text to feature vectors
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

# Step 3: Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.3, random_state=42)

# Step 4: Train Naïve Bayes classifier
model = MultinomialNB()
model.fit(X_train, y_train)

# Step 5: Predict and evaluate
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))


In [None]:
34. Write a Python program to train an SVM Classifier and a Naïve Bayes Classifier on the same dataset and
compare their accuracy.
** from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load dataset
data = load_iris()
X = data.data
y = data.target

# Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Feature Scaling for SVM (not required for Naive Bayes)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train SVM Classifier
svm_model = SVC(kernel='rbf', C=1.0)
svm_model.fit(X_train_scaled, y_train)
svm_preds = svm_model.predict(X_test_scaled)

# Train Gaussian Naïve Bayes Classifier
nb_model = GaussianNB()
nb_model.fit(X_train, y_train)
nb_preds = nb_model.predict(X_test)

# Evaluate Accuracy
svm_accuracy = accuracy_score(y_test, svm_preds)
nb_accuracy = accuracy_score(y_test, nb_preds)

# Print results
print("SVM Classifier Accuracy: {:.2f}%".format(svm_accuracy * 100))
print("Naïve Bayes Classifier Accuracy: {:.2f}%".format(nb_a_


In [None]:
35. Write a Python program to perform feature selection before training a Naïve Bayes classifier and compare
results.
** from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.feature_selection import SelectKBest, chi2
from sklearn.metrics import accuracy_score

# Load dataset
data = load_iris()
X = data.data
y = data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# ---- Model without Feature Selection ----
model_full = GaussianNB()
model_full.fit(X_train, y_train)
y_pred_full = model_full.predict(X_test)
acc_full = accuracy_score(y_test, y_pred_full)

# ---- Feature Selection using chi-squared test ----
selector = SelectKBest(score_func=chi2, k=2)  # Keep top 2 features
X_train_sel = selector.fit_transform(X_train, y_train)
X_test_sel = selector.transform(X_test)

# ---- Model with Selected Features ----
model_sel = GaussianNB()
model_sel.fit(X_train_sel, y_train)
y_pred_sel = model_sel.predict(X_test_sel)
acc_sel = accuracy_score(y_test, y_pred_sel)

# ---- Output Results ----
print("Accuracy without Feature Selection: {:.2f}%".format(acc_full * 100))
print("Accuracy with Feature Selection (Top 2 features): {:.2f}%".format(acc_sel * 100))*_


In [None]:
36. Write a Python program to train an SVM Classifier using One-vs-Rest (OvR) and One-vs-One (OvO)
strategies on the Wine dataset and compare their accuracy.
** from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier
from sklearn.metrics import accuracy_score

# Load the Wine dataset
data = load_wine()
X = data.data
y = data.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# SVM base classifier
svm = SVC(kernel='rbf', C=1.0, gamma='scale')

# One-vs-Rest (OvR)
ovr_clf = OneVsRestClassifier(svm)
ovr_clf.fit(X_train_scaled, y_train)
y_pred_ovr = ovr_clf.predict(X_test_scaled)
acc_ovr = accuracy_score(y_test, y_pred_ovr)

# One-vs-One (OvO)
ovo_clf = OneVsOneClassifier(svm)
ovo_clf.fit(X_train_scaled, y_train)
y_pred_ovo = ovo_clf.predict(X_test_scaled)
acc_ovo = accuracy_score(y_test, y_pred_ovo)

# Print accuracies
print("Accuracy using One-vs-Rest (OvR): {:.2f}%".format(acc_ovr * 100))
print("Accuracy using One-vs-One (OvO): {:.2f}%".format(acc_ovo * 100))


In [None]:
37. Write a Python program to train an SVM Classifier using Linear, Polynomial, and RBF kernels on the Breast
Cancer dataset and compare their accuracy.
** from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train SVM with Linear kernel
svm_linear = SVC(kernel='linear')
svm_linear.fit(X_train_scaled, y_train)
y_pred_linear = svm_linear.predict(X_test_scaled)
acc_linear = accuracy_score(y_test, y_pred_linear)

# Train SVM with Polynomial kernel
svm_poly = SVC(kernel='poly', degree=3)
svm_poly.fit(X_train_scaled, y_train)
y_pred_poly = svm_poly.predict(X_test_scaled)
acc_poly = accuracy_score(y_test, y_pred_poly)

# Train SVM with RBF kernel
svm_rbf = SVC(kernel='rbf')
svm_rbf.fit(X_train_scaled, y_train)
y_pred_rbf = svm_rbf.predict(X_test_scaled)
acc_rbf = accuracy_score(y_test, y_pred_rbf)

# Print results
print("Accuracy with Linear Kernel: {:.2f}%".format(acc_linear * 100))
print("Accuracy with Polynomial Kernel: {:.2f}%".format(acc_poly * 100))
print("Accuracy with RBF Kernel: {:.2f}%".format(acc_rbf * 100))


In [None]:
38. Write a Python program to train an SVM Classifier using Stratified K-Fold Cross-Validation and compute the
average accuracy.
** from sklearn.datasets import load_breast_cancer
from sklearn.svm import SVC
from sklearn.model_selection import StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
import numpy as np

# Load dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Set up Stratified K-Fold
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
accuracies = []

# SVM model
svm = SVC(kernel='rbf', C=1.0, gamma='scale')

# Cross-validation loop
for train_index, test_index in skf.split(X_scaled, y):
    X_train, X_test = X_scaled[train_index], X_scaled[test_index]
    y_train, y_test = y[train_index], y[test_index]

    svm.fit(X_train, y_train)
    y_pred = svm.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    accuracies.append(acc)

# Print results
print("Fold Accuracies:", ["{:.2f}%".format(a * 100) for a in accuracies])
print("Average Accuracy: {:.2f}%".format(np.mean(accuracies) * 100))


In [None]:
39.  Write a Python program to train a Naïve Bayes classifier using different prior probabilities and compare
performance.
** from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report
import numpy as np

# Load dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

# Define different priors
default_prior = None  # automatic priors
custom_prior_1 = [0.3, 0.7]  # Prior belief: class 0 (benign) 30%, class 1 (malignant) 70%
custom_prior_2 = [0.5, 0.5]  # Equal priors

# Function to train and evaluate
def train_and_evaluate(priors, label):
    model = GaussianNB(priors=priors)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    print(f"\n--- {label} ---")
    print(f"Accuracy: {acc * 100:.2f}%")
    print(classification_report(y_test, y_pred, target_names=data.target_names))

# Train and evaluate with different priors
train_and_evaluate(default_prior, "Default Prior (learned from data)")
train_and_evaluate(custom_prior_1, "Custom Prior [0.3, 0.7]")
train_and_evaluate(custom_prior_2, "Custom Prior [0.5, 0.5]")


In [None]:
40. Write a Python program to perform Recursive Feature Elimination (RFE) before training an SVM Classifier and
compare accuracy.
** from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.feature_selection import RFE
from sklearn.metrics import accuracy_score

# Load dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

# Baseline SVM without feature selection
svm_baseline = SVC(kernel='linear', random_state=42)
svm_baseline.fit(X_train, y_train)
y_pred_baseline = svm_baseline.predict(X_test)
accuracy_baseline = accuracy_score(y_test, y_pred_baseline)

# Recursive Feature Elimination
rfe = RFE(estimator=SVC(kernel='linear'), n_features_to_select=10)
rfe.fit(X_train, y_train)

# Apply RFE to train and test sets
X_train_rfe = rfe.transform(X_train)
X_test_rfe = rfe.transform(X_test)

# Train SVM with selected features
svm_rfe = SVC(kernel='linear', random_state=42)
svm_rfe.fit(X_train_rfe, y_train)
y_pred_rfe = svm_rfe.predict(X_test_rfe)
accuracy_rfe = accuracy_score(y_test, y_pred_rfe)

# Print comparison results
print(f"Accuracy without RFE: {accuracy_baseline * 100:.2f}%")
print(f"Accuracy with RFE (10 features): {accuracy_rfe * 100:.2f}%")


In [None]:
41.  Write a Python program to train an SVM Classifier and evaluate its performance using Precision, Recall, and
F1-Score instead of accuracy.
** from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report

# Load dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# Train SVM classifier
svm = SVC(kernel='linear', random_state=42)
svm.fit(X_train, y_train)

# Predict on test data
y_pred = svm.predict(X_test)

# Evaluate performance
report = classification_report(y_test, y_pred, target_names=data.target_names)
print("Classification Report:\n")
print(report)


In [None]:
42. Write a Python program to train a Naïve Bayes Classifier and evaluate its performance using Log Loss
(Cross-Entropy Loss)
** from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import log_loss

# Load the dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# Train Gaussian Naïve Bayes classifier
nb_model = GaussianNB()
nb_model.fit(X_train, y_train)

# Predict probability scores
y_proba = nb_model.predict_proba(X_test)

# Calculate Log Loss
logloss = log_loss(y_test, y_proba)
print(f"Log Loss (Cross-Entropy Loss): {logloss:.4f}")


In [None]:
43. Write a Python program to train an SVM Classifier and visualize the Confusion Matrix using seaborn.
** import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target
class_names = iris.target_names

# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# Train an SVM classifier
svm_model = SVC(kernel='linear', random_state=42)
svm_model.fit(X_train, y_train)

# Make predictions
y_pred = svm_model.predict(X_test)

# Generate the confusion matrix
cm = confusion_matrix(y_test, y_pred)

# Visualize the confusion matrix
plt.figure(figsize=(6, 5))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=class_names, yticklabels=class_names)
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.title('Confusion Matrix - SVM on Iris Dataset')
plt.tight_layout()
plt.show()

# Print accuracy
print("Accuracy:", accuracy_score(y_test, y_pred))


In [None]:
44.  Write a Python program to train an SVM Regressor (SVR) and evaluate its performance using Mean Absolute
Error (MAE) instead of MSE.
** import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import mean_absolute_error
from sklearn.preprocessing import StandardScaler

# Load California Housing dataset (modern alternative to deprecated Boston dataset)
data = fetch_california_housing()
X = data.data
y = data.target

# Split the dataset into train and test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train SVR model
svr = SVR(kernel='rbf', C=100, gamma=0.1, epsilon=0.1)
svr.fit(X_train_scaled, y_train)

# Predict on test set
y_pred = svr.predict(X_test_scaled)

# Evaluate with Mean Absolute Error (MAE)
mae = mean_absolute_error(y_test, y_pred)
print("Mean Absolute Error (MAE):", round(mae, 3))


In [None]:
45. Write a Python program to train a Naïve Bayes classifier and evaluate its performance using the ROC-AUC
score.
** import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import roc_auc_score, roc_curve
import matplotlib.pyplot as plt

# Load the dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split the dataset into train and test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train Gaussian Naïve Bayes classifier
nb = GaussianNB()
nb.fit(X_train, y_train)

# Predict probabilities
y_prob = nb.predict_proba(X_test)[:, 1]

# Compute ROC-AUC score
roc_auc = roc_auc_score(y_test, y_prob)
print("ROC-AUC Score:", round(roc_auc, 3))

# Plot ROC Curve
fpr, tpr, thresholds = roc_curve(y_test, y_prob)
plt.figure(figsize=(8, 5))
plt.plot(fpr, tpr, label=f"ROC curve (AUC = {roc_auc:.3f})", color='blue')
plt.plot([0, 1], [0, 1], 'k--')  # Diagonal line
plt.xlabel("False Positi


In [None]:
46. Write a Python program to train an SVM Classifier and visualize the Precision-Recall Curve.
** import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import precision_recall_curve, average_precision_score
import matplotlib.pyplot as plt

# Load Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split into train and test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train SVM classifier with probability estimates
svm = SVC(kernel='rbf', probability=True, random_state=42)
svm.fit(X_train_scaled, y_train)

# Predict probabilities
y_scores = svm.predict_proba(X_test_scaled)[:, 1]

# Calculate precision, recall, and average precision score
precision, recall, thresholds = precision_recall_curve(y_test,_
