1) What is a Support Vector Machine (SVM)4

 A supervised ML algorithm that finds the optimal hyperplane that maximizes the margin between different classes in the feature space.

2) What is the difference between Hard Margin and Soft Margin SVM4

 Hard Margin: No misclassifications allowed; works only if data is perfectly linearly separable.

 Soft Margin: Allows some misclassification by introducing a penalty; better for noisy or overlapping data

3) What is the mathematical intuition behind SVM4

 Find a hyperplane
𝑤
⋅
𝑥
+
𝑏
=
0
w⋅x+b=0 that maximizes the margin
2
∥
𝑤
∥
∥w∥
2
​
  while minimizing classification errors.

 4) What is the role of Lagrange Multipliers in SVM4

  Used in optimization to transform the constrained problem into an unconstrained one via the dual formulation, enabling use of kernels.

 5) What are Support Vectors in SVM4
  
  The data points that lie closest to the decision boundary; they define the position and orientation of the hyperplane.

 6) What is a Support Vector Classifier (SVC)4

  An implementation of SVM for classification tasks, possibly using kernels for non-linear decision boundaries.

 7)  What is a Support Vector Regressor (SVR)4
  
  An SVM variant for regression; predicts continuous values while ignoring small errors within an epsilon margin.

  8) What is the Kernel Trick in SVM4

   Transforms data into a higher-dimensional space without explicitly computing the transformation, enabling non-linear classification.

 9) Compare Linear Kernel, Polynomial Kernel, and RBF Kernel

  Linear vs Polynomial vs RBF Kernel
Linear:

Works well for linearly separable data; fast.

Polynomial: Models curved relationships; degree parameter controls flexibility.

RBF: Handles complex, non-linear boundaries; uses distance-based similarity.

10) What is the effect of the C parameter in SVM4

 High C: Less tolerance for errors (overfits).

Low C: More tolerance for errors (underfits, but better generalization).

11) What is the role of the Gamma parameter in RBF Kernel SVM4

 Controls influence of individual points:

High gamma: Each point has narrow influence (overfits).

Low gamma: Each point has broad influence (underfits).

12) What is the Naïve Bayes classifier, and why is it called "Naïve"4

 A probabilistic classifier based on Bayes’ Theorem with the assumption of feature independence (“naïve” assumption).

13) What is Bayes’ Theorem4

 Bayes’ Theorem
𝑃
(
𝐴
∣
𝐵
)
=
𝑃
(
𝐵
∣
𝐴
)
⋅
𝑃
(
𝐴
)
𝑃
(
𝐵
)
P(A∣B)=
P(B)
P(B∣A)⋅P(A)
​

14) Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes:

 Gaussian vs Multinomial vs Bernoulli Naïve Bayes
Gaussian:

Assumes continuous features with normal distribution.

Multinomial: Works with count-based features (e.g., word counts).

Bernoulli: Works with binary features (presence/absence).

15) When should you use Gaussian Naïve Bayes over other variants4

 When features are continuous and approximately normally distributed.

16) What are the key assumptions made by Naïve Bayes4

 Features are conditionally independent given the class.

Features contribute equally to prediction.

17) What are the advantages and disadvantages of Naïve Bayes4

 Advantages of Naïve Bayes:

Simple, fast, works with small datasets.

Performs well in high-dimensional spaces.

Disadvantages:

Strong independence assumption may not hold.

Poor at capturing complex relationships.

18) Why is Naïve Bayes a good choice for text classification4

 Text data often has many features (words) that are approximately independent; NB handles high-dimensional sparse data well.

19) Compare SVM and Naïve Bayes for classification tasks:

 SVM: Works well with complex, high-dimensional boundaries; slower training.

NB: Probabilistic, fast, works well with sparse text data.

NB often better for text; SVM better for non-linear complex data.

20) How does Laplace Smoothing help in Naïve Bayes?

 Adds a small constant to all counts to avoid zero probabilities for unseen feature-class combinations.



In [None]:
# 21 Write a Python program to train an SVM Classifier on the Iris dataset and evaluate accuracy

# Import necessary libraries
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Standardize features (important for SVM performance)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Create an SVM classifier with RBF kernel
svm_clf = SVC(kernel='rbf', C=1.0, gamma='scale')

# Train the model
svm_clf.fit(X_train, y_train)

# Make predictions
y_pred = svm_clf.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"SVM Classifier Accuracy: {accuracy:.2f}")

In [None]:
# 22 Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies:

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# 1. Load the Wine dataset
wine = datasets.load_wine()
X = wine.data
y = wine.target

# 2. Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# 3. Standardize features for better SVM performance
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# 4. Train SVM with Linear Kernel
svm_linear = SVC(kernel='linear', random_state=42)
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)
acc_linear = accuracy_score(y_test, y_pred_linear)

# 5. Train SVM with RBF Kernel
svm_rbf = SVC(kernel='rbf', random_state=42)
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)
acc_rbf = accuracy_score(y_test, y_pred_rbf)

# 6. Print results
print(f"Accuracy with Linear Kernel: {acc_linear:.4f}")
print(f"Accuracy with RBF Kernel:    {acc_rbf:.4f}")

# 7. Compare
if acc_linear > acc_rbf:
    print("Linear kernel performed better.")
elif acc_rbf > acc_linear:
    print("RBF kernel performed better.")
else:
    print("Both kernels performed equally well.")

In [None]:
# 23  Write a Python program to train an SVM Regressor (SVR) on a housing dataset and evaluate it using Mean Squared Error (MSE):

# Import libraries
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error

# Load California Housing dataset
housing = fetch_california_housing()
X = pd.DataFrame(housing.data, columns=housing.feature_names)
y = housing.target

# Split data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Feature scaling (important for SVR)
scaler_X = StandardScaler()
scaler_y = StandardScaler()

X_train_scaled = scaler_X.fit_transform(X_train)
X_test_scaled = scaler_X.transform(X_test)

# y needs to be reshaped before scaling
y_train_scaled = scaler_y.fit_transform(y_train.reshape(-1, 1)).ravel()

# Train the SVR model with RBF kernel
svr = SVR(kernel='rbf', C=100, gamma=0.1, epsilon=0.1)
svr.fit(X_train_scaled, y_train_scaled)

# Predict on test set
y_pred_scaled = svr.predict(X_test_scaled)

# Inverse transform predictions to original scale
y_pred = scaler_y.inverse_transform(y_pred_scaled.reshape(-1, 1)).ravel()

# Calculate Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error (MSE):", mse)

In [None]:
# 24  Write a Python program to train an SVM Classifier with a Polynomial Kernel and visualize the decision boundary:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler

# 1. Generate a synthetic dataset
X, y = make_moons(n_samples=100, noise=0.2, random_state=42)

# 2. Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# 3. Train an SVM classifier with Polynomial Kernel
svm_poly = SVC(kernel='poly', degree=3, C=1, gamma='scale')
svm_poly.fit(X_scaled, y)

# 4. Create a meshgrid for decision boundary
x_min, x_max = X_scaled[:, 0].min() - 1, X_scaled[:, 0].max() + 1
y_min, y_max = X_scaled[:, 1].min() - 1, X_scaled[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
                     np.arange(y_min, y_max, 0.02))

# 5. Predict over the grid
Z = svm_poly.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# 6. Plot the decision boundary
plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.coolwarm)
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=y, s=30, cmap=plt.cm.coolwarm, edgecolors='k')
plt.title("SVM with Polynomial Kernel (degree=3)")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()

In [None]:
# 25 Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluate accuracy:

# Gaussian Naïve Bayes on Breast Cancer dataset

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# 1. Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# 2. Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# 3. Create and train the Gaussian Naïve Bayes classifier
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# 4. Make predictions on the test set
y_pred = gnb.predict(X_test)

# 5. Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)

print("Gaussian Naïve Bayes Accuracy on Breast Cancer dataset: {:.2f}%".format(accuracy * 100))

In [None]:
# 26 Write a Python program to train a Multinomial Naïve Bayes classifier for text classification using the 20 Newsgroups dataset.

# Multinomial Naïve Bayes on 20 Newsgroups dataset

from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn import metrics

# 1. Load dataset
categories = None  # you can also specify a list of categories if needed
train_data = fetch_20newsgroups(subset='train', categories=categories, shuffle=True, random_state=42)
test_data = fetch_20newsgroups(subset='test', categories=categories, shuffle=True, random_state=42)

print(f"Number of training samples: {len(train_data.data)}")
print(f"Number of test samples: {len(test_data.data)}")
print(f"Target classes: {train_data.target_names}\n")

# 2. Convert text to numerical features
vectorizer = TfidfVectorizer(stop_words='english')
X_train = vectorizer.fit_transform(train_data.data)
X_test = vectorizer.transform(test_data.data)

# 3. Train Multinomial Naïve Bayes model
clf = MultinomialNB()
clf.fit(X_train, train_data.target)

# 4. Make predictions
y_pred = clf.predict(X_test)

# 5. Evaluate model
accuracy = metrics.accuracy_score(test_data.target, y_pred)
print(f"Accuracy: {accuracy:.4f}")

print("\nClassification Report:")
print(metrics.classification_report(test_data.target, y_pred, target_names=test_data.target_names))

# 6. Show some predictions
print("\nSample Predictions:")
for i in range(5):
    print(f"Text: {test_data.data[i][:100]}...")
    print(f"Predicted: {train_data.target_names[y_pred[i]]}")
    print(f"Actual: {train_data.target_names[test_data.target[i]]}\n")

In [None]:
# 27 Write a Python program to train an SVM Classifier with different C values and compare the decision boundaries visually

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

# 1. Load dataset (we'll use only 2 features for easy visualization)
X, y = datasets.make_classification(
    n_samples=100, n_features=2, n_informative=2,
    n_redundant=0, n_clusters_per_class=1, random_state=42
)

# 2. Standardize features
scaler = StandardScaler()
X = scaler.fit_transform(X)

# 3. Define different C values to test
C_values = [0.1, 1, 10, 100]

# 4. Create a mesh grid for plotting decision boundaries
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(
    np.linspace(x_min, x_max, 300),
    np.linspace(y_min, y_max, 300)
)

# 5. Train and plot for each C
plt.figure(figsize=(12, 8))

for i, C in enumerate(C_values, 1):
    model = SVC(kernel='linear', C=C)
    model.fit(X, y)

    # Predict for mesh grid points
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    # Plot
    plt.subplot(2, 2, i)
    plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.coolwarm)
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm, edgecolors='k')
    plt.title(f"SVM Decision Boundary (C={C})")
    plt.xlabel("Feature 1")
    plt.ylabel("Feature 2")

plt.tight_layout()
plt.show()

In [None]:
# 28  Write a Python program to train a Bernoulli Naïve Bayes classifier for binary classification on a dataset with binary features=

# Bernoulli Naïve Bayes for Binary Classification
from sklearn.naive_bayes import BernoulliNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
import numpy as np

# Sample binary dataset
# Features: binary (0/1)
X = np.array([
    [1, 0, 1, 0],
    [1, 1, 1, 0],
    [0, 0, 1, 1],
    [0, 1, 0, 0],
    [1, 0, 0, 1],
    [0, 1, 1, 0],
    [1, 1, 0, 1],
    [0, 0, 0, 1]
])

# Labels: binary classification (0 or 1)
y = np.array([1, 1, 0, 0, 1, 0, 1, 0])

# Split into train and test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)

# Initialize Bernoulli Naïve Bayes model
model = BernoulliNB()

# Train the model
model.fit(X_train, y_train)

# Predict on test set
y_pred = model.predict(X_test)

# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

In [None]:
# 29 Write a Python program to apply feature scaling before training an SVM model and compare results with unscaled data=

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# 1. Load dataset
data = datasets.load_breast_cancer()
X = data.data
y = data.target

# 2. Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 3. SVM without scaling
svm_unscaled = SVC(kernel='rbf', C=1.0, gamma='scale')
svm_unscaled.fit(X_train, y_train)
y_pred_unscaled = svm_unscaled.predict(X_test)
accuracy_unscaled = accuracy_score(y_test, y_pred_unscaled)

# 4. Apply feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# 5. SVM with scaling
svm_scaled = SVC(kernel='rbf', C=1.0, gamma='scale')
svm_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = svm_scaled.predict(X_test_scaled)
accuracy_scaled = accuracy_score(y_test, y_pred_scaled)

# 6. Print results
print("Accuracy without scaling: {:.4f}".format(accuracy_unscaled))
print("Accuracy with scaling: {:.4f}".format(accuracy_scaled))

In [None]:
# 30 Write a Python program to train a Gaussian Naïve Bayes model and compare the predictions before and after Laplace Smoothing

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Model without Laplace-like smoothing (very small var_smoothing)
gnb_no_smoothing = GaussianNB(var_smoothing=1e-12)
gnb_no_smoothing.fit(X_train, y_train)
pred_no_smoothing = gnb_no_smoothing.predict(X_test)
acc_no_smoothing = accuracy_score(y_test, pred_no_smoothing)

# Model with Laplace-like smoothing (default var_smoothing)
gnb_smoothing = GaussianNB()  # default var_smoothing=1e-9
gnb_smoothing.fit(X_train, y_train)
pred_smoothing = gnb_smoothing.predict(X_test)
acc_smoothing = accuracy_score(y_test, pred_smoothing)

# Results
print("Accuracy without smoothing:", acc_no_smoothing)
print("Accuracy with smoothing:", acc_smoothing)
print("\nPredictions without smoothing:", pred_no_smoothing)
print("Predictions with smoothing:", pred_smoothing)

In [None]:
# 31 Write a Python program to train an SVM Classifier and use GridSearchCV to tune the hyperparameters (C, gamma, kernel)=

from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load dataset (Iris dataset as an example)
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Define the parameter grid for GridSearchCV
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [1, 0.1, 0.01, 0.001],
    'kernel': ['linear', 'rbf', 'poly', 'sigmoid']
}

# Create SVM model
svm = SVC()

# Create GridSearchCV object
grid = GridSearchCV(svm, param_grid, refit=True, verbose=2, cv=5, n_jobs=-1)

# Fit the model
grid.fit(X_train, y_train)

# Best parameters found
print("Best Parameters:", grid.best_params_)

# Predictions using the best model
y_pred = grid.predict(X_test)

# Accuracy score
print("Test Accuracy:", accuracy_score(y_test, y_pred))

In [None]:
# 32 Write a Python program to train an SVM Classifier on an imbalanced dataset and apply class weighting and check it improve accuracy=

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report, accuracy_score
import numpy as np

# Step 1: Create an imbalanced dataset
X, y = make_classification(n_samples=2000, n_features=20, n_informative=3, n_redundant=1,
                            n_clusters_per_class=1, weights=[0.9], flip_y=0, random_state=42)

# Step 2: Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Step 3: Train SVM without class weights
svm_no_weights = SVC(kernel='rbf', random_state=42)
svm_no_weights.fit(X_train, y_train)
y_pred_no_weights = svm_no_weights.predict(X_test)

# Step 4: Train SVM with class weights
svm_with_weights = SVC(kernel='rbf', class_weight='balanced', random_state=42)
svm_with_weights.fit(X_train, y_train)
y_pred_with_weights = svm_with_weights.predict(X_test)

# Step 5: Results comparison
print("Without Class Weights:")
print("Accuracy:", accuracy_score(y_test, y_pred_no_weights))
print(classification_report(y_test, y_pred_no_weights))

print("\nWith Class Weights:")
print("Accuracy:", accuracy_score(y_test, y_pred_with_weights))
print(classification_report(y_test, y_pred_with_weights))

In [None]:
# 33 Write a Python program to implement a Naïve Bayes classifier for spam detection using email data=

# Naïve Bayes Spam Detection
# Using the SMS Spam Collection Dataset

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# ---- 1. Load dataset ----
# Dataset format:  label,text
# Example:
# spam,Free entry in 2 a wkly comp to win FA Cup final...
# ham,I'm going to be home soon and i don't want to talk about this stuff anymore...
url = "https://raw.githubusercontent.com/justmarkham/pycon-2016-tutorial/master/data/sms.tsv"
df = pd.read_csv(url, sep='\t', header=None, names=['label', 'message'])

# Encode labels: ham -> 0, spam -> 1
df['label'] = df['label'].map({'ham': 0, 'spam': 1})

# ---- 2. Split into train/test ----
X_train, X_test, y_train, y_test = train_test_split(
    df['message'], df['label'], test_size=0.2, random_state=42
)

# ---- 3. Convert text to numerical features ----
vectorizer = CountVectorizer(stop_words='english')
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)

# ---- 4. Train Naïve Bayes classifier ----
model = MultinomialNB()
model.fit(X_train_vec, y_train)

# ---- 5. Predictions ----
y_pred = model.predict(X_test_vec)

# ---- 6. Evaluation ----
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))


In [None]:
# 34 Write a Python program to train an SVM Classifier and a Naïve Bayes Classifier on the same dataset and compare their accuracy=

# Import libraries
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Train SVM Classifier
svm_clf = SVC(kernel='linear', random_state=42)
svm_clf.fit(X_train, y_train)
svm_preds = svm_clf.predict(X_test)

# Train Naïve Bayes Classifier
nb_clf = GaussianNB()
nb_clf.fit(X_train, y_train)
nb_preds = nb_clf.predict(X_test)

# Calculate accuracy
svm_accuracy = accuracy_score(y_test, svm_preds)
nb_accuracy = accuracy_score(y_test, nb_preds)

# Print results
print(f"SVM Classifier Accuracy: {svm_accuracy * 100:.2f}%")
print(f"Naïve Bayes Classifier Accuracy: {nb_accuracy * 100:.2f}%")

# Compare performance
if svm_accuracy > nb_accuracy:
    print("SVM performed better on this dataset.")
elif nb_accuracy > svm_accuracy:
    print("Naïve Bayes performed better on this dataset.")
else:
    print("Both classifiers performed equally well.")

In [None]:
# 35 Write a Python program to perform feature selection before training a Naïve Bayes classifier and compare results=

# Feature Selection with Naïve Bayes and Comparison
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_selection import SelectKBest, chi2
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# 1. Load the dataset
categories = ['rec.sport.baseball', 'rec.sport.hockey', 'sci.med', 'sci.space']
data = fetch_20newsgroups(subset='all', categories=categories, remove=('headers', 'footers', 'quotes'))

# 2. Vectorize the text data
vectorizer = CountVectorizer(stop_words='english')
X = vectorizer.fit_transform(data.data)
y = data.target

# 3. Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# ---------------- Without Feature Selection ----------------
nb_all = MultinomialNB()
nb_all.fit(X_train, y_train)
y_pred_all = nb_all.predict(X_test)
acc_all = accuracy_score(y_test, y_pred_all)

# ---------------- With Feature Selection ----------------
# Keep top 2000 features based on chi-square score
selector = SelectKBest(chi2, k=2000)
X_train_selected = selector.fit_transform(X_train, y_train)
X_test_selected = selector.transform(X_test)

nb_selected = MultinomialNB()
nb_selected.fit(X_train_selected, y_train)
y_pred_selected = nb_selected.predict(X_test_selected)
acc_selected = accuracy_score(y_test, y_pred_selected)

# ---------------- Results ----------------
print(f"Accuracy without feature selection: {acc_all:.4f}")
print(f"Accuracy with feature selection   : {acc_selected:.4f}")

In [None]:
# 36 Write a Python program to train an SVM Classifier using One-vs-Rest (OvR) and One-vs-One (OvO) strategies on the Wine dataset and compare their accuracy=

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier
from sklearn.metrics import accuracy_score

# Load the Wine dataset
wine = datasets.load_wine()
X, y = wine.data, wine.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

# One-vs-Rest (OvR) strategy
ovr_clf = OneVsRestClassifier(SVC(kernel='linear', random_state=42))
ovr_clf.fit(X_train, y_train)
ovr_preds = ovr_clf.predict(X_test)
ovr_acc = accuracy_score(y_test, ovr_preds)

# One-vs-One (OvO) strategy
ovo_clf = OneVsOneClassifier(SVC(kernel='linear', random_state=42))
ovo_clf.fit(X_train, y_train)
ovo_preds = ovo_clf.predict(X_test)
ovo_acc = accuracy_score(y_test, ovo_preds)

# Display results
print("SVM with One-vs-Rest (OvR) Accuracy:", ovr_acc)
print("SVM with One-vs-One (OvO) Accuracy:", ovo_acc)

# Which is better?
if ovr_acc > ovo_acc:
    print("OvR performed better.")
elif ovo_acc > ovr_acc:
    print("OvO performed better.")
else:
    print("Both performed equally.")

In [None]:
# 37 Write a Python program to train an SVM Classifier using Linear, Polynomial, and RBF kernels on the Breast Cancer dataset and compare their accuracy

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Load Breast Cancer dataset
cancer = datasets.load_breast_cancer()
X = cancer.data
y = cancer.target

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature scaling for better SVM performance
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Define kernels to test
kernels = ['linear', 'poly', 'rbf']
accuracies = {}

# Train and evaluate SVM for each kernel
for kernel in kernels:
    svm = SVC(kernel=kernel, random_state=42)
    svm.fit(X_train, y_train)
    y_pred = svm.predict(X_test)
    accuracies[kernel] = accuracy_score(y_test, y_pred)

# Display accuracies
print("SVM Kernel Comparison on Breast Cancer Dataset:")
for kernel, acc in accuracies.items():
    print(f"{kernel.capitalize()} Kernel Accuracy: {acc:.4f}")

In [None]:
# 38  Write a Python program to train an SVM Classifier using Stratified K-Fold Cross-Validation and compute the average accuracy=

from sklearn import datasets
from sklearn.svm import SVC
from sklearn.model_selection import StratifiedKFold, cross_val_score
import numpy as np

# Load dataset (Breast Cancer dataset as example)
data = datasets.load_breast_cancer()
X = data.data
y = data.target

# Define the SVM Classifier
svm_model = SVC(kernel='linear', random_state=42)

# Create Stratified K-Fold object
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# Perform cross-validation
scores = cross_val_score(svm_model, X, y, cv=skf, scoring='accuracy')

# Display results
print("Accuracy for each fold:", scores)
print("Average Accuracy: {:.4f}".format(np.mean(scores)))

In [None]:
# 39 Write a Python program to train a Naïve Bayes classifier using different prior probabilities and compare performance

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report
import numpy as np

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

# List of different prior probability settings
priors_list = [
    None,               # Let the model learn from data
    [0.5, 0.5],         # Equal priors
    [0.7, 0.3],         # Class 0 more probable
    [0.3, 0.7]          # Class 1 more probable
]

print("Comparing Naïve Bayes with Different Priors:\n")
for priors in priors_list:
    model = GaussianNB(priors=priors)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

    acc = accuracy_score(y_test, y_pred)
    print(f"Priors: {priors} -> Accuracy: {acc:.4f}")
    print(classification_report(y_test, y_pred, target_names=data.target_names))
    print("-" * 60)

In [None]:
# 40 Write a Python program to perform Recursive Feature Elimination (RFE) before training an SVM Classifier and compare accuracy=

from sklearn.datasets import load_breast_cancer
from sklearn.svm import SVC
from sklearn.feature_selection import RFE
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Train-Test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# -------------------
# Model with all features
# -------------------
svm = SVC(kernel="linear", random_state=42)
svm.fit(X_train, y_train)
y_pred_all = svm.predict(X_test)
accuracy_all = accuracy_score(y_test, y_pred_all)

# -------------------
# RFE for feature selection
# -------------------
n_features_to_select = 10  # You can tune this
rfe = RFE(estimator=SVC(kernel="linear"), n_features_to_select=n_features_to_select)
rfe.fit(X_train, y_train)

# Transform datasets
X_train_rfe = rfe.transform(X_train)
X_test_rfe = rfe.transform(X_test)

# Train model on selected features
svm_rfe = SVC(kernel="linear", random_state=42)
svm_rfe.fit(X_train_rfe, y_train)
y_pred_rfe = svm_rfe.predict(X_test_rfe)
accuracy_rfe = accuracy_score(y_test, y_pred_rfe)

# -------------------
# Compare results
# -------------------
print(f"Accuracy with all features ({X.shape[1]} features): {accuracy_all:.4f}")
print(f"Accuracy after RFE ({n_features_to_select} features): {accuracy_rfe:.4f}")

# Show selected features
selected_features = [data.feature_names[i] for i, selected in enumerate(rfe.support_) if selected]
print("\nSelected Features after RFE:")
print(selected_features)

In [None]:
# 41  Write a Python program to train an SVM Classifier and evaluate its performance using Precision, Recall, and F1-Score instead of accuracy=

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import precision_score, recall_score, f1_score, classification_report

# Load dataset
data = datasets.load_breast_cancer()
X, y = data.data, data.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train SVM classifier
svm_clf = SVC(kernel='rbf', random_state=42)
svm_clf.fit(X_train, y_train)

# Make predictions
y_pred = svm_clf.predict(X_test)

# Evaluate using Precision, Recall, and F1-Score
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

# Print results
print(f"Precision: {precision:.4f}")
print(f"Recall:    {recall:.4f}")
print(f"F1-Score:  {f1:.4f}")

# Detailed classification report
print("\nClassification Report:\n", classification_report(y_test, y_pred))

In [None]:
# 42 Write a Python program to train a Naïve Bayes Classifier and evaluate its performance using Log Loss (Cross-Entropy Loss)=

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import log_loss

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split dataset into training & testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Initialize Gaussian Naive Bayes model
nb_model = GaussianNB()

# Train the model
nb_model.fit(X_train, y_train)

# Predict probability scores for Log Loss calculation
y_proba = nb_model.predict_proba(X_test)

# Calculate Log Loss (Cross-Entropy Loss)
loss = log_loss(y_test, y_proba)

# Display result
print(f"Log Loss (Cross-Entropy Loss): {loss:.4f}")

In [None]:
# 43 Write a Python program to train an SVM Classifier and visualize the Confusion Matrix using seaborn

# Import libraries
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, accuracy_score

# Load dataset (Iris dataset as example)
iris = datasets.load_iris()
X = iris.data
y = iris.target

# Split dataset into training & testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train SVM Classifier
model = SVC(kernel='linear', random_state=42)
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Compute confusion matrix
cm = confusion_matrix(y_test, y_pred)

# Accuracy
acc = accuracy_score(y_test, y_pred)
print(f"Accuracy: {acc:.2f}")

# Plot confusion matrix using seaborn
plt.figure(figsize=(6,4))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues",
            xticklabels=iris.target_names,
            yticklabels=iris.target_names)
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.title("Confusion Matrix - SVM Classifier")
plt.show()

In [None]:
# 44 Write a Python program to train an SVM Regressor (SVR) and evaluate its performance using Mean Absolute Error (MAE) instead of MSE=

# SVM Regressor with MAE Evaluation
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error

# Load dataset (California Housing)
data = fetch_california_housing()
X, y = data.data, data.target

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature scaling (important for SVM)
scaler_X = StandardScaler()
scaler_y = StandardScaler()

X_train_scaled = scaler_X.fit_transform(X_train)
X_test_scaled = scaler_X.transform(X_test)

y_train_scaled = scaler_y.fit_transform(y_train.reshape(-1, 1)).ravel()

# Train the SVR model
svr = SVR(kernel='rbf', C=100, gamma=0.1)
svr.fit(X_train_scaled, y_train_scaled)

# Predict and inverse transform predictions
y_pred_scaled = svr.predict(X_test_scaled)
y_pred = scaler_y.inverse_transform(y_pred_scaled.reshape(-1, 1))

# Evaluate model using MAE
mae = mean_absolute_error(y_test, y_pred)
print(f"Mean Absolute Error (MAE): {mae:.4f}")

In [None]:
# 45 Write a Python program to train a Naïve Bayes classifier and evaluate its performance using the ROC-AUC score

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import roc_auc_score, roc_curve
import matplotlib.pyplot as plt

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split into train & test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train Naïve Bayes Classifier
model = GaussianNB()
model.fit(X_train, y_train)

# Predict probabilities
y_probs = model.predict_proba(X_test)[:, 1]  # Probability for class 1

# Calculate ROC-AUC score
roc_auc = roc_auc_score(y_test, y_probs)
print(f"ROC-AUC Score: {roc_auc:.4f}")

# Plot ROC curve
fpr, tpr, thresholds = roc_curve(y_test, y_probs)
plt.plot(fpr, tpr, label=f'Naïve Bayes (AUC = {roc_auc:.4f})')
plt.plot([0, 1], [0, 1], 'k--')  # Random guess line
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve - Naïve Bayes")
plt.legend()
plt.show()

In [None]:
# 46  Write a Python program to train an SVM Classifier and visualize the Precision-Recall Curve.

import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import precision_recall_curve, average_precision_score

# Load dataset (Breast Cancer dataset for binary classification)
data = datasets.load_breast_cancer()
X = data.data
y = data.target

# Split into train/test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Feature scaling
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train SVM classifier with probability estimates
svm = SVC(kernel='rbf', probability=True, random_state=42)
svm.fit(X_train, y_train)

# Get prediction probabilities
y_scores = svm.predict_proba(X_test)[:, 1]

# Compute Precision-Recall curve and average precision
precision, recall, thresholds = precision_recall_curve(y_test, y_scores)
avg_precision = average_precision_score(y_test, y_scores)

# Plot Precision-Recall curve
plt.figure(figsize=(8, 6))
plt.plot(recall, precision, label=f'AP = {avg_precision:.2f}')
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve for SVM Classifier')
plt.legend()
plt.grid(True)
plt.show()