In [1]:
#1-What is a Support Vector Machine (SVM)?

#A Support Vector Machine (SVM) is a machine learning algorithm used for classification and regression.
# It finds the best boundary (hyperplane) that separates data into different classes. SVM works well for complex datasets, 
# even when data isn’t perfectly separable

In [2]:
#2-What is the difference between Hard Margin and Soft Margin SVM?

#Hard Margin SVM strictly separates classes with a clear boundary and works only when data is perfectly separable. 

# Soft Margin SVM allows some misclassification to handle overlapping data, making it more flexible for real-world problems



In [3]:
#3-What is the mathematical intuition behind SVM

#SVM finds the best boundary (hyperplane) that maximizes the margin between different classes.
 
# It uses mathematical optimization to find this hyperplane by minimizing errors and maximizing the distance from the closest data points (support vectors).

#  For non-linear data, it transforms the data using a kernel trick to make it separable

In [4]:
#4-What is the role of Lagrange Multipliers in SVM

#Lagrange Multipliers help SVM find the best hyperplane by converting the constrained optimization problem into an easier one to solve.
 
# They ensure that only the most important data points (support vectors) influence the decision boundary while keeping computations efficient

In [5]:
#5-What are Support Vectors in SVM
#Support Vectors are the key data points in SVM that are closest to the decision boundary.
#  They define the optimal hyperplane and help maximize the margin between classes. Removing them would change the boundary,
#  making them crucial for the model.

In [6]:
#6-What is a Support Vector Classifier (SVC)

# A Support Vector Classifier (SVC) is an SVM used for classification tasks. 
# It finds the best boundary (hyperplane) that separates different classes while allowing some misclassification (Soft Margin) for better flexibility.
#  SVC works well with both linear and non-linear data using kernel tricks.

In [7]:
#7-What is a Support Vector Regressor (SVR)

#A Support Vector Regressor (SVR) is an SVM used for regression tasks.
#  Instead of finding a boundary, it finds a line (or curve) that fits the data while keeping most points within a margin.
#  It’s useful for handling outliers and capturing complex patterns

In [8]:
#8-What is the Kernel Trick in SVM

#The Kernel Trick allows SVM to handle complex, non-linear data by transforming it into a higher-dimensional space where it becomes easier to separate.
#  Instead of manually computing this transformation, the kernel function does it efficiently, making SVM more powerful.

In [9]:
#9-Compare Linear Kernel, Polynomial Kernel, and RBF Kernel

#Linear Kernel: Used when data is linearly separable; fast and simple.
#Polynomial Kernel: Captures more complex patterns by mapping data into a higher-degree polynomial space.
#RBF (Radial Basis Function) Kernel: Best for highly non-linear data; it maps points into infinite-dimensional space for better separation.

In [1]:
#10-What is the effect of the C parameter in SVM

#The C parameter in SVM controls the trade-off between a smooth decision boundary and correctly classifying training points. 

# A high C tries to classify all points correctly (less margin, risk of overfitting), 
# while a low C allows some misclassification for a more general boundary (better generalization)

In [2]:
#11-What is the role of the Gamma parameter in RBF Kernel SVM

#The Gamma parameter in RBF Kernel SVM controls how far a single data point's influence reaches.
#  A high gamma makes the model focus on nearby points (risk of overfitting),
#  while a low gamma considers distant points, creating a smoother decision boundary (better generalization)

In [3]:
#12-What is the Naïve Bayes classifier, and why is it called "Naïve"

#Naïve Bayes is a simple and fast classification algorithm based on Bayes' theorem.
#  It assumes that all features are independent, which is rarely true in real life—this unrealistic assumption makes it "naïve."
#  Despite this, it works well for text classification and spam filtering

In [4]:
#13-What is Bayes’ Theorem

#Bayes' Theorem calculates the probability of an event based on prior knowledge of related events. It updates beliefs when new evidence appears and is widely used in machine learning and statistics. Formula:


In [5]:
#14-Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes

#Gaussian Naïve Bayes: Used for continuous data, assumes features follow a normal (Gaussian) distribution.

#Multinomial Naïve Bayes: Best for text data (e.g., word counts), works well for classification tasks like spam detection.

#Bernoulli Naïve Bayes: Used for binary data (0s and 1s), suitable for tasks like document classification based on word presence/absence

In [6]:
#15- When should you use Gaussian Naïve Bayes over other variants

#Use Gaussian Naïve Bayes when your features are continuous and follow a normal (Gaussian) distribution. 
# It works well for datasets with numerical values like age, height, or test scores. 
# If your data is categorical or text-based, other variants like Multinomial or Bernoulli Naïve Bayes are better choice

In [7]:
#16-What are the key assumptions made by Naïve Bayes

#Naïve Bayes makes two key assumptions:

#Feature Independence – It assumes that all features are independent and do not affect each other, which is rarely true but simplifies calculations.

#Class Conditional Independence – Given the class label, the features are assumed to be independent of each other.

#Despite these assumptions, Naïve Bayes works well in many real-world applications like spam filtering and sentiment analysis.

In [8]:
#17-What are the advantages and disadvantages of Naïve Bayes

#dvantages:
#Fast, simple, and works well with small or large data.
#Performs great on text classification (e.g., spam filtering).
#Can handle missing data and requires less training time.

#Disadvantages:
#Assumes features are independent, which is often unrealistic.
#Struggles with highly correlated features.
#Zero probability issue if a category is missing (fixed by smoothing

In [9]:
#18- Why is Naïve Bayes a good choice for text classification

#Naïve Bayes is great for text classification because:

#Handles High-Dimensional Data – Works well with large vocabularies in text data.

#Fast and Efficient – Requires less training time, even on big datasets.

#Performs Well with Sparse Data – Effective for documents with many zero values (e.g., word counts).

#Works Well with Probabilities – Assigns likelihood scores to different categories, improving classification.

#It's widely used for spam detection, sentiment analysis, and topic categorization

In [10]:
#19-Compare SVM and Naïve Bayes for classification tasks

#SVM: Best for complex, high-dimensional data, but slower.

#Naïve Bayes: Fast and great for text classification.

#SVM: Works well with non-linear patterns using kernel tricks.

#Naïve Bayes: Assumes feature independence, which is often unrealistic.

#SVM: More robust to noise, but computationally expensive.

#Naïve Bayes: Handles sparse data well, like word counts in text.

In [11]:
#20-How does Laplace Smoothing help in Naïve Bayes

#Laplace Smoothing prevents zero probability issues in Naïve Bayes when a category is missing in the training data.
#  It adds a small value (usually 1) to all counts,
#  ensuring every word or feature has a nonzero probability. This helps improve model accuracy, especially for text classification.

In [12]:
#Practical

In [15]:
#21-Write a Python program to train an SVM Classifier on the Iris dataset and evaluate accuracy

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)




In [16]:
# Create and train the SVM classifier
svm_classifier = SVC(kernel='linear')  # You can change kernel to 'rbf', 'poly', etc.
svm_classifier.fit(X_train, y_train)

# Make predictions
y_pred = svm_classifier.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')


Accuracy: 1.00


In [19]:
#22-Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then
#compare their accuracies

#from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Wine dataset
wine = datasets.load_wine()
X, y = wine.data, wine.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)




In [20]:
# Train SVM classifier with Linear kernel
svm_linear = SVC(kernel='linear')
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)
accuracy_linear = accuracy_score(y_test, y_pred_linear)

# Train SVM classifier with RBF kernel
svm_rbf = SVC(kernel='rbf')
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)
accuracy_rbf = accuracy_score(y_test, y_pred_rbf)

# Compare accuracies
print(f'Linear Kernel Accuracy: {accuracy_linear:.2f}')
print(f'RBF Kernel Accuracy: {accuracy_rbf:.2f}')

Linear Kernel Accuracy: 0.94
RBF Kernel Accuracy: 0.64


In [21]:
#23-Write a Python program to train an SVM Regressor (SVR) on a housing dataset and evaluate it using Mean
#Squared Error (MSE)

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error

# Load the California housing dataset
housing = datasets.fetch_california_housing()
X, y = housing.data, housing.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [22]:

# Train SVM Regressor (SVR)
svr_model = SVR(kernel='rbf')  # You can change the kernel to 'linear', 'poly', etc.
svr_model.fit(X_train, y_train)

# Make predictions
y_pred = svr_model.predict(X_test)

# Evaluate using Mean Squared Error (MSE)
mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse:.2f}')


Mean Squared Error: 1.33


In [None]:
#24-%: Write a Python program to train an SVM Classifier with a Polynomial Kernel and visualize the decision
#boundary

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from mlxtend.plotting import plot_decision_regions

# Load a toy dataset (for easy visualization)
X, y = datasets.make_moons(n_samples=200, noise=0.2, random_state=42)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)




In [None]:
# Train SVM classifier with Polynomial kernel
svm_poly = SVC(kernel='poly', degree=3)
svm_poly.fit(X_train, y_train)

# Visualize the decision boundary
plt.figure(figsize=(8, 6))
plot_decision_regions(X, y, clf=svm_poly, legend=2)
plt.title("SVM with Polynomial Kernel")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()

In [None]:
#25-Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and
#evaluate accuracy:

#from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
cancer = datasets.load_breast_cancer()
X, y = cancer.data, cancer.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)



In [28]:
# Train Gaussian Naïve Bayes classifier
nb_classifier = GaussianNB()
nb_classifier.fit(X_train, y_train)

# Make predictions
y_pred = nb_classifier.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')


Accuracy: 0.97


In [None]:
#26-Write a Python program to train a Multinomial Naïve Bayes classifier for text classification using the 20
#Newsgroups dataset.

from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score

# Load the 20 Newsgroups dataset
newsgroups = fetch_20newsgroups(subset='all', categories=['rec.sport.baseball', 'sci.space', 'comp.graphics'], remove=('headers', 'footers', 'quotes'))
X, y = newsgroups.data, newsgroups.target

# Convert text data into numerical features
vectorizer = CountVectorizer()
X_counts = vectorizer.fit_transform(X)
tfidf_transformer = TfidfTransformer()
X_tfidf = tfidf_transformer.fit_transform(X_counts)


In [30]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_tfidf, y, test_size=0.2, random_state=42)

# Train Multinomial Naïve Bayes classifier
nb_classifier = MultinomialNB()
nb_classifier.fit(X_train, y_train)

# Make predictions
y_pred = nb_classifier.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')


NameError: name 'X_tfidf' is not defined

In [31]:
#27- Write a Python program to train an SVM Classifier with different C values and compare the decision
#boundaries visually
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from mlxtend.plotting import plot_decision_regions

# Load a toy dataset
X, y = datasets.make_moons(n_samples=200, noise=0.2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)




ModuleNotFoundError: No module named 'mlxtend'

In [None]:
# Different C values to test
C_values = [0.1, 1, 10]

plt.figure(figsize=(12, 4))
for i, C in enumerate(C_values, 1):
    # Train SVM classifier with different C values
    svm_model = SVC(kernel='linear', C=C)
    svm_model.fit(X_train, y_train)
    
    # Plot decision boundary
    plt.subplot(1, 3, i)
    plot_decision_regions(X, y, clf=svm_model, legend=2)
    plt.title(f'SVM with C={C}')
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')

plt.tight_layout()
plt.show()

In [32]:
#28-= Write a Python program to train a Bernoulli Naïve Bayes classifier for binary classification on a dataset with
#binary features
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB
from sklearn.metrics import accuracy_score
import numpy as np

# Create a synthetic dataset with binary features
X, y = make_classification(n_samples=500, n_features=10, n_informative=5, n_redundant=0, random_state=42)
X = np.where(X > 0, 1, 0)  # Convert features to binary values (0 or 1)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Bernoulli Naïve Bayes classifier
bnb_classifier = BernoulliNB()
bnb_classifier.fit(X_train, y_train)

# Make predictions
y_pred = bnb_classifier.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')


Accuracy: 0.88


In [33]:
#29-Write a Python program to apply feature scaling before training an SVM model and compare results with
#unscaled data
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target



In [34]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train SVM without feature scaling
svm_unscaled = SVC(kernel='linear')
svm_unscaled.fit(X_train, y_train)
y_pred_unscaled = svm_unscaled.predict(X_test)
unscaled_accuracy = accuracy_score(y_test, y_pred_unscaled)

# Apply feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train SVM with feature scaling
svm_scaled = SVC(kernel='linear')
svm_scaled.fit(X_train_scaled, y_train)
y_pred_scaled = svm_scaled.predict(X_test_scaled)
scaled_accuracy = accuracy_score(y_test, y_pred_scaled)

# Compare results
print(f'Accuracy without scaling: {unscaled_accuracy:.2f}')
print(f'Accuracy with scaling: {scaled_accuracy:.2f}')

Accuracy without scaling: 1.00
Accuracy with scaling: 0.97


In [35]:
#30-= Write a Python program to train a Gaussian Naïve Bayes model and compare the predictions before and
#after Laplace Smoothing
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load the Iris dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Gaussian Naïve Bayes without Laplace Smoothing
gnb_no_smoothing = GaussianNB(var_smoothing=0)  # Essentially no smoothing
gnb_no_smoothing.fit(X_train, y_train)
y_pred_no_smoothing = gnb_no_smoothing.predict(X_test)
accuracy_no_smoothing = accuracy_score(y_test, y_pred_no_smoothing)

# Train Gaussian Naïve Bayes with default Laplace Smoothing
gnb_smoothing = GaussianNB()  # Default var_smoothing
gnb_smoothing.fit(X_train, y_train)
y_pred_smoothing = gnb_smoothing.predict(X_test)
accuracy_smoothing = accuracy_score(y_test, y_pred_smoothing)

# Compare results
print(f'Accuracy without Laplace Smoothing: {accuracy_no_smoothing:.2f}')
print(f'Accuracy with Laplace Smoothing: {accuracy_smoothing:.2f}')


Accuracy without Laplace Smoothing: 1.00
Accuracy with Laplace Smoothing: 1.00
