
SVM & Naive Bayes



1 : What is Information Gain, and how is it used in Decision Trees?

What is Information Gain?

Information Gain (IG) measures how much uncertainty (entropy) is reduced after splitting a dataset based on a feature.

 In simple words:

Information Gain tells us which feature is the best to split the data at each node of a Decision Tree.
The feature with the highest Information Gain is chosen for the split.

 Why is Information Gain Needed?

When building a decision tree:

We want to create pure nodes (where data mostly belongs to one class).
Information Gain helps select the most informative feature that best separates the classes.

 Key Concepts
1️ Entropy

Entropy measures the impurity or randomness in a dataset.

How Information Gain is Used in Decision Trees:-

Start with the full dataset.
Calculate entropy of the target variable.

For each feature:

Split the data.
Compute entropy after the split.
Calculate Information Gain.
Choose the feature with maximum Information Gain.

Repeat recursively until:

Nodes are pure, or
Stopping criteria are met (depth, minimum samples).

 Limitations of Information Gain

Biased toward features with many unique values (e.g., ID column).
Can lead to overfitting.

 That’s why algorithms like C4.5 use Gain Ratio instead.

2: What is the difference between Gini Impurity and Entropy?

ini Impurity vs Entropy

Both Gini Impurity and Entropy are impurity measures used by decision tree algorithms to decide the best split at each node.

 Definitions:-
Gini Impurity

Measures the probability of incorrect classification of a randomly chosen element.
Used by CART (Classification and Regression Trees).

Entropy

Measures the amount of uncertainty or randomness in the data.
Used by ID3 and C4.5 algorithms.

Gini Impurity

Strengths

Faster to compute
Works well for large datasets
Default choice in many libraries (e.g., sklearn)

Weaknesses

Slightly less informative for complex distributions
Less sensitive to class imbalance

Entropy

Strengths

More theoretically sound (information theory)
More sensitive to class purity
Often produces more informative splits

Weaknesses

Computationally expensive
Can lead to deeper trees (overfitting risk)



3:What is Pre-Pruning in Decision Trees?



Pre-Pruning (also called Early Stopping) is a technique where the growth of a decision tree is stopped early during training to prevent the tree from becoming too complex.

 In simple words:

The tree is not allowed to grow fully; splitting stops when certain conditions are met.

 Why Pre-Pruning is Needed

Decision Trees can:

Grow very deep
Learn noise from training data
Perform poorly on unseen data (overfitting)

Pre-Pruning helps:
✔ Reduce overfitting
✔ Improve generalization
✔ Reduce training time
✔ Create simpler, interpretable trees

 How Pre-Pruning Works

Before making a split at any node, the algorithm checks predefined stopping criteria.
If the criteria are not satisfied, the node becomes a leaf.

 Common Pre-Pruning Criteria

Maximum Tree Depth
 Stop splitting if the tree reaches a fixed depth.
Minimum Samples per Split
 Require a minimum number of samples to split a node.
Minimum Samples per Leaf
 Ensure each leaf has enough data points.
Minimum Impurity Decrease
 Split only if impurity reduction (Gini/Entropy) is above a threshold.
Maximum Number of Leaf Nodes
 Limit the total number of leaf nodes.

Limitations of Pre-Pruning

May stop too early
Risk of underfitting
Choosing optimal thresholds can be tricky

4:Write a Python program to train a Decision Tree Classifier using Gini Impurity as the criterion and print the feature importances (practical).

In [1]:
# Import required libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
import pandas as pd

# 1. Load dataset
iris = load_iris()
X = iris.data
y = iris.target

# Convert to DataFrame for better readability
feature_names = iris.feature_names
X_df = pd.DataFrame(X, columns=feature_names)

# 2. Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X_df, y, test_size=0.2, random_state=42
)

# 3. Train Decision Tree using Gini Impurity
model = DecisionTreeClassifier(
    criterion='gini',
    random_state=42
)

model.fit(X_train, y_train)

# 4. Print feature importances
print("Feature Importances (Gini Impurity):")
for feature, importance in zip(feature_names, model.feature_importances_):
    print(f"{feature}: {importance:.4f}")


Feature Importances (Gini Impurity):
sepal length (cm): 0.0000
sepal width (cm): 0.0167
petal length (cm): 0.9061
petal width (cm): 0.0772


5: What is a Support Vector Machine (SVM)?

A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression tasks.
Its main goal is to find the optimal decision boundary (hyperplane) that best separates different classes in the data.

 Key Idea Behind SVM

 SVM tries to:

Separate data points into classes
Maximize the margin between the closest data points of each class
These closest points are called Support Vectors.

 Core Concepts
1 Hyperplane

A line (2D), plane (3D), or higher-dimensional boundary
Separates data points of different classes

2️ Margin

Distance between the hyperplane and the nearest data points
SVM chooses the hyperplane with the maximum margin

3️ Support Vectors

Data points closest to the hyperplane
They define the position of the hyperplane

 Types of SVM::-
 Linear SVM:

Used when data is linearly separable
Decision boundary is a straight line

 Non-Linear SVM:

Used when data is not linearly separable
Uses kernel functions to map data into higher dimensions

Advantages of SVM

- Effective in high-dimensional spaces
- Works well with small and medium datasets
- Robust to overfitting (with proper kernel & C value)

 Disadvantages of SVM

- Computationally expensive for large datasets
- Sensitive to choice of kernel and parameters
- Harder to interpret than Decision Trees

Example::-


In [2]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

# Load dataset
data = datasets.load_iris()
X, y = data.data, data.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train SVM model
svm_model = SVC(kernel='linear')
svm_model.fit(X_train, y_train)

print("SVM model trained successfully")


SVM model trained successfully


6: What is the Kernel Trick in SVM?

The Kernel Trick is a technique used in Support Vector Machines (SVM) that allows the algorithm to separate non-linearly separable data by implicitly mapping it into a higher-dimensional space, without explicitly computing the transformation.

 Why the Kernel Trick is Needed ?

Some datasets cannot be separated by a straight line in their original feature space.

 Example:

XOR problem
Circular or spiral data patterns
Instead of manually transforming features, SVM uses a kernel function to compute inner products in a higher-dimensional space efficiently.

 Key Idea (Simple Explanation)

The kernel trick lets SVM draw non-linear decision boundaries in the original space by performing linear separation in a higher-dimensional space.

 How It Works (Conceptually)

Data is mapped from input space → higher-dimensional feature space
A linear hyperplane is found in that space
When projected back, the boundary appears non-linear

Advantages of Kernel Trick:

 Handles non-linear data efficiently
 No need to explicitly compute higher-dimensional features
 Powerful and flexible decision boundaries

 Limitations:

 Kernel choice is problem-dependent
 Can be computationally expensive for large datasets
 Harder to interpret

Example::-

In [3]:
#Example::-
from sklearn.svm import SVC
from sklearn.datasets import make_circles
from sklearn.model_selection import train_test_split

# Create non-linear data
X, y = make_circles(n_samples=200, noise=0.1, factor=0.2)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train SVM with RBF kernel
svm_model = SVC(kernel='rbf', gamma='scale')
svm_model.fit(X_train, y_train)

print("SVM trained using Kernel Trick (RBF kernel)")


SVM trained using Kernel Trick (RBF kernel)


7: Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies.

In [4]:
# Import required libraries
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# 1. Load Wine dataset
wine = load_wine()
X = wine.data
y = wine.target

# 2. Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# 3. Train SVM with Linear kernel
svm_linear = SVC(kernel='linear', random_state=42)
svm_linear.fit(X_train, y_train)

# 4. Train SVM with RBF kernel
svm_rbf = SVC(kernel='rbf', random_state=42)
svm_rbf.fit(X_train, y_train)

# 5. Make predictions
y_pred_linear = svm_linear.predict(X_test)
y_pred_rbf = svm_rbf.predict(X_test)

# 6. Calculate accuracies
accuracy_linear = accuracy_score(y_test, y_pred_linear)
accuracy_rbf = accuracy_score(y_test, y_pred_rbf)

# 7. Print results
print("Accuracy with Linear Kernel SVM:", accuracy_linear)
print("Accuracy with RBF Kernel SVM:", accuracy_rbf)


Accuracy with Linear Kernel SVM: 1.0
Accuracy with RBF Kernel SVM: 0.8055555555555556


8: What is the Naïve Bayes classifier, and why is it called "Naïve"?

What is Naïve Bayes?

The Naïve Bayes classifier is a supervised machine learning algorithm based on Bayes’ Theorem.
It is mainly used for classification tasks, especially in text classification, spam detection, and sentiment analysis.

It predicts the class of a data point by calculating the posterior probability for each class and choosing the class with the highest probability.

Why is it Called “Naïve”?

It is called “Naïve” because of its strong assumption:

 All features are conditionally independent given the class label

This assumption is usually not true in real-world data, but surprisingly, the algorithm still works very well in practice.

Key Assumptions:

Features are independent of each other
Each feature contributes equally and independently to the outcome
No correlation among predictors

Advantages

 Simple and fast
 Works well with high-dimensional data
 Performs well even with small datasets
 Very effective for text classification

 Disadvantages

 Strong independence assumption
 Poor performance if features are highly correlated
 Zero probability problem (handled using smoothing)

 Example:

In [5]:
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load dataset
data = load_iris()
X, y = data.data, data.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train Naïve Bayes model
model = GaussianNB()
model.fit(X_train, y_train)

print("Naïve Bayes model trained successfully")


Naïve Bayes model trained successfully


10: Breast Cancer Dataset
Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer
dataset and evaluate accuracy.

In [6]:
# Import required libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# 1. Load Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# 2. Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# 3. Train Gaussian Naïve Bayes model
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# 4. Make predictions
y_pred = gnb.predict(X_test)

# 5. Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)

print("Accuracy of Gaussian Naïve Bayes Classifier:", accuracy)


Accuracy of Gaussian Naïve Bayes Classifier: 0.9736842105263158
