# Supervised Classification: Decision Trees, SVM, and Naive Bayes| **Assignment**

**Instructions:** Carefully read each question. Use Google Docs, Microsoft Word, or a similar tool to create a document where you type out each question along with its answer. Save the document as a PDF, and then upload it to the LMS. Please do not zip or archive the files before uploading them. Each question carries 20 marks.

**Question 1 :** What is Information Gain, and how is it used in Decision Trees?

**Answer:** Information Gain is a measure used in Decision Trees to decide which feature (or attribute) to split on at each step of building the tree. It helps determine how well a particular feature separates the training examples according to their target classes.

**Question 2:** What is the difference between Gini Impurity and Entropy?

Hint: Directly compares the two main impurity measures, highlighting strengths, weaknesses, and appropriate use cases.

**Answer: comparison of Gini Impurity and Entropy, the two most common impurity measures

**Entropy:** Measures the amount of disorder or uncertainty in the data.

**Gini Impurity:** Measures the probability of incorrectly classifying a randomly chosen sample.

# Interpretation

Entropy comes from information theory — it quantifies how much “information” is needed to describe the dataset.

Gini Impurity comes from probability theory — it measures how often a randomly chosen sample would be misclassified if it were labeled randomly based on the class distribution.


**Question 3:** What is Pre-Pruning in Decision Trees?

**Answer:** Pre-pruning (also called early stopping) is a technique used to stop the growth of a Decision Tree early—before it becomes too complex and starts overfitting the training data.

# 1. Concept

Instead of letting the tree grow fully and then trimming it, pre-pruning prevents unnecessary splits during the tree-building process.

The idea is to stop splitting a node if the split doesn’t provide a significant improvement in prediction accuracy or information gain.

# 2. Common Pre-Pruning Criteria

- A tree may stop splitting when:

- The information gain (or impurity reduction) is below a threshold.

- The number of samples in a node is too small.

- The maximum depth of the tree is reached.

- The accuracy improvement after a split is minimal.

**Question 4:** Write a Python program to train a Decision Tree Classifier using Gini Impurity as the criterion and print the feature importances (practical).

Hint: Use criterion='gini' in DecisionTreeClassifier and access  feature_importances_.

(Include your Python code and output in the code box below.)

**Answer:**

In [None]:
# Import required libraries
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Create and train the Decision Tree Classifier using Gini Impurity
clf = DecisionTreeClassifier(criterion='gini', random_state=42)
clf.fit(X, y)

# Print the feature importances
print("Feature Importances:")
for name, importance in zip(iris.feature_names, clf.feature_importances_):
    print(f"{name}: {importance:.4f}")


**Question 5:** What is a Support Vector Machine (SVM)?


**ANswer:** A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and sometimes regression tasks. It works by finding the best boundary (or hyperplane) that separates different classes in the data.

**Question 6:** What is the Kernel Trick in SVM?

**Answer:** The Kernel Trick in Support Vector Machines (SVM) is a mathematical technique that allows the algorithm to handle non-linear data by transforming it into a higher-dimensional space—without explicitly computing that transformation.

**Question 7:** Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies.

Hint:Use SVC(kernel='linear') and SVC(kernel='rbf'), then compare accuracy scores after fitting on the same dataset.

(Include your Python code and output in the code box below.)

**Answer:**

In [None]:
# Import required libraries
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Wine dataset
wine = load_wine()
X = wine.data
y = wine.target

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train SVM with Linear kernel
svm_linear = SVC(kernel='linear', random_state=42)
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)
accuracy_linear = accuracy_score(y_test, y_pred_linear)

# Train SVM with RBF kernel
svm_rbf = SVC(kernel='rbf', random_state=42)
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)
accuracy_rbf = accuracy_score(y_test, y_pred_rbf)

# Print the accuracies
print(f"Accuracy with Linear Kernel: {accuracy_linear:.4f}")
print(f"Accuracy with RBF Kernel: {accuracy_rbf:.4f}")


**Question 8:** What is the Naïve Bayes classifier, and why is it called "Naïve"?

**ANswer:** The Naïve Bayes classifier is a probabilistic machine learning algorithm used for classification tasks. It is based on Bayes’ Theorem, which calculates the probability of a class given the observed features.

# 1. Bayes’ Theorem
**𝑃
(
𝐶
∣
𝑋
)
=𝑃
(
𝑋
∣
𝐶
)
⋅
𝑃
(
𝐶
)
𝑃
(
𝑋
)**

# 2. Why is it “Naïve”?

It is called naïve because it assumes all features are independent of each other, given the class label. In real-world data, features are often correlated, so this assumption is “naïve.”

Despite this strong assumption, Naïve Bayes often works surprisingly well in practice, especially for text classification (like spam detection or sentiment analysis).

**Question 9:** Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes

**Answer:**
# 1. Gaussian Naïve Bayes

- Use case: Continuous numerical data (e.g., height, weight, temperature).

- Assumption: Features follow a Gaussian (normal) distribution.

- How it works: Estimates the likelihood of a feature using the Gaussian probability density function:

𝑃
(
𝑥
𝑖
∣
𝐶
)
=1
2
𝜋
𝜎
𝐶
2
exp
⁡
(
−
(
𝑥
𝑖
−
𝜇
𝐶
)
2
2
𝜎
𝐶
2
)
- Example: Predicting whether a person has a disease based on continuous measurements like blood pressure or cholesterol levels.

# 2. Multinomial Naïve Bayes

- Use case: Discrete count data, commonly used for text classification.

- Assumption: Features represent counts or frequencies (like word counts in a document).

- How it works: Computes probability based on the frequency of each feature in the class.

- Example: Classifying emails as spam or not based on word counts.

# 3. Bernoulli Naïve Bayes

- Use case: Binary/Boolean features (0 or 1).

- Assumption: Features are present or absent, not counts.

- How it works: Models each feature as a Bernoulli random variable:

𝑃
(
𝑥
𝑖
∣
𝐶
)
=𝑝
𝑖
𝑥
𝑖
(
1
−
𝑝
𝑖
)
1
−
𝑥
𝑖

- Example: Spam detection using binary indicators like “contains word ‘free’ or not.”

**Question 10:** **Breast Cancer Dataset** Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluate accuracy.

Hint:Use GaussianNB() from sklearn.naive_bayes and the Breast Cancer dataset from sklearn.datasets.

(Include your Python code and output in the code box below.)

**Answer:**

In [None]:
# Import required libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create and train Gaussian Naive Bayes classifier
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Make predictions
y_pred = gnb.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of Gaussian Naive Bayes on Breast Cancer dataset: {accuracy:.4f}")
