Question 1 : What is Information Gain, and how is it used in Decision Trees?

Information Gain is a metric used in decision trees to determine which feature provides the most useful split at each step. It measures how much uncertainty (called entropy) in the target variable is reduced after splitting the data based on a particular feature.

A decision tree calculates the entropy of the parent node, then evaluates how the data is divided into child nodes for each feature. The difference between the parent entropy and the weighted entropy of the children is called Information Gain.

The feature that results in the highest Information Gain is chosen for the split because it produces the purest and most informative partitions of the data. This process continues recursively, helping the decision tree build a structure that best separates the classes and improves prediction accuracy.

Question 2: What is the difference between Gini Impurity and Entropy?
Hint: Directly compares the two main impurity measures, highlighting strengths,
weaknesses, and appropriate use cases.

1. Meaning

Gini Impurity: Measures how often a sample would be misclassified if labels were
assigned randomly according to class proportions.

Entropy: Measures the level of uncertainty or randomness in the node.


2. Strengths

Gini Impurity

Faster to compute (no log function).

Often produces slightly purer splits.

Works well when classes are imbalanced.

Entropy

More theoretically grounded (Information Theory).

More sensitive to probability changes when classes are evenly split (50–50).

Good for maximizing Information Gain.

3. Weaknesses

Gini Impurity

Less interpretable from an information-theory perspective.

May be biased toward features with many categories.

Entropy

Slightly slower due to logarithms.

Can exaggerate uncertainty in near-balanced splits.

4. When to Use Each (Use Cases)

Use Gini Impurity when:

You want faster training (default in CART/scikit-learn).

Simplicity and performance matter more than theory.

Use Entropy when:

You want splits based strictly on Information Gain (ID3, C4.5).

You want a purer theoretical understanding of node uncertainty.

Question 3:What is Pre-Pruning in Decision Trees?

Pre-pruning, also called early stopping, is a technique used to stop a decision tree from growing too deep and becoming overly complex.

 Instead of allowing the tree to fully grow and then trimming it, pre-pruning puts rules in place during tree construction to decide when to stop splitting a node.

It prevents overfitting by stopping the algorithm early when further splits are unlikely to improve model performance.

In [2]:
'''
Question 4:Write a Python program to train a Decision Tree Classifier using Gini
Impurity as the criterion and print the feature importances (practical).
Hint: Use criterion='gini' in DecisionTreeClassifier and access .feature_importances_.
(Include your Python code and output in the code box below.)
'''

# Decision Tree Classifier using Gini Impurity
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
import pandas as pd

# Load dataset (Iris dataset for demonstration)
iris = load_iris()
X = iris.data
y = iris.target
feature_names = iris.feature_names

# Create and train the model using Gini Impurity
clf = DecisionTreeClassifier(criterion='gini', random_state=42)
clf.fit(X, y)

# Print feature importances
importances = clf.feature_importances_

# Display results clearly
for name, importance in zip(feature_names, importances):
    print(f"{name}: {importance:.4f}")


sepal length (cm): 0.0133
sepal width (cm): 0.0000
petal length (cm): 0.5641
petal width (cm): 0.4226


Question 5: What is a Support Vector Machine (SVM)?

A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression, but it is mainly popular for classification tasks.

SVM works by finding the best possible boundary (called a hyperplane) that separates different classes in the data.

It chooses this boundary in such a way that the margin — the distance between the hyperplane and the nearest data points — is as large as possible.

These closest data points are called support vectors, and they determine the position of the boundary.

Question 6: What is the Kernel Trick in SVM?

The Kernel Trick is a smart shortcut used in SVM to deal with data that cannot be separated with a straight line.

Imagine your data is mixed up in 2D and you cannot draw a line between the classes.
But if you could lift the data into 3D, suddenly it becomes separable.

Instead of actually increasing dimensions (which is slow and complicated), the kernel trick pretends to do this using a mathematical function.
It lets SVM draw complex, curved boundaries without doing extra heavy calculations.

In [3]:
'''
Question 7: Write a Python program to train two SVM classifiers with Linear and RBF
kernels on the Wine dataset, then compare their accuracies.
Hint:Use SVC(kernel='linear') and SVC(kernel='rbf'), then compare accuracy scores after fitting
on the same dataset.
(Include your Python code and output in the code box below.)
'''
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load and split data
X, y = load_wine(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train both models
linear_clf = SVC(kernel='linear').fit(X_train, y_train)
rbf_clf = SVC(kernel='rbf').fit(X_train, y_train)

# Print accuracy scores
print("Linear Kernel Accuracy:", accuracy_score(y_test, linear_clf.predict(X_test)))
print("RBF Kernel Accuracy   :", accuracy_score(y_test, rbf_clf.predict(X_test)))


Linear Kernel Accuracy: 0.9814814814814815
RBF Kernel Accuracy   : 0.7592592592592593


Question 8: What is the Naïve Bayes classifier, and why is it called "Naïve"?

The Naïve Bayes classifier is a probabilistic machine learning algorithm that uses Bayes’ Theorem to predict the class of a given data point.

It is widely used in real-world applications like spam email detection, sentiment analysis, document classification, and recommendation systems because it is simple, fast, and effective, even with large datasets.

Why is it called “Naïve”?

It is called naïve because it makes a strong assumption that all features are independent of each other. In reality, features are often correlated, but Naïve Bayes ignores these relationships to simplify calculations.

Question 9: Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve
Bayes, and Bernoulli Naïve Bayes

1. Gaussian Naïve Bayes (GNB)

Data Type: Continuous (numeric) data.

Assumption: Features follow a normal (Gaussian) distribution.

Example Use Case: Predicting a person’s risk based on continuous measurements like height, weight, or blood pressure.

How it works: Calculates probabilities using the Gaussian (bell-curve) formula.

2. Multinomial Naïve Bayes (MNB)

Data Type: Discrete count data (frequencies).

Assumption: Features represent counts of events (like word counts in text).

Example Use Case: Text classification such as spam detection or document categorization.

How it works: Calculates probabilities based on the frequency of each feature in each class.

3. Bernoulli Naïve Bayes (BNB)

Data Type: Binary features (0 or 1, yes/no, present/absent).

Assumption: Features follow a Bernoulli distribution.

Example Use Case: Text classification where we only care about whether a word appears or not, not how many times.

How it works: Uses presence/absence of features to calculate probabilities.



In [4]:
'''
Question 10: Breast Cancer Dataset
Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer
dataset and evaluate accuracy.
Hint:Use GaussianNB() from sklearn.naive_bayes and the Breast Cancer dataset from
sklearn.datasets.
(Include your Python code and output in the code box below.)
'''

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score


data = load_breast_cancer()
X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


gnb = GaussianNB()
gnb.fit(X_train, y_train)

y_pred = gnb.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy of Gaussian Naive Bayes:", round(accuracy, 4))


Accuracy of Gaussian Naive Bayes: 0.9415
