# Question 1 : What is Information Gain, and how is it used in Decision Trees?
# answer
Information Gain (IG) measures how much “information” a feature gives about the target variable.
It is the reduction in entropy (uncertainty) achieved by splitting a dataset based on a feature.
Decision Trees use Information Gain to decide which feature to split on at each node — the feature with the highest IG is chosen.

High IG → good split (reduces uncertainty)

Low IG → poor split (less informative)

# Question 2: What is the difference between Gini Impurity and Entropy?
# Hint: Directly compares the two main impurity measures, highlighting strengths,weaknesses, and appropriate use cases.
# ANSWER
#                       Gini Impurity
. 1 −∑𝑝𝑖21−∑pi2  
. Measures probability of misclassification
. Faster computation
 #                        Entropy

  . −∑𝑝ilog⁡2(𝑝𝑖)−∑pilog2(pi)
. Measures level of information disorder
. When information theory understanding is needed
**Key Points:**
Both measure impurity (lower is better).
Gini is simpler and faster → used in CART (Classification and Regression Trees).
Entropy is used in ID3 and C4.5 decision tree algorithms.

# Question 3:What is Pre-Pruning in Decision Trees?
# ANSWER
Pre-pruning (also called early stopping) stops the growth of the Decision Tree before it becomes overly complex.
It sets conditions to halt splitting if:
A node’s information gain is below a threshold.
The number of samples in a node is too small.
The maximum depth of the tree is reached.
✅ Purpose: Prevent overfitting and improve generalization.

# Question 4:Write a Python program to train a Decision Tree Classifier using Gini Impurity as the criterion and print the feature importances (practical).
# ANSWER



In [1]:
# Decision Tree Classifier using Gini Impurity
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train model
clf = DecisionTreeClassifier(criterion='gini', random_state=42)
clf.fit(X_train, y_train)

# Predictions
y_pred = clf.predict(X_test)

# Output
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Feature Importances:")
for name, importance in zip(iris.feature_names, clf.feature_importances_):
    print(f"{name}: {importance:.3f}")


Accuracy: 1.0
Feature Importances:
sepal length (cm): 0.000
sepal width (cm): 0.019
petal length (cm): 0.893
petal width (cm): 0.088


# Question 5: What is a Support Vector Machine (SVM)?
# ANSWER
SVM is a supervised learning algorithm used for classification and regression.
It finds the optimal hyperplane that maximizes the margin (distance between classes).
The data points nearest to the hyperplane are called Support Vectors.

✅ Advantages:
>>Works well in high-dimensional spaces
>>Robust to overfitting in most cases

# Question 6: What is the Kernel Trick in SVM?
# ANSWER
The Kernel Trick allows SVM to perform classification on non-linearly separable data by transforming it into a higher-dimensional space without explicitly computing the transformation.
# Common Kernels:
**Linear:** Simple, when data is linearly separable
**Polynomial:** For curved decision boundaries
**RBF (Radial Basis Function):** Most popular for non-linear data

# Question 7: Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies.


In [2]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load data
data = load_wine()
X, y = data.data, data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train SVM with Linear kernel
svm_linear = SVC(kernel='linear', random_state=42)
svm_linear.fit(X_train, y_train)
linear_acc = accuracy_score(y_test, svm_linear.predict(X_test))

# Train SVM with RBF kernel
svm_rbf = SVC(kernel='rbf', random_state=42)
svm_rbf.fit(X_train, y_train)
rbf_acc = accuracy_score(y_test, svm_rbf.predict(X_test))

print("Linear Kernel Accuracy:", linear_acc)
print("RBF Kernel Accuracy:", rbf_acc)


Linear Kernel Accuracy: 0.9814814814814815
RBF Kernel Accuracy: 0.7592592592592593


# Question 8: What is the Naïve Bayes classifier, and why is it called "Naïve"?
# ANSWER
Naïve Bayes is a probabilistic classifier based on Bayes’ Theorem:

𝑃(𝐶∣𝑋)=𝑃(𝑋∣𝐶)⋅𝑃(𝐶)𝑃(𝑋)P(C∣X)=P(X)P(X∣C)⋅P(C)
	​
  It assumes all features are independent given the class — this is the “naïve” assumption, which simplifies computation but is rarely true in reality.

✅ Advantages:
>>Fast and efficient for large datasets
>>Works well for text classification and spam filtering

# Question 9: Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes
# ANSWER
# 1. Gaussian Naïve Bayes (GNB)
** Theoretical Basis: **
Assumes that features follow a normal (Gaussian) distribution within each class.
It calculates the probability of a feature value using the probability density function (PDF) of a Gaussian distribution.
𝑃(𝑥𝑖∣𝐶𝑘)=12𝜋𝜎𝑘2𝑒−(𝑥𝑖−𝜇𝑘)22𝜎𝑘2P(xi∣Ck2πσk21e−2σk2(xi−μk)2
	​
**Used For: **
Continuous features (real numbers like height, weight, temperature, etc.).
** Example:**
Classifying patients based on blood pressure and age.
**Advantages:**
Works well when data is continuous and roughly follows a bell-curve distribution.
**Limitations:**
Not suitable for text or count data, which is discrete.

# 2. Multinomial Naïve Bayes (MNB)

** Theoretical Basis:**
Assumes that features represent discrete counts or frequencies (how many times an event occurs).
It models the data using a multinomial distribution — suitable for count vectors.

𝑃(𝑋∣𝐶𝑘)=(∑𝑖𝑥𝑖)!∏𝑖𝑥𝑖!∏𝑖𝑃(𝑥𝑖∣𝐶𝑘)𝑥𝑖P(X∣Ck)=∏ixi!(∑ixi)!i∏P(xi∣Ck)x
	​
** Used For:**
Text classification, spam detection, document categorization, etc.
(e.g., number of times a word appears in a document).
** Example:**
Classifying emails as spam based on word frequencies.
** Advantages:**
Performs well on document classification and text mining tasks.
** Limitations:**
Not appropriate for continuous data — requires integer (count) inputs.

# 3. Bernoulli Naïve Bayes (BNB)

** Theoretical Basis:**
Assumes binary features — each feature can take only two values: 1 (present/true) or 0 (absent/false).
Uses the Bernoulli distribution to model feature probabilities.

𝑃(𝑥𝑖∣𝐶𝑘)=𝑃𝑖𝑥𝑖(1−𝑃𝑖)1−𝑥𝑖P(xi∣Ck​)=Pixi(1−Pi​)1−xi
** Used For:**
Binary/Boolean data, where features indicate presence or absence.
**Example:**
Whether a document contains a specific word (yes/no).
** Advantages:**
Works well for binary features and short text classification tasks.
**Limitations:**
Loses information about frequency — it only checks if a feature exists, not how many times.

# Question 10: Breast Cancer Dataset Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluate accuracy.


In [3]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train model
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Predictions
y_pred = gnb.predict(X_test)

# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))


Accuracy: 0.9415204678362573
