1. What is Information Gain, and how is it used in Decision Trees?


Ans:Information Gain in Decision Trees


Information Gain is the primary metric used to determine which feature should be used to split data at each node of a decision tree. It measures the reduction in Entropy (randomness) achieved by partitioning a dataset based on a specific attribute.


How It Is Used

 1. Calculate Parent Entropy: The algorithm first calculates the entropy of the current dataset (the "parent" node) to determine how mixed the target classes are.
 2. Evaluate Potential Splits: For every available feature, the algorithm simulates a split and calculates the weighted average entropy of the resulting "child" nodes.
 3. Measure Gain: It subtracts the child entropy from the parent entropy:$$IG(S, A) = \text{Entropy(Parent)} - \text{Entropy(Children given Feature A)}$$
 4. Select the Best Feature: The feature that results in the highest Information Gain (the largest reduction in uncertainty) is selected as the decision node.
 5. Recursion: This process repeats for each branch until the data is perfectly classified or a stopping criterion is met.

2.What is the difference between Gini Impurity and Entropy?


Ans:Key Differences at a Glance

1. Computational Efficiency


 - Gini: Does not require calculating logarithms. This makes it significantly faster to compute, which is a major advantage when dealing with massive datasets or many features.
 - Entropy: Logarithmic calculations are more "expensive" for a processor. In large-scale applications, this can lead to slightly longer training times.


2. Sensitivity

 - Gini: Tends to isolate the most frequent class in its own branch.
 - Entropy: Tends to produce more balanced trees. Because the log function "punishes" small probabilities more harshly than the squared function in Gini, Entropy is slightly more sensitive to changes in class probabilities.


3. Practical Impact


In 90% of real-world scenarios, the choice between the two won't drastically change your model's performance. They are highly correlated. However, Gini is the default in many libraries (like Scikit-Learn) simply because it is faster.

3.What is Pre-Pruning in Decision Trees?


Ans:Pre-pruning is a technique used to prevent overfitting in decision trees by halting the tree's growth before it perfectly fits the training data. It is often referred to as early stopping.


Instead of allowing the tree to grow until every leaf is "pure," pre-pruning applies specific constraints at each node. If a node does not meet these criteria, it stops splitting and becomes a terminal leaf.


Common Pre-Pruning Criteria

 - Maximum Depth: Setting a limit on how many levels the tree can grow.
 - Minimum Samples per Split: Requiring a minimum number of data points to be present in a node before it is allowed to branch out.
 - Minimum Samples per Leaf: Ensuring that every final leaf contains at least a specific number of observations.
 - Information Gain Threshold: Only allowing a split if it improves the model's purity (e.g., Gini Impurity or Entropy) by a predefined minimum amount.


Key Trade-offs

 - Advantage: It is computationally efficient because it reduces the time and memory needed to build the tree.
 - Disadvantage: It carries the risk of underfitting. It may stop a split that seems insignificant now but could have led to important patterns deeper in the tree (the "Horizon Effect").

4.Write a Python program to train a Decision Tree Classifier using Gini
Impurity as the criterion and print the feature importances (practical).

Ans:Training a Decision Tree Classifier using Scikit-Learn is a straightforward process. The Gini impurity criterion helps the tree decide how to split data by measuring the "purity" of a node (i.e., how often a randomly chosen element from the set would be incorrectly labeled).


Here is a practical implementation using the built-in Iris dataset.

In [None]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
import pandas as pd

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Initialize and train the Decision Tree Classifier
# Using criterion='gini' as requested
clf = DecisionTreeClassifier(criterion='gini', random_state=42)
clf.fit(X, y)

# Access .feature_importances_
importances = clf.feature_importances_

# Print the results
print("Decision Tree Feature Importances (Gini):")
for name, importance in zip(iris.feature_names, importances):
    print(f"{name}: {importance:.4f}")

"""
Output:
Decision Tree Feature Importances (Gini):
sepal length (cm): 0.0133
sepal width (cm): 0.0000
petal length (cm): 0.5641
petal width (cm): 0.4226
"""

5.What is a Support Vector Machine (SVM)?

Ans:A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression. Its primary goal is to find the optimal hyperplane that maximizes the distance (the margin) between two or more classes of data points in a multi-dimensional space.

6.What is the Kernel Trick in SVM?


Ans:In the world of Support Vector Machines (SVM), the Kernel Trick is a clever mathematical shortcut that allows the algorithm to solve non-linear problems without actually doing the heavy lifting of transforming data into higher dimensions.

7.Write a Python program to train two SVM classifiers with Linear and RBF
kernels on the Wine dataset, then compare their accuracies.


Ans:Training Support Vector Machines (SVM) on the Wine dataset is a classic way to see how different kernels handle multi-class classification. The Linear kernel works best when the data is linearly separable, while the RBF (Radial Basis Function) kernel excels at finding non-linear boundaries.

In [1]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler

# Load the Wine dataset
wine = datasets.load_wine()
X, y = wine.data, wine.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize features (Recommended for SVM)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train Linear SVM
linear_svm = SVC(kernel='linear')
linear_svm.fit(X_train, y_train)
linear_acc = accuracy_score(y_test, linear_svm.predict(X_test))

# Train RBF SVM
rbf_svm = SVC(kernel='rbf')
rbf_svm.fit(X_train, y_train)
rbf_acc = accuracy_score(y_test, rbf_svm.predict(X_test))

# Output results
print(f"Linear Kernel Accuracy: {linear_acc:.4f}")
print(f"RBF Kernel Accuracy:    {rbf_acc:.4f}")

# Comparison logic
if linear_acc > rbf_acc:
    print("Linear kernel performed better.")
elif rbf_acc > linear_acc:
    print("RBF kernel performed better.")
else:
    print("Both kernels achieved the same accuracy.")

Linear Kernel Accuracy: 0.9815
RBF Kernel Accuracy:    0.9815
Both kernels achieved the same accuracy.


8.What is the Naïve Bayes classifier, and why is it called "Naïve"?


Ans:The Naïve Bayes classifier is a probabilistic machine learning model used for classification tasks. It is built upon Bayes' Theorem, which calculates the probability of a hypothesis (a class) based on prior knowledge (features).


The algorithm calculates the probability of each class for a given set of features and selects the class with the highest probability.


Why is it called "Naïve"?


It is called "Naïve" because it makes a radical, simplifying assumption: Conditional Independence.


Specifically, it assumes that the presence of one particular feature in a class is completely unrelated to the presence of any other feature.


 - In Reality: Features are often linked. For example, in an email, the word "Money" often appears near the word "Transfer."
 - he "Naïve" Assumption: The algorithm treats "Money" and "Transfer" as if they have zero relationship to each other, calculating their probabilities in total isolation.


Despite this "naïve" disregard for feature correlation, the classifier is remarkably effective for complex tasks like spam filtering and sentiment analysis.

9.Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve
Bayes, and Bernoulli Naïve Bayes


Ans:The primary difference between these three algorithms is the mathematical distribution they assume for the input features.


1. Gaussian Naïve Bayes


Used when features are continuous (e.g., decimals, measurements). It assumes that the data for each class follows a Normal (Gaussian) distribution.


 - Assumption: The features follow a bell curve.
 - Data Example: Predicting weather using temperature ($22.4$°C) and humidity ($65.2$%).
 - Formula Logic: It calculates the mean ($\mu$) and variance ($\sigma^2$) for each class.


2. Multinomial Naïve Bayes


Used when features represent discrete counts. It is the standard for text classification where you focus on word frequency.


 - Used when features represent discrete counts. It is the standard for text classification where you focus on word frequency.
 - Data Example: Word counts in an email (the word "offer" appears $5$ times).
 - Key Behavior: It accounts for how many times a feature appears; $10$ occurrences carry more weight than $1$.



3. Bernoulli Naïve Bayes


Used when features are binary (0 or 1). It is similar to Multinomial but focuses on whether a feature exists or not.


 - Assumption: Features are independent booleans.
 - Data Example: A "bag of words" where you only mark if a word is present ($1$) or absent ($0$), regardless of how many times it appears.
 - Key Behavior: It explicitly penalizes the non-occurrence of a feature, which can be useful in short-text classification.

10.Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer
dataset and evaluate accuracy.

In [2]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize and train the Gaussian Naïve Bayes classifier
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Make predictions and evaluate accuracy
y_pred = gnb.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy:.4f}")

Accuracy: 0.9737
