**Question 1 : What is Information Gain, and how is it used in Decision Trees?**

Answer:- Information Gain is a metric used in decision trees to measure how much uncertainty (entropy) is reduced after splitting a dataset on a particular feature. It is based on the concept of entropy, which quantifies the impurity or randomness in the data. Information Gain is calculated as the difference between the entropy of the parent node and the weighted average entropy of the child nodes after the split. In decision trees, at each node, the feature that gives the highest Information Gain is selected for splitting because it results in the most homogeneous (pure) subsets of data. This process is repeated recursively, allowing the decision tree to efficiently classify or predict outcomes.

**Question 2: What is the difference between Gini Impurity and Entropy?**
**Hint: Directly compares the two main impurity measures, highlighting strengths weaknesses, and appropriate use cases**

Answer:- Gini Impurity and Entropy are two commonly used measures of node impurity in decision tree algorithms, but they differ in definition, computation, and practical use. Gini Impurity measures the probability that a randomly chosen data point would be incorrectly classified if it were labeled according to the class distribution of the node. It is computationally simpler and faster to calculate, which makes it the default choice in algorithms like CART and Random Forest. Entropy, on the other hand, measures the amount of uncertainty or disorder in the data using logarithmic calculations and is used to compute Information Gain in algorithms such as ID3 and C4.5. While both measures usually produce similar splits, Gini Impurity is preferred for large datasets due to speed, whereas Entropy is more interpretable from an information-theoretic perspective and is useful when precise impurity measurement is desired.

**Question 3:What is Pre-Pruning in Decision Trees?**

Answer:- Pre-pruning in Decision Trees is a technique used to stop the growth of the tree at an early stage in order to prevent overfitting. Instead of allowing the tree to grow until all leaves are pure, pre-pruning applies stopping conditions while the tree is being built, such as limiting the maximum depth of the tree, setting a minimum number of samples required to split a node, or requiring a minimum information gain for a split. If these conditions are not met, the splitting process is stopped. By controlling tree complexity early, pre-pruning helps improve generalization, reduces training time, and avoids creating overly complex trees that fit noise in the training data.

**Question 4:Write a Python program to train a Decision Tree Classifier using Gini Impurity as the criterion and print the feature importances (practical).**
**Hint: Use criterion='gini' in DecisionTreeClassifier and access .feature_importances_.**
**(Include your Python code and output in the code box below.)**

In [1]:
# Import required libraries
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
import pandas as pd

# Load the Iris dataset
data = load_iris()
X = data.data
y = data.target
feature_names = data.feature_names

# Train Decision Tree using Gini Impurity
dt = DecisionTreeClassifier(criterion='gini', random_state=42)
dt.fit(X, y)

# Get feature importances
importances = dt.feature_importances_

# Display feature importances
feature_importance_df = pd.DataFrame({
    "Feature": feature_names,
    "Importance": importances
})

print("Feature Importances using Gini Impurity:")
print(feature_importance_df)


Feature Importances using Gini Impurity:
             Feature  Importance
0  sepal length (cm)    0.013333
1   sepal width (cm)    0.000000
2  petal length (cm)    0.564056
3   petal width (cm)    0.422611


**Question 5: What is a Support Vector Machine (SVM)?**

Answer:- A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression tasks. It works by finding an optimal decision boundary (called a hyperplane) that best separates data points of different classes. The key idea of SVM is to maximize the margin, which is the distance between the hyperplane and the nearest data points from each class, known as support vectors. By maximizing this margin, SVM achieves better generalization on unseen data. SVM can also handle non-linearly separable data using kernel functions that transform the data into a higher-dimensional space, making separation possible.

**Question 6: What is the Kernel Trick in SVM?**

Answer:- The Kernel Trick in Support Vector Machines (SVM) is a technique that allows SVM to handle non-linearly separable data by implicitly mapping the input features into a higher-dimensional feature space without explicitly computing the transformation. Instead of performing complex calculations in the higher dimension, the kernel trick uses a kernel function to compute the inner products between data points directly in the original space. This makes the computation efficient while enabling SVM to find a linear separating hyperplane in the transformed space, which corresponds to a non-linear decision boundary in the original space. Common kernel functions include linear, polynomial, and radial basis function (RBF) kernels.

**Question 7: Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies. **

**Hint:Use SVC(kernel='linear') and SVC(kernel='rbf'), then compare accuracy scores after fitting on the same dataset. **

**(Include your Python code and output in the code box below.)**

This program trains two SVM classifiers using linear and RBF kernels on the Wine dataset. After training on the same data, their accuracies are compared. The results show how different kernels affect model performance, with the linear kernel performing slightly better for this dataset.

In [2]:
# Import required libraries
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load the Wine dataset
X, y = load_wine(return_X_y=True)

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Train SVM with Linear Kernel
svm_linear = SVC(kernel='linear')
svm_linear.fit(X_train, y_train)
linear_pred = svm_linear.predict(X_test)
linear_acc = accuracy_score(y_test, linear_pred)

# Train SVM with RBF Kernel
svm_rbf = SVC(kernel='rbf')
svm_rbf.fit(X_train, y_train)
rbf_pred = svm_rbf.predict(X_test)
rbf_acc = accuracy_score(y_test, rbf_pred)

# Print accuracies
print("Linear Kernel SVM Accuracy:", linear_acc)
print("RBF Kernel SVM Accuracy:", rbf_acc)


Linear Kernel SVM Accuracy: 0.9814814814814815
RBF Kernel SVM Accuracy: 0.7592592592592593


**Question 8: What is the Naïve Bayes classifier, and why is it called "Naïve"?**

Answer:- The Naïve Bayes classifier is a probabilistic machine learning algorithm based on Bayes’ Theorem, used mainly for classification tasks. It predicts the class of a data point by calculating the probability of each class given the input features and selecting the class with the highest probability. It is called “Naïve” because it makes a strong simplifying assumption that all features are conditionally independent of each other given the class label, which is rarely true in real-world data. Despite this unrealistic assumption, Naïve Bayes often performs very well in practice, especially on large datasets and text classification problems, due to its simplicity, speed, and efficiency.

**Question 9: Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes**

Answer:- Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes are three variants of the Naïve Bayes classifier, each designed for different types of data based on how feature probabilities are modeled. Gaussian Naïve Bayes assumes that continuous features follow a normal (Gaussian) distribution, making it suitable for real-valued data such as measurements in medical or scientific datasets. Multinomial Naïve Bayes is used for discrete count-based features, such as word frequencies in text classification problems, and works well with data represented as term frequencies or TF-IDF values. Bernoulli Naïve Bayes, on the other hand, is designed for binary features, where each feature represents the presence or absence of an attribute, such as whether a word appears in a document or not. The key difference among them lies in the assumptions about the feature distributions, which determine their appropriate use cases.

**Question 10: Breast Cancer Dataset Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluate accuracy.**
Hint:Use GaussianNB() from sklearn.naive_bayes and the Breast Cancer dataset from
sklearn.datasets.
(Include your Python code and output in the code box below.)

In [3]:
# Import required libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
X, y = load_breast_cancer(return_X_y=True)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Train Gaussian Naïve Bayes classifier
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# Make predictions
y_pred = gnb.predict(X_test)

# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)

# Print accuracy
print("Gaussian Naïve Bayes Accuracy:", accuracy)


Gaussian Naïve Bayes Accuracy: 0.9415204678362573
