**Question 1**: What is Information Gain, and how is it used in Decision Trees?

**Answer**:
Information Gain is a metric used in Decision Trees to decide which feature should be used to split the data at each node. It measures how much uncertainty (entropy) is reduced after splitting the dataset based on a feature.

It is based on Entropy, which measures randomness in the data.

The feature with the highest Information Gain is chosen for splitting.

Formula:

Information Gain
=
Entropy(parent)
−
∑
(
Samples in child/
Samples in parent
×
Entropy(child)
)


**Usage** in Decision Trees:

Commonly used in ID3 and C4.5 algorithms.

Helps create more pure child nodes after each split.

**Question 2**: What is the difference between Gini Impurity and Entropy?

**Answer**:
Gini Impurity and Entropy are both measures used in Decision Trees to determine how well a feature splits the data, but they differ in how they calculate impurity and how they are used.

Gini Impurity measures the probability that a randomly selected data point would be incorrectly classified if it were labeled according to the class distribution of the node. It is simpler to compute because it does not involve logarithmic calculations, which makes it faster and more suitable for large datasets. Gini is commonly used in the CART (Classification and Regression Trees) algorithm.

Entropy measures the level of randomness or disorder in the data using a logarithmic function. It is more sensitive to changes in class probabilities and provides a more detailed measure of impurity. Because of this, entropy is often preferred when the dataset is smaller or when a more precise split is required. Entropy is mainly used in ID3 and C4.5 algorithms.

In summary, Gini Impurity is computationally efficient and widely used in practice, while Entropy provides a more mathematically detailed measure of impurity but is slower to compute.



**Question 3**: What is Pre-Pruning in Decision Trees?

**Answer**:
Pre-Pruning is a technique used to stop the growth of a decision tree early to prevent overfitting.

Common Pre-Pruning Methods:

Limiting tree depth (max_depth)

Minimum samples required to split (min_samples_split)

Minimum samples per leaf (min_samples_leaf)

Maximum number of leaf nodes

**Benefits**:

Reduces overfitting

Improves generalization

Faster training

Question 4: Decision Tree using Gini Impurity (Practical)

In [None]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

# Load dataset
data = load_iris()
X = data.data
y = data.target

# Train Decision Tree using Gini Impurity
model = DecisionTreeClassifier(criterion='gini', random_state=42)
model.fit(X, y)

# Print feature importances
print("Feature Importances:", model.feature_importances_)


**Question 5**: What is a Support Vector Machine (SVM)?

**Answer**:
Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression.

It finds an optimal hyperplane that maximizes the margin between classes.

Uses support vectors, which are the closest data points to the decision boundary.

Effective in high-dimensional spaces.

**Question 6**: What is the Kernel Trick in SVM?

**Answer**:
The Kernel Trick allows SVMs to solve non-linear problems by transforming data into a higher-dimensional space without explicitly computing the transformation.

**Common Kernels**:

Linear

Polynomial

RBF (Radial Basis Function)

Sigmoid

Benefit:
Efficiently handles complex decision boundaries.

Question 7: SVM with Linear and RBF Kernels (Practical)

In [None]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load dataset
data = load_wine()
X = data.data
y = data.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Linear Kernel SVM
svm_linear = SVC(kernel='linear')
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)

# RBF Kernel SVM
svm_rbf = SVC(kernel='rbf')
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)

# Accuracy
print("Linear Kernel Accuracy:", accuracy_score(y_test, y_pred_linear))
print("RBF Kernel Accuracy:", accuracy_score(y_test, y_pred_rbf))


**Question 8**: What is the Naïve Bayes classifier, and why is it called "Naïve"?

**Answer**:
Naïve Bayes is a probabilistic classifier based on Bayes’ Theorem.

It is called “Naïve” because it assumes that all features are independent, which is rarely true in real-world data.

Advantages:

Simple and fast

Performs well on large datasets

Works well for text classification

**Question 9**: Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes

**Answer**:
Gaussian Naïve Bayes is used when the features are continuous and follow a normal (Gaussian) distribution. It assumes that each feature value is drawn from a Gaussian distribution for each class. This variant is commonly applied in medical, scientific, and numerical datasets where feature values are real numbers.

Multinomial Naïve Bayes is suitable for discrete count-based data. It is widely used in text classification problems such as spam detection and document categorization, where features represent word counts or term frequencies. It works well when the data consists of non-negative integer values.

Bernoulli Naïve Bayes is designed for binary or boolean features. Instead of counting how many times a feature appears, it only considers whether a feature is present or absent. This makes it useful for applications like spam filtering where the presence or absence of a word matters more than its frequency.

In summary, Gaussian Naïve Bayes works best with continuous data, Multinomial Naïve Bayes is ideal for count-based features, and Bernoulli Naïve Bayes is best suited for binary-valued features.

Question 10: Gaussian Naïve Bayes on Breast Cancer Dataset

In [None]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train Gaussian Naive Bayes
model = GaussianNB()
model.fit(X_train, y_train)

# Prediction
y_pred = model.predict(X_test)

# Accuracy
print("Accuracy:", accuracy_score(y_test, y_pred))
