Question 1: What is Information Gain, and how is it used in Decision Trees?

Answer:
Information Gain is a metric used to measure how well a feature splits data into distinct classes. It is based on entropy, which quantifies impurity.

Information Gain = Entropy(before split) – Weighted Entropy(after split)
In Decision Trees, the feature with the highest Information Gain is selected as the splitting feature at each node because it results in the most significant reduction in impurity and helps build an efficient, accurate tree.

Question 2: What is the difference between Gini Impurity and Entropy?

Answer:

| Metric            | Formula                 | Interpretation                            | When Useful                              |
| ----------------- | ----------------------- | ----------------------------------------- | ---------------------------------------- |
| **Gini Impurity** | (1 - \sum p_i^2)        | Measures probability of misclassification | Faster, preferred in CART                |
| **Entropy**       | (-\sum p_i \log_2(p_i)) | Measures disorder/uncertainty             | Used when probabilistic purity is needed |

Key differences:

Gini is computationally simpler and often preferred.

Entropy tends to produce slightly more balanced trees.

Both measure impurity; choice depends on performance needs.

Question 3: What is Pre-Pruning in Decision Trees?

Answer:
Pre-Pruning (also called early stopping) prevents a decision tree from growing too large by stopping the splitting process early when further splits are unlikely to improve performance.

Common stopping criteria:

Minimum number of samples required to split a node

Maximum depth of the tree

Minimum leaf size

Minimum impurity decrease

It reduces overfitting and improves generalization.

In [1]:
#Question 4: Python program — Decision Tree with Gini Impurity

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

# Load dataset
data = load_iris()
X, y = data.data, data.target

# Train Decision Tree with Gini
clf = DecisionTreeClassifier(criterion='gini', random_state=42)
clf.fit(X, y)

# Print feature importances
print("Feature Importances:")
for name, importance in zip(data.feature_names, clf.feature_importances_):
    print(f"{name}: {importance:.4f}")

Feature Importances:
sepal length (cm): 0.0133
sepal width (cm): 0.0000
petal length (cm): 0.5641
petal width (cm): 0.4226


Question 5: What is a Support Vector Machine (SVM)?

Answer:
SVM is a supervised learning algorithm used for classification and regression. It works by finding the optimal hyperplane that maximally separates classes.

Key idea: maximize the margin between support vectors (boundary points) and the separating hyperplane.

Question 6: What is the Kernel Trick in SVM?

Answer:
The Kernel Trick allows SVMs to classify data that is not linearly separable by implicitly mapping it to a higher-dimensional space without computing the transformation directly.

Popular kernels:

Linear

Polynomial

RBF (Gaussian)

This makes SVM powerful for complex decision boundaries.

In [2]:
#Question 7: Python program — Linear vs RBF SVM on Wine Dataset

from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load data
data = load_wine()
X, y = data.data, data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Linear SVM
clf_linear = SVC(kernel='linear')
clf_linear.fit(X_train, y_train)
acc_linear = accuracy_score(y_test, clf_linear.predict(X_test))

# RBF SVM
clf_rbf = SVC(kernel='rbf')
clf_rbf.fit(X_train, y_train)
acc_rbf = accuracy_score(y_test, clf_rbf.predict(X_test))

print("Linear Kernel Accuracy:", acc_linear)
print("RBF Kernel Accuracy:", acc_rbf)

Linear Kernel Accuracy: 1.0
RBF Kernel Accuracy: 0.8055555555555556


Question 8: What is the Naïve Bayes classifier, and why is it called "Naïve"?

Answer:
Naïve Bayes is a probabilistic classifier based on Bayes’ Theorem.
It assumes that all features are independent given the class label.

It is called “naïve” because this independence assumption rarely holds in real data, yet the classifier performs remarkably well in practice

Question 9: Differences between Gaussian, Multinomial, and Bernoulli Naïve Bayes

Answer:
| Type               | Data Type                                         | Use Case                                           |
| ------------------ | ------------------------------------------------- | -------------------------------------------------- |
| **Gaussian NB**    | Continuous features (follows normal distribution) | Medical data, sensor data                          |
| **Multinomial NB** | Count data                                        | NLP, word counts                                   |
| **Bernoulli NB**   | Binary features (0/1)                             | Document classification with word presence/absence |


In [3]:
#Question 10: Python program — Gaussian NB on Breast Cancer Dataset


from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Gaussian NB model
clf = GaussianNB()
clf.fit(X_train, y_train)

# Evaluate accuracy
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print("Accuracy:", accuracy)

Accuracy: 0.9736842105263158
