# DA-AG-009 — Supervised Classification: Decision Trees, SVM, and Naive Bayes

## Q1. What is Information Gain, and how is it used in Decision Trees?

**Answer:**  
Information Gain (IG) measures the reduction in entropy after splitting data on an attribute.  
The attribute with the highest IG is chosen for splitting.

\[ IG(S, A) = Entropy(S) - \sum_v \frac{|S_v|}{|S|} Entropy(S_v) \]


## Q2. What is the difference between Gini Impurity and Entropy?

| Measure | Formula | Range | Remarks |
|----------|----------|--------|----------|
| **Gini Impurity** | 1 - Σ p_i² | 0–0.5 | Faster, default in sklearn |
| **Entropy** | -Σ p_i log₂ p_i | 0–1 | From info theory, more interpretable |

Both measure node impurity; **Gini** is simpler and slightly faster.


## Q3. What is Pre-Pruning in Decision Trees?

**Answer:**  
Pre-pruning stops tree growth early using parameters like `max_depth`, `min_samples_split`, or `min_impurity_decrease` to prevent overfitting.


In [None]:
from sklearn.tree import DecisionTreeClassifier
clf = DecisionTreeClassifier(max_depth=3, min_samples_split=5)
clf

## Q4. Train a Decision Tree using Gini Impurity and print feature importances

In [None]:
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
import pandas as pd

data = load_iris()
X, y = data.data, data.target

model = DecisionTreeClassifier(criterion='gini', random_state=42)
model.fit(X, y)

# Feature Importances
pd.DataFrame({'Feature': data.feature_names, 'Importance': model.feature_importances_})

## Q5. What is a Support Vector Machine (SVM)?

**Answer:**  
SVM finds the optimal hyperplane separating classes with maximum margin. Effective for high-dimensional data.

## Q6. What is the Kernel Trick in SVM?

**Answer:**  
Kernel Trick maps non-linear data to higher dimensions (using RBF, polynomial, etc.) for linear separation in transformed space.

## Q7. Train two SVMs (Linear and RBF) on the Wine dataset and compare accuracies

In [None]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

wine = load_wine()
X_train, X_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.3, random_state=42)

svm_linear = SVC(kernel='linear')
svm_rbf = SVC(kernel='rbf')

svm_linear.fit(X_train, y_train)
svm_rbf.fit(X_train, y_train)

acc_linear = accuracy_score(y_test, svm_linear.predict(X_test))
acc_rbf = accuracy_score(y_test, svm_rbf.predict(X_test))

print('Linear Kernel Accuracy:', acc_linear)
print('RBF Kernel Accuracy:', acc_rbf)

Linear Kernel Accuracy: 0.9814814814814815
RBF Kernel Accuracy: 0.7592592592592593


## Q8. What is the Naïve Bayes classifier, and why is it called 'Naïve'?

**Answer:**  
Naïve Bayes applies Bayes’ Theorem assuming **feature independence** (hence “naïve”).  
It’s fast, simple, and effective for text classification and spam detection.

\[ P(C|X) = \frac{P(X|C) P(C)}{P(X)} \]


## Q9. Differences between Gaussian, Multinomial, and Bernoulli Naïve Bayes

| Type | Data Type | Example Use |
|------|------------|--------------|
| **GaussianNB** | Continuous | Iris, medical data |
| **MultinomialNB** | Count data | Text, word frequencies |
| **BernoulliNB** | Binary features | Binary word presence |


## Q10. Train Gaussian Naïve Bayes on Breast Cancer dataset and evaluate accuracy

In [None]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.3, random_state=42)

model = GaussianNB()
model.fit(X_train, y_train)
pred = model.predict(X_test)

print('Accuracy:', accuracy_score(y_test, pred))

Accuracy: 0.9415204678362573
