#Supervised Classification: Decision Trees, SVM, and Naive Bayes Assignment

**Question 1. What is Information Gain, and how is it used in Decision Trees?**
- Information Gain measures how much uncertainty (or impurity) in the target variable is reduced after splitting the data on a feature.

- Decision trees use Information Gain to decide:

  - which feature to split on at each node.

- How it works (conceptually):
  1. Measure entropy (uncertainty) before the split.
  2. Measure entropy after the split.
  3. Subtract.

- Information Gain = Entropy(before split) − Entropy(after split)

- Higher Information Gain means:

  - better separation of classes
  - more useful feature

- So, the decision tree chooses the feature that gives the highest Information Gain first.

**Question 2. What is the difference between Gini Impurity and Entropy?**
- Both are impurity measures used to decide splits in decision trees.

- Write this in your notebook:
  - Gini Impurity:
    Measures how often a randomly chosen sample would be incorrectly classified if it were labeled randomly according to the distribution of classes.

  - Entropy:
    Measures the level of disorder or unpredictability in the data.

- Now the comparison:

  - Gini Impurity:
    - Faster to compute
    - Often preferred in CART (used in scikit-learn)
    - Biased slightly toward dominant classes

  - Entropy:
    - Based on information theory (logarithmic)
    - Gives more weight to rare classes
    - Slower to compute than Gini

- When to use:

  - Use Gini when:
    speed and simplicity matter.

  - Use Entropy when:
    you care more about capturing uncertainty and want more balanced splits.

- Both usually produce very similar trees in practice.

**Question 3. What is Pre-Pruning in Decision Trees?**
- Pre-pruning (also called early stopping) means:

  - stopping the tree from growing too deep while it is being built.

- Instead of allowing the tree to grow fully and pruning later, we limit growth using rules like:

  - max depth
  - minimum samples to split
  - minimum samples in leaf
  - maximum number of nodes

- Purpose:

  - prevent overfitting
  - improve generalization
  - reduce training time

- Example settings in scikit-learn:

  - max_depth=5, min_samples_split=10, min_samples_leaf=5

**Question 4. Write a Python program to train a Decision Tree Classifier using Gini Impurity as the criterion and print the feature importances (practical).**

In [1]:
# Decision Tree Classifier with Gini Impurity

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
import pandas as pd

# Load dataset
data = load_iris()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

# Train model
model = DecisionTreeClassifier(criterion='gini', random_state=0)
model.fit(X, y)

# Print feature importances
print("Feature Importances:")
for name, score in zip(X.columns, model.feature_importances_):
    print(name, ":", score)

Feature Importances:
sepal length (cm) : 0.0
sepal width (cm) : 0.013333333333333329
petal length (cm) : 0.06405595813204505
petal width (cm) : 0.9226107085346216


**Question 5.  What is a Support Vector Machine (SVM)?**
- Support Vector Machine is a classification algorithm that:

  - finds the best boundary (hyperplane) that separates classes with the maximum margin.

- It focuses on the most important data points (support vectors), not all samples.

- Used for:

  - classification
  - regression
  - outlier detection

- Works well on high-dimensional data.

**Question 6. What is the Kernel Trick in SVM?**
- Kernel Trick allows SVM to classify non-linear data by mapping it into higher dimensions without actually computing the transformation.

- Instead, it uses kernel functions such as:

  - linear
  - polynomial
  - RBF (Gaussian)

- So a problem that is not linearly separable becomes separable.

- Example:

  - Linear data → linear kernel
  - Curved boundary → RBF kernel

**Question 7. Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies.**

In [2]:
# Compare SVM Linear vs RBF on Wine dataset

from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load dataset
data = load_wine()
X = data.data
y = data.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

# Linear SVM
linear_svm = SVC(kernel='linear')
linear_svm.fit(X_train, y_train)
linear_acc = accuracy_score(y_test, linear_svm.predict(X_test))

# RBF SVM
rbf_svm = SVC(kernel='rbf')
rbf_svm.fit(X_train, y_train)
rbf_acc = accuracy_score(y_test, rbf_svm.predict(X_test))

print("Linear SVM Accuracy:", linear_acc)
print("RBF SVM Accuracy:", rbf_acc)

Linear SVM Accuracy: 0.9814814814814815
RBF SVM Accuracy: 0.7777777777777778


**Question 8.  What is the Naïve Bayes classifier, and why is it called "Naïve"?**
- Naïve Bayes is a probabilistic classifier based on Bayes’ Theorem.

- It assumes:

  - all features are independent of each other — which is often not true.

- Because of this unrealistic independence assumption, it is called “Naïve.”

- Still, it works surprisingly well for:

  - spam filtering
  - document classification
  - sentiment analysis

- Fast, simple, and effective.

**Question 9. Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes.**

**1. Gaussian Naïve Bayes**

- Used when features are continuous and follow a normal distribution.

- Example:
  - height, age, temperature.

  - sklearn.naive_bayes.GaussianNB

**2. Multinomial Naïve Bayes**

- Used for count-based data.

- Example:
  - word counts in text (Bag-of-Words, TF-IDF).

  - sklearn.naive_bayes.MultinomialNB

**3. Bernoulli Naïve Bayes**

- Used for binary features (0/1).

- Example:
  - word present = 1, word absent = 0.

  - sklearn.naive_bayes.BernoulliNB

**Quick Summary:**

  - Gaussian → continuous numeric
  - Multinomial → word counts
  - Bernoulli → binary presence/absence

**Question 10. Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluate accuracy.**

In [3]:
# Gaussian Naive Bayes on Breast Cancer Dataset

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=0
)

# Model
model = GaussianNB()

# Train
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)

print("Accuracy:", accuracy)

Accuracy: 0.9239766081871345
