# Supervised Classification: Decision Trees, SVM, and Naive Bayes

1. What is Information Gain, and how is it used in Decision Trees?
 - Information Gain (IG) is a metric that measures how much “uncertainty” in the dataset reduces after splitting on a feature. Decision Trees use Information Gain to choose the best attribute for splitting. It is calculated as the difference between the entropy before a split and the weighted entropy after the split. A feature with the highest Information Gain is selected at each node because it gives the purest child nodes. IG helps the tree grow in a way that best separates the classes. It is mainly used in the ID3 algorithm.

2. What is the difference between Gini Impurity and Entropy?
 - Gini Impurity measures how often a randomly chosen sample would be incorrectly labeled. Entropy measures the level of uncertainty or disorder in the data. Gini is simpler and faster computationally, making it preferred in CART decision trees.

    Entropy is more mathematically complex and tends to create slightly deeper trees. Both measure impurity, but Gini usually creates better-balanced splits while Entropy focuses more on pure partitions.

3. What is Pre-Pruning in Decision Trees ?
 - Pre-pruning stops the tree from growing too deep by applying constraints during training. Common techniques include limiting the maximum depth, minimum samples per split, or minimum leaf size. It prevents overfitting by halting splitting when the improvement becomes insignificant. Pre-pruning ensures the tree remains generalizable to unseen data. It also reduces training time and model complexity.




In [1]:
'''
4. Write a Python program to train a Decision Tree Classifier using Gini
Impurity as the criterion and print the feature importances (practical).

'''

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
import pandas as pd

# Load dataset
data = load_iris()
X = data.data
y = data.target

# Train Decision Tree with Gini criterion
model = DecisionTreeClassifier(criterion='gini')
model.fit(X, y)

# Print feature importances
feature_importances = pd.Series(model.feature_importances_, index=data.feature_names)
print("Feature Importances:\n", feature_importances)


Feature Importances:
 sepal length (cm)    0.026667
sepal width (cm)     0.000000
petal length (cm)    0.050723
petal width (cm)     0.922611
dtype: float64


5. What is a Support Vector Machine (SVM)?
 - SVM is a supervised machine learning algorithm used for classification and regression. It finds the best boundary (hyperplane) that maximizes the margin between classes. SVM works well in high-dimensional spaces and is robust to outliers. It can handle linear as well as non-linear decision boundaries using kernel functions. SVM is popular for text classification, image recognition, and bioinformatics.

6. What is the Kernel Trick in SVM?
 - The Kernel Trick allows SVM to solve non-linear problems by projecting data into a higher-dimensional space without explicitly doing the transformation. This mathematical shortcut makes computation fast and efficient. Common kernels are Linear, Polynomial, and RBF. It helps SVM draw complex boundaries between classes. Without the kernel trick, SVM could only solve linear classification problems.


In [2]:
'''
7. Write a Python program to train two SVM classifiers with Linear and RBF
kernels on the Wine dataset, then compare their accuracies.

'''
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load dataset
data = load_wine()
X = data.data
y = data.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Linear Kernel SVM
svm_linear = SVC(kernel='linear')
svm_linear.fit(X_train, y_train)
acc_linear = accuracy_score(y_test, svm_linear.predict(X_test))

# RBF Kernel SVM
svm_rbf = SVC(kernel='rbf')
svm_rbf.fit(X_train, y_train)
acc_rbf = accuracy_score(y_test, svm_rbf.predict(X_test))

print("Linear Kernel Accuracy:", acc_linear)
print("RBF Kernel Accuracy:", acc_rbf)


Linear Kernel Accuracy: 1.0
RBF Kernel Accuracy: 0.8055555555555556


8.  What is the Naive Bayes classifier, and why is it called "Naive"?
 - Naive Bayes is a probabilistic classifier based on Bayes’ Theorem. It assumes that all features are statistically independent from each other. This assumption is unrealistic in real datasets, which is why the model is called “naive.” Despite this simplification, Naive Bayes performs extremely well, especially for text classification. It is fast, scalable, and works well with high-dimensional data.

9.  Explain the differences between Gaussian Naive Bayes, Multinomial Naïve
Bayes, and Bernoulli Naive Bayes
 - Gaussian NB: Used when features are continuous and follow normal distribution (e.g., age, height).
    
    Multinomial NB: Used for count data such as word frequencies in NLP (bag-of-words model).

    Bernoulli NB: Used when features are binary (0/1), such as the presence or absence of a word.
    
    Each variant uses Bayes’ theorem but applies a different likelihood function depending on the data type.


In [3]:
'''
10. Breast Cancer Dataset
Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer
dataset and evaluate accuracy.

'''

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Gaussian NB model
model = GaussianNB()
model.fit(X_train, y_train)

# Predictions & accuracy
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print("Gaussian Naive Bayes Accuracy:", accuracy)


Gaussian Naive Bayes Accuracy: 0.9736842105263158
