# Theory Assignment

## Question 1 : What is Information Gain, and how is it used in Decision Trees?
  -> Information Gain is a metric derived from information theory used to quantify how much “information” (i.e., reduction in uncertainty) is obtained by splitting a dataset by a particular attribute (feature).

  How is Information Gain used in Decision Trees?

In the context of decision tree learning (e.g., algorithms like ID3, C4.5), Information Gain is used as a splitting criterion to choose which attribute to split on at each node. Here is how:

At a given node in the tree, you have a subset of training examples.

For each candidate attribute (that hasn’t been used yet in the path), compute the IG of splitting on that attribute.

Choose the attribute with the highest Information Gain — i.e., the one that gives the largest reduction in entropy (most “pure” or homogeneous split) for that node.


Create child nodes corresponding to the values of that attribute, and repeat recursively on each child subset until stopping criteria (e.g., all examples in subset belong to one class, or no attributes remain)

Because each split ideally reduces uncertainty (increases class homogeneity), the tree becomes more confident in its predictions as you go down.

## Question 2: What is the difference between Gini Impurity and Entropy?
  -> Gini is computationally simpler/faster (because no log calculations) so many implementations use it by default.


Entropy is more sensitive to changes in class probabilities (especially smaller probabilities) — in practice though the difference in tree performance is often minimal.


Choice between them often doesn’t make a big practical difference; tuning other parameters (pruning, depth) often matters more.

## Question 3:What is Pre-Pruning in Decision Trees?
  -> Pre-pruning means stopping the growth of the decision tree before it becomes fully grown (i.e., before every possible split is made) by using certain stopping criteria. In other words, while building the tree you check whether further splitting is justified; if not, you stop and make the current node a leaf.

##Question 4:Write a Python program to train a Decision Tree Classifier using Gini Impurity as the criterion and print the feature importances (practical).


In [1]:
# 1. Imports
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

# 2. Load sample data (you can replace this with your own)
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = iris.target

# 3. Split into train / test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# 4. Build classifier with Gini Impurity criterion
clf = DecisionTreeClassifier(criterion='gini', random_state=42)
clf.fit(X_train, y_train)

# 5. Get feature importances
importances = clf.feature_importances_

# 6. Print feature importances alongside feature names
feat_imp = pd.Series(importances, index=X.columns)
feat_imp = feat_imp.sort_values(ascending=False)
print("Feature importances (descending):")
print(feat_imp)

# 7. (Optional) Evaluate accuracy
print("\nTest set accuracy: {:.3f}".format(clf.score(X_test, y_test)))



Feature importances (descending):
petal length (cm)    0.893264
petal width (cm)     0.087626
sepal width (cm)     0.019110
sepal length (cm)    0.000000
dtype: float64

Test set accuracy: 1.000


##Question 5: What is a Support Vector Machine (SVM)?
->A Support Vector Machine (SVM) is a supervised machine-learning algorithm that finds the best boundary (hyperplane) between classes by maximising the margin between the closest data points of different classes.


Key points:

It works for classification (and can be extended to regression).


If the data isn’t linearly separable, it uses a kernel function to map data into a higher-dimensional space where separation is possible.


The “support vectors” are the training points closest to the decision boundary; they determine the position of the boundary.


## Question 6: What is the Kernel Trick in SVM?
 -> The kernel trick is a technique used in SVMs to enable them to handle non-linearly separable data by mapping it into a higher-dimensional feature space — but without ever explicitly computing that mapping.


In simpler terms: you replace the usual dot-product operations in the SVM algorithm with a kernel function
𝐾
(
𝑥
𝑖
,
𝑥
𝑗
)
K(x
i
	​

,x
j
	​

) that computes the dot product of the features in the transformed (higher) space:

𝐾
(
𝑥
𝑖
,
𝑥
𝑗
)
=
𝜙
(
𝑥
𝑖
)
⋅
𝜙
(
𝑥
𝑗
)
K(x
i
	​

,x
j
	​

)=ϕ(x
i
	​

)⋅ϕ(x
j
	​

)

where
𝜙
(
⋅
)
ϕ(⋅) is the (possibly very high dimensional) mapping function.

## Question 7: Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies.



In [2]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# 1. Load data
data = load_wine()
X = data.data
y = data.target

# 2. Split into train/test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# 3. Feature scale (important for SVMs)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled  = scaler.transform(X_test)

# 4. Train SVM with linear kernel
svm_linear = SVC(kernel='linear', random_state=42)
svm_linear.fit(X_train_scaled, y_train)
y_pred_lin = svm_linear.predict(X_test_scaled)
acc_lin = accuracy_score(y_test, y_pred_lin)

# 5. Train SVM with RBF kernel
svm_rbf = SVC(kernel='rbf', random_state=42)
svm_rbf.fit(X_train_scaled, y_train)
y_pred_rbf = svm_rbf.predict(X_test_scaled)
acc_rbf = accuracy_score(y_test, y_pred_rbf)

# 6. Print results
print(f"Accuracy with Linear kernel   : {acc_lin:.4f}")
print(f"Accuracy with RBF kernel      : {acc_rbf:.4f}")


Accuracy with Linear kernel   : 0.9630
Accuracy with RBF kernel      : 0.9815


## Question 8: What is the Naïve Bayes classifier, and why is it called "Naïve"?
 -> The Naïve Bayes classifier is a probabilistic supervised learning algorithm used for classification. It is based on Bayes’ theorem, which relates the probability of a class given the features to the prior probability of the class and the likelihood of the features given the class. It computes for each class
𝐶
𝑘
C
k
	​

 and a feature vector
𝑥
=
(
𝑥
1
,
𝑥
2
,
…
,
𝑥
𝑛
)
x=(x
1
	​

,x
2
	​

,…,x
n
	​

):

𝑃
(
𝐶
𝑘
∣
𝑥
)
∝
𝑃
(
𝐶
𝑘
)

∏
𝑖
=
1
𝑛
𝑃
(
𝑥
𝑖
∣
𝐶
𝑘
)
P(C
k
	​

∣x)∝P(C
k
	​

)
i=1
∏
n
	​

P(x
i
	​

∣C
k
	​

)

and assigns the class with the highest posterior probability.
Wikipedia
+2
GeeksforGeeks
+2

Why is it called “Naïve”?
It is called “naïve” because of a strong simplifying assumption: it assumes that all features (the
𝑥
𝑖
x
i
	​

) are conditionally independent of each other given the class label. In reality, features often are correlated—but the classifier “naïvely” ignores those inter-dependencies.

If you like, I can also provide a short Python example of how to implement Naïve Bayes (e.g., with text data) and show when its “naïve” assumption may matter.

## Question 9: Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes
 -> Use Gaussian NB when your features are real-valued and roughly normally distributed.

Use Multinomial NB when your features count things (how many times something occurred).

Use Bernoulli NB when your features are binary indicators (whether something happened or not).

## Question 10: Breast Cancer Dataset
Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer
dataset and evaluate accuracy.
Hint:Use GaussianNB() from sklearn.naive_bayes and the Breast Cancer dataset from
sklearn.datasets.


In [None]:
# 1. Imports
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# 2. Load the data
data = load_breast_cancer()
X   = data.data
y   = data.target

# 3. Split into train/test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.3,     # 30% test size
    random_state=42,   # for reproducibility
    stratify=y         # maintain class balance in split
)

# 4. Initialize the Gaussian Naive Bayes classifier
clf = GaussianNB()

# 5. Train (fit) the classifier
clf.fit(X_train, y_train)

# 6. Predict on the test set
y_pred = clf.predict(X_test)

# 7. Compute accuracy
acc = accuracy_score(y_test, y_pred)
print(f"Accuracy of GaussianNB on Breast Cancer dataset: {acc:.4f}")
