#Decision Trees, SVM, and Naive Bayes

Q 1.What is Information Gain, and how is it used in Decision Trees?

Answer: Information Gain (IG) is a metric used in Decision Trees to measure how much uncertainty (entropy) is reduced after splitting a dataset on a particular feature.

It is based on the concept of Entropy, which measures impurity in a dataset.

ùê∏
ùëõ
ùë°
ùëü
ùëú
ùëù
ùë¶
(
ùëÜ
)=
‚àí
‚àë
ùëù
<sub>ùëñ</sub>
log<sub>2</sub>(p<sub>i</sub>)

Where:

-   p<sub>ùëñ</sub>= probability of class i

Information Gain is calculated as:

IG(S,A)=Entropy(S)‚àí‚àë
‚à£S <sub> v </sub>‚à£ / |s| Entropy(S <sub>v</sub>)

Where:

-  S = dataset
-  A = feature

-  ùëÜ<sub>ùë£</sub>= subset after split

How it is used:

-  For every feature, Information Gain is calculated.

-  The feature with the highest Information Gain is selected for splitting.

-  This process repeats recursively.

Importance:

-  Helps build an optimal tree.

-  Reduces randomness in splits.

-  Used in algorithms like ID3 and C4.5.

Q 2.Difference Between Gini Impurity and Entropy.

Ans--


| Basis          | Gini Impurity                    | Entropy                           |
| -------------- | -------------------------------- | --------------------------------- |
| Formula        | (1 - \sum p_i^2)                 | (-\sum p_i \log_2 p_i)            |
| Range          | 0 to 0.5 (binary)                | 0 to 1                            |
| Speed          | Faster to compute                | Slightly slower (log calculation) |
| Interpretation | Probability of misclassification | Measure of information disorder   |
| Used in        | CART Algorithm                   | ID3, C4.5                         |

Key Differences:

-  Gini is computationally efficient.

-  Entropy has stronger theoretical foundation (Information Theory).

-  In practice, both give similar results.

Q 3: What is Pre-Pruning in Decision Trees?

Answer:

Pre-Pruning (Early Stopping) is a technique used to stop the growth of a Decision Tree before it becomes too complex.

Purpose:

To prevent overfitting.

**Common Pre-Pruning Techniques:**

-  Setting max_depth

-  Setting min_samples_split

-  Setting min_samples_leaf

-  Setting minimum information gain threshold

Advantages:

-  Reduces model complexity

-  Faster training

-  Better generalization

Disadvantages:

-  May stop too early (underfitting)

Q 4.:Write a Python program to train a Decision Tree Classifier using Gini
Impurity as the criterion and print the feature importances.

In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

# Load dataset
data = load_iris()
X = data.data
y = data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train Decision Tree with Gini
model = DecisionTreeClassifier(criterion='gini', random_state=42)
model.fit(X_train, y_train)

# Print feature importances
print("Feature Importances:")
for name, importance in zip(data.feature_names, model.feature_importances_):
    print(f"{name}: {importance:.4f}")


Feature Importances:
sepal length (cm): 0.0000
sepal width (cm): 0.0191
petal length (cm): 0.8933
petal width (cm): 0.0876


Q 5.What is a Support Vector Machine (SVM)?

Answer:

Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression.

Core Idea:

Find the optimal hyperplane that separates classes with the maximum margin.

Key Concepts:

-  Hyperplane

-  Margin

-   Vectors (data points closest to hyperplane)

Advantages:

-  Works well with high-dimensional data

-  Effective in complex classification

Q 6.What is the Kernel Trick in SVM?

Answer:

The Kernel Trick allows SVM to perform classification in higher-dimensional space without explicitly computing coordinates in that space.

Instead of transforming data manually, a kernel function computes similarity directly.

Common Kernels:

-  Linear

-  Polynomial

-   Radial Basis Function (RBF)

-    Sigmoid

Benefit:

-  Allows SVM to solve non-linear problems efficiently.

Q 7.Write a Python program to train two SVM classifiers with Linear and RBF
kernels on the Wine dataset, then compare their accuracies.



In [2]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load dataset
data = load_wine()
X = data.data
y = data.target

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Linear SVM
linear_svm = SVC(kernel='linear')
linear_svm.fit(X_train, y_train)
linear_pred = linear_svm.predict(X_test)
linear_acc = accuracy_score(y_test, linear_pred)

# RBF SVM
rbf_svm = SVC(kernel='rbf')
rbf_svm.fit(X_train, y_train)
rbf_pred = rbf_svm.predict(X_test)
rbf_acc = accuracy_score(y_test, rbf_pred)

print("Linear Kernel Accuracy:", linear_acc)
print("RBF Kernel Accuracy:", rbf_acc)


Linear Kernel Accuracy: 0.9814814814814815
RBF Kernel Accuracy: 0.7592592592592593


Q 8.What is the Na√Øve Bayes classifier, and why is it called "Na√Øve"?

Ans--Na√Øve Bayes is a probabilistic classification algorithm based on Bayes‚Äô Theorem.

P(A‚à£B)=
P(B‚à£A)P(A)/P(B)
	‚Äã


It assumes that all features are independent of each other.

**Why "Na√Øve"?**

-  Because it makes a strong independence assumption, which is rarely true in real-world data.

**Advantages:**

-  Fast

-  Works well with large datasets

-  Good for text classification

Q 9.Explain the differences between Gaussian Na√Øve Bayes, Multinomial Na√Øve
Bayes, and Bernoulli Na√Øve Bayes.

Ans--
| Type           | Used For            | Data Type            |
| -------------- | ------------------- | -------------------- |
| Gaussian NB    | Continuous data     | Normally distributed |
| Multinomial NB | Text classification | Discrete counts      |
| Bernoulli NB   | Binary features     | 0/1 features         |

**Gaussian NB**

-  Assumes normal distribution.

**Multinomial NB**

-  Used for word frequency.

**Bernoulli NB**

-  Used for binary presence/absence features.

Q 10. Breast Cancer Dataset
Write a Python program to train a Gaussian Na√Øve Bayes classifier on the Breast Cancer
dataset and evaluate accuracy.

In [3]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train model
model = GaussianNB()
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)

print("Accuracy:", accuracy)


Accuracy: 0.9415204678362573
