Question 1 : What is Information Gain, and how is it used in Decision Trees?
- Information Gain measures how much uncertainty (impurity) is reduced after splitting a dataset on a feature.
In Decision Trees, the feature with the highest Information Gain is selected for splitting at each node.

- Formula (based on Entropy):
  
    Information Gain = Entropy(parent) − ∑ Entropy(children)

- Use in Decision Trees:

  - Higher Information Gain → better feature to split

  - Used mainly in ID3 and C4.5 algorithms

Question 2: What is the difference between Gini Impurity and Entropy?  
| Aspect         | Gini Impurity                    | Entropy                |
| -------------- | -------------------------------- | ---------------------- |
| Formula        | (1 - \sum p^2)                   | (-\sum p \log_2 p)     |
| Speed          | Faster                           | Slower                 |
| Used in        | CART                             | ID3, C4.5              |
| Interpretation | Probability of misclassification | Measure of information |
| Accuracy       | Almost similar                   | Slightly more precise  |



Question 3:What is Pre-Pruning in Decision Trees?
- Pre-pruning is a technique where tree growth is stopped early to prevent overfitting.

- Common pre-pruning parameters:

    max_depth

    min_samples_split

    min_samples_leaf

    max_features

- Benefit:

    Reduces overfitting and improves generalization.

In [3]:
#Question 4:Write a Python program to train a Decision Tree Classifier using Gini Impurity as the criterion and print the feature importances.
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
import pandas as pd

X, y = load_iris(return_X_y=True)

model = DecisionTreeClassifier(criterion='gini', random_state=42)
model.fit(X, y)

importance = model.feature_importances_

df = pd.DataFrame({
    "Feature_Index": range(len(importance)),
    "Importance": importance
})

print(df)



   Feature_Index  Importance
0              0    0.013333
1              1    0.000000
2              2    0.564056
3              3    0.422611


Question 5: What is a Support Vector Machine (SVM)?
- Support Vector Machine (SVM) is a supervised ML algorithm used for classification and regression.
It finds the optimal hyperplane that separates data points of different classes with maximum margin.

- Key concepts:

    Support Vectors → closest data points to the boundary

    Margin → distance between classes

    Works well with high-dimensional data


Question 6: What is the Kernel Trick in SVM?
- The Kernel Trick allows SVM to solve non-linear problems by transforming data into a higher-dimensional space without explicitly computing the transformation.

- Common Kernels:

    Linear

    Polynomial

    RBF (Gaussian)

    Sigmoid

- Example:

    Linear kernel → linearly separable data

    RBF kernel → complex, non-linear data

In [1]:
#Question 7: Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies.
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

X, y = load_wine(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

linear_svm = SVC(kernel='linear')
linear_svm.fit(X_train, y_train)
y_pred_linear = linear_svm.predict(X_test)

rbf_svm = SVC(kernel='rbf')
rbf_svm.fit(X_train, y_train)
y_pred_rbf = rbf_svm.predict(X_test)

print("Linear SVM Accuracy:", accuracy_score(y_test, y_pred_linear))
print("RBF SVM Accuracy:", accuracy_score(y_test, y_pred_rbf))


Linear SVM Accuracy: 1.0
RBF SVM Accuracy: 0.8055555555555556


Question 8: What is the Naïve Bayes classifier, and why is it called "Naïve"?
- Naïve Bayes is a probabilistic classifier based on Bayes’ Theorem.

  It is called “Naïve” because it assumes that:

- All features are independent of each other (which is rarely true)

- Why still effective?

    Fast

    Works well with small datasets

    Performs well in text classification

Question 9: Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes?
- 1. Gaussian Naïve Bayes

      Used for continuous data

      Assumes data follows a normal distribution

- 2. Multinomial Naïve Bayes

      Used for count data

      Common in text classification (TF-IDF)

- 3. Bernoulli Naïve Bayes

      Used for binary features (0/1)


In [2]:
#Question 10: Breast Cancer Dataset : Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluate accuracy.
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

X, y = load_breast_cancer(return_X_y=True)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = GaussianNB()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print("Gaussian Naive Bayes Accuracy:", accuracy_score(y_test, y_pred))


Gaussian Naive Bayes Accuracy: 0.9736842105263158
