Que 1. What is Information Gain, and how is it used in Decision Trees?
- Information Gain (IG) measures how much a feature reduces uncertainty (entropy) in a dataset, quantifying the information it provides about the class label in decision trees, it's used to select the best attribute for splitting nodes, choosing the one with the highest IG to create the most informative tree structure, leading to purer subsets and faster convergence to leaf nodes.  
    - At the root, calculate the Information Gain for every available feature.
    
    - The feature with the highest Information Gain is chosen as the root node because it best separates the data into purer groups.
    - This process is repeated for subsequent nodes (branches). For each child node, the algorithm again finds the feature that offers the maximum IG for that specific subset of data.
    - The tree grows by selecting features that maximally reduce uncertainty at each step, leading to leaf nodes that ideally contain instances of a single class.

Que 2. What is the difference between Gini Impurity and Entropy?
- Gini Impurity and Entropy are both impurity measures for decision trees, but Gini uses squared probabilities (faster, ranges 0-0.5), while Entropy uses logarithms (slower, ranges 0-1) to quantify disorder, with lower values indicating purer nodes. Gini is computationally cheaper and often preferred for large datasets, though Entropy can sometimes yield slightly better results by producing more balanced trees.

-  Gini is often the default for large datasets due to speed, but both work well and give similar results.


Que 3. What is Pre-Pruning in Decision Trees?
- Pre-pruning (or early stopping) in decision trees stops the tree's growth during training, preventing it from becoming too complex and overfitting the data, by using criteria like setting a maximum tree depth, minimum samples per leaf, or minimum impurity decrease. It's more computationally efficient than post-pruning (which prunes a full tree later) but risks underfitting if stopped too early, a challenge known as the "horizon effect".  


In [7]:
# Que 4. Write a Python program to train a Decision Tree Classifier using Gini
# Impurity as the criterion and print the feature importances (practical).

import pandas as pd

from sklearn.datasets import load_iris
data = load_iris()
df = pd.DataFrame(data.data, columns = data.feature_names)
X = df
y = data.target

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion='gini', random_state=42)
classifier.fit(X_train, y_train)

importances = classifier.feature_importances_

feature_importances_series = pd.Series(importances, index=data.feature_names)
feature_importances_series = feature_importances_series.sort_values(ascending=False)

print("Feature Importances (Gini Impurity):")
print(feature_importances_series)

Feature Importances (Gini Impurity):
petal length (cm)    0.893264
petal width (cm)     0.087626
sepal width (cm)     0.019110
sepal length (cm)    0.000000
dtype: float64


Que 5. What is a Support Vector Machine (SVM)?
- A Support Vector Machine (SVM) is a powerful supervised machine learning algorithm used for classification and regression tasks. Its primary objective is to find the optimal decision boundary, known as a hyperplane, that maximally separates data points of different classes in a multi-dimensional space.
- The fundamental idea behind an SVM involves several key components:
    - **Hyperplane:** The decision boundary that separates different classes of data. In a two-dimensional space, this is a line; in a three-dimensional space, it is a plane; and in higher dimensions, it is a hyperplane.

    - **Margin:** The distance between the decision boundary and the closest data points from each class. The SVM algorithm aims to maximize this distance to ensure the best possible separation and generalization to new data.
    - **Support Vectors:** The data points that lie closest to the hyperplane and directly influence its position and orientation. Only these critical points are used to define the decision boundary, making the algorithm memory-efficient.

Que 6. What is the Kernel Trick in SVM?
- **Kernel Trick:** The Kernel Trick in SVM is a method to handle non-linear data by implicitly mapping it to a higher-dimensional space, making it linearly separable, without explicitly calculating the coordinates in that space, thus avoiding massive computation. Instead of transforming data points, a kernel function calculates the dot product (similarity) between data points in the new, higher dimension, allowing SVM to find a complex, non-linear decision boundary efficiently.


In [30]:
# Que 7. Write a Python program to train two SVM classifiers with Linear
# and RBF kernels on the Wine dataset, then compare their accuracies.

import pandas as pd
from sklearn.datasets import load_wine
data = load_wine()
df = pd.DataFrame(data.data, columns = data.feature_names)
X = df
y = data.target

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=1)

from sklearn.svm import SVC
clf1 = SVC(kernel='linear')
clf1.fit(X_train, y_train)
y_pred1 = clf1.predict(X_test)

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
print("Accuracies of SVC kernel linear :\n")
print(f'accuracy_score : {accuracy_score(y_test, y_pred1)}')
print(f'confusion_matrix :\n {confusion_matrix(y_test, y_pred1)}')
print(f'\nclassification_report :\n {classification_report(y_test, y_pred1)}')

from sklearn.svm import SVC
clf2 = SVC(kernel='rbf')
clf2.fit(X_train, y_train)
y_pred2 = clf2.predict(X_test)

print("Accuracies of SVC kernel RBF :\n")
print(f'accuracy_score : {accuracy_score(y_test, y_pred2)}')
print(f'confusion_matrix :\n {confusion_matrix(y_test, y_pred2)}')
print(f'\nclassification_report :\n {classification_report(y_test, y_pred2)}')

Accuracies of SVC kernel linear :

accuracy_score : 0.9629629629629629
confusion_matrix :
 [[23  0  0]
 [ 1 18  0]
 [ 0  1 11]]

classification_report :
               precision    recall  f1-score   support

           0       0.96      1.00      0.98        23
           1       0.95      0.95      0.95        19
           2       1.00      0.92      0.96        12

    accuracy                           0.96        54
   macro avg       0.97      0.95      0.96        54
weighted avg       0.96      0.96      0.96        54

Accuracies of SVC kernel RBF :

accuracy_score : 0.6851851851851852
confusion_matrix :
 [[19  0  4]
 [ 1 15  3]
 [ 0  9  3]]

classification_report :
               precision    recall  f1-score   support

           0       0.95      0.83      0.88        23
           1       0.62      0.79      0.70        19
           2       0.30      0.25      0.27        12

    accuracy                           0.69        54
   macro avg       0.62      0.62      0.6

Que 8. What is the Naive Bayes classifier, and why is it called Naive?
- The Naive Bayes classifier is a simple yet powerful probabilistic machine learning algorithm, based on Bayes Theorem, that predicts class membership by calculating probabilities, assuming features are completely independent.
- It's called "Naive" because it unrealistically assumes features don't affect each other, simplifying calculations but sometimes underperforming with real-world correlated data, yet it excels in text classification (spam filters, sentiment analysis) due to its speed and effectiveness.


Que 9. Explain the differences between Gaussian Naive Bayes, Multinomial Naive Bayes, and Bernoulli Naive Bayes.
- **Gaussian Naive Bayes (GNB):** When features are Continuous numerical, and follows normal distribution (e.g., height, weight, temperature). It calculates the mean and variance for each feature within each class and uses the Gaussian probability density function.
    - Use Case: Predicting house prices based on size (continuous), medical diagnosis (e.g., blood pressure).  

- **Multinomial Naive Bayes (MNB):** When inputs are text data (e.g., word counts in a document, number of times a word appears). It models the probability of observing counts of different features (words) for a given class.
    - Use Case: Text classification (spam detection, topic modeling) based on word frequencies.

- **Bernoulli Naive Bayes (BNB):** When features follows Bernoulli dustribution, Binary features (presence or absence, 0 or 1). It calculates the probability of a feature being present or absent (e.g., a word in a document).
    - Use Case: Text classification where features are just whether a word is present or not (not its frequency).

In [40]:
# Question 10: Breast Cancer Dataset
# Write a Python program to train a Gaussian Naive Bayes classifier on the Breast Cancer dataset and evaluate accuracy.

from sklearn.datasets import load_breast_cancer
X, y = load_breast_cancer(return_X_y=True)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)

from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred)

0.9415204678362573