#Supervised Classification: Decision
Trees, SVM, and Naive Bayes|
Assignment

Question 1 : What is Information Gain, and how is it used in Decision Trees?
  - Information Gain is a metric used in decision trees to determine the best attribute for splitting data at each node. It quantifies the reduction in entropy (uncertainty) about the target variable after a split. By calculating Information Gain for each attribute, the algorithm selects the one that provides the most significant reduction in entropy. This process continues recursively, building a tree that efficiently classifies or predicts the target variable.

  Question 2: What is the difference between Gini Impurity and Entropy?

  -   Gini Impurity:
      -   It Measures the probability of misclassifying a randomly chosen element.
      -    Calculated as: 1 - Σ p(i)\^2.
      - Computationally less expensive due to the absence of logarithms.
      -   Tends to favor splits with distinct classes.
-   Gini Entropy:
      -   It Measures the average information needed to identify the class.
      -   Calculated as: - Σ p(i) \* log2(p(i)).
      -   Slightly more sensitive to changes in class probabilities.
      -   Tends to favor splits that result in more balanced branches.
-   Use Cases:
    -   Gini Impurity is often preferred for its efficiency.
    -   Entropy may be chosen for a more balanced tree structure.


Question 3:What is Pre-Pruning in Decision Trees?

  - Pre-pruning, also known as early stopping, is a method used in decision trees to prevent overfitting. It involves setting criteria to stop the growth of the tree before it fully classifies the training data. This can include limiting the tree's depth, the number of samples in a leaf node, or the minimum impurity decrease required for a split. The goal is to create a simpler tree that generalizes better to unseen data.





Question 4:Write a Python program to train a Decision Tree Classifier using Gini
Impurity as the criterion and print the feature importances (practical).

In [1]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

dtree = DecisionTreeClassifier(criterion='gini', random_state=42)

dtree.fit(X_train, y_train)

print("Feature Importances:")
for i, importance in enumerate(dtree.feature_importances_):
    print(f"{iris.feature_names[i]}: {importance}")


Feature Importances:
sepal length (cm): 0.0
sepal width (cm): 0.01911001911001911
petal length (cm): 0.8932635518001373
petal width (cm): 0.08762642908984374


Question 5: What is a Support Vector Machine (SVM)?

- A Support Vector Machine (SVM) is a supervised machine learning model used for classification and regression tasks. It works by finding the optimal hyperplane that separates data points of different classes with the widest possible margin. SVMs are effective in high-dimensional spaces and can use kernel functions to handle non-linearly separable data.

Question 6: What is the Kernel Trick in SVM?
  - The kernel trick is a technique used in Support Vector Machines (SVMs) to implicitly map data into a higher-dimensional space without explicitly calculating the transformation. This allows SVMs to perform non-linear classification by using kernel functions, such as the radial basis function (RBF) or polynomial kernels, which compute the dot product of data points in the transformed space.



Question 7: Write a Python program to train two SVM classifiers with Linear and RBF
kernels on the Wine dataset, then compare their accuracies.
Hint:Use SVC(kernel='linear') and SVC(kernel='rbf'), then compare accuracy scores after fitting
on the same dataset.

In [2]:
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_wine
from sklearn.metrics import accuracy_score

wine = load_wine()
X = wine.data
y = wine.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

linear_svm = SVC(kernel='linear')
linear_svm.fit(X_train, y_train)
linear_svm_predictions = linear_svm.predict(X_test)
linear_svm_accuracy = accuracy_score(y_test, linear_svm_predictions)

rbf_svm = SVC(kernel='rbf')
rbf_svm.fit(X_train, y_train)
rbf_svm_predictions = rbf_svm.predict(X_test)
rbf_svm_accuracy = accuracy_score(y_test, rbf_svm_predictions)


print("Linear SVM Accuracy:", linear_svm_accuracy)
print("RBF SVM Accuracy:", rbf_svm_accuracy)

Linear SVM Accuracy: 0.9814814814814815
RBF SVM Accuracy: 0.7592592592592593


Question 8: What is the Naïve Bayes classifier, and why is it called "Naïve"?
  - The Naive Bayes classifier is a probabilistic machine learning algorithm based on Bayes' theorem, used for classification tasks. It's called "naive" because it assumes that all features in the dataset are conditionally independent of each other given the class label. This means it assumes that the presence or absence of a particular feature does not affect the presence or absence of any other feature. This assumption simplifies the calculations but is often unrealistic in real-world scenarios. Despite this strong assumption, Naive Bayes classifiers often perform surprisingly well, especially in text classification and spam filtering.

Question 9: Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve
Bayes, and Bernoulli Naïve Bayes

 -   Gaussian Naive Bayes: This variant assumes that the features follow a Gaussian (normal) distribution. It's typically used when the features are continuous. The classifier calculates the mean and standard deviation of each feature for each class and uses these parameters to estimate the probability of a data point belonging to a particular class.

-   Multinomial Naive Bayes: This is commonly used for text classification. It assumes that the features represent the frequency of words in a document. The features are counts (e.g., term frequency), and the model calculates the probability of each word given a class, using these counts. It's suitable for discrete data, especially for text where the frequency of words matters.

-   Bernoulli Naive Bayes: This variant is used when features are binary (0 or 1). It's suitable for data where the presence or absence of a feature is important. It models the probability of each feature being present (1) or absent (0) given the class. It's often used for text classification with binary features indicating the presence or absence of words.


In [7]:
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score

cancer = load_breast_cancer()

X_train, X_test, y_train, y_test = train_test_split(
    cancer.data, cancer.target, test_size=0.3, random_state=42)

gnb = GaussianNB()


gnb.fit(X_train, y_train)

y_pred = gnb.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy:.4f}")



Accuracy: 0.9415
