## ASSIGNMENT

## Supervised Classification: Decision Trees, SVM, and Naive Bayes

### Q1.  What is Information Gain, and how is it used in Decision Trees?

#### A1. Information Gain (IG) is a metric used in decision trees to determine the effectiveness of a feature in splitting the dataset. It measures the reduction in entropy (uncertainty) of the target variable when a feature is used for splitting.

#### Information Gain is a crucial concept in decision tree algorithms. It's used to determine which attribute (or feature) is the best to split the data at each node of the tree. The goal is to maximize information gain, which effectively minimizes the entropy (or impurity) of the resulting child nodes. 

### Q2. What is the difference between Gini Impurity and Entropy? 

#### A2. Gini Impurity: The internal working of Gini impurity is also somewhat similar to the working of entropy in the Decision Tree. In the Decision Tree algorithm, both are used for building the tree by splitting as per the appropriate features but there is quite a difference in the computation of both methods.Gini index is a linear measure. The range of the Gini index is [0, 0.5], where 0 indicates perfect purity and 0.5 indicates maximum impurity. Gini index is typically used in CART (Classification and Regression Trees) algorithms.

#### Entropy: It measures the amount of uncertainty or randomness in a set. The range of entropy is [0, log2(C)], where c is the number of classes. The range becomes [0, 1] for binary classification. Entropy is a logarithmic measure.Entropy is typically used in ID3 and C4.5 algorithms.

### Q3. What is Pre-Pruning in Decision Trees?

#### A3. Pre-pruning halts the growth of the decision tree during its construction to prevent it from becoming overly complex. This is achieved by setting constraints or thresholds that the tree must adhere to while splitting nodes. Common techniques include limiting the maximum depth of the tree, setting a minimum number of samples per leaf or split, and restricting the number of features considered for splitting.Pre-pruning results in a simpler tree that is less likely to overfit the training data. It is computationally efficient as it avoids unnecessary splits, making it suitable for larger datasets. However, it may prematurely stop the tree's growth, potentially missing important patterns in the data.

### Q4. :Write a Python program to train a Decision Tree Classifier using Gini Impurity as the criterion and print the feature importances (practical).

In [1]:
# A4.
# Import necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn import metrics
# Load dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
# Create Decision Tree classifier object
clf = DecisionTreeClassifier(criterion="gini", max_depth=3)
# Train Decision Tree Classifier
clf = clf.fit(X_train, y_train)
# Predict the response for test dataset
y_pred = clf.predict(X_test)
# Model Accuracy
print("Accuracy:", metrics.accuracy_score(y_test, y_pred))

Accuracy: 0.9555555555555556


#### In this example, we first import the necessary libraries and load the Iris dataset. We then split the dataset into training and test sets using the train_test_split function. Next, we create a DecisionTreeClassifier object with the criterion set to "gini" and the maximum depth set to 3. We train the classifier using the fit method and make predictions on the test set using the predict method. Finally, we calculate the accuracy of the model using the accuracy_score function

### Q5. What is a Support Vector Machine (SVM)? 

#### A5. Support Vector Machines (SVM) are supervised machine learning algorithms used for classification, regression, and outlier detection. The core idea of SVM is to find the optimal hyperplane that separates data points of different classes with the maximum margin. This margin is the distance between the hyperplane and the nearest data points, known as support vectors.

### Q6.  What is the Kernel Trick in SVM?

#### A6. Kernel Trick: A method to transform non-linearly separable data into a higher-dimensional space where it becomes linearly separable. Common kernels include linear, polynomial, and radial basis function (RBF).

### Q7.  Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies. 

In [2]:
# A7.
# Import necessary libraries
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

# Load the Wine dataset
wine = datasets.load_wine()
X = wine.data
y = wine.target

# Standardize the dataset for better performance
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train an SVM classifier with a Linear kernel
svm_linear = SVC(kernel='linear', random_state=42)
svm_linear.fit(X_train, y_train)
y_pred_linear = svm_linear.predict(X_test)

# Train an SVM classifier with an RBF kernel
svm_rbf = SVC(kernel='rbf', random_state=42)
svm_rbf.fit(X_train, y_train)
y_pred_rbf = svm_rbf.predict(X_test)

# Calculate and compare accuracies
accuracy_linear = accuracy_score(y_test, y_pred_linear)
accuracy_rbf = accuracy_score(y_test, y_pred_rbf)

print(f"Accuracy with Linear Kernel: {accuracy_linear:.2f}")
print(f"Accuracy with RBF Kernel: {accuracy_rbf:.2f}")


Accuracy with Linear Kernel: 0.98
Accuracy with RBF Kernel: 0.98


### Q8. What is the Naïve Bayes classifier, and why is it called "Naïve"? 

#### A8. The Naive Bayes Classifier is a probabilistic algorithm based on Bayes' Theorem. It assumes that features are independent of each other given the class label, which simplifies the computation of probabilities. The "naive" assumption simplifies the likelihood P(X|y) by assuming that all features are conditionally independent.

### Q9. Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes. 

#### A9. The Bayesian algorithm is a general probabilistic framework based on Bayes' Theorem, which calculates the conditional probability of an event given prior knowledge. It is expressed as:
#### P(A|B) = [P(B|A) * P(A)] / P(B)

#### Bernoulli Naive Bayes is a specific implementation of the Naive Bayes classifier designed for binary data. It assumes:
#### 1. Features are binary (e.g., presence or absence of a word in text classification).
#### 2. Features are conditionally independent given the class label.
#### The Bernoulli model uses the Bernoulli distribution, which calculates the probability of success (1) or failure (0) for each feature. The likelihood of a feature is modeled as:
#### P(xi|y) = p(i|y)^xi * (1 - p(i|y))^(1 - xi)

#### Multinomial Naive Bayes (MNB) is a probabilistic machine learning algorithm based on Bayes' theorem. It is particularly effective for text classification tasks where the features represent discrete frequencies or counts of events, such as word counts in documents.MNB assumes that the features follow a multinomial distribution, which is suitable for data where features are counts or frequencies. The algorithm calculates the probability distribution of text data, making it well-suited for natural language processing (NLP) tasks.
#### The term "multinomial" refers to the type of data distribution assumed by the model. The features in text classification are typically word counts or term frequencies. The multinomial distribution is used to estimate the likelihood of seeing a specific set of word counts in a document.

### Q10. Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluate accuracy. 

In [3]:
#A10. 
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load the Breast Cancer dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize the Gaussian Naïve Bayes classifier
gnb = GaussianNB()

# Train the classifier
gnb.fit(X_train, y_train)

# Make predictions on the test set
y_pred = gnb.predict(X_test)

# Evaluate the accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of Gaussian Naïve Bayes classifier: {accuracy:.2f}")


Accuracy of Gaussian Naïve Bayes classifier: 0.94
