# Supervised Classification: Decision Trees, SVM, and Naive Bayes Assignment

 Q 1.  What is Information Gain, and how is it used in Decision Trees?

 ->   Information Gain is a concept from information theory used to measure how well a particular feature separates the data into different classes. In simple words, it tells us how much “information” a feature adds when we split the dataset based on that feature. A high Information Gain means the feature helps in making the data more pure and less mixed.

In Decision Trees, Information Gain is used for selecting the best attribute at every step of tree construction. The algorithm checks all the available features and calculates their Information Gain. The feature that gives the highest gain becomes the root or the next node. This process continues until the tree reaches the stopping condition. Because of this, Information Gain helps the tree make better decisions and increases accuracy.

Q 2. What is the difference between Gini Impurity and Entropy?

->   Gini Impurity and Entropy are two main impurity measures used in Decision Trees. Both measure how mixed the data is, but they work in slightly different ways.

Gini Impurity calculates the probability that a randomly chosen element will be misclassified. It is simpler and faster to compute, making it suitable for large datasets.

Entropy comes from information theory and measures the amount of randomness or disorder in the data. It is mathematically heavier but gives more precise splits when classes are very uneven.

Gini Impurity is preferred in CART algorithm because of its speed, while Entropy is used in ID3 and C4.5 algorithms for more detailed evaluation.

Q 3. What is Pre-Pruning in Decision Trees?

->   Pre-Pruning is a technique used to stop a Decision Tree from growing too large during its training process. Instead of allowing the tree to expand fully and then cutting unnecessary branches later, pre-pruning stops the splitting early if the algorithm detects that further splits will not improve accuracy.

Common pre-pruning methods include setting a maximum depth, minimum samples for split, or minimum leaf size. This prevents the model from overfitting and helps in creating a simpler, more general model. Because of this, pre-pruning saves time and improves performance on unseen data.

Q 4. Write a Python program to train a Decision Tree Classifier using Gini 
Impurity as the criterion and print the feature importances (practical). 
    
Hint: Use criterion='gini' in DecisionTreeClassifier and access .feature_importances_. 

->  

In [1]:
# python Code
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

# Load dataset
data = load_iris()
X = data.data
y = data.target

# Train Decision Tree using Gini
model = DecisionTreeClassifier(criterion='gini')
model.fit(X, y)

# Print feature importances
print("Feature Importances:")
print(model.feature_importances_)

Feature Importances:
[0.01333333 0.01333333 0.55072262 0.42261071]


Q 5. What is a Support Vector Machine (SVM)? 

->  Support Vector Machine is a supervised machine learning model used for classification and regression. Its main idea is to find the best possible boundary, called a hyperplane, that separates different classes with maximum margin. The datapoints closest to the boundary are called support vectors, and they play a key role in defining the decision line.

SVM is powerful because it works well even when the number of features is large, and it can handle both linear and non-linear data. Because of its strong mathematical foundation, SVM gives very high accuracy on many real-world classification problems.

Q 6. What is the Kernel Trick in SVM? 

->  The Kernel Trick is a mathematical method that allows SVM to perform classification on non-linear data. Instead of transforming the data manually to a higher-dimensional space, the kernel function automatically computes relationships in that space.

Common kernels include Linear, Polynomial, and RBF (Gaussian). With the kernel trick, SVM can create complex decision boundaries without increasing the computational cost too much. This makes SVM extremely flexible and powerful.

Q 7. Write a Python program to train two SVM classifiers with Linear and RBF 
kernels on the Wine dataset, then compare their accuracies. 

Hint:Use SVC(kernel='linear') and SVC(kernel='rbf'), then compare accuracy scores after fitting 
on the same dataset.

->  Python Code :

In [2]:
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load dataset
data = load_wine()
X = data.data
y = data.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Linear SVM
svm_linear = SVC(kernel='linear')
svm_linear.fit(X_train, y_train)
acc_linear = accuracy_score(y_test, svm_linear.predict(X_test))

# RBF SVM
svm_rbf = SVC(kernel='rbf')
svm_rbf.fit(X_train, y_train)
acc_rbf = accuracy_score(y_test, svm_rbf.predict(X_test))

print("Linear Kernel Accuracy:", acc_linear)
print("RBF Kernel Accuracy:", acc_rbf)

Linear Kernel Accuracy: 0.9814814814814815
RBF Kernel Accuracy: 0.7592592592592593


Q 8. What is the Naïve Bayes classifier, and why is it called "Naïve"?

->  Naïve Bayes is a probabilistic classification algorithm based on Bayes' Theorem. It predicts the class of a sample by calculating the probability of each class given the input features. It works extremely well for text classification, spam filtering, and sentiment analysis.

It is called “Naïve” because it assumes that all features are independent of each other, which is rarely true in real life. However, even with this simple assumption, Naïve Bayes performs surprisingly well and is fast to train.

Q 9. Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve 
Bayes, and Bernoulli Naïve Bayes 

->  Gaussian Naïve Bayes: Used when features are continuous and follow a normal distribution. Works well in datasets like Iris and medical measurements.

Multinomial Naïve Bayes: Used for discrete counts, especially in text data such as word frequency or term counts in documents.

Bernoulli Naïve Bayes: Used when features are binary (0 or 1). Useful in cases like “word present or not present” in text data.

Each type fits different situations based on the nature of the input features.

Q 10. Breast Cancer Dataset 

Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer 
dataset and evaluate accuracy. 
    
Hint:Use GaussianNB() from sklearn.naive_bayes and the Breast Cancer dataset from 
sklearn.datasets.

->  