Supervised Classification: Decision 
Trees, SVM, and Naive Baye|

In [None]:
'''
Question 1 : What is Information Gain, and how is it used in Decision Trees?
Answer:      Information Gain measures how much uncertainty about the target variable is reduced after splitting the data based on a particular feature. 
             In Decision Trees, it helps determine which attribute to split on at each node. The feature with the highest Information Gain — 
             meaning it best separates the classes — is chosen for the split, leading to a purer classification.

Question 2: What is the difference between Gini Impurity and Entropy?
Answer:     Both Gini Impurity and Entropy measure how mixed the classes are within a node. Entropy uses logarithms and measures information content, 
            while Gini measures the probability of incorrect classification. Gini is computationally faster and often preferred in CART, 
            while Entropy is used in ID3/C4.5 and gives slightly more balanced splits.

Question 3:What is Pre-Pruning in Decision Trees?
Answer:    Pre-pruning in Decision Trees is a technique used to stop the tree from growing too large during its construction. 
           It sets conditions such as maximum depth, minimum number of samples required to split a node, or minimum Information Gain. 
           This prevents overfitting by halting further splits that do not significantly improve model accuracy, resulting in a simpler 
           and more generalizable tree.           
'''           

In [4]:
'''
Question 4:Write a Python program to train a Decision Tree Classifier using Gini
Impurity as the criterion and print the feature importances (practical).
Answer:
'''
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

iris = load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

model = DecisionTreeClassifier(criterion='gini', random_state=42)
model.fit(X_train, y_train)

accuracy = model.score(X_test, y_test)
print("Accuracy:", accuracy)

for name, importance in zip(iris.feature_names, model.feature_importances_):
    print(f"{name}: {importance:.4f}")

Accuracy: 1.0
sepal length (cm): 0.0000
sepal width (cm): 0.0191
petal length (cm): 0.8933
petal width (cm): 0.0876


In [None]:
'''
Question 5: What is a Support Vector Machine (SVM)?
Answer:     A Support Vector Machine (SVM) is a supervised learning algorithm used for classification and regression tasks. 
            It works by finding the optimal hyperplane that best separates data points of different classes with the maximum margin. 
            The data points closest to this boundary are called support vectors, as they directly influence the position of the hyperplane. 
            SVMs are effective in high-dimensional spaces and can handle both linear and non-linear data using kernel functions.

Question 6: What is the Kernel Trick in SVM?
Answer:     The Kernel Trick in SVM is a mathematical technique that allows the algorithm to handle non-linear data by implicitly mapping 
            it into a higher-dimensional space without explicitly performing the transformation. This helps the SVM find a linear separating 
            hyperplane in that transformed space. Common kernel functions include linear, polynomial, and RBF (Radial Basis Function) kernels, 
            which enable SVMs to model complex, non-linear decision boundaries efficiently.
'''            

In [2]:
'''
Question 7: Write a Python program to train two SVM classifiers with Linear and RBF
kernels on the Wine dataset, then compare their accuracies.
Answer:
'''
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

wine = load_wine()
X, y = wine.data, wine.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, stratify=y)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

svm_linear = SVC(kernel='linear', random_state=42)
svm_linear.fit(X_train, y_train)
linear_acc = svm_linear.score(X_test, y_test)

svm_rbf = SVC(kernel='rbf', random_state=42)
svm_rbf.fit(X_train, y_train)
rbf_acc = svm_rbf.score(X_test, y_test)

print("Linear Kernel Accuracy:", linear_acc)
print("RBF Kernel Accuracy:", rbf_acc)

Linear Kernel Accuracy: 0.9555555555555556
RBF Kernel Accuracy: 0.9777777777777777


In [None]:
'''
Question 8: What is the Naïve Bayes classifier, and why is it called "Naïve"?
Answer:     The Naïve Bayes classifier is a probabilistic machine learning algorithm based on Bayes’ Theorem, which predicts
            the class of a data point by calculating the probability of each class given the input features. It is called “Naïve” 
            because it assumes that all features are independent of each other given the class label — an assumption that is rarely 
            true in real-world data but simplifies computation. Despite this simplification, Naïve Bayes often performs remarkably well, 
            especially in text classification and spam detection.

Question 9: Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes
Answer:     The three types of Naïve Bayes classifiers differ based on the type of data they handle. Gaussian Naïve Bayes is used for
            continuous data and assumes that features follow a normal (Gaussian) distribution. Multinomial Naïve Bayes is suitable for 
            count-based features, such as word frequencies in text classification. Bernoulli Naïve Bayes is designed for binary features, 
            where each feature represents the presence or absence of a particular attribute (e.g., whether a word appears in a document).
'''

In [3]:
'''
Question 10: Breast Cancer Dataset
Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer
dataset and evaluate accuracy.
Answer:
'''
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB

data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.25, random_state=42)
model = GaussianNB().fit(X_train, y_train)

print("Accuracy:", model.score(X_test, y_test))

Accuracy: 0.958041958041958
