#**SVM and Naives Bayes Assignment**

Q1. What is Information Gain, and how is it used in Decision Trees?

Ans1. Information Gain (IG) is a measure from information theory that quantifies how much “information” (in the sense of reduction in uncertainty) you get about a target variable by knowing the value of some attribute. Information Gain in a decision tree is used as a criterion to pick which feature to split on at each node: you compute the decrease in **uncertainty (entropy)** about the target class when you split the data based on a particular feature, and choose the feature that yields the **largest drop** (i.e. the highest information gain).

Q2. What is the difference between Gini Impurity and Entropy?

Ans2. Gini Impurity measures how often a randomly chosen item from the node would be misclassified if it were labeled according to the class proportions (lower is purer).  Entropy, drawn from information theory, measures the amount of uncertainty or “disorder” in the class distribution (zero when all items are the same class, higher when mixed). Gini is computationally simpler (no logarithms) and often results in slightly faster splitting, whereas Entropy is more sensitive to changes in class probabilities and sometimes yields more balanced splits.

Q3. What is Pre-Pruning in Decision Trees?

Ans4. Pre-Pruning (also called early stopping) is a technique used during the construction of a decision tree: instead of letting the tree grow until every leaf is “pure,” you impose constraints (stopping criteria) so that some branches are not grown if they don’t satisfy certain rules.



In [1]:
#Q4. Write a Python program to train a Decision Tree Classifier using Gini Impurity as the criterion and print the feature importances (practical).
#Ans4.

# Example: Decision Tree Classifier (Gini) + Feature Importances

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier

# 1. Load dataset (for demonstration, using Iris dataset)
data = load_iris()
X, y = data.data, data.target
feature_names = data.feature_names

# 2. Split into train and test (optional but a good practice)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# 3. Create and train the classifier with Gini impurity criterion
clf = DecisionTreeClassifier(criterion='gini', random_state=42)
clf.fit(X_train, y_train)

# 4. Print feature importances
importances = clf.feature_importances_
print("Feature importances (using Gini):")
for name, importance in zip(feature_names, importances):
    print(f"{name}: {importance:.4f}")

# Optional: show some model details
print("\nTree depth:", clf.get_depth())
print("Number of leaves:", clf.get_n_leaves())


Feature importances (using Gini):
sepal length (cm): 0.0000
sepal width (cm): 0.0191
petal length (cm): 0.8933
petal width (cm): 0.0876

Tree depth: 6
Number of leaves: 10


Q5. What is a Support Vector Machine (SVM)?

Ans5. A Support Vector Machine is a supervised machine-learning algorithm that finds a hyperplane (in a space defined by the features) that best separates data points of different classes. It chooses the hyperplane that maximizes the “margin” — the distance between the hyperplane and the nearest data points of any class (the support vectors), which helps SVM generalize well to unseen data.

Q6. What is the Kernel Trick in SVM?

Ans6. The Kernel Trick is a technique used by Support Vector Machine to handle data that are **not linearly separable** in the original feature space. Instead of explicitly transforming the data into a higher-dimensional space (which could be computationally expensive or even infinite-dimensional), SVM uses a **kernel function** that computes the dot product of the data points as if they were in that higher-dimensional space — all without explicitly doing the transformation. This “trick” allows the SVM to define a **linear hyperplane in a transformed (higher-dimensional) space**, which when projected back corresponds to a **non-linear decision boundary in the original space** — enabling SVMs to classify complex, non-linear data efficiently.



In [2]:
#Q7. Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies.
#Ans7.

# SVM on Wine dataset: compare Linear vs RBF kernel
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# 1. Load the data
wine = load_wine()
X, y = wine.data, wine.target

# 2. Split into train/test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.30, random_state=42, stratify=y
)

# 3. Feature scaling — important for SVM
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# 4. Train SVM with Linear kernel
svm_linear = SVC(kernel='linear', C=1.0, random_state=42)
svm_linear.fit(X_train_scaled, y_train)
y_pred_linear = svm_linear.predict(X_test_scaled)
acc_linear = accuracy_score(y_test, y_pred_linear)

# 5. Train SVM with RBF kernel
svm_rbf = SVC(kernel='rbf', C=1.0, gamma='scale', random_state=42)
svm_rbf.fit(X_train_scaled, y_train)
y_pred_rbf = svm_rbf.predict(X_test_scaled)
acc_rbf = accuracy_score(y_test, y_pred_rbf)

# 6. Print accuracies
print(f"Accuracy (Linear kernel): {acc_linear * 100:.2f}%")
print(f"Accuracy (RBF kernel):    {acc_rbf * 100:.2f}%")


Accuracy (Linear kernel): 96.30%
Accuracy (RBF kernel):    98.15%


Q8. What is the Naïve Bayes classifier, and why is it called "Naïve"?

Ans8. The Naive Bayes classifier is a simple probabilistic classification algorithm that uses Bayes' Theorem to compute the probability that a given example belongs to each class, based on its features.It is called “naïve” because the method makes a strong simplification: it assumes that all features are conditionally independent of each other, given the class label. That is, it treats each feature as if it contributes on its own to the probability of the class — even if in reality features are often correlated. This assumption is often unrealistic (hence “naïve”), but it makes the model very simple and computationally efficient — which is also why Naive Bayes works well in many real-world tasks (especially those with many features, like text classification).

Q9.  Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes.

Ans9. Gaussian NB assumes your features are continuous and follow a normal (Gaussian) distribution, so it works best with real-valued numeric data.
Multinomial NB expects discrete features representing counts or frequencies (e.g. word counts in documents) — and uses a multinomial distribution to model them. Bernoulli NB, by contrast, assumes binary features (presence/absence of an attribute), making it suited for data where you only care whether something occurs (or not), not how often.

In [3]:
#Q10.  Breast Cancer Dataset : Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluate accuracy.
#Ans10.

# Gaussian Naive Bayes on Breast Cancer dataset

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# 1. Load data
data = load_breast_cancer()
X, y = data.data, data.target

# 2. Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.30, random_state=42, stratify=y
)

# 3. Create and train GaussianNB classifier
gnb = GaussianNB()
gnb.fit(X_train, y_train)

# 4. Make predictions on test set
y_pred = gnb.predict(X_test)

# 5. Evaluate model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy on test data: {accuracy * 100:.2f}%\n")

print("Classification report:")
print(classification_report(y_test, y_pred, target_names=data.target_names))

print("Confusion matrix:")
print(confusion_matrix(y_test, y_pred))


Accuracy on test data: 94.74%

Classification report:
              precision    recall  f1-score   support

   malignant       0.97      0.89      0.93        64
      benign       0.94      0.98      0.96       107

    accuracy                           0.95       171
   macro avg       0.95      0.94      0.94       171
weighted avg       0.95      0.95      0.95       171

Confusion matrix:
[[ 57   7]
 [  2 105]]
