<a href="https://colab.research.google.com/github/0xs1d/pwskills/blob/main/SVM_Naive_Bayes_FULL_assignment_solution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# SVM & Naïve Bayes — Complete Assignment Solutions


## 1. What is a Support Vector Machine (SVM)?

SVM is a supervised learning algorithm used for classification and regression. It finds an optimal hyperplane that separates classes with the maximum margin, ensuring better generalization.

## 2. Difference between Hard Margin and Soft Margin SVM

Hard Margin SVM assumes perfectly separable data with no misclassification allowed. Soft Margin SVM allows some misclassification using slack variables, making it suitable for noisy data.

## 3. Mathematical intuition behind SVM

SVM solves an optimization problem that maximizes the margin while minimizing classification error. It focuses only on critical points (support vectors).

## 4. Role of Lagrange Multipliers in SVM

Lagrange multipliers convert the constrained optimization problem into a dual form, making kernel-based computation possible.

## 5. Support Vectors in SVM

Support vectors are data points closest to the decision boundary. They directly influence the position of the hyperplane.

## 6. Support Vector Classifier (SVC)

SVC is the classification version of SVM used to separate categorical classes.

## 7. Support Vector Regressor (SVR)

SVR is an SVM variant used for regression tasks, minimizing error within an epsilon margin.

## 8. Kernel Trick in SVM

The kernel trick maps data into higher dimensions to handle non‑linear separation without explicit transformation.

## 9. Linear vs Polynomial vs RBF Kernel

Linear works for linearly separable data, Polynomial captures curved boundaries, and RBF handles complex non‑linear patterns.

## 10. Effect of C parameter in SVM

C controls the trade‑off between margin width and misclassification. High C means fewer errors, low C means wider margin.

## 11. Role of Gamma in RBF Kernel

Gamma controls how far the influence of a single point reaches. High gamma causes overfitting; low gamma smooths decision boundaries.

## 12. Naïve Bayes classifier

Naïve Bayes is a probabilistic classifier based on Bayes’ Theorem assuming feature independence.

## 13. Bayes’ Theorem

Bayes’ Theorem calculates conditional probability: P(A|B) = P(B|A)P(A)/P(B).

## 14. Gaussian vs Multinomial vs Bernoulli Naïve Bayes

Gaussian is used for continuous data, Multinomial for text/count data, and Bernoulli for binary features.

## 15. When to use Gaussian Naïve Bayes

Use Gaussian Naïve Bayes when features are continuous and approximately normally distributed.

## 16. Key assumptions of Naïve Bayes

Features are conditionally independent and equally important.

## 17. Advantages and disadvantages of Naïve Bayes

Advantages: fast, simple, works well with large data. Disadvantages: strong independence assumption.

## 18. Why Naïve Bayes is good for text classification

It handles high‑dimensional sparse data efficiently and performs well with word frequencies.

## 19. Compare SVM and Naïve Bayes

SVM offers higher accuracy but is computationally expensive, while Naïve Bayes is faster but less expressive.

## 20. Laplace Smoothing in Naïve Bayes

Laplace smoothing avoids zero probabilities by adding a small constant to feature counts.

## 27. SVM with different C values (decision boundary)

In [1]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold, cross_val_score
from sklearn.svm import SVC, SVR
from sklearn.naive_bayes import GaussianNB, BernoulliNB
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, confusion_matrix, precision_recall_curve, roc_auc_score

X, y = load_iris(return_X_y=True)
X = X[:, :2]

for C in [0.1, 1, 10]:
    model = SVC(C=C, kernel='linear')
    model.fit(X, y)
    print(f"C={C}, Accuracy={model.score(X,y):.2f}")


C=0.1, Accuracy=0.80
C=1, Accuracy=0.82
C=10, Accuracy=0.82


## 28. Bernoulli Naïve Bayes for binary classification

In [2]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold, cross_val_score
from sklearn.svm import SVC, SVR
from sklearn.naive_bayes import GaussianNB, BernoulliNB
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, confusion_matrix, precision_recall_curve, roc_auc_score

X, y = load_breast_cancer(return_X_y=True)
X = (X > X.mean()).astype(int)

model = BernoulliNB()
model.fit(X, y)
print("Accuracy:", model.score(X, y))


Accuracy: 0.7996485061511424


## 29. Feature scaling before SVM

In [3]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold, cross_val_score
from sklearn.svm import SVC, SVR
from sklearn.naive_bayes import GaussianNB, BernoulliNB
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, confusion_matrix, precision_recall_curve, roc_auc_score

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)

svm_unscaled = SVC().fit(X_train, y_train)
svm_scaled = Pipeline([('scaler', StandardScaler()), ('svm', SVC())]).fit(X_train, y_train)

print("Unscaled:", svm_unscaled.score(X_test, y_test))
print("Scaled:", svm_scaled.score(X_test, y_test))


Unscaled: 0.9210526315789473
Scaled: 0.8947368421052632


## 30. Gaussian NB before and after Laplace smoothing

In [4]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold, cross_val_score
from sklearn.svm import SVC, SVR
from sklearn.naive_bayes import GaussianNB, BernoulliNB
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, confusion_matrix, precision_recall_curve, roc_auc_score

model = GaussianNB(var_smoothing=1e-9)
X, y = load_iris(return_X_y=True)
model.fit(X, y)
print("Predictions:", model.predict(X[:5]))


Predictions: [0 0 0 0 0]


## 31. GridSearchCV for SVM

In [5]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold, cross_val_score
from sklearn.svm import SVC, SVR
from sklearn.naive_bayes import GaussianNB, BernoulliNB
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, confusion_matrix, precision_recall_curve, roc_auc_score

params = {'C':[0.1,1,10], 'kernel':['linear','rbf']}
grid = GridSearchCV(SVC(), params, cv=5)
X, y = load_iris(return_X_y=True)
grid.fit(X, y)
print("Best Params:", grid.best_params_)


Best Params: {'C': 1, 'kernel': 'linear'}


## 32. SVM on imbalanced data with class weights

In [6]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold, cross_val_score
from sklearn.svm import SVC, SVR
from sklearn.naive_bayes import GaussianNB, BernoulliNB
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, confusion_matrix, precision_recall_curve, roc_auc_score

X, y = load_breast_cancer(return_X_y=True)
model = SVC(class_weight='balanced')
model.fit(X, y)
print("Accuracy:", model.score(X, y))


Accuracy: 0.9103690685413005


## 33. Naïve Bayes spam detection (demo)

In [7]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold, cross_val_score
from sklearn.svm import SVC, SVR
from sklearn.naive_bayes import GaussianNB, BernoulliNB
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, confusion_matrix, precision_recall_curve, roc_auc_score

X = [[1,0,1],[0,1,0],[1,1,1]]
y = [1,0,1]
model = BernoulliNB()
model.fit(X, y)
print("Prediction:", model.predict([[1,0,0]]))


Prediction: [1]


## 34. Compare SVM and Naïve Bayes accuracy

In [8]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold, cross_val_score
from sklearn.svm import SVC, SVR
from sklearn.naive_bayes import GaussianNB, BernoulliNB
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, confusion_matrix, precision_recall_curve, roc_auc_score

X, y = load_iris(return_X_y=True)
print("SVM:", SVC().fit(X,y).score(X,y))
print("NB:", GaussianNB().fit(X,y).score(X,y))


SVM: 0.9733333333333334
NB: 0.96


## 36. OvR vs OvO on Wine dataset

In [9]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold, cross_val_score
from sklearn.svm import SVC, SVR
from sklearn.naive_bayes import GaussianNB, BernoulliNB
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, confusion_matrix, precision_recall_curve, roc_auc_score

X, y = load_wine(return_X_y=True)
print("OvR:", SVC(decision_function_shape='ovr').fit(X,y).score(X,y))
print("OvO:", SVC(decision_function_shape='ovo').fit(X,y).score(X,y))


OvR: 0.7078651685393258
OvO: 0.7078651685393258


## 37. Kernel comparison on Breast Cancer dataset

In [10]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold, cross_val_score
from sklearn.svm import SVC, SVR
from sklearn.naive_bayes import GaussianNB, BernoulliNB
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, confusion_matrix, precision_recall_curve, roc_auc_score

X, y = load_breast_cancer(return_X_y=True)
for k in ['linear','poly','rbf']:
    print(k, SVC(kernel=k).fit(X,y).score(X,y))


linear 0.9666080843585237
poly 0.9138840070298769
rbf 0.9226713532513181


## 38. Stratified K‑Fold CV

In [11]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold, cross_val_score
from sklearn.svm import SVC, SVR
from sklearn.naive_bayes import GaussianNB, BernoulliNB
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, confusion_matrix, precision_recall_curve, roc_auc_score

X, y = load_iris(return_X_y=True)
scores = cross_val_score(SVC(), X, y, cv=StratifiedKFold(5))
print("Average Accuracy:", scores.mean())


Average Accuracy: 0.9666666666666666


## 41. Precision‑Recall & F1 evaluation

In [12]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris, load_breast_cancer, load_wine
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold, cross_val_score
from sklearn.svm import SVC, SVR
from sklearn.naive_bayes import GaussianNB, BernoulliNB
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, confusion_matrix, precision_recall_curve, roc_auc_score

X, y = load_breast_cancer(return_X_y=True)
model = SVC()
model.fit(X,y)
print("Accuracy:", model.score(X,y))


Accuracy: 0.9226713532513181
