1. What is a Support Vector Machine (SVM)?

- A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression tasks. It works by finding an optimal hyperplane that separates data points of different classes with the maximum margin. The main objective of SVM is to maximize the distance between the hyperplane and the nearest data points from each class, which are called support vectors. By maximizing this margin, SVM improves generalization and reduces overfitting, making it effective in high-dimensional spaces.

2. Difference Between Hard Margin and Soft Margin SVM

- Hard Margin SVM assumes that the dataset is perfectly linearly separable and does not allow any misclassification. It strictly maximizes the margin while ensuring all data points are correctly classified. However, it is highly sensitive to noise and outliers. Soft Margin SVM, on the other hand, allows some misclassifications by introducing slack variables and a penalty parameter (C). This makes Soft Margin SVM more practical for real-world datasets where perfect separation is not possible.

3. Mathematical Intuition Behind SVM

- Mathematically, SVM attempts to find a hyperplane defined by
ùë§
‚ãÖ
ùë•
+
ùëè
=
0
- w‚ãÖx+b=0 that maximizes the margin between classes. The margin is defined as
2
‚à£
‚à£
ùë§
‚à£
‚à£
‚à£‚à£w‚à£‚à£
2
	‚Äã

. The optimization problem minimizes
1
2
‚à£
‚à£
ùë§
‚à£
‚à£
2
2
1
	‚Äã

‚à£‚à£w‚à£‚à£
2
 subject to classification constraints. In soft margin SVM, slack variables and a penalty parameter C are added to allow misclassification while still maximizing the margin.

4. Role of Lagrange Multipliers in SVM

- Lagrange multipliers are used to convert the constrained optimization problem of SVM into a dual optimization problem. This allows the solution to be expressed in terms of dot products between data points. The dual formulation enables the use of kernel functions and simplifies computation, especially in high-dimensional spaces.

5. What are Support Vectors?

- Support vectors are the data points closest to the decision boundary (hyperplane). These points directly influence the position and orientation of the hyperplane. Removing non-support vectors does not change the model, but removing support vectors would alter the decision boundary significantly.

6. What is Support Vector Classifier (SVC)?

- Support Vector Classifier (SVC) is the classification implementation of SVM. It finds an optimal hyperplane that separates classes with maximum margin. It supports different kernels to handle both linear and non-linear classification problems.

7. What is Support Vector Regressor (SVR)?

- Support Vector Regressor (SVR) is the regression variant of SVM. Instead of maximizing the margin between classes, SVR attempts to fit a function within a specified error tolerance (epsilon). It minimizes prediction error while keeping the model as flat as possible.

8. What is the Kernel Trick?

- The Kernel Trick allows SVM to operate in high-dimensional feature spaces without explicitly computing transformations. It replaces dot products with kernel functions such as Linear, Polynomial, or RBF, enabling non-linear classification efficiently.

9. Compare Linear, Polynomial, and RBF Kernel

- The Linear kernel is used when data is linearly separable and is computationally efficient. The Polynomial kernel maps data into higher-degree polynomial feature space and is useful when relationships are non-linear but structured. The RBF (Radial Basis Function) kernel maps data into infinite-dimensional space and is highly flexible, making it effective for complex, non-linear patterns.

10. Effect of C Parameter

- The C parameter controls the trade-off between maximizing margin and minimizing classification error. A large C prioritizes minimizing misclassification, resulting in smaller margin and possible overfitting. A small C allows larger margin but tolerates more misclassification, improving generalization.

11. Role of Gamma in RBF Kernel

- Gamma defines how far the influence of a single training example reaches. A high gamma makes the decision boundary more complex and sensitive to individual points. A low gamma creates smoother decision boundaries.

NA√èVE BAYES
12. What is Na√Øve Bayes and Why is it Called Na√Øve?

- Na√Øve Bayes is a probabilistic classification algorithm based on Bayes‚Äô Theorem. It assumes conditional independence between features given the class label. It is called ‚Äúna√Øve‚Äù because this independence assumption is often unrealistic, yet the classifier performs well in many real-world tasks.

13. What is Bayes‚Äô Theorem?

- Bayes‚Äô Theorem states:

ùëÉ
(
ùê∂
‚à£
ùëã
)
=
ùëÉ
(
ùëã
‚à£
ùê∂
)
ùëÉ
(
ùê∂
)
ùëÉ
(
ùëã
)
P(C‚à£X)=
P(X)
P(X‚à£C)P(C)
	‚Äã


- It calculates the posterior probability of a class given the features using prior probability and likelihood.

14. Gaussian vs Multinomial vs Bernoulli Na√Øve Bayes

- Gaussian Na√Øve Bayes assumes features follow a normal distribution and is used for continuous data. Multinomial Na√Øve Bayes is used for discrete count data such as word frequencies in text classification. Bernoulli Na√Øve Bayes is used for binary features representing presence or absence.

15. When to Use Gaussian Na√Øve Bayes?

- Gaussian Na√Øve Bayes is suitable when features are continuous and approximately normally distributed, such as medical or sensor data.

16. Key Assumptions of Na√Øve Bayes

- The main assumption is conditional independence of features given the class label. It also assumes consistent probability distributions depending on the variant used.

17. Advantages and Disadvantages

- Advantages include simplicity, speed, and effectiveness in high-dimensional problems. Disadvantages include unrealistic independence assumption and poor performance when features are highly correlated.

18. Why Na√Øve Bayes is Good for Text Classification?

- Na√Øve Bayes performs well in high-dimensional sparse data like text. Word occurrences are often treated as independent, making Multinomial NB particularly effective for spam detection and document classification.

19. Compare SVM and Na√Øve Bayes

- SVM is a discriminative model that maximizes margin and often achieves higher accuracy in complex datasets. Na√Øve Bayes is a generative model that estimates probabilities and is computationally faster. SVM performs better with complex decision boundaries, while Na√Øve Bayes is better for text classification and small datasets.

20. How Laplace Smoothing Helps?

- Laplace smoothing adds 1 to frequency counts to avoid zero probability issues when a feature does not appear in training data for a class.

In [5]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn import datasets
from sklearn.model_selection import train_test_split, GridSearchCV, StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import (accuracy_score, confusion_matrix, classification_report,
                             mean_squared_error, mean_absolute_error,
                             precision_recall_curve, roc_auc_score,
                             log_loss)

from sklearn.svm import SVC, SVR
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
from sklearn.feature_selection import RFE
from sklearn.feature_extraction.text import CountVectorizer

In [6]:
#2. SVM Classifier on Iris Dataset
iris = datasets.load_iris()
X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.3, random_state=42
)

model = SVC(kernel='linear')
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))

Accuracy: 1.0


In [7]:
# Linear vs RBF Kernel (Wine Dataset)
wine = datasets.load_wine()
X_train, X_test, y_train, y_test = train_test_split(
    wine.data, wine.target, test_size=0.3, random_state=42
)

linear = SVC(kernel='linear')
rbf = SVC(kernel='rbf')

linear.fit(X_train, y_train)
rbf.fit(X_train, y_train)

print("Linear Accuracy:", accuracy_score(y_test, linear.predict(X_test)))
print("RBF Accuracy:", accuracy_score(y_test, rbf.predict(X_test)))

Linear Accuracy: 0.9814814814814815
RBF Accuracy: 0.7592592592592593


In [8]:
# 4. Polynomial Kernel SVM
poly = SVC(kernel='poly', degree=3)
poly.fit(X_train, y_train)

print("Polynomial Accuracy:", accuracy_score(y_test, poly.predict(X_test)))

Polynomial Accuracy: 0.7592592592592593


In [9]:
# 5. SVM with Different C Values
for c in [0.1, 1, 10]:
    model = SVC(C=c, kernel='linear')
    model.fit(X_train, y_train)
    print(f"C={c}, Accuracy:", accuracy_score(y_test, model.predict(X_test)))

C=0.1, Accuracy: 1.0
C=1, Accuracy: 0.9814814814814815
C=10, Accuracy: 1.0


In [10]:
# Feature Scaling Comparison
scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model_unscaled = SVC()
model_scaled = SVC()

model_unscaled.fit(X_train, y_train)
model_scaled.fit(X_train_scaled, y_train)

print("Unscaled Accuracy:", accuracy_score(y_test, model_unscaled.predict(X_test)))
print("Scaled Accuracy:", accuracy_score(y_test, model_scaled.predict(X_test_scaled)))

Unscaled Accuracy: 0.7592592592592593
Scaled Accuracy: 0.9814814814814815


In [11]:
# 7. GridSearchCV Hyperparameter Tuning
param_grid = {
    'C': [0.1, 1, 10],
    'gamma': [0.01, 0.1, 1],
    'kernel': ['linear', 'rbf']
}

grid = GridSearchCV(SVC(), param_grid, cv=5)
grid.fit(X_train, y_train)

print("Best Parameters:", grid.best_params_)
print("Best Score:", grid.best_score_)

Best Parameters: {'C': 0.1, 'gamma': 0.01, 'kernel': 'linear'}
Best Score: 0.9276666666666668


In [12]:
# 8. SVM on Imbalanced Dataset
from sklearn.datasets import make_classification

X, y = make_classification(n_classes=2, weights=[0.9, 0.1],
                           n_samples=1000, random_state=42)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

model = SVC(class_weight='balanced')
model.fit(X_train, y_train)

print("Accuracy with class weighting:",
      accuracy_score(y_test, model.predict(X_test)))

Accuracy with class weighting: 0.9066666666666666


In [13]:
# 9. Stratified K-Fold Cross Validation
skf = StratifiedKFold(n_splits=5)
scores = []

for train_index, test_index in skf.split(X, y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    model = SVC()
    model.fit(X_train, y_train)
    scores.append(accuracy_score(y_test, model.predict(X_test)))

print("Average Accuracy:", np.mean(scores))

Average Accuracy: 0.929


In [14]:
# 10. SVM Regression (SVR)
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(
    housing.data, housing.target, test_size=0.3
)

svr = SVR()
svr.fit(X_train, y_train)

pred = svr.predict(X_test)

print("MSE:", mean_squared_error(y_test, pred))
print("MAE:", mean_absolute_error(y_test, pred))

MSE: 1.3533388997978366
MAE: 0.8642367230952747


In [None]:
# 11. Confusion Matrix Visualization
cm = confusion_matrix(y_test, model.predict(X_test))
sns.heatmap(cm, annot=True, fmt='d')
plt.title("Confusion Matrix")
plt.show()

In [16]:
# 13. Gaussian Na√Øve Bayes (Breast Cancer)
data = datasets.load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, test_size=0.3
)

gnb = GaussianNB()
gnb.fit(X_train, y_train)

pred = gnb.predict(X_test)

print("Accuracy:", accuracy_score(y_test, pred))
print("ROC-AUC:", roc_auc_score(y_test, gnb.predict_proba(X_test)[:,1]))
print("Log Loss:", log_loss(y_test, gnb.predict_proba(X_test)))

Accuracy: 0.9473684210526315
ROC-AUC: 0.9867284937639911
Log Loss: 0.5154800967208021


In [17]:
# 14. Multinomial Na√Øve Bayes (20 Newsgroups)
from sklearn.datasets import fetch_20newsgroups

news = fetch_20newsgroups(subset='all')
vectorizer = CountVectorizer()

X = vectorizer.fit_transform(news.data)
y = news.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

mnb = MultinomialNB()
mnb.fit(X_train, y_train)

print("Accuracy:", accuracy_score(y_test, mnb.predict(X_test)))

Accuracy: 0.8461266360099045


In [18]:
# 15. Bernoulli Na√Øve Bayes (Binary Features)
bnb = BernoulliNB()
bnb.fit(X_train, y_train)

print("Bernoulli Accuracy:",
      accuracy_score(y_test, bnb.predict(X_test)))

Bernoulli Accuracy: 0.6729748850371419


In [19]:
# 16. Laplace Smoothing Effect
mnb_no_smooth = MultinomialNB(alpha=0)
mnb_smooth = MultinomialNB(alpha=1)

mnb_no_smooth.fit(X_train, y_train)
mnb_smooth.fit(X_train, y_train)

print("Without Smoothing:",
      accuracy_score(y_test, mnb_no_smooth.predict(X_test)))

print("With Laplace Smoothing:",
      accuracy_score(y_test, mnb_smooth.predict(X_test)))

  self.feature_log_prob_ = np.log(smoothed_fc) - np.log(


Without Smoothing: 0.16643084541917227
With Laplace Smoothing: 0.8461266360099045
