Support Vector Machine (SVM)
1: What is a Support Vector Machine (SVM)?
SVM is a supervised machine learning algorithm used for classification and regression tasks. It finds the best hyperplane that separates data points of different classes with the maximum margin.

2: What is the difference between Hard Margin and Soft Margin SVM?

Hard Margin SVM: Assumes the data is linearly separable with no misclassifications. Strict and sensitive to outliers.

Soft Margin SVM: Allows some misclassifications by introducing a penalty parameter (C) to balance margin size and classification error. Better for real-world data.

3: What are Support Vectors in SVM?
These are the data points closest to the hyperplane. They lie on the margin boundaries and are critical in defining the hyperplane.

4: What is a Support Vector Classifier (SVC)?
It is the implementation of SVM for classification tasks. Scikit-learn's SVC is a popular example.

5 : What is a Support Vector Regressor (SVR)?**
SVR is the regression counterpart of SVM. Instead of maximizing the margin, it fits data within an epsilon-insensitive tube and penalizes deviations outside it.

6: What is the Kernel Trick in SVM?
It allows SVMs to perform in non-linear feature spaces without explicitly computing transformations. It uses kernel functions like:

Linear

Polynomial

Radial Basis Function (RBF)

Sigmoid


10: What is the effect of the C parameter in SVM?

C controls the trade-off between margin width and misclassification.

Large C → less tolerance for error (hard margin-like).

Small C → more tolerance, wider margin.

11: What is the role of the Gamma parameter in RBF Kernel SVM?

Gamma defines how far the influence of a single training point reaches.

Low gamma → far reach → smoother boundaries.

High gamma → close reach → complex boundaries (can overfit).

Naïve Bayes
12: What is the Naïve Bayes classifier, and why is it called "Naïve"?
Naïve Bayes classifier is a probabilistic machine learning algorithm based on Bayes’ Theorem, used primarily for classification tasks. It assumes independence between features, which is why it is termed "naïve."

Classifier: It assigns a class label to a given input based on probability.

Why "Naïve"? It assumes that all features are conditionally independent given the class label — which is rarely true in real-world data. However, despite this unrealistic assumption, it often performs surprisingly well in practice.

13: What is Bayes’ Theorem?
Bayes’ Theorem describes the probability of an event, based on prior knowledge of conditions that might be related to the event.

𝑃
(
𝐶
∣
𝑋
)
=
𝑃
(
𝑋
∣
𝐶
)
⋅
𝑃
(
𝐶
)
𝑃
(
𝑋
)
P(C∣X)=
P(X)
P(X∣C)⋅P(C)
​

Where:

𝑃
(
𝐶
∣
𝑋
)
P(C∣X): Posterior probability (class given features)

𝑃
(
𝑋
∣
𝐶
)
P(X∣C): Likelihood (features given class)

𝑃
(
𝐶
)
P(C): Prior probability of the class

𝑃
(
𝑋
)
P(X): Evidence (total probability of features)




14: Differences between Gaussian, Multinomial, and Bernoulli Naïve Bayes:

Variant	Suitable For	Distribution Assumption
Gaussian	Continuous data	Normal distribution
Multinomial	Discrete data (e.g., word counts)	Multinomial distribution
Bernoulli	Binary features (e.g., word presence)	Bernoulli distribution (0 or 1)

15: When should you use Gaussian Naïve Bayes over other variants?
Use it when your features are continuous and normally distributed — e.g., image pixel values, sensor readings.

16: What are the key assumptions made by Naïve Bayes?

Feature independence: All features contribute independently to the class.

Equal importance: Each feature has equal influence (unless weighted differently).

17: What are the advantages and disadvantages of Naïve Bayes?

Advantages:

Simple and fast

Works well with high-dimensional data

Effective for text classification

Handles missing data

Disadvantages:

Assumes feature independence

Can be outperformed by complex models

Poor performance with correlated features

18: Why is Naïve Bayes a good choice for text classification?

Text data is high-dimensional and sparse

Word occurrences can be treated as independent features

Fast training and prediction

Performs well in spam detection, sentiment analysis

19: Compare SVM and Naïve Bayes for classification tasks:

Aspect	SVM	Naïve Bayes
Speed (Training)	Slower	Very fast
Performance	High (especially with tuning)	Competitive in many cases
Assumptions	Margin maximization	Feature independence
Works well for	Complex boundaries	Text, high-dimensional data
Probabilistic Output	Not inherently	Yes

20: How does Laplace Smoothing help in Naïve Bayes?
In Naïve Bayes, when calculating probabilities, we often encounter a situation where a particular feature value does not occur with a certain class in the training data.
This leads to:

𝑃
(
𝑥
𝑖
∣
𝑦
)
=
0
P(x
i
​
 ∣y)=0
And since Naïve Bayes multiplies probabilities, this zero makes the entire posterior probability zero, which is undesirable — especially if the event is rare but possible.

✅ Solution: Laplace Smoothing (a.k.a. Add-One Smoothing)
Laplace Smoothing adjusts probability estimates to avoid zero probabilities by adding a small constant (usually 1) to each count.

Formula without smoothing:
𝑃
(
𝑤
𝑖
∣
𝑦
)
=
count
(
𝑤
𝑖
,
𝑦
)
∑
𝑗
count
(
𝑤
𝑗
,
𝑦
)
P(w
i
​
 ∣y)=
∑
j
​
 count(w
j
​
 ,y)
count(w
i
​
 ,y)
​

With Laplace Smoothing (add-1):
𝑃
(
𝑤
𝑖
∣
𝑦
)
=
count
(
𝑤
𝑖
,
𝑦
)
+
1
∑
𝑗
count
(
𝑤
𝑗
,
𝑦
)
+
𝑉
P(w
i
​
 ∣y)=
∑
j
​
 count(w
j
​
 ,y)+V
count(w
i
​
 ,y)+1
​

Where:

𝑤
𝑖
w
i
​
 : a word or feature

𝑦
y: the class label

𝑉
V: total number of unique features (e.g., vocabulary size in text classification)

📌 Example (Text Classification):
Imagine you're classifying messages as spam or not spam, and you're using word frequencies. If a word like "lottery" hasn't appeared in any spam messages during training, you'll get:

𝑃
(
lottery
∣
spam
)
=
0
P(lottery∣spam)=0
Without smoothing, the entire spam probability becomes 0 for any message containing "lottery".
With Laplace smoothing:

𝑃
(
lottery
∣
spam
)
=
0
+
1
total word count in spam
+
𝑉
P(lottery∣spam)=
total word count in spam+V
0+1
​

Now, the model assigns a small, non-zero probability instead of zero — preventing incorrect elimination of a possible class.

🟢 Benefits of Laplace Smoothing:
Prevents zero probability errors

Enables the model to handle unseen features

Makes the model more robust to sparse data


In [None]:
#21.
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train SVM Classifier
model = SVC(kernel='linear')
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("Iris Dataset - Accuracy:", accuracy_score(y_test, y_pred))


In [None]:
#22
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load dataset
wine = load_wine()
X, y = wine.data, wine.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Linear Kernel
linear_svm = SVC(kernel='linear')
linear_svm.fit(X_train, y_train)
linear_pred = linear_svm.predict(X_test)
linear_acc = accuracy_score(y_test, linear_pred)

# RBF Kernel
rbf_svm = SVC(kernel='rbf')
rbf_svm.fit(X_train, y_train)
rbf_pred = rbf_svm.predict(X_test)
rbf_acc = accuracy_score(y_test, rbf_pred)

print(f"Linear SVM Accuracy: {linear_acc}")
print(f"RBF SVM Accuracy: {rbf_acc}")


In [None]:
#23
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error

# Load dataset
data = fetch_california_housing()
X, y = data.data, data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train SVR
model = SVR(kernel='rbf')
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("SVR - Mean Squared Error:", mean_squared_error(y_test, y_pred))


In [None]:
#24
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.svm import SVC

# Generate synthetic data
X, y = make_classification(n_features=2, n_redundant=0, n_informative=2,
                           n_clusters_per_class=1, n_samples=100, random_state=42)

# Train SVM with polynomial kernel
model = SVC(kernel='poly', degree=3)
model.fit(X, y)

# Plot decision boundary
def plot_decision_boundary(clf, X, y):
    h = .02
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    plt.contourf(xx, yy, Z, alpha=0.3)
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k')
    plt.title("SVM with Polynomial Kernel")
    plt.show()

plot_decision_boundary(model, X, y)


In [None]:
#25
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Gaussian NB
model = GaussianNB()
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("Breast Cancer Dataset - Accuracy:", accuracy_score(y_test, y_pred))



In [None]:
#26
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset
newsgroups = fetch_20newsgroups(subset='all')
X, y = newsgroups.data, newsgroups.target

# Vectorize text data
vectorizer = CountVectorizer()
X_vec = vectorizer.fit_transform(X)

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X_vec, y, test_size=0.2, random_state=42)

# Train Multinomial NB
model = MultinomialNB()
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
print("20 Newsgroups - Accuracy:", accuracy_score(y_test, y_pred))


In [None]:
#27
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.svm import SVC

X, y = datasets.make_classification(n_features=2, n_redundant=0, n_informative=2,
                                    n_clusters_per_class=1, n_samples=100, random_state=42)

C_vals = [0.01, 1, 100]

plt.figure(figsize=(12, 4))
for i, C in enumerate(C_vals):
    model = SVC(kernel='linear', C=C)
    model.fit(X, y)

    plt.subplot(1, 3, i+1)
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
                         np.linspace(y_min, y_max, 200))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    plt.contourf(xx, yy, Z, alpha=0.3)
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k')
    plt.title(f'C = {C}')
plt.tight_layout()
plt.show()


In [None]:
#28
from sklearn.datasets import make_classification
from sklearn.naive_bayes import BernoulliNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = make_classification(n_samples=1000, n_features=20, n_informative=15,
                           n_classes=2, random_state=42)
X_binary = (X > 0).astype(int)

X_train, X_test, y_train, y_test = train_test_split(X_binary, y, test_size=0.2)

model = BernoulliNB()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print("Bernoulli NB Accuracy:", accuracy_score(y_test, y_pred))


In [None]:
#29
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

# Without scaling
model_raw = SVC()
model_raw.fit(X_train, y_train)
print("Without Scaling Accuracy:", model_raw.score(X_test, y_test))

# With scaling
model_scaled = make_pipeline(StandardScaler(), SVC())
model_scaled.fit(X_train, y_train)
print("With Scaling Accuracy:", model_scaled.score(X_test, y_test))


In [None]:
#30
from sklearn.naive_bayes import GaussianNB
import numpy as np

gnb = GaussianNB()
gnb.fit(X_train, y_train)
pred_before = gnb.predict(X_test)

# Simulated Laplace Smoothing by adding small value to variance
gnb.var_ += 1e-9
pred_after = gnb.predict(X_test)

print("Accuracy before smoothing:", accuracy_score(y_test, pred_before))
print("Accuracy after smoothing:", accuracy_score(y_test, pred_after))


In [None]:
#31
from sklearn.model_selection import GridSearchCV

param_grid = {
    'C': [0.1, 1, 10],
    'gamma': ['scale', 'auto'],
    'kernel': ['linear', 'rbf']
}

grid = GridSearchCV(SVC(), param_grid, cv=5)
grid.fit(X_train, y_train)

print("Best Parameters:", grid.best_params_)
print("Best Score:", grid.best_score_)


In [None]:
#32
from sklearn.utils.class_weight import compute_class_weight

# Create imbalance
X_imb, y_imb = X, np.where(y == 0, 0, 1)
y_imb[:90] = 0  # majority class

model = SVC(class_weight='balanced')
model.fit(X_imb, y_imb)
print("Imbalanced Accuracy with Class Weight:", model.score(X_imb, y_imb))


In [None]:
#33
from sklearn.feature_extraction.text import CountVectorizer

emails = ["Free offer just for you", "Hi friend, let's meet tomorrow", "Claim your free money now", "How are you doing today"]
labels = [1, 0, 1, 0]  # 1=Spam, 0=Ham

vec = CountVectorizer(binary=True)
X = vec.fit_transform(emails)

model = BernoulliNB()
model.fit(X, labels)

print("Predictions:", model.predict(X))


In [None]:
#34
from sklearn.naive_bayes import GaussianNB

svm = SVC()
nb = GaussianNB()

svm.fit(X_train, y_train)
nb.fit(X_train, y_train)

print("SVM Accuracy:", svm.score(X_test, y_test))
print("Naïve Bayes Accuracy:", nb.score(X_test, y_test))


In [None]:
#35
from sklearn.feature_selection import SelectKBest, chi2

selector = SelectKBest(score_func=chi2, k=10)
X_new = selector.fit_transform(X, y)

X_train_fs, X_test_fs, y_train, y_test = train_test_split(X_new, y, test_size=0.2)

nb = GaussianNB()
nb.fit(X_train_fs, y_train)

print("Naïve Bayes Accuracy after Feature Selection:", nb.score(X_test_fs, y_test))


In [None]:
#36
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier

X_train, X_test, y_train, y_test = train_test_split(*load_wine(return_X_y=True), test_size=0.2)

model_ovr = OneVsRestClassifier(SVC())
model_ovo = OneVsOneClassifier(SVC())

model_ovr.fit(X_train, y_train)
model_ovo.fit(X_train, y_train)

print("OvR Accuracy:", model_ovr.score(X_test, y_test))
print("OvO Accuracy:", model_ovo.score(X_test, y_test))


In [None]:
#37
from sklearn.model_selection import StratifiedKFold, cross_val_score

skf = StratifiedKFold(n_splits=5)
scores = cross_val_score(SVC(), X, y, cv=skf)

print("Stratified K-Fold Accuracy (mean):", scores.mean())


In [None]:
#38
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.svm import SVC
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
skf = StratifiedKFold(n_splits=5)

model = SVC(kernel='linear')
scores = cross_val_score(model, X, y, cv=skf)

print("Average Accuracy (Stratified K-Fold):", scores.mean())


In [None]:
#39
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score

X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

priors = [[0.5, 0.5], [0.7, 0.3], [0.3, 0.7]]
for p in priors:
    model = GaussianNB(priors=p)
    model.fit(X_train, y_train)
    acc = accuracy_score(y_test, model.predict(X_test))
    print(f"Accuracy with priors {p}: {acc:.4f}")


In [None]:
#40
from sklearn.feature_selection import RFE
from sklearn.svm import SVC
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X, y = load_wine(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)

# RFE with 5 selected features
svm = SVC(kernel='linear')
selector = RFE(estimator=svm, n_features_to_select=5)
selector.fit(X_train, y_train)

model = svm.fit(selector.transform(X_train), y_train)
y_pred = model.predict(selector.transform(X_test))

print("Accuracy after RFE:", accuracy_score(y_test, y_pred))


In [None]:
#41
from sklearn.metrics import precision_score, recall_score, f1_score

model = SVC()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print("Precision:", precision_score(y_test, y_pred, average='macro'))
print("Recall:", recall_score(y_test, y_pred, average='macro'))
print("F1 Score:", f1_score(y_test, y_pred, average='macro'))


In [None]:
#42
from sklearn.metrics import log_loss
from sklearn.naive_bayes import GaussianNB

model = GaussianNB()
model.fit(X_train, y_train)
y_proba = model.predict_proba(X_test)

print("Log Loss:", log_loss(y_test, y_proba))


In [None]:
#43
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)

sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title("SVM Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()


In [None]:
#44
from sklearn.datasets import fetch_california_housing
from sklearn.svm import SVR
from sklearn.metrics import mean_absolute_error

X, y = fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)

model = SVR()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print("Mean Absolute Error (SVR):", mean_absolute_error(y_test, y_pred))


In [None]:
#45
from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import label_binarize

# For binary classification
model = GaussianNB()
model.fit(X_train, y_train)
y_proba = model.predict_proba(X_test)

# Binarize labels for ROC AUC
y_bin = label_binarize(y_test, classes=[0, 1])
print("ROC-AUC Score:", roc_auc_score(y_bin, y_proba[:, 1]))


In [None]:
#46
from sklearn.metrics import precision_recall_curve, PrecisionRecallDisplay

model = SVC(probability=True)
model.fit(X_train, y_train)
y_proba = model.predict_proba(X_test)[:, 1]

precision, recall, _ = precision_recall_curve(y_test, y_proba)
disp = PrecisionRecallDisplay(precision=precision, recall=recall)
disp.plot()
plt.title("SVM Precision-Recall Curve")
plt.show()


In [None]:
#47
