<a href="https://colab.research.google.com/github/Chaakash16/Python-Basics/blob/main/SVM_%26_Naive_bayes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Theoretical**

#### **1. What is a Support Vector Machine (SVM)?**
SVM is a supervised learning algorithm used for classification and regression. It finds the best hyperplane that separates different classes in a dataset.

#### **2. What is the difference between Hard Margin and Soft Margin SVM?**
- **Hard Margin**: No misclassification is allowed, requires perfectly separable data.
- **Soft Margin**: Allows some misclassification to handle noisy and non-linearly separable data.

#### **3. What is the mathematical intuition behind SVM?**
SVM maximizes the margin between data points of different classes by solving an optimization problem using Lagrange multipliers.

#### **4. What is the role of Lagrange Multipliers in SVM?**
They help in transforming the constrained optimization problem into an unconstrained one, making it easier to solve.

#### **5. What are Support Vectors in SVM?**
Support vectors are the data points that are closest to the decision boundary and help define the margin.

#### **6. What is a Support Vector Classifier (SVC)?**
An SVC is an SVM model used for classification tasks.

#### **7. What is a Support Vector Regressor (SVR)?**
An SVR is an SVM model used for regression tasks by fitting the best hyperplane to predict continuous values.

#### **8. What is the Kernel Trick in SVM?**
It allows SVM to work in higher-dimensional space without explicitly transforming the data.

#### **9. Compare Linear Kernel, Polynomial Kernel, and RBF Kernel.**
- **Linear Kernel**: Best for linearly separable data.
- **Polynomial Kernel**: Used for complex relationships.
- **RBF Kernel**: Best for non-linearly separable data.

#### **10. What is the effect of the C parameter in SVM?**
Higher C values lead to lower bias but higher variance, making the model fit the data more strictly.

#### **11. What is the role of the Gamma parameter in RBF Kernel SVM?**
Higher gamma values make the model focus more on nearby points, capturing more complex patterns but increasing overfitting risk.

#### **12. What is the Naïve Bayes classifier, and why is it called "Naïve"?**
Naïve Bayes is a probabilistic classifier based on Bayes' theorem. It is "naïve" because it assumes that features are independent.

#### **13. What is Bayes’ Theorem?**
Bayes’ Theorem calculates the probability of an event based on prior knowledge of conditions related to the event.

#### **14. Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes.**
- **Gaussian Naïve Bayes**: Assumes continuous data follows a Gaussian distribution.
- **Multinomial Naïve Bayes**: Best for discrete data like word counts.
- **Bernoulli Naïve Bayes**: Used for binary feature data.

#### **15. When should you use Gaussian Naïve Bayes over other variants?**
When working with continuous data that follows a normal distribution.

#### **16. What are the key assumptions made by Naïve Bayes?**
- Features are independent.
- Each feature contributes equally to the outcome.

#### **17. What are the advantages and disadvantages of Naïve Bayes?**
**Advantages:** Fast, simple, works well with small datasets.
**Disadvantages:** Assumes feature independence, which may not always be true.

#### **18. Why is Naïve Bayes a good choice for text classification?**
It performs well with high-dimensional sparse data and is computationally efficient.

#### **19. Compare SVM and Naïve Bayes for classification tasks.**
- **SVM**: Works well with complex boundaries but is computationally expensive.
- **Naïve Bayes**: Works well with high-dimensional data and is faster.

#### **20. How does Laplace Smoothing help in Naïve Bayes?**
It prevents zero probability issues by adding a small value to all word counts.

**Practical**

Write a Python program to train an SVM Classifier on the Iris dataset and evaluate accuracy:

In [None]:
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

iris = datasets.load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)
clf = SVC()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(accuracy_score(y_test, y_pred))

Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then
compare their accuracies:

In [None]:
from sklearn.datasets import load_wine

wine = load_wine()
X_train, X_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.2, random_state=42)

clf_linear = SVC(kernel='linear')
clf_rbf = SVC(kernel='rbf')

clf_linear.fit(X_train, y_train)
clf_rbf.fit(X_train, y_train)

y_pred_linear = clf_linear.predict(X_test)
y_pred_rbf = clf_rbf.predict(X_test)

print("Linear Kernel Accuracy:", accuracy_score(y_test, y_pred_linear))
print("RBF Kernel Accuracy:", accuracy_score(y_test, y_pred_rbf))

Write a Python program to train an SVM Regressor (SVR) on a housing dataset and evaluate it using Mean
Squared Error (MSE):

In [None]:
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target, test_size=0.2, random_state=42)

regressor = SVR()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)

print("MSE:", mean_squared_error(y_test, y_pred))

Write a Python program to train an SVM Classifier with a Polynomial Kernel and visualize the decision
boundary:

In [None]:
import numpy as np
import matplotlib.pyplot as plt

X = np.random.randn(200, 2)
y = np.logical_xor(X[:, 0] > 0, X[:, 1] > 0)

svm_poly = SVC(kernel='poly', degree=3)
svm_poly.fit(X, y)

xx, yy = np.meshgrid(np.linspace(-3, 3, 100), np.linspace(-3, 3, 100))
Z = svm_poly.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)

plt.contourf(xx, yy, Z, alpha=0.3)
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k')
plt.show()

Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and
evaluate accuracy:

In [None]:
from sklearn.naive_bayes import GaussianNB
from sklearn.datasets import load_breast_cancer

cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.2, random_state=42)

gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred = gnb.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))

Write a Python program to train a Multinomial Naïve Bayes classifier for text classification using the 20
Newsgroups dataset.

In [None]:
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.datasets import fetch_20newsgroups

newsgroups = fetch_20newsgroups(subset='all', categories=['sci.space', 'rec.sport.baseball'])
X_train, X_test, y_train, y_test = train_test_split(newsgroups.data, newsgroups.target, test_size=0.2, random_state=42)

vectorizer = CountVectorizer()
X_train_counts = vectorizer.fit_transform(X_train)
X_test_counts = vectorizer.transform(X_test)

mnb = MultinomialNB()
mnb.fit(X_train_counts, y_train)
y_pred = mnb.predict(X_test_counts)

print("Accuracy:", accuracy_score(y_test, y_pred))

Write a Python program to train an SVM Classifier with different C values and compare the decision
boundaries visually.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVC

X = np.random.randn(200, 2)
y = np.logical_xor(X[:, 0] > 0, X[:, 1] > 0)

for C in [0.1, 1, 10]:
    clf = SVC(kernel='linear', C=C)
    clf.fit(X, y)
    xx, yy = np.meshgrid(np.linspace(-3, 3, 100), np.linspace(-3, 3, 100))
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
    plt.contourf(xx, yy, Z, alpha=0.3)
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k')
    plt.title(f"SVM with C={C}")
    plt.show()

Write a Python program to train a Bernoulli Naïve Bayes classifier for binary classification on a dataset with
binary features.

In [None]:
from sklearn.naive_bayes import BernoulliNB
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=500, n_features=10, n_classes=2, random_state=42)
bnb = BernoulliNB()
bnb.fit(X, y)
y_pred = bnb.predict(X)
print("Accuracy:", accuracy_score(y, y_pred))

Write a Python program to apply feature scaling before training an SVM model and compare results with
unscaled data.

In [None]:
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

clf = make_pipeline(StandardScaler(), SVC(kernel='linear'))
clf.fit(X_train, y_train)
print("Accuracy with Scaling:", clf.score(X_test, y_test))

Write a Python program to train a Gaussian Naïve Bayes model and compare the predictions before and
after Laplace Smoothing.

In [None]:
gnb_no_smoothing = GaussianNB(var_smoothing=1e-9)
gnb_with_smoothing = GaussianNB(var_smoothing=1e-5)
gnb_no_smoothing.fit(X_train, y_train)
gnb_with_smoothing.fit(X_train, y_train)
print("Accuracy without smoothing:", gnb_no_smoothing.score(X_test, y_test))
print("Accuracy with smoothing:", gnb_with_smoothing.score(X_test, y_test))

Write a Python program to train an SVM Classifier and use GridSearchCV to tune the hyperparameters (C,
gamma, kernel).

In [None]:
from sklearn.model_selection import GridSearchCV

param_grid = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf']}
grid = GridSearchCV(SVC(), param_grid, cv=5)
grid.fit(X_train, y_train)
print("Best parameters:", grid.best_params_)

Write a Python program to train an SVM Classifier on an imbalanced dataset and apply class weighting and
check it improve accuracy.

In [None]:
from sklearn.utils.class_weight import compute_class_weight

class_weights = dict(zip(np.unique(y_train), compute_class_weight('balanced', classes=np.unique(y_train), y=y_train)))
clf = SVC(class_weight=class_weights)
clf.fit(X_train, y_train)
print("Accuracy on imbalanced dataset:", clf.score(X_test, y_test))

Write a Python program to implement a Naïve Bayes classifier for spam detection using email data.

In [None]:
from sklearn.datasets import fetch_openml

email_data = fetch_openml(name='spam', version=1)
X_train, X_test, y_train, y_test = train_test_split(email_data.data, email_data.target, test_size=0.2, random_state=42)
gnb.fit(X_train, y_train)
print("Spam Detection Accuracy:", gnb.score(X_test, y_test))

Write a Python program to train an SVM Classifier and a Naïve Bayes Classifier on the same dataset and
compare their accuracy.

In [None]:
svm_clf = SVC()
svm_clf.fit(X_train, y_train)
print("SVM Accuracy:", svm_clf.score(X_test, y_test))
print("Naive Bayes Accuracy:", gnb.score(X_test, y_test))

Write a Python program to perform feature selection before training a Naïve Bayes classifier and compare
results.

In [None]:
from sklearn.feature_selection import SelectKBest, chi2

X_new = SelectKBest(chi2, k=10).fit_transform(X_train, y_train)
gnb.fit(X_new, y_train)
print("Accuracy after feature selection:", gnb.score(SelectKBest(chi2, k=10).fit_transform(X_test, y_test), y_test))

Write a Python program to train an SVM Classifier using One-vs-Rest (OvR) and One-vs-One (OvO)
strategies on the Wine dataset and compare their accuracy.

In [None]:
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier

ovr_svm = OneVsRestClassifier(SVC())
ovo_svm = OneVsOneClassifier(SVC())

ovr_svm.fit(X_train, y_train)
ovo_svm.fit(X_train, y_train)

print("OvR Accuracy:", ovr_svm.score(X_test, y_test))
print("OvO Accuracy:", ovo_svm.score(X_test, y_test))

Write a Python program to train an SVM Classifier using Linear, Polynomial, and RBF kernels on the Breast
Cancer dataset and compare their accuracy.

In [None]:
for kernel in ['linear', 'poly', 'rbf']:
    clf = SVC(kernel=kernel)
    clf.fit(X_train, y_train)
    print(f"{kernel} Kernel Accuracy:", clf.score(X_test, y_test))

Write a Python program to train an SVM Classifier using Stratified K-Fold Cross-Validation and compute the
average accuracy.

In [None]:
from sklearn.model_selection import StratifiedKFold, cross_val_score

cv = StratifiedKFold(n_splits=5)
scores = cross_val_score(SVC(), X_train, y_train, cv=cv)
print("Cross-validation scores:", scores.mean())

Write a Python program to train a Naïve Bayes classifier using different prior probabilities and compare
performance.

In [None]:
priors = [[0.7, 0.3], [0.5, 0.5]]
for prior in priors:
    gnb = GaussianNB(priors=prior)
    gnb.fit(X_train, y_train)
    print(f"Prior {prior} Accuracy:", gnb.score(X_test, y_test))

Write a Python program to perform Recursive Feature Elimination (RFE) before training an SVM Classifier and
compare accuracy.

In [None]:
from sklearn.feature_selection import RFE

rfe = RFE(SVC(kernel='linear'), n_features_to_select=5)
X_new = rfe.fit_transform(X_train, y_train)
clf.fit(X_new, y_train)
print("Accuracy after RFE:", clf.score(rfe.transform(X_test), y_test))

Write a Python program to train an SVM Classifier and evaluate its performance using Precision, Recall, and
F1-Score instead of accuracy.

In [None]:
from sklearn.metrics import classification_report

y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))

Write a Python program to train a Naïve Bayes Classifier and evaluate its performance using Log Loss
(Cross-Entropy Loss).

In [None]:
from sklearn.metrics import log_loss

y_prob = gnb.predict_proba(X_test)
print("Log Loss:", log_loss(y_test, y_prob))

Write a Python program to train an SVM Classifier and visualize the Confusion Matrix using seaborn.

In [None]:
import seaborn as sns
from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt="d")
plt.show()

Write a Python program to train an SVM Regressor (SVR) and evaluate its performance using Mean Absolute
Error (MAE) instead of MSE.

In [None]:
from sklearn.metrics import mean_absolute_error

y_pred = regressor.predict(X_test)
print("MAE:", mean_absolute_error(y_test, y_pred))

Write a Python program to train a Naïve Bayes classifier and evaluate its performance using the ROC-AUC
score.

In [None]:
from sklearn.metrics import roc_auc_score

y_prob = gnb.predict_proba(X_test)[:, 1]
print("ROC-AUC Score:", roc_auc_score(y_test, y_prob))

Write a Python program to train an SVM Classifier and visualize the Precision-Recall Curve.

In [None]:
from sklearn.metrics import precision_recall_curve

precision, recall, _ = precision_recall_curve(y_test, y_prob)
plt.plot(recall, precision, marker='.')
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.title("Precision-Recall Curve")
plt.show()