<a href="https://colab.research.google.com/github/Tushar-rancy/SVM-Navie-bayes-Assignment/blob/main/SVM_NaiveBayes_Assignment_Complete.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# SVM & Naive Bayes – Assignment Q&A

### Q1. What is a Support Vector Machine (SVM)

**Answer:**
A Support Vector Machine is a supervised learning model used for classification and regression tasks. It finds the optimal hyperplane that maximally separates classes.

### Q2. What is the difference between Hard Margin and Soft Margin SVM

**Answer:**
Hard Margin SVM strictly separates classes with no tolerance for misclassification. Soft Margin SVM allows some errors for better generalization.

### Q3. What is the mathematical intuition behind SVM

**Answer:**
SVM aims to maximize the margin between support vectors of different classes while minimizing classification errors.

### Q4. What is the role of Lagrange Multipliers in SVM

**Answer:**
They help solve the constrained optimization problem in SVM by transforming it into a dual problem.

### Q5. What are Support Vectors in SVM

**Answer:**
Support vectors are data points closest to the decision boundary; they influence the position and orientation of the hyperplane.

### Q6. What is a Support Vector Classifier (SVC)

**Answer:**
SVC is an SVM used for classification tasks.

### Q7. What is a Support Vector Regressor (SVR)

**Answer:**
SVR is an SVM variant used for regression that fits a margin of tolerance (epsilon) around the predicted function.

### Q8. What is the Kernel Trick in SVM

**Answer:**
The kernel trick maps data into higher dimensions to make it linearly separable without explicitly computing the transformation.

### Q9. Compare Linear Kernel, Polynomial Kernel, and RBF Kernel

**Answer:**
- Linear: best for linearly separable data
- Polynomial: maps input features to a higher-degree polynomial space
- RBF: captures complex relationships by using distance-based similarity

### Q10. What is the effect of the C parameter in SVM

**Answer:**
C controls the trade-off between maximizing the margin and minimizing classification errors.

### Q11. What is the role of the Gamma parameter in RBF Kernel SVM

**Answer:**
Gamma defines the influence of a single training example. Low values mean far influence; high values mean close influence.

### Q12. What is the Naïve Bayes classifier, and why is it called 'Naïve'

**Answer:**
Naïve Bayes is a probabilistic classifier based on Bayes’ theorem with the assumption that features are conditionally independent.

### Q13. What is Bayes’ Theorem

**Answer:**
P(A|B) = (P(B|A) * P(A)) / P(B)

### Q14. Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes

**Answer:**
- Gaussian: for continuous data
- Multinomial: for count data
- Bernoulli: for binary features

### Q15. When should you use Gaussian Naïve Bayes over other variants

**Answer:**
When the features are continuous and normally distributed.

### Q16. What are the key assumptions made by Naïve Bayes

**Answer:**
Features are conditionally independent given the class label.

### Q17. What are the advantages and disadvantages of Naïve Bayes

**Answer:**
**Advantages**: Simple, fast, effective for text. **Disadvantages**: Strong independence assumption.

### Q18. Why is Naïve Bayes a good choice for text classification

**Answer:**
Text data often satisfies the conditional independence assumption and Naïve Bayes performs well with sparse data.

### Q19. Compare SVM and Naïve Bayes for classification tasks

**Answer:**
SVM is more flexible and accurate for complex datasets; Naïve Bayes is faster and suitable for text classification.

### Q20. How does Laplace Smoothing help in Naïve Bayes?

**Answer:**
It avoids zero probability by adding a small constant to each count.

## Practical Implementation of SVM & Naïve Bayes

### Q21. Import libraries and load dataset

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC, SVR
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv("your_dataset.csv")
df.head()

### Q22. Explore dataset structure and check for null values

In [None]:
print(df.info())
print(df.isnull().sum())

### Q23. Split the data into features and target variable

In [None]:
X = df.drop('target', axis=1)
y = df['target']

### Q24. Train-test split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Q25. Train an SVM with a linear kernel

In [None]:
svm_linear = SVC(kernel='linear')
svm_linear.fit(X_train, y_train)

### Q26. Evaluate SVM with linear kernel

In [None]:
y_pred = svm_linear.predict(X_test)
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

### Q27. Train an SVM with RBF kernel

In [None]:
svm_rbf = SVC(kernel='rbf')
svm_rbf.fit(X_train, y_train)

### Q28. Train an SVM with polynomial kernel

In [None]:
svm_poly = SVC(kernel='poly', degree=3)
svm_poly.fit(X_train, y_train)

### Q29. Compare accuracies of SVM models

In [None]:
print('Linear:', accuracy_score(y_test, svm_linear.predict(X_test)))
print('RBF:', accuracy_score(y_test, svm_rbf.predict(X_test)))
print('Poly:', accuracy_score(y_test, svm_poly.predict(X_test)))

### Q30. Train a Gaussian Naïve Bayes classifier

In [None]:
nb_gauss = GaussianNB()
nb_gauss.fit(X_train, y_train)

### Q31. Evaluate Gaussian Naïve Bayes

In [None]:
y_pred_nb = nb_gauss.predict(X_test)
print(confusion_matrix(y_test, y_pred_nb))
print(classification_report(y_test, y_pred_nb))

### Q32. Train a Multinomial Naïve Bayes model (for text/count data)

In [None]:
# Only use if features are non-negative counts
nb_multi = MultinomialNB()
nb_multi.fit(X_train, y_train)

### Q33. Train a Bernoulli Naïve Bayes model (for binary features)

In [None]:
nb_bern = BernoulliNB()
nb_bern.fit(X_train, y_train)

### Q34. Plot confusion matrix for best model

In [None]:
sns.heatmap(confusion_matrix(y_test, y_pred_nb), annot=True, fmt='d')
plt.title('Confusion Matrix')
plt.show()

### Q35. Compare performance of SVM and Naïve Bayes

In [None]:
print('SVM Accuracy:', accuracy_score(y_test, svm_rbf.predict(X_test)))
print('NB Accuracy:', accuracy_score(y_test, nb_gauss.predict(X_test)))

### Q36. Apply feature scaling and retrain SVM

In [None]:
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_train_scaled, X_test_scaled, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)
svm_scaled = SVC(kernel='rbf')
svm_scaled.fit(X_train_scaled, y_train)

### Q37. Evaluate scaled SVM model

In [None]:
y_pred_scaled = svm_scaled.predict(X_test_scaled)
print(classification_report(y_test, y_pred_scaled))

### Q38. Use GridSearchCV to find best SVM parameters

In [None]:
from sklearn.model_selection import GridSearchCV
params = {'C': [0.1, 1, 10], 'gamma': [1, 0.1, 0.01], 'kernel': ['rbf']}
grid = GridSearchCV(SVC(), param_grid=params, refit=True, verbose=0)
grid.fit(X_train_scaled, y_train)
print('Best Params:', grid.best_params_)

### Q39. Evaluate best model from GridSearchCV

In [None]:
best_model = grid.best_estimator_
y_pred_best = best_model.predict(X_test_scaled)
print(accuracy_score(y_test, y_pred_best))

### Q40. Train an SVR model for regression task

In [None]:
svr_model = SVR(kernel='rbf')
svr_model.fit(X_train_scaled, y_train)

### Q41. Evaluate SVR model performance

In [None]:
print('SVR Score:', svr_model.score(X_test_scaled, y_test))

### Q42. Plot SVM decision boundary (2D example)

In [None]:
# Only works for 2-feature data
import numpy as np
h = .02
x_min, x_max = X_train_scaled[:, 0].min() - 1, X_train_scaled[:, 0].max() + 1
y_min, y_max = X_train_scaled[:, 1].min() - 1, X_train_scaled[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = best_model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
plt.scatter(X_train_scaled[:, 0], X_train_scaled[:, 1], c=y_train, cmap=plt.cm.coolwarm)
plt.title('SVM Decision Boundary')
plt.show()

### Q43. Save trained SVM model

In [None]:
import joblib
joblib.dump(best_model, 'svm_best_model.pkl')

### Q44. Load and test saved SVM model

In [None]:
loaded_model = joblib.load('svm_best_model.pkl')
print('Loaded model score:', loaded_model.score(X_test_scaled, y_test))

### Q45. Save trained Naïve Bayes model

In [None]:
joblib.dump(nb_gauss, 'naive_bayes_model.pkl')

### Q46. Load and test saved Naïve Bayes model

In [None]:
loaded_nb = joblib.load('naive_bayes_model.pkl')
print('Loaded NB model score:', loaded_nb.score(X_test, y_test))