In [None]:
'''

 What is a Support Vector Machine (SVM)?**
SVM is a supervised machine learning algorithm used for classification and regression tasks. It finds the optimal hyperplane that best separates different classes in a dataset by maximizing the margin between data points.

 What is the difference between Hard Margin and Soft Margin SVM?**
- **Hard Margin SVM**: Used when the data is perfectly linearly separable. It strictly maximizes the margin but does not allow any misclassification.
- **Soft Margin SVM**: Used when data is not perfectly separable. It allows some misclassification by introducing a penalty (C parameter) to improve generalization.

 What is the mathematical intuition behind SVM?**
SVM aims to maximize the margin between two classes. It finds the hyperplane defined as:
\[
w^T x + b = 0
\]
by minimizing the norm \( ||w||^2 \) while ensuring correct classification:
\[
y_i (w^T x_i + b) \geq 1
\]

 What is the role of Lagrange Multipliers in SVM?**
Lagrange multipliers help convert the constrained optimization problem of SVM into an unconstrained dual form, making it easier to solve using quadratic programming.

 What are Support Vectors in SVM?**
Support Vectors are the data points that lie closest to the decision boundary. They determine the position and orientation of the optimal hyperplane.

 What is a Support Vector Classifier (SVC)?**
SVC is the classification version of SVM, used to separate data into two or more classes using a decision boundary.

 What is a Support Vector Regressor (SVR)?**
SVR is the regression version of SVM that tries to fit a function within a certain margin while minimizing the error.

 What is the Kernel Trick in SVM?**
The Kernel Trick allows SVM to handle non-linearly separable data by mapping it into a higher-dimensional space where it becomes linearly separable.

 Compare Linear Kernel, Polynomial Kernel, and RBF Kernel:**
- **Linear Kernel**: Used for linearly separable data.
- **Polynomial Kernel**: Captures polynomial relationships but is computationally expensive.
- **RBF Kernel**: Most commonly used; maps data into infinite-dimensional space and is effective for complex patterns.

 What is the effect of the C parameter in SVM?**
The **C parameter** controls the trade-off between maximizing the margin and minimizing misclassification. A high C reduces the margin but minimizes misclassification, while a low C increases the margin but allows more errors.

 What is the role of the Gamma parameter in RBF Kernel SVM?**
Gamma determines how far the influence of a single training point reaches. A high gamma makes the model focus on close points, while a low gamma captures broader patterns.


 What is the Naïve Bayes classifier, and why is it called "Naïve"?**
Naïve Bayes is a probabilistic classifier based on Bayes' Theorem. It is called "Naïve" because it assumes that features are independent of each other, which is often not the case in real-world scenarios.

 What is Bayes’ Theorem?**
Bayes’ Theorem states:
\[
P(A|B) = \frac{P(B|A) P(A)}{P(B)}
\]
It calculates the probability of an event occurring given prior knowledge of related conditions.

 Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes:**
- **Gaussian Naïve Bayes**: Used for continuous data assuming a normal distribution.
- **Multinomial Naïve Bayes**: Used for text classification where features represent frequencies.
- **Bernoulli Naïve Bayes**: Used for binary features (0/1) like spam classification.

 When should you use Gaussian Naïve Bayes over other variants?**
Gaussian Naïve Bayes is best suited when the dataset consists of continuous variables that follow a normal distribution.

 What are the key assumptions made by Naïve Bayes?**
- Features are independent of each other (conditional independence assumption).
- All features contribute equally to the outcome.
- The prior probabilities are correctly estimated.

 What are the advantages and disadvantages of Naïve Bayes?**
**Advantages**:
- Fast and efficient.
- Works well with high-dimensional data.
- Requires a small amount of training data.

**Disadvantages**:
- Assumes feature independence, which is rarely true.
- Struggles with feature dependencies.

Why is Naïve Bayes a good choice for text classification?**
Naïve Bayes performs well in text classification because words (features) often appear independently, making the independence assumption less problematic.

 Compare SVM and Naïve Bayes for classification tasks:**
- **SVM**: Works well for high-dimensional, complex datasets with clear margins.
- **Naïve Bayes**: Works well for probabilistic classification, especially for text and spam filtering.

How does Laplace Smoothing help in Naïve Bayes?**
Laplace Smoothing (additive smoothing) prevents zero probabilities by adding a small constant (e.g., 1) to all frequency counts, ensuring unseen features do not dominate the prediction.

Here are the interview-style questions along with Python-based answers:

---

 Write a Python program to train an SVM Classifier on the Iris dataset and evaluate accuracy**
```python
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

iris = datasets.load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

model = SVC(kernel='linear')
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
```

----------------------------------------------------------------------------------------------------------------------------------------------------------
---python

 Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies**
```python
from sklearn.datasets import load_wine
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

wine = load_wine()
X_train, X_test, y_train, y_test = train_test_split(wine.data, wine.target, test_size=0.2, random_state=42)

linear_svm = SVC(kernel='linear').fit(X_train, y_train)
rbf_svm = SVC(kernel='rbf').fit(X_train, y_train)

print("Linear Kernel Accuracy:", accuracy_score(y_test, linear_svm.predict(X_test)))
print("RBF Kernel Accuracy:", accuracy_score(y_test, rbf_svm.predict(X_test)))
```

---

 Write a Python program to train an SVM Regressor (SVR) on a housing dataset and evaluate it using Mean Squared Error (MSE)**
```python
from sklearn.datasets import fetch_california_housing
from sklearn.svm import SVR
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

housing = fetch_california_housing()
X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target, test_size=0.2, random_state=42)

model = SVR(kernel='rbf')
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
print(f'Mean Squared Error: {mse:.2f}')
```

---

 Write a Python program to train an SVM Classifier with a Polynomial Kernel and visualize the decision boundary**
```python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
from sklearn.svm import SVC

X, y = make_moons(n_samples=100, noise=0.1)
clf = SVC(kernel='poly', degree=3).fit(X, y)

plt.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm')

ax = plt.gca()
xlim, ylim = ax.get_xlim(), ax.get_ylim()
xx, yy = np.meshgrid(np.linspace(xlim[0], xlim[1], 50), np.linspace(ylim[0], ylim[1], 50))
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)

plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='black')
plt.show()
```

---

 Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluate accuracy**
```python
from sklearn.datasets import load_breast_cancer
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, test_size=0.2, random_state=42)

model = GaussianNB().fit(X_train, y_train)
y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
```

---

 Write a Python program to train a Multinomial Naïve Bayes classifier for text classification using the 20 Newsgroups dataset**
```python
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

news = fetch_20newsgroups(subset='train', categories=['rec.autos', 'sci.space'])
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(news.data)
X_train, X_test, y_train, y_test = train_test_split(X, news.target, test_size=0.2, random_state=42)

model = MultinomialNB().fit(X_train, y_train)
y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
```

---

 Write a Python program to train an SVM Classifier and use GridSearchCV to tune the hyperparameters (C, gamma, kernel)**
```python
from sklearn.model_selection import GridSearchCV
param_grid = {'C': [0.1, 1, 10], 'gamma': [0.001, 0.01, 0.1], 'kernel': ['linear', 'rbf']}
grid = GridSearchCV(SVC(), param_grid, cv=5)
grid.fit(X_train, y_train)

print(f'Best parameters: {grid.best_params_}')
print(f'Best accuracy: {grid.best_score_}')
```

---

 Write a Python program to train an SVM Classifier and a Naïve Bayes Classifier on the same dataset and compare their accuracy**
```python
from sklearn.naive_bayes import GaussianNB

svm_model = SVC(kernel='rbf').fit(X_train, y_train)
nb_model = GaussianNB().fit(X_train, y_train)

svm_acc = accuracy_score(y_test, svm_model.predict(X_test))
nb_acc = accuracy_score(y_test, nb_model.predict(X_test))

print(f'SVM Accuracy: {svm_acc:.2f}')
print(f'Naïve Bayes Accuracy: {nb_acc:.2f}')
```

---

 Write a Python program to train a Naïve Bayes classifier and evaluate its performance using the ROC-AUC score**
```python
from sklearn.metrics import roc_auc_score

y_prob = nb_model.predict_proba(X_test)[:, 1]
roc_auc = roc_auc_score(y_test, y_prob)

print(f'ROC-AUC Score: {roc_auc:.2f}')
```
