

### 1) What is a Support Vector Machine (SVM)

**Answer:**
A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression tasks. It finds the optimal hyperplane that best separates the data into classes by maximizing the margin between the closest points of the classes, called support vectors.

---

### 2) What is the difference between Hard Margin and Soft Margin SVM

**Answer:**

* **Hard Margin SVM** assumes the data is perfectly linearly separable and doesn't allow any misclassifications.
* **Soft Margin SVM** allows some data points to be misclassified by introducing a penalty for violations. It is more robust in real-world scenarios where data is noisy or overlapping.

---

### 3) What is the mathematical intuition behind SVM

**Answer:**
SVM aims to maximize the margin between two classes by solving the following optimization problem:

$$
\min \frac{1}{2} \|w\|^2 \quad \text{subject to } y_i(w \cdot x_i + b) \geq 1
$$

Here, $w$ is the weight vector, $b$ is the bias, and $y_i$ is the class label of point $x_i$. The goal is to find the hyperplane that maximizes the separation margin.

---

### 4) What is the role of Lagrange Multipliers in SVM

**Answer:**
Lagrange multipliers are used to convert the constrained optimization problem into its dual form, making it easier to solve and enabling the use of the **kernel trick**. Non-zero Lagrange multipliers identify the **support vectors**—critical points that define the decision boundary.

---

### 5) What are Support Vectors in SVM

**Answer:**
Support vectors are the data points that lie closest to the decision boundary (margin). These points are the most influential in determining the position and orientation of the separating hyperplane.

---

### 6) What is a Support Vector Classifier (SVC)

**Answer:**
A Support Vector Classifier (SVC) is the classification implementation of SVM. It classifies data into categories by finding the optimal separating hyperplane and supports various kernel functions for non-linear decision boundaries.

---

### 7) What is a Support Vector Regressor (SVR)

**Answer:**
Support Vector Regressor (SVR) is the regression counterpart of SVM. It fits a function within a margin of tolerance (ε), and penalizes data points that fall outside this margin, minimizing the prediction error.

---

### 8) What is the Kernel Trick in SVM

**Answer:**
The Kernel Trick allows SVM to perform non-linear classification by mapping input features into a higher-dimensional space without explicitly computing the transformation. It replaces dot products with kernel functions like linear, polynomial, or RBF to make computations efficient.

---

### 9) Compare Linear Kernel, Polynomial Kernel, and RBF Kernel

**Answer:**

* **Linear Kernel**: Suitable for linearly separable data. Fast and interpretable.
* **Polynomial Kernel**: Adds interaction terms, allowing curved boundaries. Degree parameter controls complexity.
* **RBF (Radial Basis Function)**: Handles highly non-linear data well. Uses a Gaussian function to map points to infinite-dimensional space. Most commonly used.

---

### 10) What is the effect of the C parameter in SVM

**Answer:**
The **C** parameter controls the trade-off between maximizing the margin and minimizing classification error.

* High **C**: Less regularization, tries to classify all training points correctly (may overfit).
* Low **C**: More regularization, allows more margin violations (may underfit but generalizes better).

---

### 11) What is the role of the Gamma parameter in RBF Kernel SVM

**Answer:**
Gamma defines how far the influence of a single training example reaches:

* High gamma: Short-range influence, tight decision boundaries (may overfit).
* Low gamma: Long-range influence, smoother decision boundaries (may underfit).

---

### 12) What is the Naïve Bayes classifier, and why is it called "Naïve"

**Answer:**
Naïve Bayes is a probabilistic classifier based on Bayes’ Theorem. It is "naïve" because it assumes all features are **conditionally independent** given the class label—a strong and rarely true assumption, but often works well in practice.

---

### 13) What is Bayes’ Theorem

**Answer:**
Bayes’ Theorem is a fundamental formula to update probabilities based on new evidence:

$$
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
$$

It calculates the probability of event A happening given that event B has occurred.

---

### 14) Explain the differences between Gaussian Naïve Bayes, Multinomial Naïve Bayes, and Bernoulli Naïve Bayes

**Answer:**

* **Gaussian Naïve Bayes**: For continuous data assuming a normal (Gaussian) distribution.
* **Multinomial Naïve Bayes**: For count-based data (e.g., word counts in text).
* **Bernoulli Naïve Bayes**: For binary/boolean features (e.g., word presence/absence).

---

### 15) When should you use Gaussian Naïve Bayes over other variants

**Answer:**
Use Gaussian Naïve Bayes when your features are **continuous** and **normally distributed**. Common in numerical datasets like sensor data or medical measurements.

---

### 16) What are the key assumptions made by Naïve Bayes

**Answer:**

* All features are **conditionally independent** given the class label.
* Each feature contributes **equally and independently** to the outcome.
  These assumptions simplify computation but are rarely true in practice.

---

### 17) What are the advantages and disadvantages of Naïve Bayes

**Advantages:**

* Simple and fast to train
* Works well with high-dimensional data
* Performs well on text classification problems

**Disadvantages:**

* Assumes feature independence
* May perform poorly when features are highly correlated

---

### 18) Why is Naïve Bayes a good choice for text classification

**Answer:**
Naïve Bayes works well for text because:

* Text data is high-dimensional, which NB handles efficiently.
* Bag-of-words representations fit the independence assumption reasonably well.
* Fast training and prediction even on large datasets.

---

### 19) Compare SVM and Naïve Bayes for classification tasks

**Answer:**

| Feature         | SVM                          | Naïve Bayes                  |
| --------------- | ---------------------------- | ---------------------------- |
| Type            | Discriminative               | Generative                   |
| Accuracy        | Often higher                 | Competitive in text          |
| Training Speed  | Slower                       | Much faster                  |
| Feature Scaling | Required                     | Not required                 |
| Assumption      | No distributional assumption | Conditional independence     |
| Best Use Case   | Complex boundaries           | High-dimensional sparse data |

---

### 20) How does Laplace Smoothing help in Naïve Bayes?

**Answer:**
Laplace Smoothing (also called additive smoothing) prevents zero probabilities for words or features not seen in training data. It adds 1 (or a small value) to all feature counts, ensuring that every possible feature has a non-zero probability.




In [None]:
#1) Write a Python program to train an SVM Classifier on the Iris dataset and evaluate accuracy:
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf = SVC()
clf.fit(X_train, y_train)
print("Accuracy (Iris SVM):", clf.score(X_test, y_test))

#2) Write a Python program to train two SVM classifiers with Linear and RBF kernels on the Wine dataset, then compare their accuracies:
from sklearn.datasets import load_wine
X, y = load_wine(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
svc_linear = SVC(kernel='linear').fit(X_train, y_train)
svc_rbf = SVC(kernel='rbf').fit(X_train, y_train)
print("Linear Kernel Accuracy:", svc_linear.score(X_test, y_test))
print("RBF Kernel Accuracy:", svc_rbf.score(X_test, y_test))

#3) Write a Python program to train an SVM Regressor (SVR) on a housing dataset and evaluate it using Mean Squared Error (MSE):
from sklearn.datasets import fetch_california_housing
X, y = fetch_california_housing(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
svr = SVR()
svr.fit(X_train, y_train)
y_pred = svr.predict(X_test)
print("SVR MSE:", mean_squared_error(y_test, y_pred))

#4) Write a Python program to train an SVM Classifier with a Polynomial Kernel and visualize the decision boundary:
from sklearn.datasets import make_classification
X, y = make_classification(n_features=2, n_redundant=0, n_informative=2, n_clusters_per_class=1)
clf = SVC(kernel='poly', degree=3).fit(X, y)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='coolwarm')
plt.title("SVM with Polynomial Kernel")
plt.show()

#5) Write a Python program to train a Gaussian Naïve Bayes classifier on the Breast Cancer dataset and evaluate accuracy:
from sklearn.datasets import load_breast_cancer
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
gnb = GaussianNB()
gnb.fit(X_train, y_train)
print("Accuracy (Gaussian NB):", gnb.score(X_test, y_test))

#6) Write a Python program to train a Multinomial Naïve Bayes classifier for text classification using the 20 Newsgroups dataset:
from sklearn.datasets import fetch_20newsgroups_vectorized
data = fetch_20newsgroups_vectorized()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y)
mnb = MultinomialNB()
mnb.fit(X_train, y_train)
print("Accuracy (Multinomial NB):", mnb.score(X_test, y_test))

#7) Write a Python program to train an SVM Classifier with different C values and compare the decision boundaries visually:
X, y = make_classification(n_features=2, n_redundant=0, n_informative=2, n_clusters_per_class=1)
for c in [0.1, 1, 10]:
    model = SVC(C=c).fit(X, y)
    plt.scatter(X[:, 0], X[:, 1], c=y, cmap='bwr', alpha=0.6)
    plt.title(f'SVM Decision Boundary with C={c}')
    plt.show()

#8) Write a Python program to train a Bernoulli Naïve Bayes classifier for binary classification on a dataset with binary features:
X = np.random.randint(0, 2, size=(100, 10))
y = np.random.randint(0, 2, size=(100,))
bnb = BernoulliNB()
bnb.fit(X, y)
print("Accuracy (Bernoulli NB):", bnb.score(X, y))

#9) Write a Python program to apply feature scaling before training an SVM model and compare results with unscaled data:
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)
svc = SVC()
svc.fit(X_train, y_train)
acc_unscaled = svc.score(X_test, y_test)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
svc.fit(X_train_scaled, y_train)
acc_scaled = svc.score(X_test_scaled, y_test)
print("Unscaled Accuracy:", acc_unscaled)
print("Scaled Accuracy:", acc_scaled)

#10) Write a Python program to train a Gaussian Naïve Bayes model and compare the predictions before and after Laplace Smoothing:
# Note: GaussianNB doesn't support Laplace smoothing (it's for MultinomialNB).
