## üß† PART 1: SUPPORT VECTOR MACHINES (SVM)

---

### **1. What is a Support Vector Classifier (SVC), and how does it differ from other classifiers?**

- An SVC is a **supervised machine learning algorithm** used for **binary or multiclass classification**. It aims to find the **optimal hyperplane** that separates classes with the **maximum margin**.

~ **Key differences:**

* Focuses on **maximizing margin**, not minimizing error.
* Only **support vectors** (critical boundary points) influence the model.
* Works well for **high-dimensional and non-linear data** via kernels.

---

### **2. How does an SVC find the optimal hyperplane?**

- SVC finds a hyperplane $w^T x + b = 0$ such that:

 * The margin between classes is maximized.
 * For linearly separable data:

$$
\min_{w,b} \frac{1}{2} \|w\|^2 \quad \text{s.t. } y_i(w^T x_i + b) \ge 1
$$

~ For non-linearly separable data, **soft margin** and **slack variables** are added.

---

### **3. Compare SVC with Decision Tree Classifier**

| Feature          | SVC                               | Decision Tree                |
| ---------------- | --------------------------------- | ---------------------------- |
| Boundary         | Linear / non-linear (via kernels) | Axis-aligned partitions      |
| Overfitting      | Less prone                        | More prone (without pruning) |
| Interpretability | Lower                             | High                         |
| Feature scaling  | Required                          | Not needed                   |
| Small datasets   | Performs well                     | Performs well                |

---

### **4. What is the significance of support vectors?**

- Support vectors are the **data points closest to the hyperplane**. They:

* **Define the margin**

* Influence the model boundary

* Are the only samples used in the decision function

---

### **5. Difference between hard margin and soft margin in SVM**

| Type        | Description                                     | Use Case                   |
| ----------- | ----------------------------------------------- | -------------------------- |
| Hard Margin | No misclassifications allowed                   | Perfectly separable data   |
| Soft Margin | Allows some misclassifications using slack vars | Real-world, noisy datasets |

---

### **6. When would you prefer soft margin over hard margin?**

- Use **soft margin** when:

* Data is **not linearly separable**

* There‚Äôs **noise or outliers**

* Slight misclassifications improve **generalization**

---

### **7. How does soft margin SVM adjust for non-linearity?**

- Soft margin SVM introduces **slack variables $\xi_i$**:

$$
y_i(w^T x_i + b) \ge 1 - \xi_i, \quad \xi_i \ge 0
$$

- Minimizes:

$$
\frac{1}{2} \|w\|^2 + C \sum \xi_i
$$

---

### **8. Impact of margin size on generalization**

* **Larger margin** ‚áí Better generalization (low variance).

* **Smaller margin** ‚áí Overfitting (sensitive to noise).

~ SVM maximizes margin to **improve generalization**.

---

### **9. Mathematical formulation for maximizing margin**

$$
\min_{w,b} \frac{1}{2} \|w\|^2 \quad \text{s.t. } y_i(w^T x_i + b) \ge 1
$$

~ This convex optimization finds the **widest margin** separating hyperplane.

---

### **10. Lagrange multiplier method in SVM**

- We form the **Lagrangian**:

$$
L(w, b, \alpha) = \frac{1}{2} \|w\|^2 - \sum \alpha_i [y_i(w^T x_i + b) - 1]
$$

~ Then solve using **Karush-Kuhn-Tucker (KKT)** conditions.

---

### **11. Quadratic programming and dual problem**

- The dual form:

$$
\max_{\alpha} \sum \alpha_i - \frac{1}{2} \sum \alpha_i \alpha_j y_i y_j x_i^T x_j
$$

- Subject to:

$$
0 \le \alpha_i \le C, \quad \sum \alpha_i y_i = 0
$$

- This is a **quadratic programming** problem.

---

### **12. What is the kernel trick?**

- It maps input data to **higher dimensions** without computing the mapping explicitly.

$$
K(x_i, x_j) = \phi(x_i)^T \phi(x_j)
$$

E.g., RBF kernel:

$$
K(x, x') = \exp\left(-\frac{\|x - x'\|^2}{2\sigma^2}\right)
$$

---

### **13. How does the SVM objective change in soft margin?**

- Hard margin:

$$
\min \frac{1}{2}\|w\|^2
$$

- Soft margin adds penalty:

$$
\min \frac{1}{2} \|w\|^2 + C \sum \xi_i
$$

---

### **14. Dual form of SVM optimization**

$$
\max_\alpha \sum \alpha_i - \frac{1}{2} \sum_{i,j} \alpha_i \alpha_j y_i y_j K(x_i, x_j)
$$

---

### **15. Why maximize the margin?**

- Wider margin ‚áí Better generalization
Mathematically:

$$
\text{Margin} = \frac{2}{\|w\|}
$$

---

### **16. How does soft margin SVM handle misclassified points?**

- Uses slack variables $\xi_i > 1$ to tolerate errors.

---

### **17. Role of regularization parameter C**

* **High C**: Low bias, high variance (fit tightly)

* **Low C**: High bias, low variance (simpler margin)

---

### **18. Trade-off between margin and error**

- Controlled by $C$:

* Large margin + more errors (low C)

* Small margin + fewer errors (high C)

---

### **19. Slack variables influence decision boundary**

- They allow violations of margin, shifting the hyperplane to **balance margin and misclassifications**.

---

## üß™ SVM IMPLEMENTATION (20‚Äì31)

---

### **20. Implement SVM Classifier in Python**

```python
from sklearn.svm import SVC
clf = SVC(kernel='linear', C=1.0)
clf.fit(X_train, y_train)
```

---

### **21. Hyperparameter tuning via GridSearch**

```python
from sklearn.model_selection import GridSearchCV
params = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf']}
grid = GridSearchCV(SVC(), param_grid=params, cv=5)
grid.fit(X_train, y_train)
```

---

### **22. Cross-validation for SVM**

```python
from sklearn.model_selection import cross_val_score
cross_val_score(SVC(), X, y, cv=5)
```

---

### **23. Compare SVM vs Logistic Regression via ROC-AUC**

```python
from sklearn.metrics import roc_auc_score
svm_probs = svm_model.decision_function(X_test)
lr_probs = lr_model.predict_proba(X_test)[:,1]
roc_auc_score(y_test, svm_probs)
roc_auc_score(y_test, lr_probs)
```

---

### **24. What is SVR & how does it differ from Linear Regression?**

**Support Vector Regression (SVR)**:

* Tries to fit the data **within an epsilon tube**

* Penalizes deviations beyond $\epsilon$

* Focuses on **support vectors**, not all data points

---

### **25. What is the epsilon-tube in SVR?**

- A **tolerance zone** where predictions within $\epsilon$ of actual values are not penalized.

---

### **26. Impact of kernel in SVR**

* **Linear kernel**: Assumes linear relationship

* **RBF/Poly**: Captures non-linearity

* Impacts both **fit and generalization**

---

### **27. SVR vs Ridge Regression**

| Feature         | SVR                      | Ridge Regression           |
| --------------- | ------------------------ | -------------------------- |
| Support vectors | Sparse model             | All data points contribute |
| Flexibility     | Nonlinear via kernel     | Linear only                |
| Bias-variance   | Controlled via epsilon/C | Controlled via alpha       |

---

### **28. SVR implementation in Python**

```python
from sklearn.svm import SVR
svr = SVR(kernel='rbf', C=1.0, epsilon=0.1)
svr.fit(X_train, y_train)
```

---

### **29. Hyperparameter tuning for SVR using Random Search**

```python
from sklearn.model_selection import RandomizedSearchCV
params = {'C': [0.1, 1, 10], 'epsilon': [0.1, 0.2], 'kernel': ['rbf', 'linear']}
search = RandomizedSearchCV(SVR(), param_distributions=params, cv=5)
search.fit(X_train, y_train)
```

---

### **30. Evaluate SVR using R¬≤ and MSE**

```python
from sklearn.metrics import r2_score, mean_squared_error
y_pred = svr.predict(X_test)
r2_score(y_test, y_pred), mean_squared_error(y_test, y_pred)
```

---

### **31. Cross-validation performance comparison**

- Use `cross_val_score()` with different models:

```python
cross_val_score(SVR(), X, y, cv=5)
cross_val_score(Ridge(), X, y, cv=5)
```

---

## üîÑ SVM KERNELS (32‚Äì39)

---

### **32. What is a kernel in SVM?**

- A kernel is a function that calculates the **dot product in high-dimensional space**, enabling non-linear classification.

---

### **33. Linear vs Polynomial vs RBF**

| Kernel     | Description                     | Use Case                      |
| ---------- | ------------------------------- | ----------------------------- |
| Linear     | Straight hyperplane             | Linearly separable data       |
| Polynomial | Polynomial boundaries           | Interactions between features |
| RBF        | Gaussian-based decision surface | Most general-purpose          |

---

### **34. Kernel trick explanation**

- Instead of transforming $x$ explicitly, we compute:

$$
K(x_i, x_j) = \phi(x_i)^T \phi(x_j)
$$

---

### **35. Linear vs RBF efficiency**

* **Linear**: Fastest, less memory

* **RBF**: Slower, more flexible but costly in high dimensions

---

### **36. Why margin maximization matters**

- Wider margins ‚áí better generalization ‚áí lower variance.

---

### **37. Significance of support vectors**

- They form the **decision boundary**. Removing non-support vectors doesn‚Äôt affect the model.

---

### **38. SVM vs Logistic Regression**

| Feature          | SVM                    | Logistic Regression         |
| ---------------- | ---------------------- | --------------------------- |
| Margin-based     | Yes                    | No                          |
| Probabilistic    | No (unless calibrated) | Yes                         |
| Performance      | Better with small data | Competitive with large data |
| Interpretability | Low                    | High                        |

---

### **39. SVM vs Decision Tree for classification**

* **SVM**: Better on high-dimensional data

* **Trees**: Better interpretability, works well with categorical data

---

## üß† NAIVE BAYES (40‚Äì57)

---

### **40. What is Naive Bayes? Why "naive"?**

- It‚Äôs a probabilistic classifier based on **Bayes‚Äô Theorem**.
"Naive" because it assumes **independence** between features.

---

### **41. Independence assumption implications**

* Not always true in real-world data

* Surprisingly effective even when violated

---

### **42. Performance with highly correlated features**

- Accuracy drops if features are highly correlated, as assumption breaks.

---

### **43. Naive Bayes vs SVM vs Random Forest**

| Model         | Speed     | Accuracy | Scalability       |
| ------------- | --------- | -------- | ----------------- |
| Naive Bayes   | Very fast | Moderate | Excellent         |
| SVM           | Slow      | High     | Poor (large data) |
| Random Forest | Fast      | High     | Good              |

---

### **44. Kernel functions in SVM for high-dimensional mapping**

- See Q12 and Q34 for math-based explanations.

---

### **45. Choosing an appropriate kernel**

* Try **linear** first

* Use **RBF** for general non-linear cases

* Use **grid search** to tune kernel parameters

---

### **46. Polynomial vs RBF on complex data**

* **RBF**: Better for complex, smooth boundaries

* **Polynomial**: Better when data has clear polynomial trends

---

### **47. Key assumptions of Naive Bayes**

* Feature independence

* Equal importance of features

* All features contribute independently to class probability

---

### **48. Categorical vs Continuous features**

* **Categorical**: Use **Multinomial/Bernoulli NB**

* **Continuous**: Use **Gaussian NB**

---

### **49. Why is it called ‚Äúnaive‚Äù?**

- Because of the **naive assumption of independence** between features.

---

### **50. Bayes‚Äô Theorem**

$$
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
$$

---

### **51. Naive Bayes Posterior Calculation**

$$
P(y|x_1, ..., x_n) \propto P(y) \prod_{i=1}^n P(x_i|y)
$$

---

### **52. Handling missing data**

* Ignore missing features in calculation

* Use imputation

* Some versions can handle missing values probabilistically

---

### **53. Gaussian vs Multinomial Naive Bayes**

| Type        | Use Case                  |
| ----------- | ------------------------- |
| Gaussian    | Continuous data           |
| Multinomial | Text (counts/frequencies) |

---

### **54. Types of Naive Bayes**

* **Gaussian**: Continuous features (assumes normal distribution)

* **Multinomial**: Count-based data

* **Bernoulli**: Binary features (e.g., 0/1)

---

### **55. Differences between Gaussian, Multinomial, Bernoulli**

| Type        | Feature Type | Distribution Assumed |
| ----------- | ------------ | -------------------- |
| Gaussian    | Continuous   | Normal               |
| Multinomial | Counts       | Multinomial          |
| Bernoulli   | Binary       | Bernoulli (0 or 1)   |

---

### **56. Text Classification with Naive Bayes**

```python
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

X = ["spam message", "important email", ...]
y = [1, 0, ...]

vec = CountVectorizer()
X_vec = vec.fit_transform(X)

clf = MultinomialNB()
clf.fit(X_vec, y)
```

---

### **57. Gaussian Naive Bayes for continuous data**

```python
from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()
clf.fit(X_train, y_train)
```

---




In [None]:
# Assuming you have your full dataset in variables X and y
# If you need to load data, use pandas or another library here
# Example:
# import pandas as pd
# data = pd.read_csv('your_data.csv')
# X = data.drop('target_column', axis=1)
# y = data['target_column']

from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
import numpy as np # Import numpy for example data

# Create some example data if you don't have any loaded
# In a real scenario, replace this with your actual data loading
if 'X' not in locals() or 'y' not in locals():
    print("Creating example data. Replace this with your actual data loading.")
    X = np.random.rand(100, 10) # 100 samples, 10 features
    y = np.random.randint(0, 2, 100) # 100 labels (0 or 1)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

# Now you can fit the SVC model
clf = SVC(kernel='linear', C=1.0)
clf.fit(X_train, y_train)

print("Model fitted successfully.")

Creating example data. Replace this with your actual data loading.
Model fitted successfully.


In [None]:
from sklearn.model_selection import GridSearchCV
params = {'C': [0.1, 1, 10], 'kernel': ['linear', 'rbf']}
grid = GridSearchCV(SVC(), param_grid=params, cv=5)
grid.fit(X_train, y_train)


In [None]:
from sklearn.metrics import roc_auc_score
from sklearn.linear_model import LogisticRegression # Import Logistic Regression

# Assuming 'clf' from the previous cell is the intended SVM model
svm_model = clf

# Train a Logistic Regression model for comparison
lr_model = LogisticRegression()
lr_model.fit(X_train, y_train) # Fit the Logistic Regression model

# Calculate decision function for SVM and probabilities for Logistic Regression
svm_probs = svm_model.decision_function(X_test)
lr_probs = lr_model.predict_proba(X_test)[:,1]

# Calculate and print ROC AUC scores
svm_roc_auc = roc_auc_score(y_test, svm_probs)
lr_roc_auc = roc_auc_score(y_test, lr_probs)

print(f"SVM ROC AUC: {svm_roc_auc}")
print(f"Logistic Regression ROC AUC: {lr_roc_auc}")

SVM ROC AUC: 0.5256410256410257
Logistic Regression ROC AUC: 0.5448717948717949


In [None]:
from sklearn.svm import SVR
svr = SVR(kernel='rbf', C=1.0, epsilon=0.1)
svr.fit(X_train, y_train)


In [None]:
from sklearn.model_selection import RandomizedSearchCV
params = {'C': [0.1, 1, 10], 'epsilon': [0.1, 0.2], 'kernel': ['rbf', 'linear']}
search = RandomizedSearchCV(SVR(), param_distributions=params, cv=5)
search.fit(X_train, y_train)


In [None]:
from sklearn.metrics import r2_score, mean_squared_error
y_pred = svr.predict(X_test)
r2_score(y_test, y_pred), mean_squared_error(y_test, y_pred)


(-0.13232019305482012, 0.28262712018648317)

In [None]:
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import Ridge # Import Ridge model
from sklearn.svm import SVR # Import SVR again, although it was imported in a previous cell, it's good practice for cell independence

# Assuming X and y are defined from previous cells
cross_val_score(SVR(), X, y, cv=5)
cross_val_score(Ridge(), X, y, cv=5)

array([ 0.11640046,  0.09648436, -0.33739306, -0.04597496, -0.18294413])

In [None]:
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

# Replace the ellipsis with actual example data
X = ["spam message", "important email", "another spam example", "clean email message"]
y = [1, 0, 1, 0] # Corresponding labels for the example data

vec = CountVectorizer()
X_vec = vec.fit_transform(X)

clf = MultinomialNB()
clf.fit(X_vec, y)

print("Multinomial Naive Bayes model fitted successfully with example text data.")

Multinomial Naive Bayes model fitted successfully with example text data.


In [None]:
from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()
clf.fit(X_train, y_train)
