# Q1. What is the mathematical formula for a linear SVM?

The mathematical formula for a linear SVM is given by:

For a linear SVM, the decision boundary is represented by:

\[
w \cdot x + b = 0
\]

Where:
- \( w \) represents the weight vector,
- \( x \) is an input data point (feature vector),
- \( b \) is the bias term.

### Objective:
The goal is to find the hyperplane (decision boundary) that separates the two classes in the feature space while maximizing the margin between them.

# Q2. What is the objective function of a linear SVM?

The objective function of a linear SVM is to:
\[
\min_{w, b} \frac{1}{2} \|w\|^2 + C \sum_{i=1}^{n} \max(0, 1 - y_i (w \cdot x_i + b))
\]

where:
- \( \frac{1}{2} \|w\|^2 \) is the regularization term that penalizes large values of \( w \), encouraging sparsity.
- \( C \) is a regularization parameter that controls the trade-off between maximizing the margin and minimizing classification errors.
- \( \max(0, 1 - y_i (w \cdot x_i + b)) \) is the hinge loss, which measures the classification error.

# Q3. What is the kernel trick in SVM?

The kernel trick in SVM is a mathematical approach to transform the data into a higher-dimensional space without explicitly computing the mapping function. This allows SVM to perform non-linear classification by only working in the original input space using kernel functions like:
- Linear Kernel: \( K(x, y) = x \cdot y \)
- Polynomial Kernel: \( K(x, y) = (x \cdot y + 1)^d \)
- Radial Basis Function (RBF) Kernel: \( K(x, y) = e^{-\gamma \|x - y\|^2} \)

The kernel trick avoids the need to explicitly compute \( x_i \cdot x_j \), making SVMs more scalable to higher-dimensional feature spaces.

# Q4. What is the role of support vectors in SVM? Explain with example.

Support vectors are the data points that lie closest to the decision boundary (hyperplane) in an SVM. These points have the maximum influence on the position and orientation of the decision boundary.

Example:
Consider a simple binary classification scenario with two classes (e.g., "+" and "−"):
1. Points closer to the decision boundary help determine where the hyperplane is positioned.
2. Support vectors, which are the data points closest to the hyperplane, directly influence its location.
3. They play a crucial role in maximizing the margin between the two classes.

# Q5. Illustrate with examples and graphs of Hyperplane, Marginal plane, Soft margin, and Hard margin in SVM.

**Hyperplane**:
- A hyperplane is a flat decision boundary that divides the space into two regions.
- In SVM, this is the line or plane where \( w \cdot x + b = 0 \).

**Marginal Plane**:
- The two parallel hyperplanes that are equidistant from the decision boundary are called the "marginal planes".
- These planes separate the support vectors on either side of the decision boundary.
- They define the margin, which is the distance between them.

**Soft Margin**:
- In the case of a soft margin, misclassifications are allowed but penalized.
- The decision boundary is less strict and includes some "slack" in terms of errors.
- This is controlled by the regularization parameter \( C \).
- Example: Data points classified correctly but close to the decision boundary are allowed to be slightly misclassified.

**Hard Margin**:
- The hard margin solution is a special case where no errors are allowed.
- Only correctly classified data points are considered for positioning the hyperplane.
- The decision boundary is strictly linear and does not allow any errors.
- In practice, hard margins can lead to overfitting on noisy datasets.

# Q6. SVM Implementation through Iris Dataset

**Steps to Perform**:
1. **Load the Iris Dataset** from the scikit-learn library and split it into a training set and a testing set.
2. **Train a linear SVM classifier** on the training set and predict the labels for the testing set.
3. **Compute the accuracy of the model** on the testing set.
4. **Plot the decision boundaries** of the trained model using two of the features.
5. **Try different values of the regularisation parameter \( C \)** and see how it affects the performance of the model.

### Bonus Task:
**Implement a Linear SVM Classifier from Scratch** using Python and compare its performance with the scikit-learn implementation.



## Q1. What is the mathematical formula for a linear SVM?
The mathematical formula for a linear SVM classifier is:

\[
f(x) = \text{sign}(w^T x + b)
\]

Where:
- \( w \) is the weight vector,
- \( x \) is the input feature vector,
- \( b \) is the bias term.

The decision boundary is defined as:

\[
w^T x + b = 0
\]

---

## Q2. What is the objective function of a linear SVM?
The objective of a linear SVM is to maximize the margin between the decision boundary and the closest data points (support vectors) while minimizing classification errors. This is achieved by solving:

\[
\text{Minimize: } \frac{1}{2} ||w||^2
\]

Subject to the constraints:

\[
y_i (w^T x_i + b) \geq 1, \forall i
\]

For soft-margin SVM, a penalty is added for misclassified points:

\[
\text{Minimize: } \frac{1}{2} ||w||^2 + C \sum_{i=1}^n \xi_i
\]

Where \( \xi_i \) are slack variables, and \( C \) is the regularization parameter.

---

## Q3. What is the kernel trick in SVM?
The kernel trick allows SVM to classify data that is not linearly separable by implicitly mapping the input features into a higher-dimensional space where a linear separation is possible. Common kernels include:
- **Linear Kernel**: \( K(x, x') = x^T x' \)
- **Polynomial Kernel**: \( K(x, x') = (x^T x' + c)^d \)
- **Radial Basis Function (RBF) Kernel**: \( K(x, x') = \exp(-\gamma ||x - x'||^2) \)

---

## Q4. What is the role of support vectors in SVM? 
Support vectors are the data points closest to the decision boundary (hyperplane). They determine the position and orientation of the hyperplane and influence the SVM model's performance.

**Example:**
Consider a binary classification problem with two classes (blue and red). The support vectors are the points from each class closest to the decision boundary. Removing non-support vector points does not affect the hyperplane.

---

## Q5. Illustrate with Examples and Graphs:
1. **Hyperplane**: The decision boundary separating the classes.
2. **Marginal Plane**: Parallel lines to the hyperplane marking the margin boundary.
3. **Soft Margin**: Allows some misclassification with a penalty.
4. **Hard Margin**: Assumes perfect separability with no misclassification.

---

## Q6. SVM Implementation through Iris Dataset

### Steps:
1. **Load the Dataset**:
   Use scikit-learn's `load_iris` function to load the Iris dataset and split it into training and testing sets.

2. **Train the Model**:
   Use `SVC` from scikit-learn with a linear kernel to train the SVM model.

3. **Evaluate the Model**:
   Compute accuracy on the test set.

4. **Plot Decision Boundaries**:
   Visualize decision boundaries for two features of the dataset.

5. **Experiment with C**:
   Train the model with different \( C \) values and observe performance changes.

### Code Example:
```python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X = iris.data[:, :2]  # Using only the first two features for visualization
y = iris.target

# Binary classification (for simplicity)
X = X[y != 2]
y = y[y != 2]

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train SVM
model = SVC(kernel='linear', C=1.0)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

# Plot decision boundary
def plot_decision_boundary(X, y, model):
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
                         np.arange(y_min, y_max, 0.01))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    plt.contourf(xx, yy, Z, alpha=0.8)
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k')
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.title('SVM Decision Boundary')
    plt.show()

plot_decision_boundary(X, y, model)

## Bonus Task: Implementing a Linear SVM from Scratch

### Overview:
Implementing a linear Support Vector Machine (SVM) from scratch involves:

1. **Calculating the Optimal Weights (𝑤) and Bias (𝑏)**:
   - The objective is to find the optimal hyperplane that maximizes the margin between two classes while minimizing classification errors.
   - The optimal weights and bias can be calculated using methods like **gradient descent** or **quadratic programming**.

2. **Predicting Labels Using the Decision Boundary**:
   - Once the optimal weights and bias are found, we use the decision function:
     \[
     f(x) = w \cdot x + b
     \]
     - If \( f(x) \geq 0 \), predict class 1; otherwise, predict class 0.

### Steps:

#### 1. **Calculate Optimal Weights and Bias**:
   - The model aims to minimize the hinge loss function:
     \[
     L(w, b) = \frac{1}{2} \|w\|^2 + C \sum_{i=1}^{n} \max(0, 1 - y_i(w \cdot x_i + b))
     \]
     - Here, \( C \) is a regularization parameter that controls the trade-off between maximizing the margin and minimizing misclassification.

   - **Gradient Descent** is used to iteratively update the weights \( w \) and bias \( b \):
     - Compute the gradient of the loss with respect to \( w \) and \( b \).
     - Update the weights and bias using the learning rate.

#### 2. **Predict Labels**:
   - Once the model is trained, we predict the labels by evaluating the decision function:
     \[
     \hat{y} = \text{sign}(w \cdot x + b)
     \]
     - The predicted label \( \hat{y} \) is either class 0 or class 1 based on the sign of the decision function.

#### 3. **Comparison with Scikit-learn**:
   - The custom implementation from scratch is compared to the scikit-learn SVM (`SVC`) implementation.
   - By evaluating both models on the same dataset (e.g., Iris dataset), we can compare performance metrics such as accuracy and the computational complexity of each method.

### Conclusion:
- Implementing a linear SVM from scratch provides insights into optimization nuances and helps in understanding how hyperparameters (like \( C \)) and the choice of optimization methods (gradient descent vs quadratic programming) affect model performance.
- This task also highlights the trade-offs between custom implementations and highly optimized library functions like those in scikit-learn.


### Code:
```python
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.svm import SVC

# Load the Iris dataset
iris = load_iris()
X = iris.data[:, :2]  # Use only the first two features for simplicity
y = iris.target

# Binary classification (classes 0 and 1)
X = X[y != 2]
y = y[y != 2]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# SVM from scratch
class LinearSVM:
    def __init__(self, learning_rate=0.001, lambda_param=0.01, n_iters=1000):
        self.lr = learning_rate
        self.lambda_param = lambda_param
        self.n_iters = n_iters
        self.w = None
        self.b = None

    def fit(self, X, y):
        n_samples, n_features = X.shape
        y_ = np.where(y == 0, -1, 1)  # Map labels {0, 1} to {-1, 1}
        self.w = np.zeros(n_features)
        self.b = 0

        for _ in range(self.n_iters):
            for idx, x_i in enumerate(X):
                condition = y_[idx] * (np.dot(x_i, self.w) + self.b) >= 1
                if condition:
                    self.w -= self.lr * (2 * self.lambda_param * self.w)
                else:
                    self.w -= self.lr * (2 * self.lambda_param * self.w - np.dot(x_i, y_[idx]))
                    self.b -= self.lr * y_[idx]

    def predict(self, X):
        linear_output = np.dot(X, self.w) + self.b
        return np.where(linear_output >= 0, 1, 0)

# Train custom SVM
custom_svm = LinearSVM()
custom_svm.fit(X_train, y_train)
y_pred_custom = custom_svm.predict(X_test)

# Accuracy of custom SVM
custom_accuracy = accuracy_score(y_test, y_pred_custom)
print(f"Custom SVM Accuracy: {custom_accuracy * 100:.2f}%")

# Compare with scikit-learn SVM
sklearn_svm = SVC(kernel='linear', C=1.0)
sklearn_svm.fit(X_train, y_train)
y_pred_sklearn = sklearn_svm.predict(X_test)

# Accuracy of scikit-learn SVM
sklearn_accuracy = accuracy_score(y_test, y_pred_sklearn)
print(f"Scikit-learn SVM Accuracy: {sklearn_accuracy * 100:.2f}%")
```

---