### Q1. Mathematical Formula for a Linear SVM

The mathematical formula for a linear Support Vector Machine (SVM) involves finding the optimal hyperplane that separates the data into different classes with the maximum margin. The decision function of a linear SVM can be written as:

\[ f(x) = \mathbf{w}^T \mathbf{x} + b \]

where:
- \(\mathbf{w}\) is the weight vector (normal to the hyperplane),
- \(\mathbf{x}\) is the feature vector,
- \(b\) is the bias term.

The hyperplane is defined by the equation:

\[ \mathbf{w}^T \mathbf{x} + b = 0 \]

### Q2. Objective Function of a Linear SVM

The objective of a linear SVM is to maximize the margin between the two classes while minimizing the classification error. The objective function of a linear SVM can be formulated as:

**Maximize**: \(\frac{2}{\|\mathbf{w}\|}\)

This can be rewritten as minimizing the following cost function with regularization term:

**Minimize**: \(\frac{1}{2} \|\mathbf{w}\|^2 + C \sum_{i=1}^n \xi_i\)

where:
- \(\|\mathbf{w}\|^2\) is the squared norm of the weight vector,
- \(C\) is the regularization parameter,
- \(\xi_i\) are the slack variables that allow for misclassification.

The constraint for each data point \( (\mathbf{x_i}, y_i) \) is:

\[ y_i (\mathbf{w}^T \mathbf{x_i} + b) \geq 1 - \xi_i \]
\[ \xi_i \geq 0 \]

### Q3. Kernel Trick in SVM

The kernel trick is a technique used in SVM to handle non-linearly separable data. It allows SVM to operate in a higher-dimensional space without explicitly transforming the data into that space. This is done by applying a kernel function \(K(\mathbf{x_i}, \mathbf{x_j})\) that computes the inner product in the higher-dimensional feature space:

\[ K(\mathbf{x_i}, \mathbf{x_j}) = \phi(\mathbf{x_i})^T \phi(\mathbf{x_j}) \]

where \(\phi(\cdot)\) is a mapping function to the higher-dimensional space. Common kernel functions include:

- **Linear Kernel**: \(K(\mathbf{x_i}, \mathbf{x_j}) = \mathbf{x_i}^T \mathbf{x_j}\)
- **Polynomial Kernel**: \(K(\mathbf{x_i}, \mathbf{x_j}) = (\mathbf{x_i}^T \mathbf{x_j} + c)^d\)
- **Radial Basis Function (RBF) Kernel**: \(K(\mathbf{x_i}, \mathbf{x_j}) = \exp\left(-\frac{\|\mathbf{x_i} - \mathbf{x_j}\|^2}{2\sigma^2}\right)\)

### Q4. Role of Support Vectors in SVM

Support vectors are the data points that lie closest to the hyperplane and are critical in defining the position and orientation of the hyperplane. They are the points that contribute to the margin calculation and are the only points that affect the decision boundary of the SVM. 

**Example:**

If you have a dataset with two classes, the support vectors will be the points from each class that are closest to the decision boundary. If you remove these points, the position of the decision boundary might change.

### Q5. Illustrations: Hyperplane, Marginal Plane, Soft Margin, and Hard Margin

**1. Hyperplane:** The decision boundary that separates the two classes. In 2D, it's a line; in 3D, it's a plane.

**2. Marginal Plane:** The planes parallel to the hyperplane that are at the distance of the margin from the hyperplane. They pass through the support vectors.

**3. Hard Margin:** Used when the data is linearly separable, and there is no allowance for misclassification. The margin is maximized, and no data points are within the margin.

**4. Soft Margin:** Allows some misclassification by introducing slack variables. It balances the trade-off between maximizing the margin and minimizing the classification error.

**Visualizations:**

```python
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load dataset
iris = datasets.load_iris()
X = iris.data[:, :2]  # Use only the first two features for visualization
y = iris.target

# Use only two classes for binary classification example
X = X[y != 2]
y = y[y != 2]

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train SVM with linear kernel
clf = SVC(kernel='linear', C=1.0)
clf.fit(X_train, y_train)

# Plot decision boundary
def plot_decision_boundary(clf, X, y, ax):
    h = .02  # step size in the mesh
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    ax.contourf(xx, yy, Z, alpha=0.8)
    ax.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', marker='o', s=50, cmap=plt.cm.coolwarm)
    ax.set_xlim(x_min, x_max)
    ax.set_ylim(y_min, y_max)
    ax.set_xticks(())
    ax.set_yticks(())

fig, ax = plt.subplots(figsize=(8, 6))
plot_decision_boundary(clf, X_train, y_train, ax)
plt.title('Decision Boundary with Hard Margin')
plt.show()
```

For soft margin, use `C` values to adjust the regularization:

```python
# Train SVM with soft margin
clf_soft = SVC(kernel='linear', C=0.1)
clf_soft.fit(X_train, y_train)

# Plot decision boundary for soft margin
fig, ax = plt.subplots(figsize=(8, 6))
plot_decision_boundary(clf_soft, X_train, y_train, ax)
plt.title('Decision Boundary with Soft Margin')
plt.show()
```

### Q6. SVM Implementation on Iris Dataset

```python
from sklearn.datasets import load_iris
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train a linear SVM classifier
svm = SVC(kernel='linear', C=1.0)
svm.fit(X_train, y_train)

# Predict on the test set
y_pred = svm.predict(X_test)

# Compute accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')
```

### Bonus Task: Implement Linear SVM from Scratch

```python
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

class LinearSVM:
    def __init__(self, learning_rate=0.001, epochs=1000, C=1.0):
        self.learning_rate = learning_rate
        self.epochs = epochs
        self.C = C

    def fit(self, X, y):
        num_samples, num_features = X.shape
        self.weights = np.zeros(num_features)
        self.bias = 0

        for _ in range(self.epochs):
            for idx, x_i in enumerate(X):
                condition = y[idx] * (np.dot(x_i, self.weights) + self.bias) >= 1
                if condition:
                    self.weights -= self.learning_rate * (2 * self.C * self.weights)
                else:
                    self.weights -= self.learning_rate * (2 * self.C * self.weights - np.dot(x_i, y[idx]))
                    self.bias -= self.learning_rate * y[idx]

    def predict(self, X):
        return np.sign(np.dot(X, self.weights) + self.bias)

# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Use only two classes for binary classification example
X = X[y != 2]
y = y[y != 2]

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train the

 Linear SVM
linear_svm = LinearSVM()
linear_svm.fit(X_train, y_train)

# Predict on the test set
y_pred = linear_svm.predict(X_test)

# Compute accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy of custom SVM: {accuracy:.2f}')

# Compare with scikit-learn SVM
from sklearn.svm import SVC

# Train scikit-learn linear SVM
sklearn_svm = SVC(kernel='linear', C=1.0)
sklearn_svm.fit(X_train, y_train)

# Predict on the test set
y_pred_sklearn = sklearn_svm.predict(X_test)

# Compute accuracy
accuracy_sklearn = accuracy_score(y_test, y_pred_sklearn)
print(f'Accuracy of scikit-learn SVM: {accuracy_sklearn:.2f}')

# Plot decision boundaries of the custom linear SVM
def plot_decision_boundary(model, X, y, ax):
    h = .02
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    ax.contourf(xx, yy, Z, alpha=0.8)
    ax.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', marker='o', s=50, cmap=plt.cm.coolwarm)
    ax.set_xlim(x_min, x_max)
    ax.set_ylim(y_min, y_max)
    ax.set_xticks(())
    ax.set_yticks(())

fig, ax = plt.subplots(figsize=(8, 6))
plot_decision_boundary(linear_svm, X_train, y_train, ax)
plt.title('Decision Boundary of Custom Linear SVM')
plt.show()
```

### Summary

This guide covers the mathematical foundation and implementation details of Support Vector Machines (SVM), including the objective function, kernel trick, and role of support vectors. It also illustrates concepts like hyperplanes, margins, and the difference between hard and soft margins. The final part involves implementing a linear SVM from scratch, comparing its performance with scikit-learn's SVM, and visualizing decision boundaries.