### Q1: Mathematical Formula for a Linear SVM

In a linear Support Vector Machine (SVM), the goal is to find the optimal hyperplane that separates the classes with the maximum margin. 

The decision function of a linear SVM is given by:

\[ f(x) = \mathbf{w}^T \mathbf{x} + b \]

where:
- \(\mathbf{w}\) is the weight vector.
- \(\mathbf{x}\) is the input feature vector.
- \(b\) is the bias term.

The hyperplane is defined by the equation:

\[ \mathbf{w}^T \mathbf{x} + b = 0 \]

### Q2: Objective Function of a Linear SVM

The objective of a linear SVM is to maximize the margin between two classes while minimizing classification errors. This can be formulated as a convex optimization problem:

\[ \text{Minimize} \quad \frac{1}{2} \|\mathbf{w}\|^2 \]

Subject to:

\[ y_i (\mathbf{w}^T \mathbf{x}_i + b) \geq 1 \quad \text{for all } i \]

where \(y_i\) is the class label for sample \(\mathbf{x}_i\). This formulation ensures that the margin is maximized.

### Q3: Kernel Trick in SVM

The kernel trick is a method used to extend SVMs to handle non-linearly separable data by mapping input features into a higher-dimensional space. 

For a non-linear problem, the decision function is:

\[ f(x) = \sum_{i=1}^N \alpha_i y_i K(\mathbf{x}_i, \mathbf{x}) + b \]

where \(K(\mathbf{x}_i, \mathbf{x})\) is a kernel function, such as the polynomial or radial basis function (RBF) kernel. The kernel function computes the dot product in a higher-dimensional space without explicitly performing the mapping.

### Q4: Role of Support Vectors in SVM

Support vectors are the data points that are closest to the hyperplane and influence its position and orientation. They are critical in determining the optimal hyperplane. 

**Example:**
- If we have two classes of data points, the support vectors are the points lying on the margin boundaries (i.e., the points closest to the hyperplane).
- These support vectors are used to maximize the margin and are the only data points that affect the position of the hyperplane.

### Q5: Illustrations of SVM Components

**Hyperplane:** The decision boundary that separates the classes.

**Margin:** The distance between the hyperplane and the closest data points from each class. The width of the margin is maximized in SVM.

**Soft Margin:** Allows some data points to be within the margin or even misclassified to handle non-linearly separable cases or noisy data. Controlled by the regularization parameter \(C\).

**Hard Margin:** Assumes the data is linearly separable with no errors. This is a strict case where no points are allowed inside the margin or misclassified.

**Graphs:**

1. **Hyperplane and Margin:**
   ```python
   import numpy as np
   import matplotlib.pyplot as plt
   from sklearn import datasets
   from sklearn.svm import SVC
   from sklearn.preprocessing import StandardScaler
   from sklearn.model_selection import train_test_split

   # Load and preprocess the data
   iris = datasets.load_iris()
   X = iris.data[:, :2]  # Only use the first two features for easy visualization
   y = iris.target
   X = X[y != 2]  # Binary classification (Setosa vs. Non-Setosa)
   y = y[y != 2]

   X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
   scaler = StandardScaler()
   X_train = scaler.fit_transform(X_train)
   X_test = scaler.transform(X_test)

   # Train the SVM model
   clf = SVC(kernel='linear', C=1)
   clf.fit(X_train, y_train)

   # Plot decision boundary
   h = .02  # step size in the mesh
   x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
   y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
   xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
   Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
   Z = Z.reshape(xx.shape)

   plt.contourf(xx, yy, Z, alpha=0.8)
   plt.scatter(X_train[:, 0], X_train[:, 1], c=y, edgecolors='k', marker='o')
   plt.title('SVM Decision Boundary with Hard Margin')
   plt.xlabel('Feature 1')
   plt.ylabel('Feature 2')
   plt.show()
   ```

2. **Soft Margin vs. Hard Margin:**
   ```python
   # Plotting for Soft Margin
   clf_soft = SVC(kernel='linear', C=0.1)  # Lower C for soft margin
   clf_soft.fit(X_train, y_train)

   Z_soft = clf_soft.predict(np.c_[xx.ravel(), yy.ravel()])
   Z_soft = Z_soft.reshape(xx.shape)

   plt.contourf(xx, yy, Z_soft, alpha=0.8)
   plt.scatter(X_train[:, 0], X_train[:, 1], c=y, edgecolors='k', marker='o')
   plt.title('SVM Decision Boundary with Soft Margin (C=0.1)')
   plt.xlabel('Feature 1')
   plt.ylabel('Feature 2')
   plt.show()
   ```

### Q6: SVM Implementation with the Iris Dataset

1. **Load and Split Data**:
   ```python
   from sklearn.datasets import load_iris
   from sklearn.model_selection import train_test_split
   from sklearn.svm import SVC
   from sklearn.metrics import accuracy_score
   import matplotlib.pyplot as plt

   # Load Iris dataset
   iris = load_iris()
   X = iris.data
   y = iris.target

   # For binary classification, we will use only two classes
   X = X[y != 2]
   y = y[y != 2]

   # Split data
   X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

   # Train SVM model
   clf = SVC(kernel='linear', C=1.0)
   clf.fit(X_train, y_train)

   # Predict and evaluate
   y_pred = clf.predict(X_test)
   accuracy = accuracy_score(y_test, y_pred)
   print(f'Accuracy: {accuracy:.2f}')

   # Plot decision boundaries
   X2D = X[:, :2]  # Use only the first two features
   X_train2D, X_test2D, y_train2D, y_test2D = train_test_split(X2D, y, test_size=0.3, random_state=42)
   clf2D = SVC(kernel='linear', C=1.0)
   clf2D.fit(X_train2D, y_train2D)

   h = .02  # Step size
   x_min, x_max = X2D[:, 0].min() - 1, X2D[:, 0].max() + 1
   y_min, y_max = X2D[:, 1].min() - 1, X2D[:, 1].max() + 1
   xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
   Z = clf2D.predict(np.c_[xx.ravel(), yy.ravel()])
   Z = Z.reshape(xx.shape)

   plt.contourf(xx, yy, Z, alpha=0.8)
   plt.scatter(X_train2D[:, 0], X_train2D[:, 1], c=y_train2D, edgecolors='k', marker='o')
   plt.title('SVM Decision Boundary with Linear Kernel')
   plt.xlabel('Feature 1')
   plt.ylabel('Feature 2')
   plt.show()
   ```

2. **Varying Regularization Parameter \(C\)**:
   ```python
   C_values = [0.01, 0.1, 1, 10, 100]
   for C in C_values:
       clf = SVC(kernel='linear', C=C)
       clf.fit(X_train, y_train)
       y_pred = clf.predict(X_test)
       accuracy = accuracy_score(y_test, y_pred)
       print(f'C={C} - Accuracy: {accuracy:.2f}')
   ```

### Bonus Task: Implementing Linear SVM from Scratch

1. **Implement Linear SVM**:
   ```python
   import numpy as np

   class LinearSVM:
       def __init__(self, C=1.0, learning_rate=0.001, num_iterations=1000):
           self.C = C
           self.learning_rate = learning_rate
           self.num_iterations = num_iterations

       def fit(self, X, y):
           self.W = np.zeros(X.shape[1])
           self.b = 0
           m = X.shape[0]

           for _ in range(self.num_iterations):
               for i in range(m):
                   if y[i] * (np.dot

(X[i], self.W) + self.b) < 1:
                       self.W -= self.learning_rate * (2 * self.W - self.C * y[i] * X[i])
                       self.b -= self.learning_rate * self.C * y[i]
                   else:
                       self.W -= self.learning_rate * 2 * self.W

       def predict(self, X):
           return np.sign(np.dot(X, self.W) + self.b)

   # Implement and compare with scikit-learn
   from sklearn.metrics import accuracy_score

   svm_scratch = LinearSVM(C=1.0)
   svm_scratch.fit(X_train, y_train)
   y_pred_scratch = svm_scratch.predict(X_test)
   accuracy_scratch = accuracy_score(y_test, y_pred_scratch)
   print(f'Scratch SVM Accuracy: {accuracy_scratch:.2f}')

   # Compare with scikit-learn SVM
   clf_sklearn = SVC(kernel='linear', C=1.0)
   clf_sklearn.fit(X_train, y_train)
   y_pred_sklearn = clf_sklearn.predict(X_test)
   accuracy_sklearn = accuracy_score(y_test, y_pred_sklearn)
   print(f'Scikit-Learn SVM Accuracy: {accuracy_sklearn:.2f}')
   ```