In [None]:
I'd be happy to help you with your assignment questions about Support Vector Machines (SVMs). Let's go through them one by one:

Q1. What is the mathematical formula for a linear SVM?

The linear SVM formula for classification can be expressed as:

f(x) = w^T x + b

Where:
- w is the weight vector
- x is the input vector
- b is the bias term
- f(x) is the decision function

The classification rule is:
- If f(x) ≥ 0, classify as positive class
- If f(x) < 0, classify as negative class





In [None]:
Q2. What is the objective function of a linear SVM?

The objective function for a linear SVM aims to maximize the margin while minimizing classification errors. It can be written as:

minimize: (1/2)||w||^2 + C * Σ ξᵢ

subject to: yᵢ(w^T xᵢ + b) ≥ 1 - ξᵢ and ξᵢ ≥ 0 for all i

Where:
- ||w||^2 is the L2 norm of the weight vector
- C is the regularization parameter
- ξᵢ are slack variables for handling non-linearly separable data
- yᵢ are the class labels (+1 or -1)
- xᵢ are the input vectors

In [None]:
### Q3: Kernel Trick in SVM
The kernel trick allows SVM to classify data that is not linearly separable by implicitly mapping the input features into a higher-dimensional space. The kernel function computes the dot product in this higher-dimensional space without explicitly transforming the data, which makes the computation efficient.

Common kernel functions include:
- Linear kernel: \( K(\mathbf{x}_i, \mathbf{x}_j) = \mathbf{x}_i \cdot \mathbf{x}_j \)
- Polynomial kernel: \( K(\mathbf{x}_i, \mathbf{x}_j) = (\mathbf{x}_i \cdot \mathbf{x}_j + 1)^d \)
- Radial Basis Function (RBF) kernel: \( K(\mathbf{x}_i, \mathbf{x}_j) = \exp(-\gamma \|\mathbf{x}_i - \mathbf{x}_j\|^2) \)

### Q4: Role of Support Vectors in SVM
Support vectors are the data points that lie closest to the decision boundary (hyperplane). These points are crucial because they define the position and orientation of the hyperplane. In other words, the support vectors are the critical elements of the dataset that affect the shape of the margin.

Example:
Consider a simple 2D dataset with two classes. The support vectors are the data points that lie on the margin boundaries. If you move any of these support vectors, the optimal hyperplane will change, but moving any other data point that is not a support vector will not affect the hyperplane.

### Q5: Illustrations with Examples and Graphs
Let's create examples and graphs to illustrate the concepts of hyperplane, marginal plane, soft margin, and hard margin in SVM.

```python
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets

# Generate a simple 2D dataset
X, y = datasets.make_blobs(n_samples=50, centers=2, random_state=6)

# Fit the model
clf = svm.SVC(kernel='linear', C=1)
clf.fit(X, y)

# Plot the decision boundary, support vectors, and margin
plt.scatter(X[:, 0], X[:, 1], c=y, s=30, cmap=plt.cm.Paired)
plt.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=100,
            facecolors='none', edgecolors='k', marker='o', label='Support Vectors')

# Plot the decision boundary and margins
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = clf.decision_function(xy).reshape(XX.shape)

ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], alpha=0.5,
           linestyles=['--', '-', '--'])
plt.title('SVM with Hard Margin')
plt.legend()
plt.show()

# Soft margin example with C=0.1 (allowing some misclassifications)
clf_soft = svm.SVC(kernel='linear', C=0.1)
clf_soft.fit(X, y)

plt.scatter(X[:, 0], X[:, 1], c=y, s=30, cmap=plt.cm.Paired)
plt.scatter(clf_soft.support_vectors_[:, 0], clf_soft.support_vectors_[:, 1], s=100,
            facecolors='none', edgecolors='k', marker='o', label='Support Vectors')

# Plot the decision boundary and margins
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = clf_soft.decision_function(xy).reshape(XX.shape)

ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], alpha=0.5,
           linestyles=['--', '-', '--'])
plt.title('SVM with Soft Margin (C=0.1)')
plt.legend()
plt.show()
```

### Q6: SVM Implementation Through Iris Dataset

```python
# Load Iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2]  # Only take the first two features for visualization purposes
y = iris.target

# Only consider binary classification (class 0 and class 1)
X = X[y != 2]
y = y[y != 2]

# Train the SVM model
clf = svm.SVC(kernel='linear', C=1)
clf.fit(X, y)

# Plot decision boundary
plt.scatter(X[:, 0], X[:, 1], c=y, s=30, cmap=plt.cm.Paired)
plt.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=100,
            facecolors='none', edgecolors='k', marker='o', label='Support Vectors')

# Plot the decision boundary and margins
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = clf.decision_function(xy).reshape(XX.shape)

ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], alpha=0.5,
           linestyles=['--', '-', '--'])
plt.title('SVM on Iris Dataset')
plt.legend()
plt.show()
```

### Bonus Task: Implementing a Linear SVM Classifier from Scratch

```python
class LinearSVM:
    def __init__(self, learning_rate=0.001, lambda_param=0.01, n_iters=1000):
        self.learning_rate = learning_rate
        self.lambda_param = lambda_param
        self.n_iters = n_iters
        self.w = None
        self.b = None

    def fit(self, X, y):
        n_samples, n_features = X.shape
        y_ = np.where(y <= 0, -1, 1)

        self.w = np.zeros(n_features)
        self.b = 0

        for _ in range(self.n_iters):
            for idx, x_i in enumerate(X):
                condition = y_[idx] * (np.dot(x_i, self.w) - self.b) >= 1
                if condition:
                    self.w -= self.learning_rate * (2 * self.lambda_param * self.w)
                else:
                    self.w -= self.learning_rate * (2 * self.lambda_param * self.w - np.dot(x_i, y_[idx]))
                    self.b -= self.learning_rate * y_[idx]

    def predict(self, X):
        approx = np.dot(X, self.w) - self.b
        return np.sign(approx)

# Load Iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2]  # Only take the first two features for visualization purposes
y = iris.target

# Only consider binary classification (class 0 and class 1)
X = X[y != 2]
y = y[y != 2]

# Train the Linear SVM model from scratch
linear_svm = LinearSVM()
linear_svm.fit(X, y)
y_pred_scratch = linear_svm.predict(X)

# Train the scikit-learn SVM model
clf = svm.SVC(kernel='linear', C

=1)
clf.fit(X, y)
y_pred_sklearn = clf.predict(X)

# Compare performance
accuracy_scratch = np.mean(y_pred_scratch == y)
accuracy_sklearn = np.mean(y_pred_sklearn == y)
print(f'Accuracy (from scratch): {accuracy_scratch}')
print(f'Accuracy (scikit-learn): {accuracy_sklearn}')
```