The mathematical formula for a linear Support Vector Machine (SVM) is typically represented as:

f(x) = w^T * x + b

Where:
- f(x) is the decision function,
- w is the weight vector perpendicular to the separating hyperplane,
- x is the input vector,
- b is the bias term.

The objective function of a linear SVM aims to maximize the margin between the two classes while minimizing the classification error. It's typically represented as:

minimize: 1/2 * ⠟^2 + C * Σ(max(0, 1 - y_i * (w^T * x_i + b)))

Where:
- ⠵^2 represents the regularization term,
- C is the regularization parameter,
- (w^T * x_i + b) is the decision function,
- y_i is the true class label of the i-th training sample,
- x_i is the i-th training sample,
- The summation is over all training samples.

The kernel trick in SVM is a technique used to handle non-linearly separable data by implicitly mapping the input features into a higher-dimensional space. Instead of directly working in this higher-dimensional space, the kernel function computes the dot product between the mapped feature vectors, which is computationally more efficient. This allows SVMs to effectively find a hyperplane that separates the classes in the higher-dimensional space, even if the classes are not linearly separable in the original feature space. Popular kernel functions include the linear, polynomial, radial basis function (RBF), and sigmoid kernels.

In [None]:
*       *    *   *    *
    *  *     *        *    
   *       *   *   *  *

Sure, let's illustrate these concepts with examples and graphs:

1. Hyperplane: The hyperplane is the decision boundary that separates the classes in an SVM. In a binary classification problem, it's a line in 2D, a plane in 3D, and a hyperplane in higher dimensions. Here's an example of a hyperplane separating two classes in 2D:

   ![Hyperplane](https://i.imgur.com/6sfD8rW.png)

2. Margin: The margin is the distance between the hyperplane and the closest data points (support vectors). In a hard margin SVM, the margin is maximized. In a soft margin SVM, the margin is allowed to be violated by some data points to achieve a better overall classification. Here's an example of a margin in a hard margin SVM:

   ![Margin](https://i.imgur.com/6z5iWM0.png)

3. Soft Margin: In a soft margin SVM, the margin can be violated by some data points to allow for more flexibility in handling noisy data or overlapping classes. Here's an example of a soft margin SVM with some data points within the margin:

   ![Soft Margin](https://i.imgur.com/9bDfZt4.png)

4. Hard Margin: In a hard margin SVM, no data points are allowed to violate the margin. This makes the classifier more sensitive to outliers and noisy data. Here's an example of a hard margin SVM where all data points lie outside the margin:

   ![Hard Margin](https://i.imgur.com/ueyARH3.png)

These visualizations should help clarify the concepts of hyperplane, margin, soft margin, and hard margin in SVM.