<a href="https://colab.research.google.com/github/MaralAminpour/ML-BME-Course-UofA-Fall-2023/blob/main/kernel_trick.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The "kernel trick" is a technique used in machine learning, especially in the context of support vector machines (SVMs) and other kernelized models, to operate in a high-dimensional feature space without explicitly computing the coordinates of the data in that space. Essentially, it allows algorithms to become more flexible and capable of separating data that isn't linearly separable in its original space.

et's break this down with a simple explanation:

**Kernel Functions:** These are mathematical functions that take two inputs and output a single number. The function is designed to quantify some form of similarity between the inputs.
High-Dimensional Space: Sometimes, data that is not linearly separable in its original space can become separable when it is mapped to a higher-dimensional space.

**Computational Efficiency:** Working directly in a high-dimensional space can be computationally intensive because it might require dealing with a very large number of features. The kernel trick helps to avoid this computational burden by working in the original space but using the kernel function to implicitly work in the high-dimensional space.

**Support Vector Machines (SVMs):** SVMs are a kind of machine learning model often used for classification tasks. SVMs can use the kernel trick to find the optimal hyperplane in the high-dimensional space, which separates different classes in the data in such a way that the margin between them is maximized.
Common Kernels: Some commonly used kernel functions include the linear kernel, polynomial kernel, and radial basis function (RBF) kernel, each having its own way of measuring similarity between data points.

For example, imagine you have a dataset of two features that, when plotted on a graph, cannot be separated by a straight line. By using the kernel trick, you might find that if you add another dimension (making the graph 3D), you can now separate the data points perfectly with a plane. This "trick" essentially allows the SVM to find complex boundaries between classes without having to perform computationally expensive transformations on the data.

Let's focus on the **mathematical aspect** of some common kernel functions:

### 1. Linear Kernel
The linear kernel is the simplest type of kernel which is essentially the standard dot product in the input space. Given two vectors \( x \) and \( y \), it is defined as:
$$
K(x, y) = x^T y
$$

### 2. Polynomial Kernel
The polynomial kernel allows one to classify data that is separable by a polynomial decision boundary. It is computed as:
$$
K(x, y) = (x^T y + c)^d
$$
where:
- $c$ is a user-defined constant (typically $\geq 0$)

- $d$ is the degree of the polynomial


### 3. Radial Basis Function (RBF) or Gaussian Kernel
The RBF kernel is a popular choice and can map samples into an infinite-dimensional space, making it a powerful tool for separating non-linear data. It is defined as:
$$
K(x, y) = \exp\left( -\frac{\|x - y\|^2}{2\sigma^2} \right)
$$
or equivalently using parameter $ \gamma = \frac{1}{2\sigma^2} $:
$$
K(x, y) = \exp\left( -\gamma \|x - y\|^2 \right)
$$
where:
- $ \sigma^2 $ is the variance (a user-defined parameter)
- $ \|x - y\|^2 $ is the squared Euclidean distance between $ x $ and $ y $

### 4. Sigmoid Kernel
The sigmoid kernel is defined as:
$$
K(x, y) = \tanh(\alpha x^T y + c)
$$
where:
- $ \alpha $ is a scaling parameter
- $ c $ is a constant

Each kernel has its own characteristics and can be chosen based on the problem at hand. It's important to note that choosing a good kernel and tuning the parameters correctly is essential for building a successful SVM model.