## Chapter 12: Kernel methods

# 12.3 Kernels as similarity measures

If we look back at the definition of the polynomial, Fourier, and RBF kernels given in the prior Section we can see that in each instance the kernel is a function defined *on pairs of input points*.  For example, studying the RBF kernel

\begin{equation}
\mathbf{H}_{ij} =e^{-\beta\left\Vert \mathbf{x}_{i}-\mathbf{x}_{j}\right\Vert _{2}^{2}}
\end{equation}

we can see that it is clearly a function of two inputs $\mathbf{x}_i$ and $\mathbf{x}_j$.  More specifically, it clearly measures the *similarity* between these two inputs.  The more similar they are in the input space the larger $\mathbf{H}_{ij}$ becomes, attaining the value $1$ when $\mathbf{x}_i = \mathbf{x}_j$.  Conversely the further apart the two inputs the smaller $\mathbf{H}_{ij}$ becomes, attaining $0$ when they are infinitely far apart.  In other words, the RBF kernel can be interpreted as a *similarity measure* that describes how closely two input resemble each other.

Indeed most kernels can be interpreted this way - as *similarity measures* - including the polynomial and Fourier kernels described previously.  Moreover while they clearly encode similarity in different ways, at a gross level they all typically encode a high value and those far apart and a low value for those points nearby each other.  

In the figure below we visualize our three exemplar kernels - polynomial, Fourier, and RBG - as similarity measures by fixing a point $\mathbf{x}_i = \begin{bmatrix}0.5 \\ 0.5 \end{bmatrix}$ and plotting $\mathbf{H}_{ij}$ over the range $\mathbf{x}_j\in\left[0,1\right]^{2}$, producing a color-coded surface showing how each kernel treats points near $\mathbf{x}_{i}$.  Analyzing this Figure we can judge more generally how the three kernels define 'similarity' between points.  Firstly, we can see that a polynomial kernel treats data points $\mathbf{x}_i$ and $\mathbf{x}_j$ similarly if their inner product is high or, in other words, they highly correlate with each other. Likewise the points are treated as dissimilar when they are orthogonal to one another. On the other hand, the Fourier kernel treats points as similar if they lie close together, but their similarity differs like a “sinc” function as their distance from each other grows. Finally an RBF kernel provides a smooth similarity between points. If they are close to each other in a Euclidean sense they are highly similar; however, once the distance between them passes a certain threshold they are deemed rapidly dissimilar.

<figure>
  <img src= '../../mlrefined_images/kernel_images/Fig_7_2.png' width="80%"/>
  <figcaption> 
      <strong>Figure 2:</strong> 
      <em> 
Surfaces generated by polynomial, Fourier, and RBF kernels centered at $\mathbf{x}_i = \begin{bmatrix}0.5 \\ 0.5 \end{bmatrix}$ with the surfaces color-coded based on their similarity to $\mathbf{x}_i$. (left panel) A degree 2 polynomial kernel, (middle panel) degree 3 Fourier kernel, and (right panel) RBF kernel with β = 10. See text for further details.
      </em>
  </figcaption>
</figure>