In [1]:
%matplotlib inline

## [Pairwise Utilities](https://scikit-learn.org/stable/modules/metrics.html)
- A set of metrics & kernels that are used to evaluate pairwise distances or affinities between sets of samples.
- **Distances** are functions ```d(a,b)``` such that ```d(a,b) < d(a,c)``` if ```a``` and ```b``` are more similar than ```a``` and ```c```. Two identical objects have a distance of zero. Euclidean distance is the most common metric.
- **Kernels** are measures of similarity. ```s(a,b) > s(a,c)``` if ```a``` and ```b``` are more similar than ```a``` and ```c```. Kernels must be positive semi-definite.
- [pairwise distances](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html#sklearn.metrics.pairwise_distances) - measures the distances between row vectors of _X_ and _Y_.
- [pairwise kernels](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_kernels.html#sklearn.metrics.pairwise.pairwise_kernels) - measures the kernel between _X_ and _Y_ using a kernel function.

In [3]:
import numpy as np
from sklearn.metrics          import pairwise_distances
from sklearn.metrics.pairwise import pairwise_kernels

X = np.array([[2, 3], [3, 5], [5, 8]])
Y = np.array([[1, 0], [2, 1]]        )

print(pairwise_distances(X, Y, metric='manhattan'), "\n\n",
      pairwise_distances(X,    metric='manhattan'), "\n\n",
      pairwise_kernels(  X, Y, metric='linear')
     )

[[ 4.  2.]
 [ 7.  5.]
 [12. 10.]] 

 [[0. 3. 8.]
 [3. 0. 5.]
 [8. 5. 0.]] 

 [[ 2.  7.]
 [ 3. 11.]
 [ 5. 18.]]


## [Cosine Similarity](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html#sklearn.metrics.pairwise.cosine_similarity)
- Returns the dot product of two vectors, L2-normalized.
- Defined as $k(x, y) = \frac{x y^\top}{\|x\| \|y\|}$
- Euclidean (L2) normalization projects the vectors onto a unit sphere. Their dot product is the *cosine of the angle between the points* denoted by the vectors.
- Popular for measuring similarity of documents modeled as tf-idf vectors.
- Accepts scipy.sparse matrices.

## [Linear Kernels](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.linear_kernel.html#sklearn.metrics.pairwise.linear_kernel)
- A special case of a polynomial kernel with ```degree=1``` and ```coef0=0".
- Defined as $k(x, y) = x^\top y$

## [Polynomial Kernels](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.polynomial_kernel.html#sklearn.metrics.pairwise.polynomial_kernel)
- Returns a d-degree similarity between two vectors - not only for the same dimension, but also across dimensions. This enables accounting for feature interactions.
- Defined as $k(x, y) = (\gamma x^\top y +c_0)^d$ where ```d``` is the kernel "degree". Described as "homogeneous" when $c_0 = 0$.

## [Sigmoid Kernels](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.sigmoid_kernel.html#sklearn.metrics.pairwise.sigmoid_kernel)
- Also known as the *hyperbolic tangent* or *Multilayer Perceptron* due to its common use as a neuron activation function.
- Defined as $k(x, y) = \tanh( \gamma x^\top y + c_0)$ where $\gamma$ is the slope and $c_0$ is the intercept.

## [Radial Basis Function (RBF) Kernels](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.rbf_kernel.html#sklearn.metrics.pairwise.rbf_kernel)
- Returns the RBF kernel between two vectors.
- Defined as $k(x, y) = \exp( -\gamma \| x-y \|^2)$
- If $\gamma = \sigma^{-2}$, this is known a *Gaussian kernel* of variance $\sigma^2$.

## [Laplace Kernels](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.laplacian_kernel.html#sklearn.metrics.pairwise.laplacian_kernel)
- Variant of RBF kernel
- Defined as $k(x, y) = \exp( -\gamma \| x-y \|_1)$ where $\|x-y\|_1$ is the Manhattan distance between input vectors.
- Often used with noiseless data. 
- See [Quantum mechanics in a nutshell (Wiley)](https://onlinelibrary.wiley.com/doi/full/10.1002/qua.24954)

## [Chi-Square Kernels](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.chi2_kernel.html#sklearn.metrics.pairwise.chi2_kernel)
- Most commonly used on histograms (bags) of visual words.
- Defined as $k(x, y) = \exp \left (-\gamma \sum_i \frac{(x[i] - y[i]) ^ 2}{x[i] + y[i]} \right )$

In [4]:
# common use case: trainer for non-linear SVMs (computer vision)
from sklearn.svm import SVC
from sklearn.metrics.pairwise import chi2_kernel

X = [[0, 1], [1, 0], [.2, .8], [.7, .3]]
y = [ 0,      1,      0,        1]

K   = chi2_kernel(X, gamma=.5)
svm = SVC(kernel='precomputed').fit(K, y)
svm.predict(K)

array([0, 1, 0, 1])