<a href="https://colab.research.google.com/github/deltorobarba/machinelearning/blob/master/divergence.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Statistical Distance & Divergence**

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

#### **Metric vs Statistical Distance & Divergences**

* In statistics and information geometry, divergence or a contrast function is a function which establishes the **"distance" of one probability distribution to the other** on a statistical manifold. 

* The **divergence is a weaker notion than that of the distance**, in particular the divergence need not be symmetric (that is, in general the divergence from p to q is not equal to the divergence from q to p), and **need not satisfy the triangle inequality**.

In statistics, probability theory, and information theory, **a statistical distance** quantifies the distance between two statistical objects, which can be

* two random variables, or 
* two probability distributions or 
* two samples, or 
* the distance can be between an individual sample point and a population or 
* a wider sample of points.

A distance between populations can be interpreted as measuring the distance between two probability distributions and hence they are essentially measures of distances between probability measures. 

* Where statistical distance measures relate to the differences between random variables, these may have statistical dependence, and hence these distances are not directly related to measures of distances between probability measures. 

* Again, a measure of distance between random variables may relate to the extent of dependence between them, rather than to their individual values.

* **Statistical distance measures are mostly not metrics** and they need not be symmetric. **Some types of distance measures are referred to as (statistical) divergences**.

**Properties of Distances as Metrics** (id they fullfill all 4 criteria)

1. $d(x, y) \geq 0 \quad$ (non-negativity)
2. $d(x, y)=0$ if and only if $x=y$ (identity of indiscernibles. Note that condition 1 and 2 together produce positive definiteness)
3. $d(x, y)=d(y, x)$ (symmetry)
4. $d(x, z) \leq d(x, y)+d(y, z)$ (subadditivity / triangle inequality).

**Many statistical distances are not metrics**, because they lack one or more properties of proper metrics. For example, 

* [pseudometrics](https://en.m.wikipedia.org/wiki/Pseudometric_space) violate the "positive definiteness" (alternatively, "identity of indescernibles") property (1 & 2 above); 

* [quasimetrics](https://en.m.wikipedia.org/wiki/Metric_(mathematics)#Quasimetrics) violate the symmetry property (3); and semimetrics violate the triangle inequality (4). 

* Statistical distances that satisfy (1) and (2) are referred to as divergences.



https://en.m.wikipedia.org/wiki/Distance_(graph_theory)

https://en.m.wikipedia.org/wiki/Distance

* In statistics and information geometry, there are many kinds of statistical distances, notably divergences, especially Bregman divergences and f-divergences. These include and generalize many of the notions of "difference between two probability distributions", and allow them to be studied geometrically, as statistical manifolds. 

* The most elementary is the squared Euclidean distance, which forms the basis of least squares; this is the most basic Bregman divergence. The most important in information theory is the relative entropy (Kullback–Leibler divergence), which allows one to analogously study maximum likelihood estimation geometrically; this is the most basic f-divergence, and is also a Bregman divergence (and is the only divergence that is both). 

* Statistical manifolds corresponding to Bregman divergences are flat manifolds in the corresponding geometry, allowing an analog of the Pythagorean theorem (which is traditionally true for squared Euclidean distance) to be used for linear inverse problems in inference by optimization theory.

* Other important statistical distances include the Mahalanobis distance, the energy distance, and many others.

https://en.m.wikipedia.org/wiki/Information_geometry

https://en.m.wikipedia.org/wiki/Statistical_distance

https://en.m.wikipedia.org/wiki/Divergence_(statistics)

#### **List of Distances Types**

'braycurtis': hdbscan.dist_metrics.BrayCurtisDistance

 'canberra': hdbscan.dist_metrics.CanberraDistance

 'chebyshev': hdbscan.dist_metrics.ChebyshevDistance

 'cityblock': hdbscan.dist_metrics.ManhattanDistance

 'dice': hdbscan.dist_metrics.DiceDistance

 'euclidean': hdbscan.dist_metrics.EuclideanDistance

 'hamming': hdbscan.dist_metrics.HammingDistance

 'haversine': hdbscan.dist_metrics.HaversineDistance

 'infinity': hdbscan.dist_metrics.ChebyshevDistance

 'jaccard': hdbscan.dist_metrics.JaccardDistance

 'kulsinski': hdbscan.dist_metrics.KulsinskiDistance

 'l1': hdbscan.dist_metrics.ManhattanDistance

 'l2': hdbscan.dist_metrics.EuclideanDistance

 'mahalanobis': hdbscan.dist_metrics.MahalanobisDistance

 'manhattan': hdbscan.dist_metrics.ManhattanDistance

 'matching': hdbscan.dist_metrics.MatchingDistance

 'minkowski': hdbscan.dist_metrics.MinkowskiDistance

 'p': hdbscan.dist_metrics.MinkowskiDistance

 'pyfunc': hdbscan.dist_metrics.PyFuncDistance

 'rogerstanimoto': hdbscan.dist_metrics.RogersTanimotoDistance

 'russellrao': hdbscan.dist_metrics.RussellRaoDistance

 'seuclidean': hdbscan.dist_metrics.SEuclideanDistance

 'sokalmichener': hdbscan.dist_metrics.SokalMichenerDistance

 'sokalsneath': hdbscan.dist_metrics.SokalSneathDistance

 'wminkowski': hdbscan.dist_metrics.WMinkowskiDistance

https://hdbscan.readthedocs.io/en/latest/basic_hdbscan.html

https://reference.wolfram.com/language/guide/DistanceAndSimilarityMeasures.html

#### **f-Divergence**

https://en.m.wikipedia.org/wiki/F-divergence

The Hellinger distance is a type of f-divergence

https://en.m.wikipedia.org/wiki/Hellinger_distance

#### **Bregman Divergence**

https://en.m.wikipedia.org/wiki/Bregman_divergence

The squared Euclidean divergence is a Bregman divergence (corresponding to the function x<sup>2</sup>, but not an f-divergence

https://en.m.wikipedia.org/wiki/Euclidean_distance#Squared_Euclidean_distance

#### **Kullback–Leibler divergence**

The only divergence that is both an f-divergence and a Bregman divergence is the Kullback–Leibler divergence

https://en.m.wikipedia.org/wiki/Kullback–Leibler_divergence

#### **Jensen–Shannon divergence**

https://en.m.wikipedia.org/wiki/Jensen–Shannon_divergence