<a href="https://colab.research.google.com/github/deltorobarba/machinelearning/blob/master/geometry.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Differential (Information) Geometry**

In [0]:
import tensorflow as tf
import seaborn as sns
import matplotlib.pyplot as plt 
import numpy as np
import pandas as pd
print(tf.__version__)

2.2.0


# **Distance & Divergence**

**Conditions**

1. d(x, y) ≥ 0     (non-negativity)
2. d(x, y) = 0   if and only if   x = y     (identity of indiscernibles. Note that condition 1 and 2 together produce positive definiteness)
3. d(x, y) = d(y, x)     (symmetry)
4. d(x, z) ≤ d(x, y) + d(y, z)     (subadditivity / triangle inequality).

**Distances**

For continuous data:

* Euclidean Distance
* Manhattan Distance
* Canberra Distance
* Bray Curtis Distance
* Cosine Distance
* Correlation Distance

**Divergences**

* is a (contrast) function which establishes the "distance" of one probability distribution to the other on a statistical manifold. 
* divergence is a weaker notion than that of the distance, in particular the divergence need not be symmetric (that is, in general the divergence from p to q is not equal to the divergence from q to p), and need not satisfy the triangle inequality.
* The two most important divergences are the relative entropy (Kullback–Leibler divergence, KL divergence) and the squared Euclidean distance.
* Minimizing these two divergences is the main way that linear inverse problem are solved, via the principle of maximum entropy and least squares, notably in logistic regression and linear regression.
* The two most important classes of divergences are the f-divergences and Bregman divergences; however, other types of divergence functions are also encountered in the literature. The only divergence that is both an f-divergence and a Bregman divergence is the Kullback–Leibler divergence; the squared Euclidean divergence is a Bregman divergence (corresponding to the function x<sup>2</sup>), but not an f-divergence.

## **Find the similarity between two probability distributions**

Using Jensen Shannon Divergence to build a tool to find the distance between probability distributions using Python.

I was on a mission to find a good measure of difference between two probability distributions. After doing a lot of research online, taking feedback from my colleagues, and validating various methods, I found one that does a really good job.

My problem statement could be solved by calculating the statistical distance between the two probability distributions. To do this, I found out that Jensen Shannon Distance can be used.

Jensen-Shannon Divergence (JSD)is a metric derived from another measure of statistical distance called the Kullback-Leiber Divergence(KLD). The reason why I couldn’t use the KLD is that it’s an asymmetrical function. Since there might have been a lot of distance calculations required, it posed a risk.

JSD, on the other hand, is a symmetrical function and the square root of JSD gives the Jensen-Shannon Distance. A measure that we can use to find the similarity between the two probability distributions. 0 indicates that the two distributions are the same, and 1 would indicate that they are nowhere similar.

Where P & Q are the two probability distribution, M = (P+Q)/2, and D(P ||M) is the KLD between P and M. Similarly D(Q||M) is the KLD between Q and M.
Implementation in Python
Now that we know the formula, it’s time to implement it. First of all, we need to calculate M and also, the KLD between P&M and Q&M.
Scipy is a phenomenal Python Library for scientific computing and it has lots of statistical measures in-built. It turns out that the entropy measure in scipy is implemented using the KLD. Just what we want.

(I found it to be quite simple to implement it with python and I got really good results when I tested it with a few distributions.)


In [0]:
 # Create test data
p = np.random.rayleigh(3,3)
q = np.random.weibull(3,3)
p, q

(array([3.84848271, 4.18906706, 5.61567569]),
 array([1.036487  , 0.9192782 , 0.63485667]))

In [0]:
# https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.entropy.html
# Calculate the entropy of a distribution for given probability values
entropy([p, q], base=None)

array([0.51682914, 0.471327  , 0.32851553])

In [0]:
 m = (p + q) / 2

    # compute Jensen Shannon Divergence
divergence = (scipy.stats.entropy(p, m))

divergence

array([inf, inf, inf])

In [0]:
# Create function to compute distance
from scipy.stats import entropy
def jensen_shannon_distance(p, q):
    """
    method to compute the Jenson-Shannon Distance 
    between two probability distributions
    """

    # convert the vectors into numpy arrays in case that they aren't
    # p = np.array(p)
    # q = np.array(q)

    # calculate m
    m = (p + q) / 2

    # compute Jensen Shannon Divergence
    divergence = (scipy.stats.entropy(p, m) + scipy.stats.entropy(q, m)) / 2

    # compute the Jensen Shannon Distance
    distance = np.sqrt(divergence)

    return distance

In [0]:
print(jensen_shannon_distance(eins,zwei))

[inf inf inf]


https://en.wikipedia.org/wiki/Gromov%E2%80%93Hausdorff_convergence