# Distance Metrics

Despite all of its virtues, correlation (and covariance matrices) suffers from some critical limitations as a measure of codependence.

How do we overcome this limitations?
- information theory in general, and the concept of Shannon’s entropy in particular, also have useful applications in finance.
- quantify the amount of uncertainty associated with a random variable
- Information theory is also essential to ML, because the primary goal of many ML algorithms is to reduce the amount of uncertainty involved in the solution to a problem.

The explained distance metrics are useful for:
1. defining the objective function in decision tree learning
2. defining the loss function for classification problems
3. evaluating the distance between two random variables
4. comparing clusters
5. feature selection

For the math we refer to the book: Machine Learning for Asset Managers by Marcos Lopez de Prado

## Marginal, Joint, Conditional Entropies, and Mutual Information

In [None]:
import numpy as np,scipy.stats as ss
from sklearn.metrics import mutual_info_score

x = np.random.normal(0,1,1000)
y = np.random.normal(0,1,1000)
bins = 100
cXY = np.histogram2d(x,y,bins)[0]

# entropy
hX = ss.entropy(np.histogram(x,bins)[0]) # marginal x
hY = ss.entropy(np.histogram(y,bins)[0]) # marginal y

iXY = mutual_info_score(None,None,contingency=cXY) # mutual information
iXYn = iXY / min(hX,hY) # normalized mutual information

hXY = hX + hY - iXY # joint entropy

hX_Y = hXY - hY # conditional xy
hY_X = hXY - hX # conditional yx


In [None]:
print('H(X): \t\t\t %.3f bits' % hX, "\t Marginal Entropy")
print('H(Y): \t\t\t %.3f bits' % hY, "\t Marginal Entropy")
print('H(X,Y): \t\t %.3f bits' % hXY, "\t Joint Entropy")
print('H(X|Y): \t\t %.3f bits' % hX_Y, "\t Conditional Entropy")
print('H(Y|X): \t\t %.3f bits' % hY_X, "\t Conditional Entropy")
print('I(X,Y): \t\t %.3f bits' % iXY, "\t Mutual Information")
print('I(X,Y)/min(H(X),H(Y)): \t %.3f' % iXYn, "\t\t Normalized Mutual Information")

## Mutual Information, Variation of Information, and normalized variation of information

In [None]:
import numpy as np,scipy.stats as ss
from sklearn.metrics import mutual_info_score
#- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
def varInfo(x,y,bins,norm=False):
    # variation of information
    cXY=np.histogram2d(x,y,bins)[0]
    iXY=mutual_info_score(None,None,contingency=cXY)
    hX=ss.entropy(np.histogram(x,bins)[0]) # marginal
    hY=ss.entropy(np.histogram(y,bins)[0]) # marginal
    vXY=hX+hY-2*iXY # variation of information
    if norm:
        hXY=hX+hY-iXY # joint
        vXY/=hXY # normalized variation of information
    return vXY

In [None]:
print('VI(X,Y): \t %.3f bits' % varInfo(x,y,bins), "\t Variation of Information")

## Variation of Information on Discretized Continuous Random Variables

Modified Variation of Information so that it now incorporates the optimal binning derived by function numBins.

In [None]:
def numBins(nObs,corr=None):
    # Optimal number of bins for discretization
    if corr is None: # univariate case
        z=(8+324*nObs+12*(36*nObs+729*nObs**2)**.5)**(1/3.)
        b=round(z/6.+2./(3*z)+1./3)
    else: # bivariate case
        b=round(2**-.5*(1+(1+24*nObs/(1.-corr**2))**.5)**.5)
    return int(b)

def varInfo(x,y,norm=False):
    # variation of information
    bXY=numBins(x.shape[0],corr=np.corrcoef(x,y)[0,1])
    cXY=np.histogram2d(x,y,bXY)[0]
    iXY=mutual_info_score(None,None,contingency=cXY)
    hX=ss.entropy(np.histogram(x,bXY)[0]) # marginal
    hY=ss.entropy(np.histogram(y,bXY)[0]) # marginal
    vXY=hX+hY-2*iXY # variation of information
    if norm:
        hXY=hX+hY-iXY # joint
        vXY/=hXY # normalized variation of information
    return vXY

In [None]:
print('VI(X,Y): \t %.3f bits' % varInfo(x,y), "\t Modified Variation of Information")

## Correlation and Normalizzed Mutual Information  of Two Independent Gaussian Random Variables

In [None]:
def mutualInfo(x,y,norm=False):
    # mutual information
    bXY=numBins(x.shape[0],corr=np.corrcoef(x,y)[0,1])
    cXY=np.histogram2d(x,y,bXY)[0]
    iXY=mutual_info_score(None,None,contingency=cXY)
    if norm:
        hX=ss.entropy(np.histogram(x,bXY)[0]) # marginal
        hY=ss.entropy(np.histogram(y,bXY)[0]) # marginal
        iXY/=min(hX,hY) # normalized mutual information
    return iXY

size,seed=5000,0
np.random.seed(seed)
x=np.random.normal(size=size)
e=np.random.normal(size=size)


In [None]:
import plotly.graph_objects as go

y=0*x+e
nmi=mutualInfo(x,y,True)
corr=np.corrcoef(x,y)[0,1]

fig=go.Figure()
fig.add_trace(go.Scatter(x=x,y=y,mode='markers',marker=dict(color='black',size=3)))
fig.update_layout(title='No Relationship (Corr: %.3f, NMI: %.3f)' % (corr,nmi),height=600,width=600)
fig.show()

Results: Correlation and Normalized Mutial Information (NMI) are close to 0 so we conclude the two random variables are unrelated.

In [None]:
import plotly.graph_objects as go

y=100*x+e
nmi=mutualInfo(x,y,True)
corr=np.corrcoef(x,y)[0,1]

fig=go.Figure()
fig.add_trace(go.Scatter(x=x,y=y,mode='markers',marker=dict(color='black',size=3)))
fig.update_layout(title='Linear Relationship (Corr: %.3f, NMI: %.3f)' % (corr,nmi),height=600,width=600)
fig.show()

Results: Correlation is 1 but NMI is approx 0.9 so NMI is more sensitive to the degree of uncertainity associated with e.

In [None]:
import plotly.graph_objects as go

y=100*abs(x)+e
nmi=mutualInfo(x,y,True)
corr=np.corrcoef(x,y)[0,1]

fig=go.Figure()
fig.add_trace(go.Scatter(x=x,y=y,mode='markers',marker=dict(color='black',size=3)))
fig.update_layout(title='Nonlinear Relationship (Corr: %.3f, NMI: %.3f)' % (corr,nmi),height=600,width=600)
fig.show()

Results: There clearly exists a strong relationship between the two random variables, but the correlation is still near 0 and thus fails to recognize the strong relationship. NMI is not one but significantly higher than 0 and thus recognizes that there is a substantial amount of information shared between the two random variables. In Fact it is not 1 because there are two alternative values of x associated with each value of y.