<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Distance-Metrics" data-toc-modified-id="Distance-Metrics-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Distance Metrics</a></span><ul class="toc-item"><li><span><a href="#Manhattan-Distance" data-toc-modified-id="Manhattan-Distance-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Manhattan Distance</a></span><ul class="toc-item"><li><span><a href="#Formula" data-toc-modified-id="Formula-1.1.1"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Formula</a></span></li><li><span><a href="#Code-distance" data-toc-modified-id="Code-distance-1.1.2"><span class="toc-item-num">1.1.2&nbsp;&nbsp;</span>Code distance</a></span></li></ul></li><li><span><a href="#Euclidean-Distance-(Pythagorean-Distance)" data-toc-modified-id="Euclidean-Distance-(Pythagorean-Distance)-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Euclidean Distance (Pythagorean Distance)</a></span><ul class="toc-item"><li><span><a href="#Formula" data-toc-modified-id="Formula-1.2.1"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Formula</a></span></li><li><span><a href="#Code-distance" data-toc-modified-id="Code-distance-1.2.2"><span class="toc-item-num">1.2.2&nbsp;&nbsp;</span>Code distance</a></span></li></ul></li><li><span><a href="#Minkowski-Distance" data-toc-modified-id="Minkowski-Distance-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Minkowski Distance</a></span><ul class="toc-item"><li><span><a href="#Formula" data-toc-modified-id="Formula-1.3.1"><span class="toc-item-num">1.3.1&nbsp;&nbsp;</span>Formula</a></span></li><li><span><a href="#Code-distance" data-toc-modified-id="Code-distance-1.3.2"><span class="toc-item-num">1.3.2&nbsp;&nbsp;</span>Code distance</a></span><ul class="toc-item"><li><span><a href="#Comparing-previous-points-with-previous-distance-metrics" data-toc-modified-id="Comparing-previous-points-with-previous-distance-metrics-1.3.2.1"><span class="toc-item-num">1.3.2.1&nbsp;&nbsp;</span>Comparing previous points with previous distance metrics</a></span></li><li><span><a href="#Comparing-with-distance-metrics-with-new-points" data-toc-modified-id="Comparing-with-distance-metrics-with-new-points-1.3.2.2"><span class="toc-item-num">1.3.2.2&nbsp;&nbsp;</span>Comparing with distance metrics with new points</a></span></li></ul></li></ul></li></ul></li><li><span><a href="#Note-on-Sklearn's-Metrics" data-toc-modified-id="Note-on-Sklearn's-Metrics-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Note on Sklearn's Metrics</a></span></li></ul></div>

In [None]:
import numpy as np

# Distance Metrics

A way to describe the "closeness" of data points $\rightarrow$ proxy for similarity

## Manhattan Distance

Imagine a grid and you travel along a grid

> Does it matter what path we take along the grid?

![(From curriculum)](images/distance_manhattan.png)

### Formula

$$dist(A,B) = \sum_{k=1}^{N} |a_k - b_k| $$

### Code distance

Can use a for-loop but vectorization is usually very quick

In [None]:
a = np.array([2,-3,5])
b = np.array([1,1,3])

display(a)
display(b)

In [None]:
diffs = a - b
print('A - B')
display(diffs)

In [None]:
print('|A - B|')
abs_diff = np.abs(diffs)
display(abs_diff)

In [None]:
dist = np.sum(abs_diff)
print('sum(|A-B|)')
display(dist)

## Euclidean Distance (Pythagorean Distance)

Well-known for the Pythagorean Theorem

> _"As the crow flies"_

<img src='images/distance_euclidean.png' width = 50%/>

### Formula

$$dist(A,B) = \sqrt{ \sum_{k=1}^{N} (a_k - b_k)^2 } $$

### Code distance

In [None]:
a = np.array([2,3,5])
b = np.array([1,-1,3])

display(a)
display(b)

In [None]:
diffs = a - b
print('A - B')
display(diffs)

In [None]:
print('(A - B)^2')
sq_diffs = diffs * diffs
display(sq_diffs)

In [None]:
print('sum[(A - B)^2]')
sq_sum = np.sum(sq_diffs)
display(sq_sum)

In [None]:
dist = np.sqrt(sq_sum)
print('√sum[(A - B)^2]')
display(dist)

## Minkowski Distance

Used in a Normed Vector Space

Above were special cases of the Minkowski Distance

### Formula

$$dist(A,B) = (\sum_{k=1}^{N} |a_k - b_k|^c )^\frac{1}{c} $$

### Code distance

#### Comparing previous points with previous distance metrics

In [None]:
def minkowski(A,B,c=2):
    abs_diffs = np.abs(A-B)
    pow_diffs = np.power(abs_diffs, c)
    sum_diff = np.sum(pow_diffs)
    dist = np.power(sum_diff, 1/c)
    return dist

In [None]:
a = np.array([2,3,5])
b = np.array([1,-1,3])

display(a)
display(b)

In [None]:
# Manhattan Distance
minkowski(a,b,1)

In [None]:
# Euclidean Distance
minkowski(a,b,2)

In [None]:
# Higher Order Distance
minkowski(a,b,5)

#### Comparing with distance metrics with new points

In [None]:
a = np.array([2,3,10])
b = np.array([1,-1,3])

display(a)
display(b)

In [None]:
# Manhattan Distance
minkowski(a,b,1)

In [None]:
# Euclidean Distance
minkowski(a,b,2)

In [None]:
# Higher Order Distance
minkowski(a,b,5)

# Note on Sklearn's Metrics

> We can use other metrics or define our own distance metrics

https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.DistanceMetric.html