This task will review 3 common distance metrics types: **Euclidean**, **Manhattan**, and **Minkowski**.
There is a logical connection between the three, and we will take them precisely in this order.

## Euclidean Distance

**Euclidean distance** is the linear distance between two points (vectors) in either 2D or multi-dimensional space (straight line).

$d(p,q)=\sqrt{\sum _{i=1}^{n}(q_{i}-p_{i})^2}$

**Usage:** Measuring direct distance between any two points. Used for calculating the distance for *K-Means* clustering.

**Example:** Measure the distance between the two points $a(1, 2)$ and $b(-1, -3)$ that have no obstacles between them.

In [8]:
# Euclidean distance - vanilla calculation.
from math import *
a = (1, 2)
b = (-1, -3)
print(str(sqrt(sum(pow(x - y, 2) for x, y in zip(a, b)))))

5.385164807134504


In [6]:
# Euclidean distance - SciPy.
from scipy.spatial import distance
print(str(distance.euclidean(a, b)))

5.385164807134504


## Manhattan (Cityblock) Distance

**Manhattan distance** is the distance between two points (vectors) represented by the sum of the absolute differences of their Cartesian coordinates. This is the $p=1$ distance metric *(the p is coming from the Minkowski formula)*.

Manhattan Distance is also called *cityblock distance*, because it reflects how the taxi cars can't move from point to point in a straignt line in the city, and must take turns.

$d = |x1 - x2| + |y1 - y2|$

**Usage:**
- Measuring the distance when the path consists of many square turns. This usually reflects the distance on a real-life map.
- Being more prformant that the *Euclidean* distance algorithm, it can be used for measuring distances in distance for *K-Means* calculations, when the amount of dimensions is too high and calculation too costly. Here some *precision* is sacrfificed for the sake of *performance*.

Thus, the $p=1$ distance metric (Manhattan Distance metric) is the most preferable for high dimensional applications.

**Example:** Measure the distance between the two points $a(1, 2)$ and $b(-1, -3)$.

In [10]:
# Manhattan distance - vanilla calculation.
a = (1, 2)
b = (-1, -3)
print(sum(abs(x - y) for x, y in zip(a, b)))

7


In [11]:
# Manhattan distance - SciPy.
print(distance.cityblock(a, b))

7


## Minkowski Distance

**Minkowski distance** is the distance/similarity measurement between two points (vectors) in the normed vector space (N dimensional real space) and is a generalization of the **Euclidean** distance and the **Manhattan** distance, and adds a parameter, called the *order* or $p$, that allows different distance measures to be calculated.

$d=(\sum_{i=1}^n|x_i-y_i|^p)^\frac{1}{p}$

**Usage:** Used as a generalized distance metric. When $p = 1$, it serves as a **Manhattan** distance. When $p = 2$, it serves as a **Euclidean** distance. When $p = \infty$, it serves as a **Chebychev** distance.

It is common to use **Minkowski** distance when implementing a machine learning algorithm that uses distance measures as it gives control over the type of distance measure used for real-valued vectors via a hyperparameter $p$ that can be tuned.

**Example:** Measure the distance between the two points $a(1, 2)$ and $b(-1, -3)$.

In [14]:
# Minkowski  distance - vanilla calculation.
from math import *
from decimal import Decimal
def nth_root(value, n_root):
    root_value = 1 / float(n_root)
    return round(Decimal(value) ** Decimal(root_value), 3)
 
def minkowski_distance(x, y, p_value):
    return nth_root(sum(pow(abs(a - b), p_value) for a, b in zip(x, y)), p_value)

a = (1, 2)
b = (-1, -3)
print(minkowski_distance(a, b, 3))

5.104


In [15]:
# Minkowski distance - SciPy.
print(distance.minkowski(a, b, 3))

5.104468722001463
