# Calculating Distances

### Representing Points

Three different ways to define the distance between two points:

1. Euclidean Distance
2. Manhattan Distance
3. Hamming Distance

Note: we can only find the difference between two points if they have the same number of dimensions

In [1]:
two_d = [10, 2]
four_d = [12, 14, 22, 8]
five_d = [30, -1, 50, 0, 2]

#### Euclidean Distance
To find the Euclidean distance between two points, we first calculate the squared distance between each dimension. We then take the square root of this value.

In [2]:
def euclidean_distance(pt1, pt2):
    distance = 0
    for i in range(len(pt1)):
        distance += (pt1[i] - pt2[i]) ** 2
    return distance ** 0.5

print(euclidean_distance([1,2], [4,0]))
print(euclidean_distance([5,4,3],[1,7,9]))

3.605551275463989
7.810249675906654


#### Manhattan Distance

Rather than summing the squared difference between each dimension, we instead sum the absolute value of the difference between each dimension.

It’s similar to how you might navigate when walking city blocks. After all, you cannot move diagonally from point A to point B through buildings, you need to walk along the sidewalks. 

In [3]:
def manhattan_distance(pt1, pt2):
    distance = 0
    for i in range(len(pt1)):
        distance += abs(pt1[i] - pt2[i])
    return distance

print(manhattan_distance([1, 2], [4, 0]))
print(manhattan_distance([5, 4, 3], [1, 7, 9]))

5
13


#### Hamming Distance

Instead of finding the difference of each dimension, Hamming distance only cares about whether the dimensions are exactly equal. When finding the Hamming distance between two points, add one for every dimension that has different values.

Hamming distance is used in spell checking algorithms. For example, the Hamming distance between the word “there” and the typo “thete” is one. Each letter is a dimension, and each dimension has the same value except for one.

In [4]:
def hamming_distance(pt1, pt2):
    distance = 0
    for i in range(len(pt1)):
        if pt1[i] != pt2[i]:
            distance += 1
    return distance

print(hamming_distance([1,2], [1,100]))
print(hamming_distance([5,4,9],[1,7,9]))

1
2


### SciPy
the scipy implementation of Manhattan distance is called cityblock(). Remember, computing Manhattan distance is like asking how many blocks away you are from a point.

Second, the scipy implementation of Hamming distance will always return a number between 0 an 1. Rather than summing the number of differences in dimensions, this implementation sums those differences and then divides by the total number of dimensions. For example, in your implementation, the Hamming distance between [1, 2, 3] and [7, 2, -10] would be 2. In scipy‘s version, it would be 2/3.

In [6]:
from scipy.spatial import distance

In [7]:
print(distance.euclidean([1,2],[4,0]))
print(distance.cityblock([1,2],[4,0]))
print(distance.hamming([1,2],[4,0]))

3.605551275463989
5
1.0
