# Distance Formula


**Representing points**

We will use lists to represent points, thus a 2D point would be `[2,3]`, while a 5D point would be `[1,2,3,4,5]`. To find the distance between two points, they must hae the same dimensions.


There are a number of ways to determine the distance between two points:

**Euclidean Distance**

Most commonly used distance formula. 
 - First calculate the squared difference between each dimension
 - 2nd add up all of these squared differences and take the square root

In [1]:
# calculate the Euclidean distance between two points
def euclidean_distance(pt1, pt2):
  distance = 0
  for i in range(len(pt1)):
    distance += (pt1[i] - pt2[i]) ** 2
  return distance ** 0.5 # square root

print(euclidean_distance([1,2], [4,0]))
print(euclidean_distance([5,4,3], [1,7,9]))

3.605551275463989
7.810249675906654


**Manhattan Distance**

Called Manhattan distance because it's similar to how you might navigate when walking city blocks. It will always be greater than or equal to Euclidean distance.
 - Sum the absolute value of the difference between each dimension


In [2]:
def manhattan_distance(pt1, pt2):
  distance = 0
  for i in range(len(pt1)):
    distance += abs(pt1[i] - pt2[i])
  return distance

print(manhattan_distance([1,2], [4,0]))
print(manhattan_distance([5,4,3], [1,7,9]))

5
13


**Hamming Distance**

Hamming distance only cares about whether the dimensions are exactly equal. When finding the Hamming distance between two points, add one for every dimension that has different values.

Used in spell checking algorithms. For example, the Hamming distance between the word "there" and the typo "thete" is one. Each letter is a dimension, and each dimension has the same value except for one.

In [3]:
def hamming_distance(pt1, pt2):
  distance = 0
  for i in range(len(pt1)):
    if pt1[i] != pt2[i]:
      distance += 1
  return distance

print(hamming_distance([1,2], [4,0]))
print(hamming_distance([5,4,3], [1,7,9]))

2
3


The Python `SciPy` library has an implementation of each of these methods:
 - Euclidean distance `.euclidean()`
 - Manhattan distance `.cityblock()`
 - Hamming distance `.hamming()`
 
Note:

`Scipy`'s implementation of Hamming distance will always return a number between 0 an 1. Rather than summing the number of differences in dimensions, it sums those differences and then divides by the total number of dimensions. For example, the Hamming distance between [1, 2, 3] and [7, 2, -10] is 2. In scipy's version, it would be 2/3.

In [4]:
from scipy.spatial import distance

print(distance.euclidean([1,2], [4,0]))
print(distance.cityblock([1,2], [4,0]))
print(distance.hamming([5,4,9], [1,7,9]))

3.605551275463989
5
0.6666666666666666
