# Distance function basics

* Metrics measure distance between two items
* Norms measure size of something

## Eucledian distance

The most basic distance function in vector space, based on Pythagorean Theorem. $$d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}$$

In [4]:
A <- c(1,2)
B <- c(5,9)
dims <- c("x", "y")

m <- rbind(A, B)
colnames(m) <- dims
print(m)

d <- sqrt((5-1)^2 + (9-2)^2)
print(d)

  x y
A 1 2
B 5 9
[1] 8.062258


When applied to 3-dimensional space, the distance can be calculated as such:

$$d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2 + (z_2 - z_1)^2}$$

Thus, generalized formula for eucledian discance in N-dimensional space can be defined as follows

$$ d(a, b) = \sqrt{\sum_{i=1}^{n} (a_i - b_i)^2} $$

When implemented in R, the code should look something like this -

In [5]:
myEucledian <- function(A, B) {
  sum <- 0
  for(i in seq(along=A)) {
    sum <- sum + (A[i] - B[i])^2
  }
  return(sqrt(sum))
}

>Note that all examples in this file, such as the one above, are horribly bad because for loops in R are bad (and the author should feel bad).

In [6]:
d <- myEucledian(A, B)
print(d)

[1] 8.062258


In [16]:
A <- c(2,7,4)
B <- c(3,4,5)
dims <- c("x", "y", "z")

m <- rbind(A, B)
colnames(m) <- dims
print(m)

d <- myEucledian(A, B)
print(d)

  x y z
A 2 7 4
B 3 4 5
[1] 3.316625


## Manhattan distance

Sometimes the most direct path from point A to point B is not a straight line. Think taxycab that has to drive around buildings.

$$ d(a, b) = \sum_{i=1}^{n} \lvert a_i - b_i \lvert $$

In [17]:
myManhattan <- function(A, B) {
  sum <- 0
  for(i in seq(along=A)) {
    sum <- sum + abs(A[i] - B[i])
  }
  return(sum)
}

In [18]:
d <- myManhattan(A, B)
print(d)

[1] 5


## Chebyshev distance

Also known as "chessboard distance" where distance between two points is the greatest possible move size. Think chessboard where pieces can jump in any direction but moveset is limited.

$$ d(a, b) = \lim(\sum_{i=1}^{n}\lvert a_i - b_i \lvert^k)^{1/k} $$

$$ d(a, b) = max_i(\lvert  a_i - b_i \lvert) $$

Given two 3-dimensional vectors, the distance can be calculated as such

$$ d = max(\lvert x_2 - x_1 \lvert, \lvert y_2 - y_1 \lvert, \lvert z_2 - z_1 \lvert) $$

In [36]:
print(m)
d <- max( abs(3 - 2), abs(4 - 7), abs(5 - 4) )
print(d)

  x y z
A 2 7 4
B 3 4 5
[1] 3


In [37]:
# implement your own R function here
myCheb <- function(A, B) {
    dist <- 0
    return(dist)
}

d <- myCheb(A, B)
print(d)

[1] 0


## Canberra distance

Canberra distance is a weighted version of Manhattan distance, often used for comparing ranked lists.

$$ d = \sum_{i=1}^{n} \frac{\lvert x_i - y_i \lvert}{\lvert x_i \lvert + \lvert y_i \lvert} $$