There are various types of distance metrics that we used for either supervised learning(eg.KNN) or unsupervised learning(eg.Kmeans).
Below are the most commonly used distance metrics.
### 1. Minkowski distance
- Minkowski distance is a generalized distance metric.
![image-10.png](attachment:image-10.png)

Some common values of ‘p’ are:-
- p = 1, Manhattan Distance
- p = 2, Euclidean Distance
- p = infinity, Chebychev Distance

### 2. Manhattan Distance(city block distance)
- We can get the equation for Manhattan distance by substituting p = 1 in the Minkowski distance formula. The formula is:
- Manhattan Distance is the most preferable for high dimensional data over Euclidean distance.
![image-3.png](attachment:image-3.png)
![image-6.png](attachment:image-6.png)

### 3. Euclidean Distance
- Euclidean distance is the straight line distance between 2 data points in a plane.
- Euclidean distance is not scale in-variant which means that distances computed might be skewed depending on the units of the features.
- Typically, one needs to normalize the data before using Euclidean Distance.()
- Good to use for low dimensional data since it is more sensitive to curse of dimensionality.
- We can get the equation for Euclidean distance by substituting p = 2 in the Minkowski distance formula. The formula is:
![image-8.png](attachment:image-8.png)
![image-7.png](attachment:image-7.png)

### 4. Hamming Distance
- Hamming distance is a metric for comparing two binary data strings.
- It can be used for only binary strings of equal lengths.
- Suppose there are two strings 11011001 and 10011101. Since, this addition contains two 1s, the Hamming distance, d(11011001, 10011101) = 2.
![image-11.png](attachment:image-11.png)

### 5. Cosine Distance & Cosine Similarity
- Cosine distance & Cosine Similarity metric is mainly used to find similarities between two data points.
- As the cosine distance between the data points increases, the cosine similarity, or the amount of similarity decreases, and vice versa.
- Cosine similarity has often been used as a way to counteract Euclidean distance’s problem with high dimensionality. The cosine similarity is simply the cosine of the angle between two vectors.
- We use cosine similarity often when we have high-dimensional data and when the magnitude of the vectors is not of importance.
- Cosine similarity = Cos θ
- Cosine distance = 1- Cos θ
- In below example we can see how cosine distance varies with respect to similarity.
![image-9.png](attachment:image-9.png)

### 6. Chebyshev Distance
- Chebyshev distance is defined as the greatest of difference between two vectors along any coordinate dimension. In other words, it is simply the maximum distance along one axis.
![image-13.png](attachment:image-13.png)
![image-12.png](attachment:image-12.png)


### 1. Minkowski Distance

In [8]:
# taken values from examples of 5.Cosine Distance & Cosine Similarity,p=2 means its Euclidean Distance
from scipy.spatial import distance
print("figure-1 ==> ",distance.minkowski([0,6],[6,0],p=2))
print("figure-2 ==> ",distance.minkowski([2,0],[6,0],p=2))
print("figure-3 ==> ",distance.minkowski([3,5],[4,2],p=2))

figure-1 ==>  8.48528137423857
figure-2 ==>  4.0
figure-3 ==>  3.1622776601683795


### 2. Manhattan Distance

In [9]:
# taken values from examples of 5.Cosine Distance & Cosine Similarity
from scipy.spatial import distance
print("figure-1 ==> ",distance.cityblock([0,6],[6,0]))
print("figure-2 ==> ",distance.cityblock([2,0],[6,0]))
print("figure-3 ==> ",distance.cityblock([3,5],[4,2]))

figure-1 ==>  12
figure-2 ==>  4
figure-3 ==>  4


### 3. Euclidean Distance

In [11]:
# taken values from examples of 5.Cosine Distance & Cosine Similarity
from scipy.spatial import distance
print("figure-1 ==> ",distance.euclidean([0,6],[6,0]))
print("figure-2 ==> ",distance.euclidean([2,0],[6,0]))
print("figure-3 ==> ",distance.euclidean([3,5],[4,2]))

figure-1 ==>  8.48528137423857
figure-2 ==>  4.0
figure-3 ==>  3.1622776601683795


### 4. Hamming Distance

In [14]:
from scipy.spatial import distance
print(distance.hamming([1,1],[1,1]))
print(distance.hamming([1,1],[1,0]))
print(distance.hamming([1,1],[0,1]))

0.0
0.5
0.5


### 5. Cosine Distance & Cosine Similarity

In [15]:
# taken values from examples of 5.Cosine Distance & Cosine Similarity
from scipy.spatial import distance
print("figure-1 ==> ",distance.cosine([0,6],[6,0]))
print("figure-2 ==> ",distance.cosine([2,0],[6,0]))
print("figure-3 ==> ",distance.cosine([3,5],[4,2]))

figure-1 ==>  1.0
figure-2 ==>  0
figure-3 ==>  0.15633851226789253


### Summary of important distance metrics
![image.png](attachment:image.png)

##### Reference:
- https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa
- https://medium.com/@kunal_gohrani/different-types-of-distance-metrics-used-in-machine-learning-e9928c5e26c7#:~:text=Manhattan%20distance%20is%20usually%20preferred,similarity%20between%20two%20data%20points.
- https://pub.towardsai.net/5-most-commonly-used-distance-metrics-in-machine-learning-97c27527b011