# Euclidean Distance

![image-2.png](attachment:image-2.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

In [5]:
# importing the library
from scipy.spatial import distance

# defining the points
point_1 = (1, 2, 3)
point_2 = (4, 5, 6)
print(point_1, point_2)
euclidean_distance = distance.euclidean(point_1,point_2)
print(euclidean_distance)

(1, 2, 3) (4, 5, 6)
5.196152422706632


# Manhattan Distance
![image.png](attachment:image.png)

Formula for Manhattan Distance
Since the above representation is 2 dimensional, to calculate Manhattan Distance, we will take the sum of absolute distances in both the x and y directions. So, the Manhattan distance in a 2-dimensional space is given as:

![image.png](attachment:image.png)

![image.png](attachment:image.png)

n = number of dimensions

pi, qi = data points

Manhattan Distance is also known as city block distance

In [2]:
# computing the manhattan distance
manhattan_distance = distance.cityblock(point_1, point_2)
print('Manhattan Distance b/w', point_1, 'and', point_2, 'is: ', manhattan_distance)

Manhattan Distance b/w (1, 2, 3) and (4, 5, 6) is:  9


# Minkowski Distance

![image.png](attachment:image.png)

In [3]:
# computing the minkowski distance
minkowski_distance = distance.minkowski(point_1, point_2, p=3)
print('Minkowski Distance b/w', point_1, 'and', point_2, 'is: ', minkowski_distance)

Minkowski Distance b/w (1, 2, 3) and (4, 5, 6) is:  4.3267487109222245


The p parameter of the Minkowski Distance metric of SciPy represents the order of the norm. When the order(p) is 1, it will represent Manhattan Distance and when the order in the above formula is 2, it will represent Euclidean Distance.

In [4]:
# minkowski and manhattan distance
minkowski_distance_order_1 = distance.minkowski(point_1, point_2, p=1)
print('Minkowski Distance of order 1:',minkowski_distance_order_1, '\nManhattan Distance: ',manhattan_distance)

Minkowski Distance of order 1: 9.0 
Manhattan Distance:  9


Here, you can see that when the order is 1, both Minkowski and Manhattan Distance are the same. Let’s verify the Euclidean Distance as well:

In [6]:
# minkowski and euclidean distance
minkowski_distance_order_2 = distance.minkowski(point_1, point_2, p=2)
print('Minkowski Distance of order 2:',minkowski_distance_order_2, '\nEuclidean Distance: ',euclidean_distance)

Minkowski Distance of order 2: 5.196152422706632 
Euclidean Distance:  5.196152422706632


When the order is 2, we can see that Minkowski and Euclidean distances are the same.

So far, we have covered the distance metrics that are used when we are dealing with continuous or numerical variables. But what if we have categorical variables? How can we decide the similarity between categorical variables? This is where we can make use of another distance metric called Hamming Distance.

# Hamming Distance

Hamming Distance measures the similarity between two strings of the same length. The Hamming Distance between two strings of the same length is the number of positions at which the corresponding characters are different.

Let’s understand the concept using an example. Let’s say we have two strings:

“euclidean” and “manhattan”

Since the length of these strings is equal, we can calculate the Hamming Distance. We will go character by character and match the strings. The first character of both the strings (e and m, respectively) is different. Similarly, the second character of both the strings (u and a) is different. and so on.

Look carefully – seven characters are different, whereas two characters (the last two characters) are similar:
![image.png](attachment:image.png)

Hence, the Hamming Distance here will be 7. Note that the larger the Hamming Distance between two strings, the more dissimilar those strings will be (and vice versa).

In [7]:
# defining two strings
string_1 = 'euclidean'
string_2 = 'manhattan'

# computing the hamming distance
hamming_distance = distance.hamming(list(string_1), list(string_2))*len(string_1)
print('Hamming Distance b/w', string_1, 'and', string_2, 'is: ', hamming_distance)

Hamming Distance b/w euclidean and manhattan is:  7.0


In [8]:
# strings of different shapes
new_string_1 = 'data'
new_string_2 = 'science'
len(new_string_1), len(new_string_2)

(4, 7)

Down: This throws an error saying that the lengths of the arrays must be the same. Hence, Hamming distance only works when we have strings or arrays of the same length. These are some of the most commonly used similarity measures or distance matrices in Machine Learning.

In [9]:
# computing the hamming distance
hamming_distance = distance.hamming(list(new_string_1), list(new_string_2))

ValueError: The 1d arrays must have equal lengths.

# Key Takeaways
1. Distance metrics are used in supervised and unsupervised learning to calculate similarity in data points.
2. They improve the performance, whether that’s for classification tasks or clustering.
3. The four types of distance metrics are Euclidean Distance, Manhattan Distance, Minkowski Distance, and Hamming Distance.

# Conclusion
Distance metrics are a key part of several machine learning algorithms. They are used in both supervised and unsupervised learning, generally to calculate the similarity between data points. Therefore, understanding distance measures is more important than you might realize. Take k-NN, for example – a technique often used for supervised learning. By default, it often uses euclidean distance, a great distance measure, for clustering.

By grasping the concept of distance metrics and their mathematical properties, data scientists can make informed decisions in selecting the appropriate metric for their specific problem. Our BlackBelt program provides comprehensive training in machine learning concepts, including distance metrics, empowering learners to become proficient in this crucial aspect of data science. Enroll in our BlackBelt program today to enhance your skills and take your data science expertise to the next level.

# Frequently Asked Questions
Q1. What is the L1 L2 distance metric?
A. The L1 is calculated as the sum of the absolute values of the vector. The L2 norm is calculated as the square root of the sum of squared vector values.

Q2. What distance metrics are used in KNN?
A. Euclidean distance, cosine similarity measure, Minkowsky, correlation, and Chi-square, are used in the k-NN classifier.

Q3. What is a distance metric in clustering?
A. Distance metric is what most algorithms, such as K-Means and KNN, use for clustering.