## FIVE MOST POPULAR SIMILARITY MEASURES IMPLEMENTATION IN PYTHON

## Euclidean Distance:

#### Euclidean distance implementation in python:

Euclidean distance is also known as simply distance. When data is dense or continuous, this is the best proximity measure.

The Euclidean distance between two points is the length of the path connecting them.The Pythagorean theorem gives this distance between two points.

In [2]:
from math import*
 
def euclidean_distance(x,y):
 
    return sqrt(sum(pow(a-b,2) for a, b in zip(x, y)))
 
print (euclidean_distance([0,3,4,5],[7,6,3,-1]))

9.746794344808963


## Manhattan Distance:

In a more mathematical way of saying Manhattan distance between two points measured along axes at right angles.

In a plane with p1 at (x1, y1) and p2 at (x2, y2).

Manhattan distance = |x1 – x2| + |y1 – y2|

This Manhattan distance metric is also known as Manhattan length, rectilinear distance, L1 distance or L1 norm, Minkowski’s L1 distance, taxi-cab metric, or city block distance.

#### Manhattan distance implementation in python:

In [3]:
from math import*
 
def manhattan_distance(x,y):
 
    return sum(abs(a-b) for a,b in zip(x,y))
 
print (manhattan_distance([10,20,10],[10,20,20]))

10


## Minkowski distance:

The Minkowski distance is a generalized metric form of Euclidean distance and Manhattan distance.

![image.png](attachment:image.png)

In the equation, d^MKD is the Minkowski distance between the data record i and j, k the index of a variable, n the total number of variables y and λ the order of the Minkowski metric. Although it is defined for any λ > 0, it is rarely used for values other than 1, 2 and ∞.

Synonyms of Minkowski:
Different names for the Minkowski distance or Minkowski metric arise from the order:

* λ = 1 is the Manhattan distance. Synonyms are L1-Norm, Taxicab or City-Block distance. For two vectors of ranked ordinal variables, the Manhattan distance is sometimes called Foot-ruler distance.
* λ = 2 is the Euclidean distance. Synonyms are L2-Norm or Ruler distance. For two vectors of ranked ordinal variables, the Euclidean distance is sometimes called Spear-man distance.
* λ = ∞ is the Chebyshev distance. Synonyms are Lmax-Norm or Chessboard distance.

#### Minkowski distance implementation in python:

In [4]:
from math import*
from decimal import Decimal
 
def nth_root(value, n_root):
 
    root_value = 1/float(n_root)
    return round (Decimal(value) ** Decimal(root_value),3)
 
def minkowski_distance(x,y,p_value):
 
    return nth_root(sum(pow(abs(a-b),p_value) for a,b in zip(x, y)),p_value)
 
print (minkowski_distance([0,3,4,5],[7,6,3,-1],3))

8.373


## Cosine similarity:

Cosine similarity metric finds the normalized dot product of the two attributes. By determining the cosine similarity, we would effectively try to find the cosine of the angle between the two objects. The cosine of 0° is 1, and it is less than 1 for any other angle.

It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90° have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.

Cosine similarity is particularly used in positive space, where the outcome is neatly bounded in [0,1]. One of the reasons for the popularity of cosine similarity is that it is very efficient to evaluate, especially for sparse vectors.

#### Cosine similarity implementation in python:

In [1]:
from math import*
 
def square_rooted(x):
 
    return round(sqrt(sum([a*a for a in x])),3)
 
def cosine_similarity(x,y):
    
    numerator = sum(a*b for a,b in zip(x,y))
    denominator = square_rooted(x)*square_rooted(y)
    return round(numerator/float(denominator),3)
 
print (cosine_similarity([3, 45, 7, 2], [2, 54, 13, 15]))

0.972


## Jaccard similarity:

When we consider about Jaccard similarity this objects will be sets. The Jaccard similarity measures the similarity between finite sample sets and is defined as the cardinality of the intersection of sets divided by the cardinality of the union of the sample sets. Suppose you want to find Jaccard similarity between two sets A and B it is the ratio of cardinality of A ∩ B and A ∪ B

Jaccard Similarity J(A, B) = |A ∩ B| / |A ∪ B|

#### Jaccard similarity implementation:

In [12]:
from math import *
 
def jaccard_similarity(x,y):
    # print(type(*[set(x)]))
 
    intersection_cardinality = len(set.intersection(*[set(x), set(y)]))
    union_cardinality = len(set.union(*[set(x), set(y)]))
    return intersection_cardinality/float(union_cardinality)
 
print (jaccard_similarity([0,1,2,5,6],[0,2,3,5,7,9]))

0.375
