# Hamming Distance

#### How many places two strings differ

+ Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different
+ In other words, it measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other
+ It is used for error detection or error correction
+ It is used to quantify the similarity of DNA sequences
+ for checking the edit distance
 + edit distance is a way of quantifying how dissimilar two strings are to one another by counting the minimum number of operations required to transform one string into the other
 + Levenshtein distance

#### Levenshtein distance

+ This method was invented in 1965 by the Russian Mathematician Vladimir Levenshtein (1935-2017)
+ The distance value describes the minimal number of deletions, insertions, or substitutions that are required to transform one string (the source) into another (the target)
+ Unlike the Hamming distance, the Levenshtein distance works on strings with an unequal length

In [1]:
# pip install python-Levenshtein
# !pip install python-Levenshtein

In [2]:
from Bio.Seq import Seq

### Hamming Distance

In [3]:
# Hamming Distance
def hamming_distance(lhs, rhs):
    return len([(x,y) for x,y in zip(lhs,rhs) if x!=y])

In [4]:
seq1 = Seq("ACTAT")
seq2 = Seq("ACTTA")
seq3 = Seq("ACTT")

In [5]:
hamming_distance(seq1,seq2)

2

In [6]:
# if we calculate the hamming distance between two equal sequences it should return back us 0
hamming_distance(seq1,seq1)

0

In [7]:
hamming_distance(seq1,seq3)

1

### Levenshtein Distance

In [8]:
from Levenshtein import distance

In [10]:
distance(str(seq1),str(seq2))

2

In [11]:
print("Hamming Distance: ", hamming_distance(seq1,seq2))
print("Levenshtein Distance: ", distance(str(seq1), str(seq2)))

Hamming Distance:  2
Levenshtein Distance:  2


# Well Done!