# Hamming Distance

The Hamming distance between two strings having the same length is the minimum number of symbol substitutions required to transform one string into the other. If the strings are given by s1 and s2, then we write the Hamming distance between them as $d_{\mathrm{H}}\left(s_{1}, s_{2}\right)$. 

We can compute the Hamming distance by visual inspection: the Hamming distance between two strings is simply the number of positions in the strings at which corresponding symbols differ. 

[More information](https://en.wikipedia.org/wiki/Hamming_distance)

## Counting Point Mutations

In [1]:
# Importing libraries
import sys
# This python library allow ANSII 
# Color formatting output
from termcolor import colored, cprint

In [2]:
# Sequence 01
seq1 = 'GAGCCTACTAACGGGAT'

# Sequence 01
seq2 = 'CATCGTAATGACGGCCT'

In [3]:
# Counting the mutated and non mutated nucleotide
non_mutate = mutate = 0

#  Sequence length
sequence_length = len(seq1)

# Index position for mutated nucleotide
mutate_position = list()
    
for i in range(sequence_length):
    if seq1[i] != seq2[i]:
        mutate+=1
        mutate_position.append(i)
    else:
        non_mutate+=1

print(f"Point-mutation: {mutate}")

Point-mutation: 7


In [4]:
# Percentage of point-mutation
per_mutate = round((mutate/sequence_length), 2) * 100

print(f"{per_mutate} % of the total sequence is point mutated.")

41.0 % of the total sequence is point mutated.


In [5]:
mutate_position

[0, 2, 4, 7, 9, 14, 15]

## Visualising point mutation

In [6]:
# Sequence 01
for i in range(len(seq1)):
    if i in mutate_position:
        cprint(seq1[i], 'red', end='')
    else:
        cprint(seq1[i], 'grey', end='')

print("\n")

# Sequence 02
for i in range(len(seq2)):
    if i in mutate_position:
        cprint(seq2[i], 'red', end='')
    else:
        cprint(seq2[i], 'grey', end='')
    

[31mG[0m[30mA[0m[31mG[0m[30mC[0m[31mC[0m[30mT[0m[30mA[0m[31mC[0m[30mT[0m[31mA[0m[30mA[0m[30mC[0m[30mG[0m[30mG[0m[31mG[0m[31mA[0m[30mT[0m

[31mC[0m[30mA[0m[31mT[0m[30mC[0m[31mG[0m[30mT[0m[30mA[0m[31mA[0m[30mT[0m[31mG[0m[30mA[0m[30mC[0m[30mG[0m[30mG[0m[31mC[0m[31mC[0m[30mT[0m

The Hamming distance between these two strings is 7. Mismatched symbols are colored red.