# Finding a most likely common ancestor

**Common ancestor** of a set of organisms is the most recent individual from which all the organisms of the set  have descended.

If we have several homologous strands of DNA, then we can find an average-case DNA strand which represent the most likely common ancestor of the given strands -> **Consensus sequence**

## Profile matrix

![image.png](attachment:image.png)

**Profile matrix** of a collection of DNA strings is a 4×n matrix P in which P<sub>i,j</sub>, i={A,C,G,T} represents the number of times that nucleotide **i** occurs in the j-th position of all the strings...
We use profile matrix to extract **consensus** sequence.



## Consensus
![image.png](attachment:image.png)
A **consensus sequence** is a sequence of length n formed from profile matrix by taking the most common symbol at each position; the j-th symbol of  therefore corresponds to the symbol having the maximum value in the j-th column of the profile matrix. Of course, there may be more than one most common symbol, leading to multiple possible consensus strings.

In [1]:
sequences = ["ATCCAGCT",
             "GGGCAACT",
             "ATGGATCT",
             "AAGCAACC",
             "TTGGAACT",
             "ATGCCATT",
             "ATGGCACT"]

n = 8 # length of each sequence

profile_matrix = {
    'A': [0]*n,
    'C': [0]*n,
    'G': [0]*n,
    'T': [0]*n
    }

for dna in sequences:
    for position, nucleotide in enumerate(dna):
        profile_matrix[nucleotide][position] += 1
profile_matrix

{'A': [5, 1, 0, 0, 5, 5, 0, 0],
 'C': [0, 0, 1, 4, 2, 0, 6, 1],
 'G': [1, 1, 6, 3, 0, 1, 0, 0],
 'T': [1, 5, 0, 0, 0, 1, 1, 6]}

In [2]:
result = [] # list to save nucleotide with max count from each "row"

for position in range(n):
    max_count = 0
    max_nucleotide = None
    for nucleotide in ['A', 'C', 'G', 'T']:
        count = profile_matrix[nucleotide][position]
        if count > max_count:
            max_count = count
            max_nucleotide = nucleotide
    result.append(max_nucleotide)

consensus = ''.join(result)
consensus

'ATGCAACT'