# Problem 10: Consensus and Profile
A matrix is a rectangular table of values divided into rows and columns. An m×n matrix has m rows and n columns. Given a matrix A, we write Ai,j to indicate the value found at the intersection of row i and column j.

Say that we have a collection of DNA strings, all having the same length n. Their profile matrix is a 4×n matrix P in which P1,j represents the number of times that 'A' occurs in the jth position of one of the strings, P2,j represents the number of times that C occurs in the jth position, and so on (see below).

A consensus string c is a string of length n formed from our collection by taking the most common symbol at each position; the jth symbol of c therefore corresponds to the symbol having the maximum value in the j-th column of the profile matrix. Of course, there may be more than one most common symbol, leading to multiple possible consensus strings.

DNA Strings
>A T C C A G C T \
G G G C A A C T  \
A T G G A T C T \
A A G C A A C C \
T T G G A A C T \
A T G C C A T T \
A T G G C A C T

Profile
>A   5 1 0 0 5 5 0 0 \
C   0 0 1 4 2 0 6 1 \
G   1 1 6 3 0 1 0 0 \
T   1 5 0 0 0 1 1 6 

Consensus
>A T G C A A C T

Given: A collection of at most 10 DNA strings of equal length (at most 1 kbp) in FASTA format.

Return: A consensus string and profile matrix for the collection. (If several possible consensus strings exist, then you may return any one of them.)

Sample Dataset
>Rosalind_1 \
ATCCAGCT \
>Rosalind_2 \
GGGCAACT \
>Rosalind_3 \
ATGGATCT \
>Rosalind_4 \
AAGCAACC \
>Rosalind_5 \
TTGGAACT \
>Rosalind_6 \
ATGCCATT \
>Rosalind_7 \
ATGGCACT

Sample Output
>ATGCAACT \
A: 5 1 0 0 5 5 0 0 \
C: 0 0 1 4 2 0 6 1 \
G: 1 1 6 3 0 1 0 0 \
T: 1 5 0 0 0 1 1 6 

In [78]:
from collections import Counter
import numpy as np
import pandas as pd
def consensus(fastaString):
    # count bases in each sequence
    counts = [x for x in map(Counter,np.transpose(np.array([list(x.replace(">","").replace("\n","").split(",")[1]) for x in fastaString.split("\n>")])))]
    
    count_matrix = {}
    
    # a base may not occure in a sequence so needs a 0 count manually added
    for c in counts:
        for x in ['A','T','G','C']:
            if x not in c.keys():
                c[x] = 0
            
    for base in ['A','C','G','T']:
        count_matrix[base] = [x[base] for x in counts]
        
    # create a dataframe matrix of the results to put it in the desired format
    count_profile = pd.DataFrame(count_matrix).idxmax(axis=1).tolist()
    
    returnString = "".join(count_profile) + '\n'
    
    for base in ['A','C','G','T']:
        returnString += base + ": " + " ".join([str(x) for x in count_matrix[base]]) + "\n"
    
    
                
    print(returnString)
    

In [82]:
testFasta = """>Rosalind_1,
ATCCAGCT
>Rosalind_2,
GGGCAACT
>Rosalind_3,
ATGGATCT
>Rosalind_4,
AAGCAACC
>Rosalind_5,
TTGGAACT
>Rosalind_6,
ATGCCATT
>Rosalind_7,
ATGGCACT"""

In [83]:
consensus(testFasta)

ATGCAACT
A: 5 1 0 0 5 5 0 0
C: 0 0 1 4 2 0 6 1
G: 1 1 6 3 0 1 0 0
T: 1 5 0 0 0 1 1 6



In [84]:
fastaString = """>Rosalind_5236,
CCACTCCGCGTGTATCGTACAATCTTACCTCCAGATGCATAAAGGACGCCGTTCGTTCAT
AACGTGGAAGCGTCGCAATGGGTTGTCAGGTTGTTTACAGAAGACATCGACGTCAGAAAC
CCGCCTACGAGATGCGCGCCACCCCTATGACCGTTTAAATGAGTTGGTTCTACTTAGACT
GTAGTTTCCCTGTGTGGCTGTTCAGGTTTTACCCATATGGTGCCTAGTTCGAAAACTATT
TACTTATTGCTGGTGTATCTGGGACTCGCTTCTCAGATGCACATAGTGTAAAATTACGGG
CGTAACCTAAGATTAACCGAATCATATATTACTGGCTCTTCTCCTTGTGGCAGTCGGCAC
CCTGTTCTAATTCGGATTTGTAGGTGATTTCCCTCTGTCTGTACGTAACGTACTCTCTGA
ACGGATGAGACGCCCGCCTGGCCCAAAAAAAAACTAGGGGTTGTTCGAACTCATGGTCTT
ACCCATAACCCCTCTCTGGCGTCGACCAACGACGTTAACTCACTGCTCAGTGTGCCGAAT
GCCCATGTTAAGTCCAACCCGGGCTAGCCCGGAATTAGATGTTGAAACAAAATATGATTG
GTTGTTTTCACAACGCGCCCCTCTCCCCCGTCGGCCTTATTCTCGCATGCGCATGCTATA
AACTCATCCAGCGCGCTCGCAGGATGAGCCATGGACACGGCTACCTCCCACCCCCACGAC
CGACGTCCGATGGTCGTCCATAAGCCGCTCTGGGTACTTATTGACGTTTCTAAGAATTAT
AGGGCATGCGGCAGTCAAACCGTAGCGTTAACTCCCCTCCTGTAAATCCATGGCCATCTC
ACAGATCCATGCTATCCGGTTTCTGATTATTCGCGAGGGATCGTGATCAGTCAGCTGCCG
CACGCATGGGGATAAGTCGGGAATGTACACCTCTACTGAACCTTATGGGCAGTTCCCAGT
TCAGCCTTA
>Rosalind_5512,
CGCTGTTTTTAGTTTATTCCCGAGCGCACAAATCAATAGTTTTACTTCGCACAAGTTGAC
AGGTCTTATCCTCCAGCCGATGGGGCGCCTCACTCATATTGCTGTCAGCTAAGACCGTTG
AGCAACAAGTTAGGTGCTCCAACAGCCAGACGGACAAAAAATAAATTTTATCATAGGGGG
CGACATCGCAGCTATAACTCGCTAAACCTATTTTAGCTCGTGAGGGAATGTAGAGTATCT
TGCATCAGTCTTAGGGCTGATCCAGCCGTGAGTAGACAGTTGATGGTCATGAGGCGCTAG
GTGAGCTTTTAGTCCATGTCAAACTTTTGCTTACTCCCTCAAAGGACTCGGACAGGCAGT
GGGGCACGCAGGCCGCAAAGAATAGGGTGCGTCAATGGTTGTCCCAGCGCCTATACGGTT
CCAAGTAGGGGTATGATGCACTGTGTAAGTGAGGCGCGCCTTCAGAAAGTCATTTATTCC
GTGTTAATCCCCTATTCACATATCCGGCATCGTACGTCGTTCATGACGACGATCGGAACA
TGTACGACATAGAACGTCCAGTCAGTAACAAGGACAGTTTTCTTTAGAGGAACATGCAAG
CCTTGGTGTCGCTGTTACGAGGGTGGGTAACACCGGATGAGGATACGAGTCACAGTACGT
GTGCTCGGCAACGATGGTCATATTTAGGTTTTCTGTGTCAATCGTAATATCTGAGCCCCG
AGAGTCGTACGGGCTTGACATCAAATCCAATTGCTCCGTCTAAGCAGGGCCGAATACCGA
CAGTGACCTTGTGTTAGGGATGTATGAATGACACGTAGCAGACACGTCCCGCCAATTATT
GACCAAACGGAGAGGTTGGCCCATCATGCGACCTACACGAGGCTCAAGATTACTTAAACT
TGCGAAGGCCCGTTATCAACTCAAGCAACGACAGGATTCTACGTGGATTCACGTTCCAAG
GCTAGCAGC
>Rosalind_8921,
GGTTGAAGAATTTGGACGGTTCGCCCGGATATACGGGTATTAGTTCGGGGGTCTATCTAA
TCTGTTTCGTATAGAGAGCGAGAGGGTACAACGTTCGACGAATATGGTAGGTCACACCAC
CCCGACCAGCGATGGTAGGGTAGCGTTTGGCGCGATACACGATAGATTCATTCAACAAGA
TTCACATGTTAGAAGAAACACTCAAACGTGCGTGTTCTATCAGCGCCCTGTCATTCGACA
TGTTTACGGGATCATGCGTAACCAGACTTCGGAGGAAGCAGGTTAGAACCGTACACAGTT
CGCCCCGTCGGCAACAAGATCCCCAGGTCATGAGAGCCGATGAGGCCGCTACCTTTGGCT
GAGCTGTTAATTGAAGAGCTGGCATATCGATCCCGACCCACGCAGACATTTGGAAGGACG
GCGTACGACTACATATATCGGCTCAAACCGGGTAACATAGCGCTGATGCCTTTGAAAATA
CGTAGGGTGGTGGCCCTAGCTCGAAACCCCTAATAGGACCCTGACCACATGATGGTGTTG
ACGTAGGTGTTAATGGAGCATTTCCGCCGTATAGGCCAGGCATCCATATTAACTGTGGGG
TGTACGCAGGGTGTACAGGCTCAGTTATAATGTCAGACCACCTTAAGATCGTGTTGGCAT
GTGTTGGTCCGAAAGTCATACCACTTAGGCTACGTCCAACTGCACCCACACCAATTACTG
ATAGGCGCCTTTAGTCAAACTATAGCTTGTTAGTGCAGGGAATTAGAGACGAGGTGGAAC
TGGTTGAGGGCTGCAGAGCCCGAGAGGCGCGGAGGTTCTATCTATATGGAGGAATCGGAG
GTTGTTAACGACGCTGTGCCCTCGGCGAAGGCGCGCCAAGACTCTACTGAGAGCCGCTTA
AATTGGCCGAGCGCCCGGAAATATCTTGAACATGCTGGGGGATCCGCGACTCTGGTATGG
GCGCCCTTG
>Rosalind_6028,
CAATGAGGAGGATTAAGTGATTCAATCGTTCGTCAACTTGCCCAGTACCGCCGCGTGCAT
GCTGGTTGCTCAACGCCGTAGCTGCGAAACGGAAACCATACTGATAACACCATGCCACCA
GCGGGCAAGCAAATGGACGAAAGCAGGACTCCGGCAACATTATCGCGTACGCCGCGATTT
GCCACATGGACTTCTGCCTAGCCCCGAAACTAAAATTTTCCTCACTACGCCAATACATCA
AAGTTCCCGTGCAACGAAAGTATTATAATAAACGAAATTAAAGCATTATCAGGTTATACA
ACGCGATGGGCCACGGGCATGCTGCGAGTTTAGGTGCGTGATATGAGGAAAATCTGTCAA
TGCCCCACCACCGTCGATTGGGCCAAGCAATAGGTCTTACATCTTCTGCCCTGCTTGACT
CGCGTTGGAGGGTTATTATGACGTACAAGAAGTCCCCACTGAGCTGTGGTGGATCAGCCC
AACGTGAGCCCTCACCGCCCGTCTCGACTCTTGCTTGAAACGTAATGACCTAGATTCATG
TTCTGTCACTTGTGGCTAATCGGGTACAGGATAAAAAATCAGCATATCAACCATTAGAAA
TCGCGGCGTGGACCAGCTATGGGGTTCCCCCACTGGTGTAGTCTAATGTGTGAACTCTGG
GCTGATCTTAACATTGCTTTAATTGGAAGTCCCCGATCTAATCGGACTGAATGCTATCTC
GTTAATAGGTCAAGGTGGATTTTTCGATGCGACCACGGTGAGCGGTGTGTACGCTAATAG
ACGCTGGGAACTTGAACAAATTAACCTAATCAGATTGGACAACGGCTGCAAGACCAAGGT
TGTGAAACGTTAGTTTCTGAGTATCCTAGTCGTGTTGCGCAACTGGTTGACTCTTGAGTA
GATTTCGGCACAGGACATGAAACTTGTACATTTCTCGAAGCGTGGTTGGCTATCCCGGGT
AGCGAGTAA
>Rosalind_3565,
TAAGCTAAATCGTGTGCCCCTCGAATAGTAAATAGTGGATTTTCTATCAGGAGTTGTGGG
TGATACACACTGTGTGGAGGTAGTCAGCAACGAGAATCCGTTTCCCAGGCGATTCGCGCA
CGACACGATCCCTCCTCTCGCAAATGTTCGTCAAAAATTCGGTGGGCGTCACGACGATCC
CTAGCCAGATAGTAAAGTGCCGGACAAGCGCGTCCAGATGCCCACACCTGTCGGCCTCGA
TTTGCTCCTGCGTTATATGTCTGGCTCGAATCGATTGACCACTCCACTATGAACGGTTTC
ATGCAAGGTCTGCTAAGTCTACCGGCTAGATGAGGAGCAGAGAATCAAAGAAATTCAGTA
ACTTGACATATCCGTTCTTAGCAGCGAACGTTGTATAGCACCCCGCGGTGCACCCGGTAA
GGCTGATCGCATCAATAGTAATCATCTAGAACTTAGTGTATCCGCTCTGAGTTTATTCGT
CGTACAAGGTAAATGTCAACAACAATTACCGTGTCCCGGTCCCAAGCGATTCAAGAATCT
TTCTACAAGAAAAGGCGAAACACGCTTCAGGTTAAGTGCTTAAATGGCCGTATCCGCGTC
CGGACTCAGTAAGATCCTGAGCGGATTTCCCCCTTAACTCGATTTTTACAGAATCGTTTA
TATCAGATGAGGAGGCAAGGAAGTCTTTAACGCGAATCTATTTTCGATTCGCGATTCCTC
GCACGATGCACTATTGTGCTATCAGATCAATCTCGTTCACGGGCCGTACGGTTGTATAGT
GATCTATTGCAATCGCAGCGATTAGTCTAAGTCCACACCCCGGCATGGCTTAACTCCTTC
GTTCTAATACCGCAGTGGAACAATAACGGTAACATGTGGACGCGTGCTCTATCGGTGAAT
GTTCACTATTATGGAGATGGCTAAAGGTATAGATTCATCTCGTACCCGGAACTGGTAGTT
TTAAGTGCT
>Rosalind_9233,
AAGACAGAGTTGTGCATGGGAGACTGCACTACATGCGAATCGTACACTGGTATCCAAATT
TCCGGCGGTCGCCTGTACAGTGCCAGACGTGTTCGGTGCCTCTTACCGTGCTAATGTACA
GGCGTACAGCATATTCCAAGATGTGGTTTCCGGGCCCAACCAGCGCACTCGATAAGGCAA
TCTCGTTATTATCGTAGCCAGATATTCGTTGACATGGAGGCGGACGTAGATTCGGGTTCC
ATCTGCACTTAGTACGGTCATCCAGCTCGTCCTTCCGAAAATGATAATGCACTATGGTGA
CATGTAGTAACACTGATGGACCTAGCATCAGCGGTCTTTCGCTAGCTCCCCCAGTCCTTG
CATGAGTTTACGGGCGGGTCGTGCGGCACTACACAACAGCCCGTCCGACGTGCCGGCCGA
GGAGTCAGTATCGCAGACGTCTGTATTCGTGATCCCAACGGGTAGGCCCACCCCCGCACT
ACCCCGTTGATTGCATGGTCTTCGTAACTAGTAATATGGCACTCAATAACAGGCTCCTAC
TAGCATTGCATTAAGTTCATTGACTCGAGTGGTATTTCAATCATCATCCTATTGTCTAAC
AAAGGTTGGGTAGTGGGTTTCTCACACTTCCACTGCCGAACTGGTTATACAGTCCTTATT
TAGTGCTCCCCCCTCCCAACTTGTTTGCCATTAATTAGTGGATACGACAAAGGAAGTCAC
CCGTGGGACAAAGCAGGGAAGTATCGGGCCCTATATGCAGGCGACACACTTATACCTCTT
CTCAAAATGTGCGCTTTATTAGATCGGATAAGGGGATGTCTAGCTTTCATTCGCGGAACG
ACTTATTACTGGGTATCAGCCCGTGGCAAAAGTTCTGTGCCCCTTGGGCGTGGATACTCG
GTTCCCCCGGCAAGAACGCGATGATTGGCTTTGATTTCGTGGTCGGTGTGCAATTTAGAT
GTACGAGGT
>Rosalind_9713,
CAGTTTTACACAGTAGTTTTCCGGTCATATATGAGCGACAAAGAGGGTTACTAAGATCTT
TTGCGACGGAAAGGAGGGCGGAGTAGCTCTACGTAGCTGCGTTCACTCACACACGCTGGA
GCGTGGCATTGGGAATGGTGCCAATTGTACGACCGAGTGAGACAGCCACAGACAGACCCT
GTCGTCAAGGCGCATAACCTTTCGTCTATAATGCAACAACGACAACAGAGATCATTCTAG
TAGTGAGCCCGTGATAGGTCCGCTATATTGCACGCGCGTGGTGCGGGTCAGGAAGATCGG
CACTTGCGAGCGAGGGGACGTATCTTAGGTACAAACGCCACAGCGCCCGGCGCGCAAGCA
CCTATGACAAGTGAATCACTACTCTATAGTCCGAGCAACTCCCACACGCGTGATAGAGAT
ACGACAAAAATTAAACGAGGCAAATCAAAAGTACTCGGCTCGAATGGGGTTTACGGGGAT
GTGACTCATTCATTCGTGCGAAGCAACGAGAGGTAACAATAACCATTGCATTTCCACAGA
GGAGACATGCGTTGTACGCCCGGTACTTGCTTGGAAGCAAATAATTACCAGCGCCACAAC
ACCTGACGCGACTCACAGGGTGGCGTAGTGCCATATCATTGTGCTGACCCAACTACAAGC
GAGTCCCCAGGTCACGGTATCGTTAATTTACATATGGGGACGCGTCACTAAAGTAGATCC
GAGACGGTCAATGGCACCCGTAGCTATAATACACCACTGGGTGCTCTGAGCGTCGTCCAT
GTAGATACTCTCGTCTGCCTCCTATTGTCGGTCGGGACGCTGGTGGTCCAAAGACAACGG
ACGTGGCAGGAGAGCGCATCGGCCATGGTGGGAATCGTAAACCTCCATACCGCCGGATGG
GTGCTTCGACGGAGCTGTACTCACGTGGGGTATCATTCGCGTGTAGCTGTCGTTGACCCG
TCGTAAGTT
>Rosalind_0661,
GAGAATCGCCCCAGGCAATATGCTTGACCGACGGTCTGTCTGAGCATAGATATTCCACGA
GCCGGTGTAGCGTCGATAAGCCGACAAGTAGGTTTTAGAGCTTCCTGCCATCCAAAGCTC
ACCCGCATGTCCTCCGTAAGACCACGCATGAGCGACATCGGCTTTTCACATATGGGCGCG
CAGAAACAAAGCGACGTCTCAGGCCAATGTTCCTGGATCGTTGCGACCCGTATGGGAGGT
TGGGTGCTTCACACAGAGTCCGGAGCAAACTCTCCTTTTCCGACTTCTTGTGCCTGCGTC
GCTTAAATTATATTTTTCTTGACACATTCTTACCGGCTGGACAGACAACCTTTTACCGGG
GGCCCGTAACGTCGAACTGGAACTAGATAGGACAAGGACGGGTACCAATTTATCCGCTAG
CCAGCGCTAAGTCTCTATGTCTGCAGGTAGCGAGACGGTTTCTGCAATCTGCCTTTCGAG
CCTCGCCTTCGTGAAAGCATCCGCCTAATCATTTGTGCATCCTTTCAACGTTAATACCCA
AGCCGATCTACTGAATCAGCGTATGGGCGTTGCCATATGCTACTTAGAACGTAATGAGGC
TTACGTGAAAGAGCAAGCCCCTCGTCGAATAGGCCCCCAATGATGACTGATCATCACGGT
AAAGCATGGTTGGATTCAATCCGTGAGCTATACCCCGGCTGATTGTCACAAGAGCGGCCC
GGTGGCAAATCGAGACCTTGATTTGCCTATATTGATGATTGGTGACAACTGCATCGGACC
CTATACTCTTGCCTGATCGGATAGCGCTGACGGCGTCGAGCCTCTGACTCATAACGAGAA
AAGAGATAAAAAGGGTGTCAAAGATGCCTTAATTCTGGCGACCTCATGTATACCGCCCCA
GGTCACGAACGATATTCCCTGGAACCACCGCCACATAGAGGTAGATGTTGGTCTCGCATC
ATACGAAGG
>Rosalind_8821,
AGCCCCTTGGAATTCTGCATAGGGGTGAGCCTTAATGCGACGTAATCTAATAATACCGAT
GCACTGCTTAAGCTTACAACTAATTTTTACGGTCCATTGTCTTATCATGTCGCAATTTTC
TATTCGATGCGAGAGATTAGATAGCGACTATCACCTATGCTCTGTCGATAGCGCAGGGCG
CGTCCCGAGAGCCCTGTAACTCGAGACTATCCATCCGAGTGCCTTCCTTCTGCGGAACGC
TGCCCAGCCACATGATAGGTGCGTCGGTGGAGTAGCCTCTAATACGCCTCGCCGTTAGGA
GATTTTCGTAGCTAGATTTTTGCTCATTGTGCCATTGAGCTGATTCCACACCGAGTTCAA
TTAAATGGTCGGAGATTACGTGGTAATACTCCTAGGATCTGTAGCGTGTAATAGGGAGTA
GAGACCAAAGGCCGAAGTCGCCATGTCGCACACTAATAGGGTAAGTCACACGGCATCTCT
TTGGGTCACTGTGGAGGTGGTAAAGACCCTCGTCACTCCATCTCTAGTCGTCGTGTCAAC
GATTGTTTTATTTAGACTCACCTCATATCACTAAGATGTATTCAGTGCTCCTTGCGCCGG
GATTGTATCGAAAGAACAGAACGGGAGTCAGTCCTCCAGTGACGCTATGAAATTCACATC
CGTCGATTTATCAATACGCGTTGGCTCTAAGTGGCATTCAATTTCCTGTAATTACACGGG
TATGGCGAGGGTTATATATACCCCGTTGCTTCCCTGTGTTTCCTTCGATGGTAGTTGCGA
GAGCCTGTATGTTGCGGCCGCTTATGTTTGGGCTGAAGGCCCATCTTAGCTAAGGCTATG
AGTGAGTGAAGCCTGTCCGTGCAATCTGCCTAATATGATTACCATCGAACGATGAGTGTC
CTCTGGTGCTTATGGGCAATTCAGACTAGACTCCAACAGATTCGGTGTCTCAGGGACTCG
CGTATTACC
>Rosalind_7321,
GTTATATCTTCCCGGGTTACTGTTAAGCACCGTCACCGGAAAATTTGAATAGGTGTGCGG
CACTCAGATAATTCACGCGGCAGCGAAACCTAGGAAGCACAAGAATCGTTATGGTACGTA
CTTGACGGAGATAATATACCGTGATGAATACACGCTCTGTACCTCAAAATCATTAAACAT
CCTCGACAAAAAATGTGTCCATTTGAGTTCGTAGCCTGCGACGTTAACCCGGGTTAAGTT
AGATCCGGGAGAGCAGAACTAACCCGGACTAATGCAAACCCTTGTGGCGCCACGCGGGCA
TAGTGTAGTACCCACACGCACGAAAGGCATGAATAGGGGGCGAAGCTACCACCTCGATAT
GCCCGAATCACCTCTCAAAACGCGGTAGCCCGCTCACATGCTTCGGGAGAAATTGAGGGG
CTTGGTCGCAAAAACCGTCTATTGAGTCGAAGGTATTTATGGGACACAGGGCACAAAAGC
AGGGCTTGACGTTCGAACTTCGTCTTCACTCTTGTACGGTATACAGTAGCATGAAATCTC
CGTAACATCGAGGTAATCCTTCTGTCGACAACGTTGTTAAAATGTTACACGCTCAGGCGA
CCCGGGCCTGAGGTGGTGTCAGGCTTCACACCGTCCGTCGTTAGAGTGGTTCTACCAGTG
TGTGCTGCGTTTGGGGAAAGCAGCTACATTACCCCAATGCAGAGTAGATCGTGCGCGACT
ACGCAGCTACGGACCGCGGCTCTAGAATATAATTTTGATCGTGAGTGACACTCTAAAACC
CGGGGGTTAATTCTAGCGTTAGAGAGCCATGAACGTTGTACCTGTCTGGTATGAAATATC
AAAAGTTATAAATTGCGCTGCTGACTGTTCTTTGCACTCACCCGCTGCTGCACTAACTAG
AAGAATCCACAGCCGAACCTGGCTAAAGGGGCGATTGAAGGTGCGCGCGTCTTGCGCTAA
GAAGCTGCC"""


In [85]:
consensus(fastaString)

CAATCATGATCGTGGATTACTGGCTTAACTACTCACGAATTATACACCGGGAATGTTCATTCCGGTGATAAGTCAGAAAGTAGTGGAACAGGGTAATAAGATTATCACACCACACACCTACCCGACAAGCGATACGCACGAAGACGATTACCCGCAATACGATAGCCATAGACAAGACCTCTACCATAAAAGTATAGCCCGTCACACTTTCCACAGCTCGCCCACAACTGTAAGGCATCTTGCTTACCGCAGAAAGAGCTCCCACTCATGACTGCAAACCAGTCAGCTTCGAACTGCGGACAGTAACGTACCTTGATGCTACCACATTGTTCAGTCCCGGAGAAGCCACGAACTTGAGAAGCTCCGATAAGTCGAGAATGGGCCAGAACTCCCAAAAACTCTCCCCGACGTAACAGGGAACCGGCTAAAAGTATATATCGCTGTACAAGAAATCACGGCGGGCAGACAGTGCATAACACTACGACTATCCCTGCACGACCTACCAACCCCCTTTTACAGTCCTCAATAACTAGAGACAAATGCTATATCAAGAAGATCCACGGCTCGAGAATAAAATGAATATATAGCAAAATATGCAACCCTGGTCGCGAAGCACACGCCGGGTTCTCACCCTCCCCAAGTATAAATGCAAATCCAATTGAGTCATCCAGCAAGGCAAGCAGTTAAGTATTCGCAACCAATCGCAAATAATGACACCCCGCAGGCGACACGAGTGCGCATTTAGATTATTAGCTTGGTGGGGACCGACCGAAGTAGAATCAGCAATTATGTGTAAAGCGAGAACGGTTAGGACGTAGCCCCTATGTCCAAAAACAAATGAATGAAAAAAAGGTGTCGGCCTATCATGATAATTCTGGGAACCTCAGTAATACCGGCTCGGATCACCGACGATGAGCCAGACAAATAGCGCTACATTAAGGTTCGGGGGCCATTCCCAAGGCAAGAGCC
A: 2 5 3 3 1 4 2 3 3 2 2 3 1 1