# Problem

For positive integers aa and nn, aa modulo nn (written amodnamodn in shorthand) is the remainder when aa is divided by nn. For example, 29mod11=729mod11=7 because 29=11×2+729=11×2+7.

Modular arithmetic is the study of addition, subtraction, multiplication, and division with respect to the modulo operation. We say that aa and bb are congruent modulo nn if amodn=bmodnamodn=bmodn; in this case, we use the notation a≡bmodna≡bmodn.

Two useful facts in modular arithmetic are that if a≡bmodna≡bmodn and c≡dmodnc≡dmodn, then a+c≡b+dmodna+c≡b+dmodn and a×c≡b×dmodna×c≡b×dmodn. To check your understanding of these rules, you may wish to verify these relationships for a=29a=29, b=73b=73, c=10c=10, d=32d=32, and n=11n=11.

As you will see in this exercise, some Rosalind problems will ask for a (very large) integer solution modulo a smaller number to avoid the computational pitfalls that arise with storing such large numbers.

**Given**: A protein string of length at most 1000 aa.

**Return**: The total number of different RNA strings from which the protein could have been translated, modulo 1,000,000. (Don't neglect the importance of the stop codon in protein translation.)

Sample Dataset

In [5]:
DNA_codon_table = {
    'TTT': 'F',     'CTT': 'L',     'ATT': 'I',     'GTT': 'V',
    'TTC': 'F',     'CTC': 'L',     'ATC': 'I',     'GTC': 'V',
    'TTA': 'L',     'CTA': 'L',     'ATA': 'I',     'GTA': 'V',
    'TTG': 'L',     'CTG': 'L',     'ATG': 'M',     'GTG': 'V',
    'TCT': 'S',     'CCT': 'P',     'ACT': 'T',     'GCT': 'A',
    'TCC': 'S',     'CCC': 'P',     'ACC': 'T',     'GCC': 'A',
    'TCA': 'S',     'CCA': 'P',     'ACA': 'T',     'GCA': 'A',
    'TCG': 'S',     'CCG': 'P',     'ACG': 'T',     'GCG': 'A',
    'TAT': 'Y',     'CAT': 'H',     'AAT': 'N',     'GAT': 'D',
    'TAC': 'Y',     'CAC': 'H',     'AAC': 'N',     'GAC': 'D',
    'TAA': '-',     'CAA': 'Q',     'AAA': 'K',     'GAA': 'E',
    'TAG': '-',     'CAG': 'Q',     'AAG': 'K',     'GAG': 'E',
    'TGT': 'C',     'CGT': 'R',     'AGT': 'S',     'GGT': 'G',
    'TGC': 'C',     'CGC': 'R',     'AGC': 'S',     'GGC': 'G',
    'TGA': '-',     'CGA': 'R',     'AGA': 'R',     'GGA': 'G',
    'TGG': 'W',     'CGG': 'R',     'AGG': 'R',     'GGG': 'G'
}
def enumerateRNA_protein(aa_sequence):
    AAs = DNA_codon_table.values()
    possibilities = 1
    for i in aa_sequence:
        possibilities = possibilities * AAs.count(i)
    possibilities = possibilities * 3
    return possibilities % 1000000

In [6]:
enumerateRNA_protein("MA")

12

In [7]:
test = "MAQWDLYLNIWITHVHQLLQGPYWQTAKSQICNYAVINLKAWVTKRWDSLKWFTWHNNPECMSERDQHDEQITCMVHINFKPFKCFYKDHALSYYRYCFKWWFKMLHMFLVCKMGFCIDQTRPQEKCPMCITCVNRHGEGENTQWNVACGHMMNTRWFQVLPFHAHMEGDVPTPYTIRENGPSSLMYWKQRNYQKTKRDLRCYMPQYSNIQLCSVCYDLLPGDKVNKAKITYVPCQMYYDFSYAPCGQVFNVFNVFVPAEHNYRGEAHSYHYASDQKPIRTQQRTAYYMYIKRRSNSVVRKKWWEMGAHPQENEHYHDYGSCGFMSSIFWALMFLMFRSCMDWMVTYAGDHQTVIYWYSFFWVWLCMTRATRNCGGEPSPPEYIHCLVNIKYMPVNSWDPMVHERHCNPVMTQTGFHFQVMDDCPSTAMEWWRFNWTKVFSNNCNIHGWLMNHHVIKFHVEDKFCGLGFKTTCARPERKNVIINWDESAHHNEHMKGSLWVATGTRNYTRWDHWTGPMRQTHGPGLFASVWHHQCTWPIIYCPDTWRYAKVEFIPSYANYNRCTDGQTSRHLCHFYSNPRFVACRATHKLTMMGTGRYDKWGYEQSIDDFIYACLHSDLSWHIPISTYDDLGVGDENWNVTYTAIKGKVEMQWDSSARKYPPQDYHYSWINYWTFWKNWKHRKGRAQWLSMFDHGVACTHKFYFAVTAFQWTFCEDWYHITCLPNWYRYWVGCQAYGQYWFNWDIKLDSHEIWLLEIGARIQHSNDVPLNWSGANIRHWYEWGKQYQWDINRFGLKGAKRRIPDKCEWHRHTKHEVFIWRHTHITTWCGVTCYMQYKALHPAWTLHGHDKQYTGNAYETEFWTRNGFYFAFEIAPNPLERHVDFYVVPPCFQDAVTWPWINIHDFMATTKLRVHIQECMPDYIFGDALYQKGRVAQCYDATHVRTPAQAIHLSCGMQLCFMVNQKVSNHWSFDKPQTPHEFAQQKTCTFWIADAIVQY"

In [9]:
print enumerateRNA_protein(test)

570176
