# Problem 17: Inferring mRNA from Protein
Problem
For positive integers a and n, a modulo n (written amodn in shorthand) is the remainder when a is divided by n. For example, 29mod11=7 because 29=11×2+7.

Modular arithmetic is the study of addition, subtraction, multiplication, and division with respect to the modulo operation. We say that a and b are congruent modulo n if amodn=bmodn; in this case, we use the notation a≡bmodn.

Two useful facts in modular arithmetic are that if a≡bmodn and c≡dmodn, then a+c≡b+dmodn and a×c≡b×dmodn. To check your understanding of these rules, you may wish to verify these relationships for a=29, b=73, c=10, d=32, and n=11.

As you will see in this exercise, some Rosalind problems will ask for a (very large) integer solution modulo a smaller number to avoid the computational pitfalls that arise with storing such large numbers.

Given: A protein string of length at most 1000 aa.

Return: The total number of different RNA strings from which the protein could have been translated, modulo 1,000,000. (Don't neglect the importance of the stop codon in protein translation.)

Sample Dataset
>MA

Sample Output
>12

In [33]:
codons = """UUU:F\nCUU:L\nAUU:I\nGUU:V
UUC:F\nCUC:L\nAUC:I\nGUC:V
UUA:L\nCUA:L\nAUA:I\nGUA:V
UUG:L\nCUG:L\nAUG:M\nGUG:V
UCU:S\nCCU:P\nACU:T\nGCU:A
UCC:S\nCCC:P\nACC:T\nGCC:A
UCA:S\nCCA:P\nACA:T\nGCA:A
UCG:S\nCCG:P\nACG:T\nGCG:A
UAU:Y\nCAU:H\nAAU:N\nGAU:D
UAC:Y\nCAC:H\nAAC:N\nGAC:D
UAA:Stop\nCAA:Q\nAAA:K\nGAA:E
UAG:Stop\nCAG:Q\nAAG:K\nGAG:E
UGU:C\nCGU:R\nAGU:S\nGGU:G
UGC:C\nCGC:R\nAGC:S\nGGC:G
UGA:Stop\nCGA:R\nAGA:R\nGGA:G
UGG:W\nCGG:R\nAGG:R\nGGG:G"""

codon_dict = {}
for x in codons.split("\n"):
    temp = x.split(":")
    if temp[1] not in list(codon_dict.keys()):
        codon_dict[temp[1]] = [temp[0]]
    else:
        codon_dict[temp[1]] = codon_dict[temp[1]] + [temp[0]]

In [34]:
def inferRNA(protienSeq):
    posibleSeq = 1
    for prot in protienSeq:
        posibleSeq = posibleSeq * len(codon_dict[prot])
    return posibleSeq * 3 % 1000000

In [35]:
inferRNA("MA")

12

In [36]:
actualInput = "MQSPWAWYSMKELVGHILNPSEVGYRYCALYDCCRMYVGAHFLVAITWESQHEKNLYKGLWPWQLICHASSQQMMGFFPEPVDVFVEDQQDEMHAPHNENALLMGTWSSNQYVTNKLWDALMHKDQIKPPMWPGPEWFDWWASEGCMMFQSFLKQMMRWVRSPWIENLIYPLHRIWANDLCCFRMHWKGDERDPDFMLDTRWSYYVARLLNMKCKTTLWNCWRGTRCYWFNHPVMVRYYRHRLPHPDIQRGDKMCDYTPRAAWYASKHEFMRCEAGMYPLCYMKPYDWSTHDWRDWTSVDLCVGWQRHHHTNELYESGCQTEWRSPTLFDYNCPVYSNPFNFGPKACKKEWDTRHQRNWWPFRDNIGRGTCDFSFVGSRDRMWGCCMTSMIWVFPDPWCGTVFMMKTQFYQTSGFCKENYRKMDLMLGYHLTYPASCMGVYFHVEYPTPAHRRKMWAMAYTHCFQTKNLVVLHWENSGKVAPLHPVADFVCLELPDVQETLEPIARTCHMMFILKFGCNMTNDTPSIRWEEIAIYDIRWKFGMVTFQIPIYLPGAPPVEDSYMRMRFLPYKHEDQYPKVFSGKMQSKMHSMQVGWGEFVHVWCHWRKPMMAKHAGYAEAATGLNWMFYQKRPHFIFIQLEAADKEGCAPCLNLCYYGVYPCDYMGGWWHVMSEQWEVKKFRQDQLMEPKMKKENWDHHHMGHGWAWQCICEWPANDGEARKMDFGFPYTYWWIFKYYILKASHNHKGAITFFNIQCADHKDCIHCQHSLPAEHYLMHVKIQRSGQSNVMQDNIKLPLHYDHMINACYFAMPYCGDTFRMADPWCDDEMVISGEDQDIWYGQNMPIMDRMMKNPDKCGKGSCCIFHDFMTVACHNNRHQYNNTRDMLPVNHTHMHAHTKEHPHRFYFHPDVHMCCEPHDSCKSNALELHNDHTMMGINEIQLFGAMLILWLIPGLKSCCNQYGYGSMIYCEDIDGQLATATWMVTWGMPYQEDIPWMPE"

In [37]:
inferRNA(actualInput)

715776