# Creating transcription and translation functions

In this notebook you will learn how to create a function that takes a DNA sequence and outputs the corresponding RNA sequence (transcription) and a function that takes an RNA sequence and outputs the corresponding peptide sequence (translation).

### *Pair programming: Choose a typing partner and a guiding partner for the first function!*

In the "Central Dogma" game you only transcribed and translated the first 10 codons of each protein. For this coding exercise you will use the entire DNA sequences (400-1,000 base pairs!). 
<br>
<br>
The sequences are saved as text files so you will need to create a variable for each gene and use the open() function to assign each sequence to a variable. 

In [1]:
# Create a variable for each gene and use the open() function 
# to open each text file and assign its contents to a variable
HisDNAFile = open("Histamine_receptor_DNA.txt")
KerDNAFile = open("Keratin_DNA.txt")
ColDNAFile = open("Collagen_DNA.txt")

# print out the type of each variable
print(type(HisDNAFile))
print(type(KerDNAFile))
print(type(ColDNAFile))

<class '_io.TextIOWrapper'>
<class '_io.TextIOWrapper'>
<class '_io.TextIOWrapper'>


You need to "read" the sequence in each variable you just created and store it in a new variable so that we have a string we can work with. 

In [2]:
# Read the sequences into new variables. Hint: you will need to use the .read() function.
HisDNA = HisDNAFile.read()
KerDNA = KerDNAFile.read()
ColDNA = ColDNAFile.read()

# print out the type of each new variable
print(type(HisDNA))
print(type(KerDNA))
print(type(ColDNA))

<class 'str'>
<class 'str'>
<class 'str'>


Now that we've assigned our sequences to variables, we can get started on the first function, "transcribe". This function will take a DNA sequence as an argument and return the corresponding RNA sequence.
<br>
<br>
Remember, A -> U, T -> A, G -> C, and C -> G

In [3]:
# Create your transcribe function here. Hint: You will need a "for loop" and conditionals.
def transcription(DNAseq):
    RNA = ""
    for base in DNAseq:
        if base == "C":
            RNA += "G"
        elif base == "G":
            RNA += "C"
        elif base == "A":
            RNA += "U"
        elif base == "T":
            RNA += "A"
        else:
            continue
    return RNA 

In [4]:
# Call your function for the first sequence and print out the results
print(transcription(HisDNA))

AUGGCACCCAAUGGCACAGCCUCUUCCUUUUGCCUGGACUCUACCGCAUGCAAGAUCACCAUCACCGUGGUCCUUGCGGUCCUCAUCCUCAUCACCGUUGCUGGCAAUGUGGUCGUCUGUCUGGCCGUGGGCUUGAACCGCCGGCUCCGCAACCUGACCAAUUGUUUCAUCGUGUCCUUGGCUAUCACUGACCUGCUCCUCGGCCUCCUGGUGCUGCCCUUCUCUGCCAUCUACCAGCUGUCCUGCAAGUGGAGCUUUGGCAAGGUCUUCUGCAAUAUCUACACCAGCCUGGAUGUGAUGCUCUGCACAGCCUCCAUUCUUAACCUCUUCAUGAUCAGCCUCGACCGGUACUGCGCUGUCAUGGACCCACUGCGGUACCCUGUGCUGGUCACCCCAGUUCGGGUCGCCAUCUCUCUGGUCUUAAUUUGGGUCAUCUCCAUUACCCUGUCCUUUCUGUCUAUCCACCUGGGGUGGAACAGCAGGAACGAGACCAGCAAGGGCAAUCAUACCACCUCUAAGUGCAAAGUCCAGGUCAAUGAAGUGUACGGGCUGGUGGAUGGGCUGGUCACCUUCUACCUCCCGCUACUGAUCAUGUGCAUCACCUACUACCGCAUCUUCAAGGUCGCCCGGGAUCAGGCCAAGAGGAUCAAUCACAUUAGCUCCUGGAAGGCAGCCACCAUCAGGGAGCACAAAGCCACAGUGACACUGGCCGCCGUCAUGGGGGCCUUCAUCAUCUGCUGGUUUCCCUACUUCACCGCGUUUGUGUACCGUGGGCUGAGAGGGGAUGAUGCCAUCAAUGAGGUGUUAGAAGCCAUCGUUCUGUGGCUGGGCUAUGCCAACUCAGCCCUGAACCCCAUCCUGUAUGCUGCGCUGAACAGAGACUUCCGCACCGGGUACCAACAGCUCUUCUGCUGCAGGCUGGCCAACCGCAACUCCCACAAAACUUCUCUGAGGUCCAACGCCUCUCAGCUGUCCAGGACCCAAAGCCGAGAACCCAGGC

You can save your results as a text file! First, open a new text file with whatever name you would like call your results. Include the "w" argument to make sure this is a writeable file.

In [5]:
# Example: f = open("HisRNA.txt", "w")
f = open("HisRNA.txt", "w")

Next, use the .write() function with your function call as the argument to create the file.

In [6]:
# Write a new text file!
f.write(transcription(HisDNA))
f.close()

Check your downloads folder (or wherever you've put this Jupyter Notebook file) to see if you've created a new text file.
<br>
<br>
Now create text files for the other two RNA sequences (you'll need to call your transcription function with a new argument each time).

In [7]:
f = open("KerRNA.txt", "w")
f.write(transcription(KerDNA))
f.close()

f = open("ColRNA.txt", "w")
f.write(transcription(ColDNA))
f.close()

### *Pair programming swap: Typing partner becomes the guiding partner and vice versa*

Congratulations! You've transcribed DNA sequences and created mRNA. Now it's time to translate that message and find the amino acid sequence of the protein.
<br>
<br>
Here's a chart to help you out:

<img src="https://archive.manylabs.org/file/lessonMedia/69/geneticCode.png" width="500px" height="500px" align="left" />

You need to create a function that takes an RNA sequence as its argument and returns the correct amino acid sequence.

Here are some hints:
- x % y finds the remainder of x divided by y
- you can check for more than one option in a conditional like this:
if x < 3 or x > 6:
- You will need a counter, a "for loop", and conditionals

In [8]:
# Create your function here

def translation(RNAseq):
    c = 0
    prot = ""
    codon = ""
    for base in RNAseq:
        c += 1
        if c % 3 == 0:
            codon += base
            if codon == "UUU" or codon == "UUC":
                prot += "F"
            elif codon == "UUA" or codon == "UUG" or codon == "CUU" or codon == "CUC" or codon == "CUA" or codon == "CUG":
                prot += "L"
            elif codon == "AUU" or codon == "AUC" or codon == "AUA":
                prot += "I"
            elif codon == "AUG":
                prot += "M"
            elif codon == "GUU" or codon == "GUC" or codon == "GUA" or codon == "GUG":
                prot += "V"
            elif codon == "UCU" or codon == "UCC" or codon == "UCA" or codon == "UCG":
                prot += "S"
            elif codon == "CCU" or codon == "CCC" or codon == "CCA" or codon == "CCG":
                prot += "P"
            elif codon == "ACU" or codon == "ACC" or codon == "ACA" or codon == "ACG":
                prot += "T"
            elif codon == "GCU" or codon == "GCC" or codon == "GCA" or codon == "GCG":
                prot += "A"
            elif codon == "UAU" or codon == "UAC":
                prot += "Y"
            elif codon == "UAA" or codon == "UAG" or codon == "UGA":
                prot += "STOP"
            elif codon == "CAU" or codon == "CAC":
                prot += "H"
            elif codon == "CAA" or codon == "CAG":
                prot += "Q"
            elif codon == "AAU" or codon == "AAC":
                prot += "N"
            elif codon == "AAA" or codon == "AAG":
                prot += "K"
            elif codon == "GAU" or codon == "GAC":
                prot += "D"
            elif codon == "GAA" or codon == "GAG":
                prot += "E"
            elif codon == "UGU" or codon == "UGC":
                prot += "C"
            elif codon == "UGG":
                prot += "W"
            elif codon == "CGU" or codon == "CGC" or codon == "CGA" or codon == "CGG":
                prot += "R"
            elif codon == "AGU" or codon == "AGC":
                prot += "S"
            elif codon == "AGA" or codon == "AGG":
                prot += "R"
            elif codon == "GGU" or codon == "GGC" or codon == "GGA" or codon == "GGG":
                prot += "G"
            else:
                codon = ""
            codon = ""
            c = 0
        else:
            codon += base
    return prot


Call your function for one of your RNA files (You can either read in one of the text files you just created or just call your first function to get an RNA sequence). Print out the results!

In [9]:
print(translation(transcription(HisDNA)))

MAPNGTASSFCLDSTACKITITVVLAVLILITVAGNVVVCLAVGLNRRLRNLTNCFIVSLAITDLLLGLLVLPFSAIYQLSCKWSFGKVFCNIYTSLDVMLCTASILNLFMISLDRYCAVMDPLRYPVLVTPVRVAISLVLIWVISITLSFLSIHLGWNSRNETSKGNHTTSKCKVQVNEVYGLVDGLVTFYLPLLIMCITYYRIFKVARDQAKRINHISSWKAATIREHKATVTLAAVMGAFIICWFPYFTAFVYRGLRGDDAINEVLEAIVLWLGYANSALNPILYAALNRDFRTGYQQLFCCRLANRNSHKTSLRSNASQLSRTQSREPRQQEEKPLKLQVWSGTEVTAPQGATDRSTOP


Now create text files for each of your amino acid sequences.

In [10]:
f = open("Hisprot.txt", "w")
f.write(translation(transcription(HisDNA)))
f.close()

f = open("Kerprot.txt", "w")
f.write(translation(transcription(KerDNA)))
f.close()

f = open("Colprot.txt", "w")
f.write(translation(transcription(ColDNA)))
f.close()

### *Bonus challenge 1: choose a partner to type and one to guide*

Not all DNA encodes for a protein. The molecular machinery inside a cell uses a special codon called a "start" codon to know when to start translating the mRNA into a protein. For most genes this codon is "AUG".

Can you modify your translation function to scan through your RNA sequence until it finds an "AUG" codon before it starts creating an amino acid string?

Copy and paste your translation function below and make any necessary changes.

In [None]:
# Hint: != means "does not equal"

def starttranslation(RNAseq):
    c = 0
    prot = ""
    codon = ""
    for base in RNAseq:
        c += 1
        if c % 3 == 0:
            codon += base
            if codon != "AUG":
                codon = ""
            else:
                if codon == "UUU" or codon == "UUC":
                    prot += "F"
                elif codon == "UUA" or codon == "UUG" or codon == "CUU" or codon == "CUC" or codon == "CUA" or codon == "CUG":
                    prot += "L"
                elif codon == "AUU" or codon == "AUC" or codon == "AUA":
                    prot += "I"
                elif codon == "AUG":
                    prot += "M"
                elif codon == "GUU" or codon == "GUC" or codon == "GUA" or codon == "GUG":
                    prot += "V"
                elif codon == "UCU" or codon == "UCC" or codon == "UCA" or codon == "UCG":
                    prot += "S"
                elif codon == "CCU" or codon == "CCC" or codon == "CCA" or codon == "CCG":
                    prot += "P"
                elif codon == "ACU" or codon == "ACC" or codon == "ACA" or codon == "ACG":
                    prot += "T"
                elif codon == "GCU" or codon == "GCC" or codon == "GCA" or codon == "GCG":
                    prot += "A"
                elif codon == "UAU" or codon == "UAC":
                    prot += "Y"
                elif codon == "UAA" or codon == "UAG" or codon == "UGA":
                    prot += "STOP"
                elif codon == "CAU" or codon == "CAC":
                    prot += "H"
                elif codon == "CAA" or codon == "CAG":
                    prot += "Q"
                elif codon == "AAU" or codon == "AAC":
                    prot += "N"
                elif codon == "AAA" or codon == "AAG":
                    prot += "K"
                elif codon == "GAU" or codon == "GAC":
                    prot += "D"
                elif codon == "GAA" or codon == "GAG":
                    prot += "E"
                elif codon == "UGU" or codon == "UGC":
                    prot += "C"
                elif codon == "UGG":
                    prot += "W"
                elif codon == "CGU" or codon == "CGC" or codon == "CGA" or codon == "CGG":
                    prot += "R"
                elif codon == "AGU" or codon == "AGC":
                    prot += "S"
                elif codon == "AGA" or codon == "AGG":
                    prot += "R"
                elif codon == "GGU" or codon == "GGC" or codon == "GGA" or codon == "GGG":
                    prot += "G"
                else:
                    codon = ""
                codon = ""
                c = 0
        else:
            codon += base
    return prot

### *Bonus challenge 2: switch typing partner and guiding partner*

For this challenge, you will be learning about a new data type: dictionaries.

Dictionaries are kind of like lists, but instead of storing single items they store pairs of items called "values" and "keys". The first item in the pair is the key and the second one is the value.
As you know, lists are defined with []. Dictionaries are defined with {}.

Here's a sample dictionary:
Ages = {"Jane":13, "Maria":15, "Arya":14, "Nicolette":15}
<br>
In this dictionary, a string of each girl's name is the key and an integer of her age is the value.

You can pull out a single value by indexing a key. So print(Ages["Jane"]) will print out 13.

Try to create a new translation function using a dictionary. The code will be much simpler!

In [None]:
def dicttranslate(RNAseq):
    codonDict = {"UUU":"F","UUC":"F","UUA":"L","UUG":"L", "CUU":"L","CUC":"L",
    "CUA":"L","CUG":"L","AUU":"I","AUC":"I","AUA":"I","AUG":"M","GUU":"V",
    "GUC":"V","GUA":"V","GUG":"V","UCU":"S","UCC":"S","UCA":"S","UCG":"S",
    "CCU":"P","CCC":"P","CCA":"P","CCG":"P","ACU":"T","ACC":"T","ACA":"T",
    "ACG":"T","GCU":"A","GCC":"A","GCA":"A","GCG":"A","UAU":"Y","UAC":"Y",
    "UAA":"STOP","UAG":"STOP","UGA":"STOP","CAU":"H","CAC":"H","CAA":"Q",
    "CAG":"Q","AAU":"N","AAC":"N","AAA":"K","AAG":"K","GAU":"D","GAC":"D",
    "GAA":"E","GAG":"E","UGU":"C","UGC":"C","UGG":"W","CGU":"R","CGC":"R",
    "CGA":"R","CGG":"R","AGU":"S","AGC":"S","AGA":"R","AGG":"R","GGU":"G",
    "GGC":"G","GGA":"G","GGG":"G"}
    c = 0
    prot = ""
    codon = ""
    for base in RNAseq:
        c += 1
        if c % 3 == 0:
            codon += base
            prot += codonDict[codon]
            codon = ""
            c = 0
        else:
            codon += base
    return prot