# Transcribe and translate genetic information with Python!

In this notebook you will learn how to write code that takes a DNA sequence and outputs the corresponding RNA sequence (transcription) and code that takes an RNA sequence and outputs the corresponding peptide sequence (translation).

### *Pair programming: Make sure the typing partner is typing and the guiding partner is guiding on the same computer!*

In the "Central Dogma" game you only transcribed and translated the first 10 codons of each protein. For this coding exercise you will use the entire DNA sequences (400-1,000 base pairs!). 
<br>
<br>
The sequences are saved as text files so you will need to create a variable for each gene and use the open() function to assign each sequence to a variable. 

In [5]:
# Create a variable for each gene and read that file into the variable as a string
# You will need the open() function and the .read() method

HisDNA = open("Histamine_receptor_DNA.txt").read()

# Now open and read the sequence files for keratin and collagen:

kerDNA = open("Keratin_DNA.txt").read()
colDNA = open("Collagen_DNA.txt").read()

Print out the data stored in the HisDNA variable:

In [6]:
print(HisDNA)
print(len(HisDNA))

TACCGTGGGTTACCGTGTCGGAGAAGGAAAACGGACCTGAGATGGCGTACGTTCTAGTGGTAGTGGCACCAGGAACGCCAGGAGTAGGAGTAGTGGCAACGACCGTTACACCAGCAGACAGACCGGCACCCGAACTTGGCGGCCGAGGCGTTGGACTGGTTAACAAAGTAGCACAGGAACCGATAGTGACTGGACGAGGAGCCGGAGGACCACGACGGGAAGAGACGGTAGATGGTCGACAGGACGTTCACCTCGAAACCGTTCCAGAAGACGTTATAGATGTGGTCGGACCTACACTACGAGACGTGTCGGAGGTAAGAATTGGAGAAGTACTAGTCGGAGCTGGCCATGACGCGACAGTACCTGGGTGACGCCATGGGACACGACCAGTGGGGTCAAGCCCAGCGGTAGAGAGACCAGAATTAAACCCAGTAGAGGTAATGGGACAGGAAAGACAGATAGGTGGACCCCACCTTGTCGTCCTTGCTCTGGTCGTTCCCGTTAGTATGGTGGAGATTCACGTTTCAGGTCCAGTTACTTCACATGCCCGACCACCTACCCGACCAGTGGAAGATGGAGGGCGATGACTAGTACACGTAGTGGATGATGGCGTAGAAGTTCCAGCGGGCCCTAGTCCGGTTCTCCTAGTTAGTGTAATCGAGGACCTTCCGTCGGTGGTAGTCCCTCGTGTTTCGGTGTCACTGTGACCGGCGGCAGTACCCCCGGAAGTAGTAGACGACCAAAGGGATGAAGTGGCGCAAACACATGGCACCCGACTCTCCCCTACTACGGTAGTTACTCCACAATCTTCGGTAGCAAGACACCGACCCGATACGGTTGAGTCGGGACTTGGGGTAGGACATACGACGCGACTTGTCTCTGAAGGCGTGGCCCATGGTTGTCGAGAAGACGACGTCCGACCGGTTGGCGTTGAGGGTGTTTTGAAGAGACTCCAGGTTGCGGAGAGTCGACAGGTCCTGGGTTTCGGCTCTTGGGTCCG

Print out the datatype of this variable:

In [7]:
print(type(HisDNA))

<class 'str'>


Now that we've assigned our sequences to variables, we can get started on the code to transcribe that DNA sequence into an RNA sequence. 

**Store the histamine receptor RNA sequence in its own variable and print it out to check your results**

Remember, A -> U, T -> A, G -> C, and C -> G

In [9]:
# Hint: You will need a "for loop" and conditionals. 
# You will also need an empty string

HisRNA = ""

for base in HisDNA:
    if base == "A":
        HisRNA += "U"
    elif base == "T":
        HisRNA += "A"
    elif base == "G":
        HisRNA += "C"
    elif base == "C":
        HisRNA += "G"

print(HisRNA)

AUGGCACCCAAUGGCACAGCCUCUUCCUUUUGCCUGGACUCUACCGCAUGCAAGAUCACCAUCACCGUGGUCCUUGCGGUCCUCAUCCUCAUCACCGUUGCUGGCAAUGUGGUCGUCUGUCUGGCCGUGGGCUUGAACCGCCGGCUCCGCAACCUGACCAAUUGUUUCAUCGUGUCCUUGGCUAUCACUGACCUGCUCCUCGGCCUCCUGGUGCUGCCCUUCUCUGCCAUCUACCAGCUGUCCUGCAAGUGGAGCUUUGGCAAGGUCUUCUGCAAUAUCUACACCAGCCUGGAUGUGAUGCUCUGCACAGCCUCCAUUCUUAACCUCUUCAUGAUCAGCCUCGACCGGUACUGCGCUGUCAUGGACCCACUGCGGUACCCUGUGCUGGUCACCCCAGUUCGGGUCGCCAUCUCUCUGGUCUUAAUUUGGGUCAUCUCCAUUACCCUGUCCUUUCUGUCUAUCCACCUGGGGUGGAACAGCAGGAACGAGACCAGCAAGGGCAAUCAUACCACCUCUAAGUGCAAAGUCCAGGUCAAUGAAGUGUACGGGCUGGUGGAUGGGCUGGUCACCUUCUACCUCCCGCUACUGAUCAUGUGCAUCACCUACUACCGCAUCUUCAAGGUCGCCCGGGAUCAGGCCAAGAGGAUCAAUCACAUUAGCUCCUGGAAGGCAGCCACCAUCAGGGAGCACAAAGCCACAGUGACACUGGCCGCCGUCAUGGGGGCCUUCAUCAUCUGCUGGUUUCCCUACUUCACCGCGUUUGUGUACCGUGGGCUGAGAGGGGAUGAUGCCAUCAAUGAGGUGUUAGAAGCCAUCGUUCUGUGGCUGGGCUAUGCCAACUCAGCCCUGAACCCCAUCCUGUAUGCUGCGCUGAACAGAGACUUCCGCACCGGGUACCAACAGCUCUUCUGCUGCAGGCUGGCCAACCGCAACUCCCACAAAACUUCUCUGAGGUCCAACGCCUCUCAGCUGUCCAGGACCCAAAGCCGAGAACCCAGGC

### *Pair programming swap: Typing partner becomes the guiding partner and vice versa*

Congratulations! You've transcribed DNA sequences and created mRNA. Now it's time to translate that message and find the amino acid sequence of the protein.
<br>
<br>
Here's a chart to help you out:

<img src="https://archive.manylabs.org/file/lessonMedia/69/geneticCode.png" width="500px" height="500px" align="left" />

**Goal:** You need to write code that takes an RNA sequence and returns the correct amino acid sequence. (Use the one letter code for amino acids (ex. L for Leucine). Stop codons can be represented as "STOP")

**You may find it helpful to write some pseudocode to help you solve this challenge**

Here are some hints:
- you will need a dictionary, a for loop, and conditionals
- you can keep track of where you are in the sequence using a "counter": Define a variable as 0 and then add to it in the loop.
- x % y finds the remainder of x divided by y

Create your dictionary first:

In [10]:
codon_dict = {"UUU":"F","UUC":"F","UUA":"L","UUG":"L", "CUU":"L","CUC":"L", 
             "CUA":"L","CUG":"L","AUU":"I","AUC":"I","AUA":"I","AUG":"M",
             "GUU":"V","GUC":"V","GUA":"V","GUG":"V","UCU":"S","UCC":"S",
             "UCA":"S","UCG":"S","CCU":"P","CCC":"P","CCA":"P","CCG":"P",
             "ACU":"T","ACC":"T","ACA":"T","ACG":"T","GCU":"A","GCC":"A",
             "GCA":"A","GCG":"A","UAU":"Y","UAC":"Y","UAA":"STOP","UAG":"STOP",
             "UGA":"STOP","CAU":"H","CAC":"H","CAA":"Q","CAG":"Q","AAU":"N",
             "AAC":"N","AAA":"K","AAG":"K","GAU":"D","GAC":"D","GAA":"E",
             "GAG":"E","UGU":"C","UGC":"C","UGG":"W","CGU":"R","CGC":"R",
             "CGA":"R","CGG":"R","AGU":"S","AGC":"S","AGA":"R","AGG":"R",
             "GGU":"G","GGC":"G","GGA":"G","GGG":"G"}

Now write the rest of your code:

In [11]:
aa = ""
codon = ""

for i in HisRNA:
    if len(codon) == 3:
        aa += codon_dict[codon] # Assigns letter to a codon and puts it in an empty string 
        codon = "" # Clears old codon in order to start another loop of adding the different letters
        codon += i # Adds the different letters (A, U, C, G) to codon 
    else:
        codon += i 
        
print(aa)

MAPNGTASSFCLDSTACKITITVVLAVLILITVAGNVVVCLAVGLNRRLRNLTNCFIVSLAITDLLLGLLVLPFSAIYQLSCKWSFGKVFCNIYTSLDVMLCTASILNLFMISLDRYCAVMDPLRYPVLVTPVRVAISLVLIWVISITLSFLSIHLGWNSRNETSKGNHTTSKCKVQVNEVYGLVDGLVTFYLPLLIMCITYYRIFKVARDQAKRINHISSWKAATIREHKATVTLAAVMGAFIICWFPYFTAFVYRGLRGDDAINEVLEAIVLWLGYANSALNPILYAALNRDFRTGYQQLFCCRLANRNSHKTSLRSNASQLSRTQSREPRQQEEKPLKLQVWSGTEVTAPQGATDR


### What about the other genes?
Write code that prints out the RNA and protein sequence for keratin and collagen:

# Make Functions for Translation and Transcription

### How to Make a New Function

#### 1. "def"
#### 2. Function name 
#### 3. Arguments in parentheses
#### 4. Colon
#### 5. Code (Changing/returning/alternating input)
#### 6. Call function

Make a Transcribe Code

In [12]:

def transcribe(DNAseq):
    
    RNA = ""
    
    for base in DNAseq:
        if base == "A":
            RNA += "U"
        elif base == "T":
            RNA += "A"
        elif base == "G":
            RNA += "C"
        elif base == "C":
            RNA += "G"
        else:
            continue
    return RNA

transcribe("ATC")

'UAG'

Make a Translation code

In [14]:
def translate(RNAseq):
    protein = ""
    codon = ""

    codon_dict = {"UUU":"F","UUC":"F","UUA":"L","UUG":"L", "CUU":"L","CUC":"L", 
             "CUA":"L","CUG":"L","AUU":"I","AUC":"I","AUA":"I","AUG":"M",
             "GUU":"V","GUC":"V","GUA":"V","GUG":"V","UCU":"S","UCC":"S",
             "UCA":"S","UCG":"S","CCU":"P","CCC":"P","CCA":"P","CCG":"P",
             "ACU":"T","ACC":"T","ACA":"T","ACG":"T","GCU":"A","GCC":"A",
             "GCA":"A","GCG":"A","UAU":"Y","UAC":"Y","UAA":"STOP","UAG":"STOP",
             "UGA":"STOP","CAU":"H","CAC":"H","CAA":"Q","CAG":"Q","AAU":"N",
             "AAC":"N","AAA":"K","AAG":"K","GAU":"D","GAC":"D","GAA":"E",
             "GAG":"E","UGU":"C","UGC":"C","UGG":"W","CGU":"R","CGC":"R",
             "CGA":"R","CGG":"R","AGU":"S","AGC":"S","AGA":"R","AGG":"R",
             "GGU":"G","GGC":"G","GGA":"G","GGG":"G"}
    
    for i in RNAseq:
        if len(codon) == 3:
            protein += codon_dict[codon] 
            codon = "" 
            codon += i 
        else:
            codon += i 
    return protein
        
translate("AUGGCACCCAAUGGCACAGCCUCUUCCUUUUGCCUGGACUCUACCGCAUGCAAGAUCACCAUC")

'MAPNGTASSFCLDSTACKIT'