# Day 10 Lab

*Adapted from Bioinformatics Algorithms Chapter 4.*

## Part A - Central Dogma of Molecular Biology 

#### (1) Write a function that will take a sequence of DNA and transcribe it to RNA.

In [2]:
from Bio.Seq import Seq

def dnaToRna(seqDNA):
    seqRNA = Seq("")
    for i in range(len(seqDNA)):
        if seqDNA[i] == 'T':
            seqRNA+='U'
        else:
            seqRNA+=seqDNA[i]
    return seqRNA
print(dnaToRna(Seq("TTAGACCT")))

UUAGACCU


#### (2) Write a function that will translate an RNA string into an amino acid string. 

`Translate("AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA")` should yield `"MAMAPRTEINSTRING"`

You can use the codon map below.

In [None]:
{"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
    "UCU":"S", "UCC":"S", "UCA":"S", "UCG":"S",
    "UAU":"Y", "UAC":"Y", "UAA":"STOP", "UAG":"STOP",
    "UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",
    "CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
    "CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
    "CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
    "CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
    "AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
    "ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
    "AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
    "AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
    "GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
    "GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
    "GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
    "GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G"}

In [3]:
def rnaToAminoAcid(seqRNA):
    seqAminoAcid = Seq("")
    dict = {"UUU":"F", "UUC":"F", "UUA":"L", "UUG":"L",
    "UCU":"S", "UCC":"S", "UCA":"S", "UCG":"S",
    "UAU":"Y", "UAC":"Y", "UAA":"STOP", "UAG":"STOP",
    "UGU":"C", "UGC":"C", "UGA":"STOP", "UGG":"W",
    "CUU":"L", "CUC":"L", "CUA":"L", "CUG":"L",
    "CCU":"P", "CCC":"P", "CCA":"P", "CCG":"P",
    "CAU":"H", "CAC":"H", "CAA":"Q", "CAG":"Q",
    "CGU":"R", "CGC":"R", "CGA":"R", "CGG":"R",
    "AUU":"I", "AUC":"I", "AUA":"I", "AUG":"M",
    "ACU":"T", "ACC":"T", "ACA":"T", "ACG":"T",
    "AAU":"N", "AAC":"N", "AAA":"K", "AAG":"K",
    "AGU":"S", "AGC":"S", "AGA":"R", "AGG":"R",
    "GUU":"V", "GUC":"V", "GUA":"V", "GUG":"V",
    "GCU":"A", "GCC":"A", "GCA":"A", "GCG":"A",
    "GAU":"D", "GAC":"D", "GAA":"E", "GAG":"E",
    "GGU":"G", "GGC":"G", "GGA":"G", "GGG":"G"}
    i = 0
    while i<len(seqRNA):
        temp = dict[seqRNA[i:i+3]]
        if temp == "STOP":
            return seqAminoAcid
        else:
            seqAminoAcid+=temp
            i+=3
    return seqAminoAcid
string = "AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA"
print(rnaToAminoAcid(string))

MAMAPRTEINSTRING


#### (4) Peptide Encoding Problem: Find substrings of a genome encoding a given amino acid sequence.

We say that a DNA string <b>encodes</b>an amino acid Peptide if the RNA string transcribed from either Pattern or its reverse complement Pattern translates into Peptide.

`PeptideEncoding("ATGGCCATGGCCCCCAGAACTGAGATCAATAGTACCCGTATTAACGGGTGA", "MA")` outputs something like `["ATGGCC","GGCCAT", "ATGGCC"]`

*Tip: Iterate through your string of DNA grabbing segments that are 3*len(peptide). transcribe, then translate the segments. If the translation is the same as your peptide, return that value back out. Do the same for the reverse complement of your DNA.

In [21]:
def peptideEncoding(seqDNA: Seq, peptide: Seq):
    length = len(peptide)*3
    finalList = []
    i = 0
    while i<len(seqDNA):
        if i+length>len(seqDNA):
            return finalList
        temp = seqDNA[i:i+length]
        tempReverse = temp.reverse_complement()
        tempRNA = rnaToAminoAcid(dnaToRna(temp))
        tempReverseRNA = rnaToAminoAcid(dnaToRna(tempReverse))
        if tempRNA == peptide:
            finalList.append(temp)
        if tempReverseRNA == peptide:
            finalList.append(temp)
        i+=1
    return finalList
string = Seq("ATGGCCATGGCCCCCAGAACTGAGATCAATAGTACCCGTATTAACGGGTGA")
print(peptideEncoding(string, Seq("MA")))

[Seq('ATGGCC'), Seq('GGCCAT'), Seq('ATGGCC')]


##### (5) Download the *Bacillus brevis* genome from moodle. Search for segments of DNA in the bacteria's genome that encode for Tyrocidine B1. How many such segments exist? 

In [25]:
TyrocidineB1 = 'VKLFPWFNQY'
bacillus = open("Bacillus_brevis.txt", 'r').read().split()
bacillus = "".join(bacillus)
print(peptideEncoding(Seq(bacillus), Seq(TyrocidineB1)))

KeyboardInterrupt: 

***
We were unable to find any 30-mers in the bacterial genome that encode for Tyrocidine B1. 

As it turns out, Tyrocidine is a cyclic peptide.

#### (6) How many different linear representations of Tyrocidine exist? 
*A cyclic peptide has each amino acid joined in a circle. Removing a the bond between a pair of amino acids would make the peptide linear. How many ways are there to achieve this?*

#### (7) Search the *Bacillus brevis* genome for each of the linear representations of Tyrocidine. Did you find any?