# Week 3 - Implementing the tasks from week 2

In this workshop, we will be revisiting the transcription and translation activity from in last week. We will be writing code to perform all the tasks we completed by hand in week 2. The workshop has two parts: implementing methods from scratch, and using existing implementations and data structures to store and process sequences.

If you need to review coding in python or the use of jupyter notebooks, there are guides to help you in the Modules tab on the LMS.

## Task 1 - Computing the reverse complement

Here, we will write a script to determine the reverse complement of a given sequence. We begin by creating a dictionary of mappings.

In [None]:
complement_dict = {'A': 'T', 'T': 'A', 'C': 'G', 'G': 'C'}
complement_dict['C']

In [None]:
dna_seq = 'GATCTTCGGGTCTAGTTCAGGTTAACC'
complement_seq = ''

for base in dna_seq:
    complement_seq += complement_dict[base]
    
print(complement_seq)

In [None]:
# Note: we do not modify the original DNA sequence. This allows it to be reused in other places.
dna_seq

The above script can be written up as a function thereby making it reusable

In [None]:
def complement(seq):
    """
    Compute the reverse complement of a given DNA sequence
    """
    
    complement_dict = {'A': 'T', 'T': 'A', 'C': 'G', 'G': 'C'}
    complement_seq = ''

    for nt in seq:
        complement_seq += complement_dict[nt]
    return complement_seq

In [None]:
print(complement('AAAAA')) # should give 'TTTTT'
print(complement(dna_seq))

## Task 2 - Transcribing a DNA sequence

Here, we trancribe a DNA sequence into an RNA-sequence. Write a function to transcribe a given DNA sequence

In [None]:
def transcribe(dna):
    """
    Compute the transcript resulting from a DNA sequence
    """
    
    # put your code here
    

In [None]:
print(transcribe('ATAT')) # should give 'AUAU'
print(transcribe('ATGCCCCAACTAAATACTACCGTATGGCCCACCATAATTACC'))

## Task 3 - Translate a DNA sequence

As with task 1, we will be needing a dictionary to help us map codons to their respective amino acids. We first form the dictionary using information provided in lab 1

In [None]:
# Note: * represents the stop codon and M the start codon
base1 = 'TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG'
base2 = 'TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG'
base3 = 'TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG'
aa = 'FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG'

codon_map = {} # build a codon map using this dictionary
#your code here
codon_map

Now, use your dictionary to get the amino acid sequence for the first reading frame (no offset on the sequence). You can use the `dict.get` function to return default values if the keys do not exist in the dictionary

In [None]:
def translate(dna, codon_dict):
    """
    Translate a DNA sequence from the first reading frame, given a codon mapping dictionary
    Codons are keys and amino acids are values in this dictionary
    """
    
    #your code here
    

In [None]:
dna_seq = 'ACTATTAAACCCATATAACCTCCCCCAAAATTCAGAATAATAAC'
print(translate('ATGATGA', codon_map)) # should give MM or MMX where X represents an incomplete codon
print(translate(dna_seq, codon_map))

Now, write a function that uses the above function to get the amino acid sequence of all 6 reading frames. Note: three reading frames will be from the reverse complement strand. You may use the `dna_seq[::-1]` to reverse a sequence this. This is a shorter way to write `dna_seq[44::-1]` which means start at position 44, go all the way to the end (position 0 inclusive) and move with a step -1 (step backwards).

In [None]:
dna_seq[::-1]

In [None]:
dna_seq[44::-1]

In [None]:
def six_rfs(dna, codon_dict):
    """
    Get the amino acid sequence from all six reading frames of a sequence.
    This function should use the translate function implemented earlier
    Return the result as a list of size 6
    """
    
    #your code here
    

In [None]:
# should give: (with or without the X)
# 'TIKPI*PPPKFRIIX'
# 'LLNPYNLPQNSE**X'
# 'Y*THITSPKIQNNN'
# 'VIILNFGGGYMGLIX'
# 'LLF*ILGEVIWV**X'
# 'YYSEFWGRLYGFNS'

six_rfs(dna_seq, codon_map)

# Task 4 - using the scikit-bio library to manage sequences

All of the above tasks can be performed using functions in the scikit-bio library. It provides functions to read and parse some popular file formats, and functions to store and modify sequences.

It is already installed for you on the lab server.

To install it on your personal computer, use the command:
conda install -c conda-forge scikit-bio

In [None]:
#import the library
import skbio

scikit-bio, like many python libraries, uses an object oriented programming paradigm. As an example, a DNA string is treated as an object. All objects have properties and behaviours. Properties could be metadata such as the sequence ID of a DNA sequence or its quality. Behaviours could be getting the transcribing or translating the sequence. Properties and behaviours are referred to as *attributes* and *methods* in python.

In [None]:
dna_seq = skbio.sequence.DNA('ACTATTAAACCCATATAACCTCCCCCAAAATTCAGAATAATAAC')
dna_seq

In [None]:
# the alphabet used to encode a DNA sequence is an attribute of the DNA object from skbio
dna_seq.alphabet

# Task 5 - Behaviours/Methods of a DNA object

We now load the sequence of the dnaA gene from a fasta file in the data folder using the `skbio.io.read` function. Type `?skbio.io.read` in a code cell to get to the help page of this function.

In [None]:
dnaA = skbio.io.read('data/dnaA.fa', format = 'fasta', into = skbio.sequence.DNA)
dnaA

The above DNA object holds attributes such as a description and an ID. We can get the complement of this sequence, transcribe it and translate it using functions from the scikit-bio library. For more information on all the functions and classes (DNA, RNA, etc.) the library provides, read the documentation page http://scikit-bio.org/docs/0.5.1/index.html.

In [None]:
dnaA.complement()

In [None]:
dnaA.transcribe()

In [None]:
dnaA.translate()

In [None]:
list(dnaA.translate_six_frames())

Thank you to Dr. Dieter Bulach and Dharmesh Bhuva for developing the tutorial material. Updated by Steven Morgan.