# 1. DNA into protein transcribe/Translate methods.

First, we create a new python virtual environment following this instructions:

https://uoa-eresearch.github.io/eresearch-cookbook/recipe/2014/11/20/conda/

We call the new environment `biopython_env`.

Then, from the command line, we activate the environment:

`source activate biopython_env`

Then, we install biopython (http://biopython.org/) from the downloads page (http://biopython.org/wiki/Download). Since we are running Jupyter notebooks using Anaconda, we will use the proper installer (http://biopython.org/wiki/Packages), so from the command line (within the `biopython_env` environment), we will run:

`conda install -c conda-forge biopython`

Then we will launch a Jupyter notebook and select `biopython_env` as kernel. Now, in the latest version of Anaconda, no new kernels will show. The way to fix this is explained here:

https://stackoverflow.com/questions/39604271/conda-environments-not-showing-up-in-jupyter-notebook

which basically means that from the command line in the `biopython_env` environment, we need to install jupyter:

`(biopython_env) $ conda install jupyter`

Now we can deactivate the environment from the command line (`source deactivate biopython_env`) and work from a Jupyter notebook and selecting the kernel we need (`biopython_env`).

In [1]:
import random
import Bio
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC, generic_dna

In [2]:
# Sequence object, which is a DNA fragment.
cdna = Seq("ATGTTACACTCCCGATGA", IUPAC.unambiguous_dna)

In [3]:
cdna

Seq('ATGTTACACTCCCGATGA', IUPACUnambiguousDNA())

In [4]:
mrna = cdna.transcribe()

In [5]:
mrna

Seq('AUGUUACACUCCCGAUGA', IUPACUnambiguousRNA())

In [6]:
protein = mrna.translate()

In [7]:
protein

Seq('MLHSR*', HasStopCodon(IUPACProtein(), '*'))

The aminoacid sequence is `MLHSR` with the stop codon `*`.

# 2. DNA mutation, DNA slicing, DNA concatenation.

In [8]:
# DNA random sequence.
dna1 = Seq(''.join(random.choice('AGTC') for x in range(20)), generic_dna)

In [9]:
dna1

Seq('TCTAACATAAGTGCCTGGGT', DNAAlphabet())

In [10]:
# Slicing DNA and storing the sequence.
slice1 = dna1[0:6]

In [11]:
slice1

Seq('TCTAAC', DNAAlphabet())

In [12]:
slice2 = dna1[11:20]

In [13]:
slice2

Seq('TGCCTGGGT', DNAAlphabet())

In [14]:
# Concatenating the slices to have a new DNA sequence.
dna2 = slice1 + slice2

In [15]:
dna2

Seq('TCTAACTGCCTGGGT', DNAAlphabet())

In [16]:
# How many nucleotides in the new sequence?
len(dna2)

15

In [17]:
# How many Guanines?
dna2.count("G")

4

In [18]:
# Where does the GC content start?
dna2.find("GC")

7

I starts on the 2nd position (remember, we start counting with an index of 0).

In [19]:
# Let's mutate the DNA sequence.
dna3 = dna2.tomutable()

In [20]:
dna3

MutableSeq('TCTAACTGCCTGGGT', DNAAlphabet())

In [21]:
# Let's implement some changes.
# For example, if G is found on 1st position, we change to A to reduce risk of disease.
dna3[1] = 'A'

In [22]:
dna3

MutableSeq('TATAACTGCCTGGGT', DNAAlphabet())

In [23]:
# Changing the 1st 4 basis to G.
dna3[0:3] = 'GGGG'

In [24]:
dna3

MutableSeq('GGGGAACTGCCTGGGT', DNAAlphabet())

In [25]:
# Putting DNA back to an unmutable sequence.
dna4 = dna3.toseq()

In [26]:
dna4

Seq('GGGGAACTGCCTGGGT', DNAAlphabet())