### Getting ready
Download a FASTA sequence of the human lactase gene from the previous recipe using Entrez research interface:

In [1]:
from Bio import Entrez, Seq, SeqIO
from Bio.Alphabet import IUPAC

In [2]:
Entrez.email = "kakyung.kim@gmail.com" 
hdl = Entrez.efetch(db='nucleotide', id=['NM_002299'], rettype='fasta')  # Lactase gene
#for l in hdl:
#    print l
seq = SeqIO.read(hdl, 'fasta')

### How to do it…
#### 1.Save the sequence of interes to a FASTA file on our local disk:

In [3]:
w_seq = seq[11:5795]
w_seq

SeqRecord(seq=Seq('ATGGAGCTGTCTTGGCATGTAGTCTTTATTGCCCTGCTAAGTTTTTCATGCTGG...TGA', SingleLetterAlphabet()), id='gi|32481205|ref|NM_002299.2|', name='gi|32481205|ref|NM_002299.2|', description='gi|32481205|ref|NM_002299.2| Homo sapiens lactase (LCT), mRNA', dbxrefs=[])

In [4]:
w_hdl = open('example.fasta', 'w')
SeqIO.write([w_seq], w_hdl, 'fasta') #SeqIO.write function takes a list of sequences to write => not for many sequences
w_hdl.close()

#### 2.In most situations, you will actually have the sequence on the disk, so you will be interested in reading it:

In [5]:
recs = SeqIO.parse('example.fasta', 'fasta')
for rec in recs: #standard iteration for multiple records.
    print(type(rec))
    seq = rec.seq
    print(rec.description)
    print(seq[:10])
    print(seq.alphabet)

<class 'Bio.SeqRecord.SeqRecord'>
gi|32481205|ref|NM_002299.2| Homo sapiens lactase (LCT), mRNA
ATGGAGCTGT
SingleLetterAlphabet()


#### 3.We will now change the alphabet of our sequence:

In [6]:
seq = Seq.Seq(str(seq), IUPAC.unambiguous_dna)
seq

Seq('ATGGAGCTGTCTTGGCATGTAGTCTTTATTGCCCTGCTAAGTTTTTCATGCTGG...TGA', IUPACUnambiguousDNA())

#### 4.As we now have an unambiguous DNA, we can transcribe it as follows:

In [7]:
print((seq[:12], seq[-12:]))
rna = seq.transcribe()
rna

(Seq('ATGGAGCTGTCT', IUPACUnambiguousDNA()), Seq('TCTTCATTCTGA', IUPACUnambiguousDNA()))


Seq('AUGGAGCUGUCUUGGCAUGUAGUCUUUAUUGCCCUGCUAAGUUUUUCAUGCUGG...UGA', IUPACUnambiguousRNA())

#### 5.Finally, we can translate our gene into a protein:
Now, we have a protein alphabet with the annotation that there is a stop codon (so, our
protein is complete).

In [8]:
prot = seq.translate()
prot

Seq('MELSWHVVFIALLSFSCWGSDWESDRNFISTAGPLTNDLLHNLSGLLGDQSSNF...SF*', HasStopCodon(IUPACProtein(), '*'))

### There's more…
few points that you should be aware of:
- When you perform an RNA translation to get your protein, be sure to use the correct
genetic code.
- Biopython's Seq object is much more flexible than is shown here. For some good
examples, refer to the Biopython tutorial. However, this recipe will be enough
for the work we need to do with FASTQ files (see the next recipe).
- To deal with strand-related issues there are, sequence functions like reverse_complement.

### See also
- Genetic codes known to Biopython are the ones specified by NCBI http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi
- Biopython tutorial http://biopython.org/DIST/docs/tutorial/Tutorial.html
- Biopython SeqIO page http://biopython.org/wiki/SeqIO http://biopython.org/static/DIST/docs/_api_163/Bio.SeqIO-module.html