# Lab 7: BioPython

This script reads the MAOA genbank file that we have seen previously in Lab 5. Here, you will use BioPython to answer some of the questions you had answered in the previous lab. Biopython must be used to answer the questions and output the answers where indicated at the end of this Notebook (you may submit just the Python script (.py file) rather than a Notebook if you prefer.

In [1]:
from Bio import SeqIO # to parse Seq data
#åfrom Bio.Seq import Seq # for Seq

with open('MAOA.gb.txt') as handle :

    # gets an iterator that allows you to go through each entry
    sequence_iter = SeqIO.parse(handle, "genbank")

    # gets next sequence (which is the 1st one here)
    seq_record = next(sequence_iter)

    print('Accession #: ', seq_record.id)
    print('Description: ', seq_record.description)
    print('seq length = ', len(seq_record))

Accession #:  NG_008957.2
Description:  Homo sapiens monoamine oxidase A (MAOA), RefSeqGene on chromosome X
seq length =  97661


In [2]:
# loop through each of the features
for feature in seq_record.features :

  # print out chromosome
  if feature.type == 'source':
    print('=================== Chromosome ==================')
    print(feature.qualifiers['chromosome'][0])
    chromosome = feature.qualifiers['chromosome'][0]
    print() 
 
  # print out gene feature
  if feature.type == 'gene':
    print('=================== Gene ==================')
    print(feature)

  # print out exon 13 (this can be used to help count the exons)
  if feature.type == 'exon': 
    if feature.qualifiers['number'][0] == '13':
      print('=================== Exon 13 ==================')
      print(feature)
      print()
    if feature.qualifiers['number'][0] == '15':
        exons = feature.qualifiers['number'][0] # 3

  # print out CDS location 
  if feature.type == 'CDS': 
    print('=================== CDS location ==================')
    end_cds_position = feature.location.parts[-1] 
    print("CDS end position:", end_cds_position) 
    print()   

X

type: gene
location: [3746:95664](+)
qualifiers:
    Key: db_xref, Value: ['GeneID:4128', 'HGNC:HGNC:6833', 'MIM:309850']
    Key: gene, Value: ['MAOA']
    Key: gene_synonym, Value: ['BRNRS; MAO-A']
    Key: note, Value: ['monoamine oxidase A']

CDS end position: [93206:93353](+)

type: exon
location: [92633:92745](+)
qualifiers:
    Key: gene, Value: ['MAOA']
    Key: gene_synonym, Value: ['BRNRS; MAO-A']
    Key: inference, Value: ['alignment:Splign:2.0.8']
    Key: number, Value: ['13']




Output the answers to the questions below, and answer the questions 
using the appropriate python code to analyze the MAOA GenBank entry. 
You must use python to analyze the GenBank entry, and your code must be
generic so that it will work for any GenBank entry that contains 
information for a single gene. Notes: for (5), use python slicing to 
extract the nucleotides from the seq_record sequence object. This will 
give you a Bio.Seq object. Then use the translate method of the 
Bio.Seq object to translate the sequence.
In answering the questions, you may assume that all exons in this entry
are for the gene MAOA (this is true here, but some GenBank entries will contain multiple genes), and that the last part of the CDS contains at least 3 codons.

For example, to answer (1), note that the *source* feature contains the chromosome, which is accessed via `feature.qualifiers['chromosome'][0]`. Save this value to a variable, and then output its value in (1) below.

In [3]:
DNA = seq_record.seq
nucleotides = DNA[:5] # 2
rna = DNA.transcribe()
protein = rna[-9:] # 4
codons = protein.translate()

print('1. The gene is on chromosome:', chromosome)
print('2. The first 5 nucleotides are:', nucleotides)
print('3. The number of exons contained is:', exons)
print('4. The last 3 codons of the protein are:', protein)
print('5. The last 3 codons code for:', codons) 

1. The gene is on chromosome: X
2. The first 5 nucleotides are: TAAAC
3. The number of exons contained is: 15
4. The last 3 codons of the protein are: AAUGAUGUA
5. The last 3 codons code for: NDV
