# What is Biopython?

The Biopython Project is an international association of developers of freely available Python (http://www.python.org) tools for computational molecular biology. Python is an object oriented, interpreted, flexible language that is becoming increasingly popular for scientific computing. Python is easy to learn, has a very clear syntax and can easily be extended with modules written in C, C++ or FORTRAN.

source : http://biopython.org/DIST/docs/install/Installation.html

## What are the uses of Bio Python?


The main Biopython releases have lots of functionality, including:

1. The ability to parse bioinformatics files into Python utilizable data structures, including support for the following formats:
Blast output – both from standalone and WWW Blast
    * Clustalw
    * FASTA
    * GenBank
    * PubMed and Medline
    * ExPASy files, like Enzyme and Prosite
    * OP, including ‘dom’ and ‘lin’ files
    * UniGen
    * SwissProt
2. Code to deal with popular on-line bioinformatics destinations such as:
 * NCBI – Blast, Entrez and PubMed services
 
3. Tools for performing common operations on sequences, such as translation, transcription and weight calculations.

4. and Much more since Python Rocks


# Installation:

Setup environment and required packages.

refer requirements.txt

# Sequence objects:


In [5]:
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC


my_seq = Seq('AGTACACTGGT',IUPAC.unambiguous_dna)
my_seq


Seq('AGTACACTGGT', IUPACUnambiguousDNA())

In [8]:
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC


my_seq = Seq('AGTACACTGGT',IUPAC.protein)
my_seq

Seq('AGTACACTGGT', IUPACProtein())

In [9]:
my_seq = Seq("GATCG", IUPAC.unambiguous_dna)
for letter in my_seq:
    print letter

G
A
T
C
G


In [10]:
my_seq = Seq("GATCG", IUPAC.unambiguous_dna)
for no, letter in enumerate(my_seq):
    print no, letter

0 G
1 A
2 T
3 C
4 G


In [11]:
print(my_seq[2])

T


In [16]:
print(my_seq[1:3])

AT


In [17]:
print(my_seq[-1])

G


## Playing around with sequences as strings:

In [18]:
'GATCGATGGGCCTATATAGGATCGAAAATCGC'.count("AA")

2

In [20]:
from Bio.Seq import Seq
Seq('GATCGATGGGCCTATATAGGATCGAAAATCGC').count("AA")

2

In [21]:
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
my_seq = Seq('GATCGATGGGCCTATATAGGATCGAAAATCGC', IUPAC.unambiguous_dna)
len(my_seq)

32

In [22]:
my_seq.count("G")

9

### Calculate  GC%

In [23]:
100 * float(my_seq.count("G") + my_seq.count("C")) / len(my_seq)

46.875

In [24]:
from Bio.SeqUtils import GC
