# BioPython
### A bioinformatics toolbox
### Description


### Resources
* [**Official Page**](https://biopython.org/)
* [**Official Tutorial**](http://biopython.org/DIST/docs/tutorial/Tutorial.html)

### Installation

```conda install -c anaconda biopython```

```pip install biopython```

### Features
* Biological file reading (e.g. FASTA, GB, MSA, PDB)


In [17]:
from Bio import Seq

snippet = Seq.Seq('tatggctgtgcaggtcgtaaatcactgcata\
attcgtgtcgctcaaggcgcac')

snippet

Seq('tatggctgtgcaggtcgtaaatcactgcataattcgtgtcgctcaaggcgcac')

In [20]:
print('Complement:\t',snippet.complement())
print('Reverse Complement:\t',snippet.reverse_complement())
print('Transcription:\t',snippet.transcribe())
print('Translation ORF 1:\t',snippet.translate())
print('Translation ORF 2:\t',snippet[1:].translate())

Complement:	 ataccgacacgtccagcatttagtgacgtattaagcacagcgagttccgcgtg
Reverse Complement:	 gtgcgccttgagcgacacgaattatgcagtgatttacgacctgcacagccata
Transcription:	 uauggcugugcaggucguaaaucacugcauaauucgugucgcucaaggcgcac
Translation ORF 1:	 YGCAGRKSLHNSCRSRR
Translation ORF 2:	 MAVQVVNHCIIRVAQGA


In [85]:
from Bio import SeqIO

fasta = SeqIO.parse('pBbe1k-RFP.fasta', format='fasta')
fasta # generator, explain

<generator object FastaIterator at 0x7fb8c2de7dd0>

In [86]:
fasta = list(fasta) # unpack
print(len(fasta)) # how many items
plasmid = fasta[0] # access first item
plasmid

1


SeqRecord(seq=Seq('gacgtcgacaccatcgaatggtgcaaaacctttcgcggtatggcatgatagcgc...tcc', SingleLetterAlphabet()), id='pBbE1k-RFP', name='pBbE1k-RFP', description=' pBbE1k-RFP sequence 4206 bps', dbxrefs=[])

In [87]:
print(plasmid.description)

 pBbE1k-RFP sequence 4206 bps


In [88]:
plasmid_annotated = SeqIO.parse('pBbe1k-RFP.gbk', format = 'gb')
plasmid_annotated = list(plasmid_annotated)[0]
plasmid_annotated

SeqRecord(seq=Seq('GACGTCGACACCATCGAATGGTGCAAAACCTTTCGCGGTATGGCATGATAGCGC...TCC', IUPACAmbiguousDNA()), id='.', name='Exported', description='synthetic circular DNA', dbxrefs=[])

In [91]:
print(plasmid_annotated.description)
print(plasmid_annotated.id)

for i in plasmid_annotated.features:
    print(i.type, i.location)

synthetic circular DNA
.
source [0:4206](+)
promoter [6:84](+)
CDS [84:1167](+)
primer_bind [103:123](-)
promoter [1398:1428](+)
protein_bind [1435:1452](+)
CDS [1492:2170](+)
terminator [2202:2274](+)
terminator [2289:2317](+)
rep_origin [2474:3063](-)
primer_bind [2554:2574](-)
terminator [3150:3245](+)
CDS [3275:4070](-)
primer_bind [3386:3406](-)
primer_bind [3996:4016](+)


In [101]:
cds1 = plasmid_annotated[84:1167]
cds2 = plasmid_annotated[1492:2170]
cds3 = plasmid_annotated[3275:4070].reverse_complement()

In [108]:
unkownProtein = cds1.translate() 
unkownProtein += 'GSGHHHHH' # add hist tag
unkownProtein.

SeqRecord(seq=Seq('VKPVTLYDVAEYAGVSYQTVSRVVNQASHVSAKTREKVEAAMAELNYIPNRVAQ...HHH', HasStopCodon(ExtendedIUPACProtein(), '*')), id='<unknown id>', name='<unknown name>', description='<unknown description>', dbxrefs=[])