Biotite project

Biotite is your Swiss army knife for bioinformatics. Whether you want to identify homologous sequence regions in a protein family or you would like to find disulfide bonds in a protein structure: Biotite has the right tool for you. This package bundles popular tasks in computational molecular biology into a uniform Python library. It can handle a major part of the typical workflow for sequence and biomolecular structure data:

Searching and fetching data from biological databases

Reading and writing popular sequence/structure file formats

Analyzing and editing sequence/structure data

Visualizing sequence/structure data

Interfacing external applications for further analysis

Biotite internally stores most of the data as NumPy ndarray objects, enabling

fast C-accelerated analysis,

intuitive usability through NumPy-like indexing syntax,

extensibility through direct access of the internal NumPy arrays.

As a result the user can skip writing code for basic functionality (like file parsers) and can focus on what their code makes unique - from small analysis scripts to entire bioinformatics software packages.

If you use Biotite in a scientific publication, please cite:

Kunzmann, P. & Hamacher, K. BMC Bioinformatics (2018) 19:346.
https://doi.org/10.1186/s12859-018-2367-z

Installation

Biotite requires the following packages:

numpy

requests

msgpack

networkx

Some functions require some extra packages:

mdtraj - Required for trajetory file I/O operations.

matplotlib - Required for plotting purposes.

Biotite can be installed via Conda...

$ conda install -c conda-forge biotite

... or pip

$ pip install biotite

Usage

Here is a small example that downloads two protein sequences from the NCBI Entrez database and aligns them:

import biotite.sequence.align as align
import biotite.sequence.io.fasta as fasta
import biotite.database.entrez as entrez

# Download FASTA file for the sequences of avidin and streptavidin
file_name = entrez.fetch_single_file(
    uids=["CAC34569", "ACL82594"], file_name="sequences.fasta",
    db_name="protein", ret_type="fasta"
)

# Parse the downloaded FASTA file
# and create 'ProteinSequence' objects from it
fasta_file = fasta.FastaFile.read(file_name)
avidin_seq, streptavidin_seq = fasta.get_sequences(fasta_file).values()

# Align sequences using the BLOSUM62 matrix with affine gap penalty
matrix = align.SubstitutionMatrix.std_protein_matrix()
alignments = align.align_optimal(
    avidin_seq, streptavidin_seq, matrix,
    gap_penalty=(-10, -1), terminal_penalty=False
)
print(alignments[0])

MVHATSPLLLLLLLSLALVAPGLSAR------KCSLTGKWDNDLGSNMTIGAVNSKGEFTGTYTTAV-TA
-------------------DPSKESKAQAAVAEAGITGTWYNQLGSTFIVTA-NPDGSLTGTYESAVGNA

TSNEIKESPLHGTQNTINKRTQPTFGFTVNWKFS----ESTTVFTGQCFIDRNGKEV-LKTMWLLRSSVN
ESRYVLTGRYDSTPATDGSGT--ALGWTVAWKNNYRNAHSATTWSGQYV---GGAEARINTQWLLTSGTT

DIGDDWKATRVGINIFTRLRTQKE---------------------
-AANAWKSTLVGHDTFTKVKPSAASIDAAKKAGVNNGNPLDAVQQ

More documentation, including a tutorial, an example gallery and the API reference is available at https://www.biotite-python.org/.

Contribution

Interested in improving Biotite? Have a look at the contribution guidelines. Feel free to join or community chat on Discord.

Name		Name	Last commit message	Last commit date
Latest commit History 4,365 Commits
.github/workflows		.github/workflows
doc		doc
src/biotite		src/biotite
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
CONTRIB.rst		CONTRIB.rst
LICENSE.rst		LICENSE.rst
MANIFEST.in		MANIFEST.in
README.rst		README.rst
environment.yml		environment.yml
pyproject.toml		pyproject.toml
setup.py		setup.py
setup_ccd.py		setup_ccd.py

License

biotite-dev/biotite

Folders and files

Latest commit

History

Repository files navigation

Biotite project

Installation

Usage

Contribution

About

Topics

Resources

License

Stars

Watchers

Forks

Languages