Skip to content


Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Incubator for useful bioinformatics code, primarily in Python and R
Failed to load latest commit information.
align Avoid converting to and from phred format for speed
biopython Coding region coordinate remapping for SNPs
biosql SQLAlchemy definitions for BioSQL; partial implementation
biosql_ontologies Finalized initial version
biostar Supporting code and configuration for BioStar NGS challenge from Pierre
classify Move Dan's scripts to separate project directory in mgh_projects
distblast Remove num_alignments, since it's incompatible (and redundant) with m…
galaxy Initial move of next gen automated analysis into git revision control
gff v0.6.2: update version for Python 3 compatibility fixes #96
keyval_testing Update couchdb test with bulk loading fixed from Paul and Chris
nextgen Add update notice about new bcbio-nextgen repository
papers/bcbio-nextgen Update chapman_bcbio.tex
posters Poster from Luca on bcbio-nextgen for cancer calling
posts cancer validation post: finalized and posted
qualbin Add plot for variant cause changes
rest_apis Move Dan's scripts to separate project directory in mgh_projects
semantic Update query example to match paper text
stats Updated script for bootstrapping R
talks Talk for PGSG
validation Update scalpel results with improved filtering and include suggestion…
visualize Move Dan's scripts to separate project directory in mgh_projects
.gitignore Presentation for Intel Life Sciences tutorial Add update notice about new bcbio-nextgen repository

Collection of useful code related to biological analysis. Much of this is discussed with examples at Blue collar bioinformatics.

Some projects which may be especially interesting:

  • CloudBioLinux -- An automated environment to install useful biological software and libraries. This is used to bootstrap blank machines, such as those you'd find on Cloud providers like Amazon, to ready to go analysis workstations. See the CloudBioLinux effort for more details. This project moved to its own repository at
  • gff -- A GFF parsing library in Python, aimed for inclusion into Biopython.
  • nextgen -- A python toolkit providing best-practice pipelines for fully automated high throughput sequencing analysis. This project has moved into its own repository:
  • distblast -- A distributed BLAST analysis running for identifying best hits in a wide variety of organisms for downstream phylogenetic analyses. The code is generalized to run on local multi-processor and distributed Hadoop clusters.
Something went wrong with that request. Please try again.