Collection of handy scripts for performing a protein cluster-restricted BLAST search. Cluster-restricted BLASTp is much faster than the standard implementation of BLASTp for medium to large and XXL databases -- with results that are 99% identical to standard blast.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
getting_started
images
manuscript
scripts
LICENSE
README.md
rubble.pl

README.md

Alt text

Welcome to RUBBLE, a pipeline that enables you to perform BLAST searches 10-20X faster, without compromising your results -- precision = 98% (+/-2%) ; recall = 98% (+/- 2%).

RUBBLE is most useful when your subject BLAST database is large (e.g. UniRef100).

  1. Downloading RUBBLE

To download, simply clone the RUBBLE repository from GitHub:

git clone git@github.com:dnasko/rubble

And RUBBLE will be cloned to your working directory.

  1. Installing RUBBLE and its Dependencies

Once you have cloned the repository you should see 3 files and 3 direcotries:

  • LICENSE the GPL version 2.
  • README.md this read me!
  • ./getting_started a directory containing some additional information to help you get started.
  • ./images a drectory with images, logos, etc. No need to worry about any of this.
  • rubble.pl a symbolic link to the rubble.pl script. Let's you run RUBBLE after you have databases built.
  • ./scripts the scripts directory, which has all of the important bits.

RUBBLE has one external dependency, and it's NCBI BLAST+. Before you can run this pipeline you will need to be sure that all executables (especially blastp, makeblastdb, and blastdbcmd) are installed on your machine and included in your PATH. Latest versions of BLAST binaries are located here.

Perl modules needed: threads, which is likely not installed on most systesm by default. Can be installed very easily (with admin privileges) via cpan minus:

sudo cpanm threads

  1. Using RUBBLE

Before you can BLAST a set of query sequences against a set of subject sequences you must create an indexed database. The same is true for RUBBBLE, but with an additional requirement. Not only do we need a BLAST database of your subject sequences, we need a BLAST database of your clustered subject sequences. Below I will breifly detail how RUBBLE databases are created and then how RUBBLE can be run.

Creating RUBBLE databases

Acknowledgements

Support from the University of Delaware Center for Bioinformatics and Computational Biology Core Facility and use of the BIOMIX compute cluster was made possible through funding from Delaware INBRE (NIGMS GM103446) and the Delaware Biotechnology Institute.