MassBLAST

Command line application to perform BLAST queries from multiple files against different databases at once.

A pre-print of the manuscript describing this application is available at bioRxiv and can be accessed here.

General description of the MassBlast workflow:

Install

Download BLAST+ and MassBlast from the links in the table below
BLAST+ must be installed and available from a command line
- check by running the command: blastn -version
Decompress MassBlast and it is ready to be used using the mass-blast script

Software name	Windows	Mac OS X	Linux
MassBlast downloads	Download	Download	Download
BLAST+ (pre-requirement)	v2.2.30 (32-bit)	v2.6.0	2.6.0

BLAST+ important notes

Must be installed before MassBlast is run
Windows users
1. Must install 32-bit version v2.2.30 of BLAST+
2. In case of an error in the first run:
  - Delete ncbi.ini located at a subdirectory at the AppData folder in the user directory
  - If problem persists, submit an issue.
Mac OS X and Linux users
- MassBlast was tested with version 2.6.0, but it could work with more recent versions (or older and down to v2.2.30)

note: Ruby and all other requirements are included in the package files, it is not necessary to install when using packaged version.

How to use MassBlast?

Place fasta files with queries at db_and_queries/queries folder.
- You can have as many files as needed, see below for an example of a nucleic-acid query
Place blast databases at db_and_queries/db folder.
- Check "How to setup a Blast database for a transcriptome" below for more information on creating a Blast database.
Edit user.yml file to change options and BLAST engine to be used, check user.yml.example for more information.
run mass-blast script (either double click it on Windows or as a command in the command line.

Example of a nucleic-acid query file that could be placed in db_and_queries/queries folder:

>Example01
attgggaatttactgcaactcaaggagaagaaaccctaccagacttttacaaggtgggct
gaggagt
>Example03
attgggaatttactgcaactcaaggagaagaaaccctaccagactttt
>Example02
attgggaatttactgcaactcaaggagaagaaaccctaccagacttttacaaggtgggct
gaggagtatttactgcaactcaaggagaagaaaccctaccagacttttacaaggtggtgg
gcaactcaagcaactcaagcaactcaagcaactcaa

Install and usage (from source code)

We do not recommend installing from source unless you plan to develop MassBlast further. The package available already has all dependencies pre-packaged and is ready to be used.

Requirements:

Ruby interpreter
Bundler gem
rub bundle install at root directory
Options are configurable via config/user.yml file
- Change 'db_parent' and 'query_parent' to specify the parent directories for blast databases and queries
- Change 'dbs' and 'folder_queries' to specify the databases that should be used and which query folders should be crawled
$ ruby script.rb

External data

The test blast database and the taxonomy database are not kept in the git tree anymore, to get this auxiliary data run the command below or call mass-blast via script.rb

$ rake bootstrap.rb

If you need to include it on your code use:

require_relative 'src/download'

ExternalData.download(path_to_db_parent)

How to test it

$  rake spec

Type of BLAST methods available

The method is defined in the file user.yml

BLASTn: Nucleic-acid sequences against a nucleic-acid database
TBLASTn: Protein sequences against a nucleic-acid database (dynamically translated to amino-acid sequences in all six reading frames)
TBLASTx: Nucleic-acid sequences against nucleic-acid database, where both query and database are dynamically translated to amino-acid sequences into all six reading frames

Methods available

All different types have two implemented methods, blast and blast_folders

blast(qfile, db, out_file, query_parent=nil, db_parent=nil)
- qfile: query file path - string
- db: database name - string
- out_file: output file path (can be relative) -string
- query_parent: parent directory of query (optional) - string
- db_parent: parent directory of database (optional) - string

notes: 'qfile' and 'db' arguments can be relative to 'query_parent' and 'db_parent' (respectively).

blast_folders( folders=nil, query_parent=nil, db_parent=nil )
- folders: list of folders (optional) - array of strings
- query_parent: parent directory of folders (optional) - string
- db_parent: parent directory of database (optional) - string

notes: 'folder' argument can be relative to 'query_parent'. All optional parameters must be set in the config.yml file

How to setup a Blast database for a transcriptome

Using makeblastdb command that comes bundled with Blast+

Open the command line in your operating system
Navigate to directory
Go to directory that has the fasta file with the assembly
Run makeblastdb command in that directory
- nucleic-acids database
$ makeblastdb -in <filename> -dbtype nucl -out "<blast_db_new_name>" -title "<blast_db_new_name>"
protein database

$ makeblastdb -in <filename> -dbtype nucl -out "<blast_db_new_name>" -title "<blast_db_new_name>"

note: do to not use spaces in the <blast db new name>

Quickly setup databases

Place the fasta files for the database in db_and_queries/import_dbs directory and run the appropriate script.

You also need to say if it is a nucleic-acid or protein-based fasta file.

For Linux and Mac OS X run the import_fastas.sh script

$ cd db_and_queries/import_dbs
# for nucleic-acid
$ sh import_fastas.sh nucl
# for protein
$ sh import_fastas.sh prot

For Windows run the import_fastas.bat script

$ cd db_and_queries/import_dbs
# for nucleic-acid
$ import_fastas.bat nucl
# for protein
$ import_fastas.bat prot

Related Tools

ORF-Finder: Finds the longest Open Reading Frame from a nucleic-acid sequence
BioRuby: Open source bioinformatics library for Ruby
Gene Extractor: can be used to extract genes from Kegg2 and GenBank using keyword search
MassBlast package bundler: Creates a package that can be easily used in all main Operating Systems without having to install Ruby and any Ruby dependecies

Ackowledgements

MassBlast was developed primarily by André Veríssimo, Jean-Etienne Bassard and Susana Vinga

A pre-print of the manuscript is available at bioRxiv and can be accessed here

This work was supported by:

European Union Framework Program 7, Project BacHBERRY (FP7-613793);
FCT, through IDMEC, under LAETA, projects (UID/EMS/50022/2013);
- Susana Vinga acknowledges support by program Investigador FCT (IF/00653/2012) from FCT, co-funded by the European Social Fund (ESF) through the Operational Program Human Potential (POPH);
- André Veríssimo acknowledges support from FCT (SFRH/BD/97415/2013).

We would like to thank Cathie Martin and Philippe Vain for reading the manuscript and providing us with important comments and insights. We would also like to thank Aldo Ricardo Almeida Robles and Nuno Mira for testing MassBlast.

Name		Name	Last commit message	Last commit date
Latest commit History 231 Commits
config		config
db_and_queries		db_and_queries
docs		docs
output		output
src		src
test		test
vendor/bundle		vendor/bundle
.gitignore		.gitignore
.rspec		.rspec
Gemfile		Gemfile
LICENSE		LICENSE
README.md		README.md
rakefile.rb		rakefile.rb
script.rb		script.rb
user.yml		user.yml

License

averissimo/mass-blast

Folders and files

Latest commit

History

Repository files navigation

MassBLAST

Install

How to use MassBlast?

Install and usage (from source code)

External data

How to test it

Type of BLAST methods available

Methods available

How to setup a Blast database for a transcriptome

Quickly setup databases

Related Tools

Ackowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Languages