Tools for working with SAM/BAM/CRAM data
Clone or download
Permalink
Failed to load latest commit information.
.github Update issue_template.md Apr 17, 2017
BioD @ c865ee2 Align BioD Sep 13, 2018
cram Conflicting cram_content_type enums with ldc 1.11. Sep 13, 2018
deb fix #79; bump version Jun 20, 2014
doc Architecture doc and started on bgzf Nov 22, 2017
etc/bash_completion.d ditto Dec 1, 2013
htslib @ 2f3c3ea update to latest htslib Apr 2, 2015
lz4 @ b3692db Reintroducing lz4 because Debian recently dropped lz4frames.h and rel… Jul 30, 2018
man fix formatting Oct 24, 2016
sambamba Updated version and changelog Sep 10, 2018
scripts Move main.d and randomize_bases.d out of root Jan 1, 2018
test Updated ldc compiler to 1.11 Sep 13, 2018
thirdparty improved sort performance (needs testing) Nov 16, 2013
undeaD @ 14540c7 Reintroducing lz4 because Debian recently dropped lz4frames.h and rel… Jul 30, 2018
utils Updated version and changelog Sep 10, 2018
.gitignore Dropped BioD and lz4 dirs. Moved bio2 into BioD May 13, 2018
.gitmodules Align with BioD repo Sep 13, 2018
.travis.yml Travis: disable tests because shunit2 is not same Jul 31, 2018
.travis.yml.macos Travis: disable for now Nov 20, 2017
INSTALL.md Makefile: add BIOD_PATH. Nov 24, 2017
LICENSE license under GPL v2+ Aug 6, 2012
Makefile Travis: trying to fix OSX build Sep 14, 2018
Makefile.docker Makefile for LDC Docker images Nov 20, 2017
Makefile.guix Preparing for LLVM6 based release Sep 10, 2018
Makefile.old Updated Make instructions and simplified the main Makefile. The follo… Feb 5, 2018
README.md Preparing for LLVM6 based release Sep 10, 2018
RELEASE-NOTES.md Updated version and changelog Sep 10, 2018
dub.json Makefile: undeaD gets loaded by dub Nov 24, 2017
gen_ldc_version_info.py Fix compilation error on Linux Oct 11, 2017
run_tests.sh Fix test Jan 8, 2018
sambamba-ldmd-debug.rsp Save version strings from build chain and display in usage Nov 17, 2016
sambamba-ldmd-release.rsp Makefile: fixed standard make file to support Dlang ./undeaD lib Feb 23, 2017

README.md

Build Status Anaconda-Server Badge DL

sambamba

Table of Contents

Introduction

Sambamba is a high performance highly parallel robust and fast tool (and library), written in the D programming language, for working with SAM and BAM files. Because of its efficiency is an important work horse running in many sequencing centres around the world today.

Current functionality is an important subset of samtools functionality, including view, index, sort, markdup, and depth. Most tools support piping: just specify /dev/stdin or /dev/stdout as filenames. When we started writing sambamba (in 2012) the main advantage over samtools was parallelized BAM reading and writing. In March 2017 samtools 1.4 was released, reaching parity on this. A recent performance comparison shows that sambamba holds its ground and can do better in different configurations. Here are some comparison metrics. For example for flagstat sambamba is 1.4x faster than samtools. For index they are similar. For Markdup almost 6x faster and for view 4x faster. For sort sambamba has been beaten generally, though sambamba is up to 2x faster on large RAM machines.

In addition sambamba has a few interesting features to offer, in particular

  • faster large machine sort, see performance
  • automatic index creation when writing any coordinate-sorted file
  • view -L <bed file> utilizes BAM index to skip unrelated chunks
  • depth allows to measure base, sliding window, or region coverages
    • Chanjo builds upon this and gets you to exon/gene levels of abstraction
  • markdup, a fast implementation of Picard algorithm
  • slice quickly extracts a region into a new file, tweaking only first/last chunks
  • and more

Even though Sambamba started out as a samtools clone we are now in the process of adding new functionality - also in the BioD project. The D language is extremely suitable for high performance computing. At this point we think that the BAM format is here to stay for processing sequencing data and we aim to make it easy to parse and process BAM files.

Sambamba is free and open source software, licensed under GPLv2+. See manual pages online to know more about what is available and how to use it.

For more information on Sambamba contact the mailing list (see below).

Binary installation

Install stable release

For those not in the mood to learn/install new package managers, there are Github source and binary releases. Simply download the tarball, unpack it and run it. For example

wget https://github.com/biod/sambamba/releases/download/v0.6.8/sambamba_v0.6.8_linux.tar.bz2
tar xvjf sambamba_v0.6.8_linux.tar.bz2
./sambamba_v0.6.8

    sambamba 0.6.8

        Usage: sambamba [command] [args...]

        Available commands: 'view', 'index', 'merge', 'sort',
                             'flagstat', 'slice', 'markdup', 'depth', 'mpileup'
        To get help on a particular command, just call it without args.

Bioconda install

Install with CONDA

With Conda use the bioconda channel.

GNU Guix install

A GNU Guix package for sambamba is available. The development version is packaged here.

Debian GNU/Linux install

Debian: see Debian packages.

Homebrew install

Users of Homebrew can also use the formula from homebrew-science.

Getting help

Sambamba has a mailing list for installation help and general discussion.

Reporting a sambamba bug or issue

Before posting an issue search the issue tracker and mailing list first. It is likely someone may have encountered something similar. Also try running the latest version of sambamba to make sure it has not been fixed already. Support/installation questions should be aimed at the mailing list. The issue tracker is for development issues around the software itself. When reporting an issue include the output of the program and the contents of the output directory.

Check list:

  1. I have found and issue with sambamba
  2. I have searched for it on the issue tracker (also check closed issues)
  3. I have searched for it on the mailing list
  4. I have tried the latest release of sambamba
  5. I have read and agreed to below code of conduct
  6. If it is a support/install question I have posted it to the mailing list
  7. If it is software development related I have posted a new issue on the issue tracker or added to an existing one
  8. In the message I have included the output of my sambamba run
  9. In the message I have included the relevant files in the output directory
  10. I have made available the data to reproduce the problem (optional)

To find bugs the sambamba software developers may ask to install a development version of the software. They may also ask you for your data and will treat it confidentially. Please always remember that sambamba is written and maintained by volunteers with good intentions. Our time is valuable too. By helping us as much as possible we can provide this tool for everyone to use.

Code of conduct

By using sambamba and communicating with its communtity you implicitely agree to abide by the code of conduct as published by the Software Carpentry initiative.

Compiling Sambamba

Note: in general there is no need to compile sambamba. You can use a recent binary install as listed above.

The preferred method for compiling Sambamba is with the LDC compiler which targets LLVM. LLVM versions 6 is faster than earlier editions.

Compilation dependencies

  • git (to check out the repo)
  • gcc compiler 4.9 or later (for htslib)
  • D compiler 1.7.0 or later (ldc2, see below)
  • python2 (parses D-compiler header for version info)
  • zlib (library)
  • lz4 (library)
  • htslib (submodule)
  • BioD (source)
  • undeaD (source)
  • python2

Compiling for Linux

The LDC compiler's github repository provides binary images. The current preferred release for sambamba is LDC - the LLVM D compiler (>= 1.6.1). After installing LDC from https://github.com/ldc-developers/ldc/releases/ with, for example

cd
wget https://github.com/ldc-developers/ldc/releases/download/v$ver/ldc2-1.7.0-linux-x86_64.tar.xz
tar xvJf ldc2-1.7.0-linux-x86_64.tar.xz
export PATH=$HOME/ldc2-1.7.0-linux-x86_64/bin:$PATH
export LIBRARY_PATH=$HOME/ldc2-1.7.0-linux-x86_64/lib
git clone --recursive https://github.com/biod/sambamba.git
cd sambamba
make

To build a development/debug version run

make clean && make debug

To run the test fetch shunit2 from https://github.com/kward/shunit2 and put it in the path so you can run

make check

GNU Guix

To build sambamba the LDC compiler is also available in GNU Guix:

guix package -i ldc

Compiling for Mac OS X

Note: the Makefile does not work. Someone want to fix that using the Makefile.old version? See also https://github.com/biod/sambamba/issues/338.

    brew install ldc
    git clone --recursive https://github.com/biod/sambamba.git
    cd sambamba
    git clone https://github.com/dlang/undeaD
    make sambamba-ldmd2-64

Development

Sambamba development and issue tracker is on github. Developer documentation can be found in the source code and the development documentation.

Debugging and troubleshooting

Segfaults on certain Intel Xeons

Important note: some popular Xeon processors segfault under heavy hyper threading - which Sambamba utilizes. Please read this when encountering seemingly random crashes.

Dump core

In a crash sambamba can dump a core file. To make this happen set

ulimit -c unlimited

and run your command. Send us the core file so we can reproduce the state at time of segfault.

Use catchsegv

Another option is to use catchsegv

catchsegv ./build/sambamba command

this will show state on stdout which can be sent to us.

Using gdb

In case of crashes it's helpful to have GDB stacktraces (bt command). A full stacktrace for all threads:

thread apply all backtrace full

Note that GDB should be made aware of D garbage collector:

handle SIGUSR1 SIGUSR2 nostop noprint

A binary relocatable install of sambamba with debug information and all dependencies can be fetched from the binary link above. Unpack the tarball and run the contained install.sh script with TARGET

./install.sh ~/sambamba-test

Run sambamba in gdb with

gdb -ex 'handle SIGUSR1 SIGUSR2 nostop noprint' \
  --args ~/sambamba-test/sambamba-*/bin/sambamba view --throw-error

License

Sambamba is distributed under GNU Public License v2+.

Credit

If you are using Sambamba in your research and want to support future work on Sambamba, please cite the following publication:

A. Tarasov, A. J. Vilella, E. Cuppen, I. J. Nijman, and P. Prins. Sambamba: fast processing of NGS alignment formats. Bioinformatics, 2015.


@article{doi:10.1093/bioinformatics/btv098,
  author = {Tarasov, Artem and Vilella, Albert J. and Cuppen, Edwin and Nijman, Isaac J. and Prins, Pjotr},
  title = {Sambamba: fast processing of NGS alignment formats},
  journal = {Bioinformatics},
  volume = {31},
  number = {12},
  pages = {2032-2034},
  year = {2015},
  doi = {10.1093/bioinformatics/btv098},
  URL = { + http://dx.doi.org/10.1093/bioinformatics/btv098}