This repository was archived by the owner on Aug 22, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 48
bamliquidator
John DiMatteo edited this page Feb 5, 2014
·
193 revisions
Overview
Download
Usage
Developer Getting Started Check List
### Overview * bamliquidator is a suite of tools for efficiently analyzing the density of short DNA sequence read alignments in the BAM file format * the read counts across many genomes are grouped, normalized, graphed in interactive html files, and summarized * for an interactive graph example, see this [summary](http://jdimatteo.github.io/Meta-Analysis/summary.html) and this [breakdown for a single chromosome](http://jdimatteo.github.io/Meta-Analysis/chr20.html) * a whole genome can be processed and analyzed in less than 20 seconds (on modern hardware) * a BAM file is a binary sequence alignment map -- see [SAMtools](http://samtools.sourceforge.net/) for more info * the read counts and summaries are stored in HDF5 format where they can be efficiently read via Python [PyTables](http://www.pytables.org) or the [HDF5 C apis](www.hdfgroup.org/HDF5/) * see here for a simple Python script example using the summary to show the hot spots and cold spots in a genome (TODO) * the HDF5 files can be viewed directly with the cross platform tool [HDFView](http://www.hdfgroup.org/products/java/hdf-java-html/hdfview/) * there is also a simple command line utility for counting the number of reads in specified portion of a chromosome, and the count is output to the console ### Download * the latest release can be downloaded here (TODO) * you can also [build from source yourself](#Developer) ### Usage #### bamliquidator_batch.py #### bamliquidator bamliquidator is run from the command line with 7 required positional arguments: ``` $ bamliquidator [ bamliquidator ] output to stdout 1. bam file (.bai file has to be at same location) 2. chromosome 3. start 4. stop 5. strand +/-, use dot (.) for both strands 6. number of summary points 7. extension length ``` Example counting the number of reads on both strands from base pair 100 to 200 on chromosome 1 (inclusive): ``` $ bamliquidator 04032013_D1L57ACXX_4.TTAGGC.hg18.bwt.sorted.bam chr1 100 200 . 1 0 120 $ ``` (TODO: add examples with summary points > 1, and explain what extension length does) ### Developer Getting Started Check List #### Dependencies: SAMtools, HDF5, boost, C++11 (clang/libc++), tcmalloc, PyTables #### Checkout, build the code, and verify runs ``` $ git clone git@github.com:BradnerLab/pipeline.git $ cd pipeline/bamliquidator_internal $ make $ ./bamliquidator_batch usage: ./bamliquidator_batch cell_type bin_size ucsc_chrom_size_path bam_file_path hdf5_file
e.g. ./bamliquidator_batch mm1s 100000 /grail/annotations/ucsc_chromSize.txt /ifs/labs/bradner/bam/hg18/mm1s/04032013_D1L57ACXX_4.TTAGGC.hg18.bwt.sorted.bam
note that this application is intended to be run from bamliquidator_batch.py -- see https://github.com/BradnerLab/pipeline/wiki for more information $ ../bamliquidator [ bamliquidator ] output to stdout
- bam file (.bai file has to be at same location)
- chromosome
- start
- stop
- strand +/-, use dot (.) for both strands
- number of summary points
- extension length
$