SPAN Peak Analyzer
+----------------------------------+ |SPAN Semi-supervised Peak Analyzer| +----------------------------|/----+ , , __.-'|'-.__.-'|'-.__ ='=====|========|====='= ~_^~-^~~_~^-^~-~~^_~^~^~^
SPAN Peak Analyzer is a multipurpose peak caller capable of processing a broad range of ChIP-seq, ATAC-seq, and
single-cell ATAC-seq datasets.
In semi-supervised mode it is capable to robustly handle multiple replicates and noise by leveraging limited manual annotation information.
Open Access Paper: https://doi.org/10.1093/bioinformatics/btab376
Citation: Shpynov O, Dievskii A, Chernyatchik R, Tsurinov P, Artyomov MN. Semi-supervised peak calling with SPAN and JBR Genome Browser. Bioinformatics. 2021 May 21.
The Latest release
Version 0.13.5244 released on Aug 12th, 2020.
Download and install Java 8.
To analyze a single (possibly replicated) biological condition use
analyze command. See details with command:
$ java -jar span.jar analyze --help
<chromosome> <peak start offset> <peak end offset> <peak_name> <score> . <coverage or fold/change> <-log p-value> <-log Q-value>
- Regular peak calling
java -Xmx4G -jar span.jar analyze -t ChIP.bam -c Control.bam --cs Chrom.sizes -p Results.peak
- Semi-supervised peak calling
java -Xmx4G -jar span.jar analyze -t ChIP.bam -c Control.bam --cs Chrom.sizes -l Labels.bed -p Results.peak
- Model fitting only
java -Xmx4G -jar span.jar analyze -t ChIP.bam -c Control.bam --cs Chrom.sizes
Differential peak calling
The compare two (possibly replicated) biological conditions use the
compare. See help for details:
$ java -jar span.jar compare --help
SPAN Command line options
-t, --treatment TREATMENT
Required. ChIP-seq treatment file. Supported formats: BAM, BED, BED.gz or bigWig file. If multiple files are provided, they are treated as replicates. Multiple files should be separated by commas:
-t A,B,C. Multiple files are
processed as replicates on the model level.
-c, --control CONTROL
Control file. Multiple files should be separated by commas. A single control file, or a separate file per each treatment file is required. Follow the instructions for
-cs, --chrom.sizes CHROMOSOMES_SIZES
Required. Chromosome sizes file for the genome build used in TREATMENT and CONTROL files. Can be downloaded at UCSC.
Fragment size. If provided, reads are shifted appropriately. If not provided, the shift is estimated from the data.
--fragment 0 argument is necessary for ATAC-Seq data processing.
Keep duplicates. By default, SPAN filters out redundant reads aligned at the same genomic position.
--keep-dup argument is necessary for single cell ATAC-Seq data processing.
-b, --bin BIN_SIZE
Peak analysis is performed on read coverage tiled into consequent bins of configurable size. (default: 200)
-f, --fdr FDR
Minimum FDR cutoff to call significant regions. (default: 0.05)
-g, --gap GAP
Gap size to merge spatially close peaks. Useful for wide histone modifications. (default: 3)
Labels BED file. Used in semi-supervised peak calling.
-p, --peaks PEAKS
Resulting peaks file in ENCODE broadPeak* (BED 6+3) format. If omitted, only the model fitting step is performed.
-m, --model MODEL
This option is used to specify SPAN model path, if not provided, model name is composed of input names and other arguments.
-w, --workdir PATH
Path to the working directory (stores coverage and model caches).
Peaks computation method.
Use 'islands' to merge consequent blocks of enriched bins with relaxed gaps, or 'simple' to merge fdr enriched HMM bins with gap into peaks (previous). (default: 'islands')
Configures the parallelism level. SPAN utilizes both multithreading and specialized processor extensions like SSE2, AVX, etc.
Number of multi-start runs using different model initializations. Use 0 to disable (default: 5)
Number of iterations for each multi-start run (default: 2)
Maximum number of iterations for EM algorithm. (default: 20)
Convergence threshold for EM algorithm, use
--debug option to see detailed info (default: 1)
Print all the debug information, used for troubleshooting.
Turn off output.
Step-by-step example with test dataset is available here.
Build from sources
Clone bioinf-commons library under the project root.
git clone firstname.lastname@example.org:JetBrains-Research/bioinf-commons.git
Launch the following command line to build SPAN jar:
The SPAN jar file will be generated in the folder
- Q: What is the average running time?
A: SPAN is capable of processing a single ChIP-Seq track in less than 1 hour on an average laptop (MacBook Pro 2015).
- Q: Which operating systems are supported?
A: SPAN is developed in modern Kotlin programming language and can be executed on any platform supported by java.
- Q: Where did you get this lovely span picture?
A: From ascii.co.uk, the original author goes by the name jgs.
Use GitHub issues to suggest new features or report bugs.