SPAN Peak Analyzer
+----------------------------------+
|SPAN Semi-supervised Peak Analyzer|
+----------------------------|/----+
, ,
__.-'|'-.__.-'|'-.__
='=====|========|====='=
~_^~-^~~_~^-^~-~~^_~^~^~^
SPAN Peak Analyzer is a multipurpose peak caller capable of processing a broad range of ChIP-seq, ATAC-seq, and
single-cell ATAC-seq datasets.
In semi-supervised mode it is capable to robustly handle multiple
replicates and noise by leveraging limited manual annotation information.
Open Access Paper: https://doi.org/10.1093/bioinformatics/btab376
Citation: Shpynov O, Dievskii A, Chernyatchik R, Tsurinov P, Artyomov MN. Semi-supervised peak calling with SPAN and JBR Genome Browser. Bioinformatics. 2021 May 21.
The Latest release
See releases section for actual information.
Requirements
Download and install Java 8.
Peak calling
To analyze a single (possibly replicated) biological condition use analyze
command. See details with command:
$ java -jar span.jar analyze --help
The <output.bed>
file will contain predicted and FDR-controlled peaks in the
ENCODE broadPeak (BED 6+3) format:
<chromosome> <peak start offset> <peak end offset> <peak_name> <score> . <coverage or fold/change> <-log p-value> <-log Q-value>
Examples:
- Regular peak calling
java -Xmx4G -jar span.jar analyze -t ChIP.bam -c Control.bam --cs Chrom.sizes -p Results.peak
- Semi-supervised peak calling
java -Xmx4G -jar span.jar analyze -t ChIP.bam -c Control.bam --cs Chrom.sizes -l Labels.bed -p Results.peak
- Model fitting only
java -Xmx4G -jar span.jar analyze -t ChIP.bam -c Control.bam --cs Chrom.sizes
Differential peak calling
The compare two (possibly replicated) biological conditions use the compare
. See help for details:
$ java -jar span.jar compare --help
SPAN Command line options
-t, --treatment TREATMENT
Required. ChIP-seq treatment file. Supported formats: BAM, BED, BED.gz or bigWig file. If multiple files are
provided, they are treated as replicates. Multiple files should be separated by commas: -t A,B,C
. Multiple files are
processed as replicates on the model level.
-c, --control CONTROL
Control file. Multiple files should be separated by commas. A single control file, or a separate file per each treatment
file is required. Follow the instructions for -t
, --treatment
TREATMENT.
-cs, --chrom.sizes CHROMOSOMES_SIZES
Required. Chromosome sizes file for the genome build used in TREATMENT and CONTROL files. Can be downloaded
at UCSC.
--fragment FRAGMENT
Fragment size. If provided, reads are shifted appropriately. If not provided, the shift is estimated from the data.
--fragment 0
argument is necessary for ATAC-Seq data processing.
-k, --keep-dup
Keep duplicates. By default, SPAN filters out redundant reads aligned at the same genomic position.
--keep-dup
argument is necessary for single cell ATAC-Seq data processing.
-b, --bin BIN_SIZE
Peak analysis is performed on read coverage tiled into consequent bins of configurable size. (default: 200)
-f, --fdr FDR
Minimum FDR cutoff to call significant regions. (default: 0.05)
-g, --gap GAP
Gap size to merge spatially close peaks. Useful for wide histone modifications. (default: 3)
--labels LABELS
Labels BED file. Used in semi-supervised peak calling.
-p, --peaks PEAKS
Resulting peaks file in ENCODE broadPeak* (BED 6+3) format. If omitted, only the model fitting step is performed.
-m, --model MODEL
This option is used to specify SPAN model path, if not provided, model name is composed of input names and other
arguments.
--noclip
Do not perform additional peaks clipping to increase reads density.
-w, --workdir PATH
Path to the working directory (stores coverage and model caches).
--threads THREADS
Configures the parallelism level. SPAN utilizes both multithreading and specialized processor extensions like SSE2, AVX,
etc.
-i, --iterations
Maximum number of iterations for EM algorithm. (default: 20)
--threshold, --tr
Convergence threshold for EM algorithm, use --debug
option to see detailed info (default: 1)
-d, --debug
Print all the debug information, used for troubleshooting.
-q, --quiet
Turn off output.
Example
Step-by-step example with test dataset is available here.
Galaxy
SPAN is available as a tool in the official ToolShed for Galaxy. You can ask your Galaxy administrator to install it.
Build from sources
Clone bioinf-commons library under the project root.
git clone git@github.com:JetBrains-Research/bioinf-commons.git
Launch the following command line to build SPAN jar:
./gradlew shadowJar
The SPAN jar file will be generated in the folder build/libs
.
FAQ
- Q: What is the average running time?
A: SPAN is capable of processing a single ChIP-Seq track in less than 1 hour on an average laptop (MacBook Pro 2015). - Q: Which operating systems are supported?
A: SPAN is developed in modern Kotlin programming language and can be executed on any platform supported by java. - Q: Where did you get this lovely span picture?
A: From ascii.co.uk, the original author goes by the name jgs.
Errors Reporting
Use GitHub issues to suggest new features or report bugs.