Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
src
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

JetBrains Research license tests

SPAN Peak Analyzer

+----------------------------------+
|SPAN Semi-supervised Peak Analyzer|
+----------------------------|/----+
           ,        ,
      __.-'|'-.__.-'|'-.__
    ='=====|========|====='=
    ~_^~-^~~_~^-^~-~~^_~^~^~^

SPAN Peak Analyzer is a multipurpose peak caller capable of processing a broad range of ChIP-seq, ATAC-seq, and single-cell ATAC-seq datasets.
In semi-supervised mode it is capable to robustly handle multiple replicates and noise by leveraging limited manual annotation information.

Open Access Paper: https://doi.org/10.1093/bioinformatics/btab376

Citation: Shpynov O, Dievskii A, Chernyatchik R, Tsurinov P, Artyomov MN. Semi-supervised peak calling with SPAN and JBR Genome Browser. Bioinformatics. 2021 May 21.

The Latest release

See releases section for actual information.

Requirements

Download and install Java 8.

Peak calling

To analyze a single (possibly replicated) biological condition use analyze command. See details with command:

$ java -jar span.jar analyze --help

The <output.bed> file will contain predicted and FDR-controlled peaks in the ENCODE broadPeak (BED 6+3) format:

<chromosome> <peak start offset> <peak end offset> <peak_name> <score> . <coverage or fold/change> <-log p-value> <-log Q-value>

Examples:

  • Regular peak calling
    java -Xmx4G -jar span.jar analyze -t ChIP.bam -c Control.bam --cs Chrom.sizes -p Results.peak
  • Semi-supervised peak calling
    java -Xmx4G -jar span.jar analyze -t ChIP.bam -c Control.bam --cs Chrom.sizes -l Labels.bed -p Results.peak
  • Model fitting only
    java -Xmx4G -jar span.jar analyze -t ChIP.bam -c Control.bam --cs Chrom.sizes

Differential peak calling

The compare two (possibly replicated) biological conditions use the compare. See help for details:

$ java -jar span.jar compare --help

SPAN Command line options

-t, --treatment TREATMENT
Required. ChIP-seq treatment file. Supported formats: BAM, BED, BED.gz or bigWig file. If multiple files are provided, they are treated as replicates. Multiple files should be separated by commas: -t A,B,C. Multiple files are processed as replicates on the model level.

-c, --control CONTROL
Control file. Multiple files should be separated by commas. A single control file, or a separate file per each treatment file is required. Follow the instructions for -t, --treatment TREATMENT.

-cs, --chrom.sizes CHROMOSOMES_SIZES
Required. Chromosome sizes file for the genome build used in TREATMENT and CONTROL files. Can be downloaded at UCSC.

--fragment FRAGMENT
Fragment size. If provided, reads are shifted appropriately. If not provided, the shift is estimated from the data. --fragment 0 argument is necessary for ATAC-Seq data processing.

-k, --keep-dup
Keep duplicates. By default, SPAN filters out redundant reads aligned at the same genomic position. --keep-dup argument is necessary for single cell ATAC-Seq data processing.

-b, --bin BIN_SIZE
Peak analysis is performed on read coverage tiled into consequent bins of configurable size. (default: 200)

-f, --fdr FDR
Minimum FDR cutoff to call significant regions. (default: 0.05)

-g, --gap GAP
Gap size to merge spatially close peaks. Useful for wide histone modifications. (default: 3)

--labels LABELS
Labels BED file. Used in semi-supervised peak calling.

-p, --peaks PEAKS
Resulting peaks file in ENCODE broadPeak* (BED 6+3) format. If omitted, only the model fitting step is performed.

-m, --model MODEL
This option is used to specify SPAN model path, if not provided, model name is composed of input names and other arguments.

--noclip
Do not perform additional peaks clipping to increase reads density.

-w, --workdir PATH
Path to the working directory (stores coverage and model caches).

--threads THREADS
Configures the parallelism level. SPAN utilizes both multithreading and specialized processor extensions like SSE2, AVX, etc.

-i, --iterations
Maximum number of iterations for EM algorithm. (default: 20)

--threshold, --tr
Convergence threshold for EM algorithm, use --debug option to see detailed info (default: 1)

-d, --debug
Print all the debug information, used for troubleshooting.

-q, --quiet
Turn off output.

Example

Step-by-step example with test dataset is available here.

Galaxy

SPAN is available as a tool in the official ToolShed for Galaxy. You can ask your Galaxy administrator to install it.

Build from sources

Clone bioinf-commons library under the project root.

git clone git@github.com:JetBrains-Research/bioinf-commons.git

Launch the following command line to build SPAN jar:

./gradlew shadowJar

The SPAN jar file will be generated in the folder build/libs.

FAQ

  • Q: What is the average running time?
    A: SPAN is capable of processing a single ChIP-Seq track in less than 1 hour on an average laptop (MacBook Pro 2015).
  • Q: Which operating systems are supported?
    A: SPAN is developed in modern Kotlin programming language and can be executed on any platform supported by java.
  • Q: Where did you get this lovely span picture?
    A: From ascii.co.uk, the original author goes by the name jgs.

Errors Reporting

Use GitHub issues to suggest new features or report bugs.

Authors

JetBrains Research BioLabs