Skip to content
HMMRATAC peak caller for ATAC-seq data
Java Shell
Branch: master
Clone or download
Latest commit 2df2ff4 Aug 16, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
ThirdPartyJARS outdated jars for versions 1.1 and lower Jan 20, 2019
src remove duplicate reads optional Aug 14, 2019
HMMRATAC_Guide.md No header on genome.info Apr 20, 2019
Make_HMMRATAC_Files.sh rename script for creating input files Apr 4, 2019
README.md Update README.md Aug 16, 2019
_config.yml Set theme jekyll-theme-slate Aug 14, 2019

README.md

HMMRATAC

Quick Start

Assume that you have a BAM file from aligner such as bwa mem named ATACseq.bam.

  1. Sort the BAM file to get a ATACseq.sorted.bam file:

    samtools sort ATACseq.bam -o ATACseq.sorted.bam

  2. Make index from the BAM file to get a ATACseq.sorted.bam.bai file:

    samtools index ATACseq.sorted.bam ATACseq.sorted.bam.bai

  3. Make genome information (chromosome sizes) from the BAM file to get a genome.info file:

    samtools view -H ATACseq.sorted.bam| perl -ne 'if(/^@SQ.*?SN:(\w+)\s+LN:(\d+)/){print $1,"\t",$2,"\n"}' > genome.info

  4. Run HMMRATAC on the sorted BAM ATACseq.sorted.bam, the BAM index file ATACseq.sorted.bam.bai, and the genome information file genome.info:

    java -jar HMMRATAC_V1.2.4_exe.jar -b ATACseq.sorted.bam -i ATACseq.sorted.bam.bai -g genome.info

  5. Filter HMMRATAC output by the score, if desired. Score threshold will depend on dataset, score type and user preference. A threshold of 10 would be:

    awk -v OFS="\t" '$13>=10 {print}' NAME_peaks.gappedPeak > NAME.filteredPeaks.gappedPeak

    To filter the summit file by the same threshold:

    awk -v OFS="\t" '$5>=10 {print}' NAME_summits.bed > NAME.filteredSummits.bed

    NOTE: HMMRATAC will report all peaks that match the structure defined by the model, including weak peaks. Filtering by score can be used to retain stronger peaks. Lower score = higher sensitivity and lower precision, Higher score = lower sensitivity and higher precision.

Samtools can be downloaded here: http://www.htslib.org/download/

Be sure to run HMMRATAC using the executable file, found here: https://github.com/LiuLabUB/HMMRATAC/releases For details on HOW to run HMMRATAC, see HMMRATAC_Guide.md, which contains a thorough runthrough of all parameters, output files and input requirements and troubleshooting.

HMMRATAC requires paired-end data. Single-end data will not work. HMMRATAC is designed to process ATAC-seq data that hasn't undergone any size selection, either physical or in silico. This should be standard practice for any ATAC-seq analysis. Size selected data can be processed by HMMRATAC (see HMMRATAC_Guide.md on --trim option).

If you use HMMRATAC in your research, please cite the following paper:

You can’t perform that action at this time.