Among many methods of measuring chromatin accessibility, Assay for Transposase Accessible Chromatin (ATAC-seq) is widely used. As like many other types of next generation sequencing (NGS), researchers include multi-sample replication in ATAC-seq experimental designs to compensate for lack of consistency. In epigenome analysis, researchers should measure subtle changes in peak considering read depth of individual samples. It is very important to determine whether peaks of each replication have integrative meaning for interesting region observed during multi-sample integration. Therefore, we developed MESIA that integrates replication with high representative & reproducibility in multi sample replication and determine optimal peak. Based on IDR, our methods identify reproducibility between all replications while derives distance from beginning of peak to point and Q value. Afterwards, multi samples determined as representative replication are integrated using the distance and Q value. When verifying performance based on expression value of gene with a peak formed in promoter, MESIA detected 8.61 times more peaks than previously used method, and value (median of gene expression) of the peaks was 1.32 times higher. These results indicate that MSIA is less likely to determine false positive peak than previously used method. MESIA is a shell script-based open source code that provides researchers involved in epigenome with comprehensive insight into peak and strategies for multi-sample integrates.
MESIA requires following packages:
- shuf
- bedtools
- samtools >= 1.6
- macs2
- featureCounts
- idr
micromamba install --allow-downgrade -y -c conda-forge -c bioconda bedtools
micromamba install --allow-downgrade -y -c conda-forge -c bioconda samtools
micromamba install --allow-downgrade -y -c conda-forge -c bioconda macs2
micromamba install --allow-downgrade -y -c conda-forge -c bioconda subread
micromamba install --allow-downgrade -y -c conda-forge -c bioconda idr
git clone https://github.com/ERASMUSlab/MESIA.git
cd MESIA/MESIA/Input_NF
cp "Your Input file path" .
cd ../Program
bash MESIA.sh "Project Name" "BAM file1" "BAM file2" "BAM file3" "BAM file4" "Threads" "mappable genome size"
example
bash MESIA.sh test DAY0_rep1 DAY0_rep2 DAY0_rep3 30 mm
mappable genome size section can be hs, mm or others
- General_info.txt
- Display experiment title, input data, number of threads and genome size.
============================================
=========== General information ============
============================================
Experiment name : test
sample list : DAY0_rep1 DAY0_rep2 DAY0_rep3
core : 30
size : mm
============================================
- Result_MESIA_stage1.txt
- Result matrix of MESIA stage1.
===================================
Result of Self-consistence analysis
===================================
s1 s2 s3
s1 ~ O O
s2 O ~ O
s3 O O ~
===================================
Result of Rescue analysis
===================================
s1 s2 s3
s1 ~ O X
s2 O ~ X
s3 X X ~
===================================
Result of MESIA stage1 analysis
===================================
s1 s2 s3
s1 ~ O X
s2 O ~ X
s3 X X ~
Passed replication set is
DAY0_rep1-DAY0_rep2
-
MESIA stage1 Self-consistence
- Files of MESIA stage1 Self-consistence analysis
-
MESIA stage1 rescue
- Files of MESIA stage1 rescue analysis
-
MESIA_optimal.narrowPeak
- chrom - Name of the chromosome (or contig, scaffold, etc.).
- chromStart - The starting position of the feature in the chromosome or scaffold.
- chromEnd - The ending position of the feature in the chromosome or scaffold.
- name - Name given to a region (preferably unique). Use "." if no name is assigned.
- score - Indicates how dark the peak will be displayed in the browser.
- strand - +/- to denote strand or orientation (whenever applicable). Use "." if no orientation is assigned.
- signalValue - Measurement of overall (usually, average) enrichment for the region.
- pValue - Measurement of statistical significance (-log10).
- qValue - Measurement of statistical significance using false discovery rate (-log10).
- peak - Point-source called for this peak; 0-based offset from chromStart.
chr1 4688502 4688766 . 864.75 . 43.566975 110.47625 107.238775 176
chr1 4688963 4689167 . 891.5 . 46.87355 120.97355 117.6737 91
chr1 4771118 4771440 . 1000 . 43.736 112.307 109.21 231
chr1 4776272 4776804 . 1472.25 . 95.801275 287.73925 283.984 192.75
chr1 4844967 4846600 . 1011.666667 . 69.51251333 201.3756166 197.9333833 691.5
chr1 4896692 4896898 . 1210.75 . 75.159125 215.3315 211.74 104.25
chr1 5083128 5083590 . 1018 . 53.2404 140.4385 137.073 91
chr1 7169526 7171356 . 938 . 28.4947 65.6063 62.7625 259
chr1 9545419 9547400 . 843.25 . 40.806025 101.765175 98.569575 189.25
chr1 9745452 9748276 . 727 . 31.1453 73.4112 70.5175 454