Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.

BatMeth2: An Integrated Package for Bisulfite DNA Methylation Data Analysis with Indel-sensitive Mapping.

This is a README file for the usage of Batmeth2.

BatMeth2 tutotial: https://batmeth2-docs.readthedocs.io

REQUIREMENTS

  1. gcc (v4.8) , gsl library, zlib

  2. samtools >= v1.3.1

  3. fastp, raw reads as input need

INSTALL

a) Download 1)

b) unzip 1)

c) Change directory into the top directory of b) "BatMeth2/"

d) Type

  • ./configure
  • make
  • make install

e) The binary of BatMeth2 will be created in bin/

Building index

a) Have a fasta-formatted reference file ready

b) Type "BatMeth2 build_index GENOME.fa" for WGBS or BatMeth2 build_index rrbs GENOME.fa for RRBS to make the neccessary pairing data-structure based on FM-index.

c) Run "BatMeth2" to see information on usage.

Usage of BatMeth2

Example Data

You can download the test data on https://drive.google.com/open?id=1SEpvJbkjwndYcpkd39T11lrBytEq_MaC Or https://pan.baidu.com/s/1mliGjbn_33wlQLieqy5YOQ with extraction code: kr32.

Example data contain files:

  • input fastq.gz (paired end)
  • genome file
  • usage code and details
  • gene annotation file

Citation:

Zhou Q, Lim J-Q, Sung W-K, Li G: An integrated package for bisulfite DNA methylation data analysis with Indel-sensitive mapping. BMC Bioinformatics 2019, 20:47.

BatMeth2 pipeline

An easy-to-use, auto-run package for DNA methylation analyses

​ In order to complete the DNA methylation data analysis more conveniently, we packaged all the functions to complete an easy-to-use, auto-run package for DNA methylation analysis. During the execution of BatMeth2 Tool, an html report is generated about statistics of the sample.

The usage is here:

Raw reads:

BatMeth2 pipel --fastp ~/location/to/fastp -1 Raw_reads_1.fq.gz -2 Raw_read_2.fq.gz -g ./batmeth2index/genome.fa -o meth -p 8 --gff ./gene.gff

Or clean reads:

BatMeth2 pipel -1 Clean_reads_1.fq.gz -2 Clean_read_2.fq.gz -g ./batmeth2index/genome.fa -o meth -p 8 --gff ./gene.gff

BatMeth2 pipeline parameters

BatMeth2 [mode][paramaters]

mode: build_index, pipel, align, calmeth, annoation, batDMR

[build_index]
Usage: (must run this step first)

  1. BatMeth2 build_index genomefile.

  2. BatMeth2 build_index rrbs genomefile.

[pipel (Contains: align, calmeth, annoation, methyPlot, mkreport)]
[fastp location]
   --fastp fastp program location.
   If --fastp is not defined, the input file should be clean data.
[select aligner]
--aligner    BatMeth2(default), bwa-meth, bsmap, bismark2, no (exit output_prefix.sam file, no need align again)
[other aligners paramaters]
--go     Name of the genome, contaion index build by aligner. (bwa-meth/bismark2)
[main paramaters]
--config   [config file].    When we run pipel function in batches datasets,
     please fill in the specified configuration file.
     And there is a sample file (multirun.onf) in the BatMeth2 directory.
--mp [4]   When batch processing data, we set the number of samples to run at a time (-mp, default is 4), and each sample needs six threads (- P parameter) by default.
-o   Name of output file prefix
-O   Output of result file to specified folder, default output to current folder (./)
[alignment paramaters]
-i    Name of input file, if paired-end. please use -1, -2, input files can be separated by commas
-1    Name of input file left end, if single-end. please use -i
-2    Name of input file left end
-g   Name of the genome mapped against
-n   maximum mismatches allowed due to seq. errors
-p   Launch threads
[calmeth paramaters]
--Qual     calculate the methratio while read QulityScore >= Q. default:10
--redup    REMOVE_DUP, 0 or 1, default 0
--region   Bins for DMR calculate , default 1000bp .
-f    for sam format outfile contain methState. [0 or 1], default: 0 (dont output this file).
[calmeth and annoation paramaters]
--coverage    >= coverage. default:5
--binCover    >= nCs per region. default:3
--chromstep   Chromosome using an overlapping sliding window of 100000bp at a step of 50000bp. default step: 50000(bp)
[annoation paramaters]
--gtf/--gff/--bed   Gtf or gff file / bed file
--distance    DNA methylation level distributions in body and -bp flanking sequences. The distance of upstream and downstream. default:2000
--step    Gene body and their flanking sequences using an overlapping sliding window of 5% of the sequence length at a step of 2.5% of the sequence length. So default step: 0.025 (2.5%)
-C    <= coverage. default:1000
[mkreport paramaters]
Make a batmeth2 html report, can see the detail in BatMeth2_Report/ directory.
-o [outprefix]
-h|--help usage

Output files

Output file format and details see "https://github.com/GuoliangLi-HZAU/BatMeth2/blob/master/output_details.pdf".

Output report details see "https://www.dna-asmdb.com/download/batmeth2.html" .

Functions in BatMeth2

BatMeth2 has the following main features:

  1. Batmeth2 has efficient and accurate alignment performance.
  2. Batmeth2 can calculate DNA methylation level of base site
  3. BatMeth2 also can caculate and annotation DNA methylation level on chromosome region or gene/TE etc. functional region.
  4. By integrating BS-Seq data visualization (DNA methylation distribution on chromosome and gene etc) and BatMeth2 can show the results of the DNA methylation data more clearly.
  5. BatMeth2 can perform effective DNA methylation differential regions analysis based on the number of input samples and user requirements. And BatMeth2 provide differential methylation annotation ability.

1. BatMeth2 alignment

Program: BatMeth2 align

Version: v1.0

[ Single-end-reads ]

Usage: batmeth2 -g INDEX -i INPUT -o OUTPUT

Example: batmeth2 -g /data/index/hg19/hg19.fa -i Read.fq -o outPrefix -p 10

[ Paired-end-reads ]

Usage: batmeth2 -g INDEX -i INPUT_left -i INPUT_right -o OUTPUT

Example: batmeth2 -g /data/index/hg19/hg19.fa -i Read_R1_left.fq -i Read_R2_right.fq -o outPrefix -p 10

Parameters :

​ --inputfile | -i Name of input file

​ --genome | -g Name of the genome mapped against

​ --outputfile | -o Name of output file prefix

​ --indelsize indel size

​ --non_directional Alignments to all four bisulfite strands will be reported. Default: OFF.

​ --insertsize | -s inital insert size, default 800

​ --std | -d standard deviatiion of reads distribution

​ --flanksize | -f size of flanking region for Smith-Waterman

​ --swlimit | try at most sw extensions

​ --threads | -p Launch threads

​ --NoInDels | -I not to find the indels result

​ --help | -h Print help

Note: To use BatMeth2, you need to first index the genome with `build_all genome'.

2. BatMeth2 calmeth

Command Format : calmeth [options] -g GENOME -i/-b <Samfile/Bamfile> -m -p 6

Usage:

-g|--genome Genome

​ -i|--input Sam format file

​ -b|--binput Bam format file

​ -p|--threads the number of threads.

​ -n|--Nmismatch Number of mismatches

​ -m|--methratio [MethFileNamePrefix] Predix of methratio output file

​ -Q [int] caculate the methratio while read QulityScore >= Q. default:10

​ -c|--coverage >= coverage. default:5

​ -nC >= nCs per region. default:5

​ -R |--Regions Bins for DMR caculate , default 1kb .

​ --binsfile DNA methylation level distributions in chrosome, default output file: {methratioPrefix}.methBins.txt

​ -s|--step Chrosome using an overlapping sliding window of 100000bp at a step of 50000bp. default step: 50000(bp)

​ -r|--remove_dup REMOVE_DUP, default:false

​ -f|--sam [outfile] f for sam format outfile contain methState. default: sam format.

​ --sam-seq-beforeBS Converting BS read to the genome sequences.

​ -h|--help

3. BatMeth2 annotation

BatMeth2: MethyGff v1.0

Command Format : methyGff [options] -o <OUT_PREFIX> -G GENOME -gff /-gtf /-b -m [-B][-P]

Usage:

​ -o|--out Output file prefix

​ -G|--genome Genome

​ -m|--methratio Methratio output file.

​ -c|--coverage >= coverage. default:5

​ -C <= coverage. default 600.

​ -nC >= Cs per bins or genes. default:5

​ -gtf|-gff Gtf/gff file

​ -b|--BED Bed file, chrom start end (strand)

​ -d|--distance DNA methylation level distributions in body and -bp flanking sequences. The distance of upstream and downstream. default:2000

​ -B|--body For different analysis input format, gene/TEs body methylation level. [Different Methylation Gene(DMG/DMT...)]

​ -P|--promoter For different analysis input format.[Different Methylation Promoter(DMP)]

​ --TSS Caculate heatmap for TSS. [Outfile: outPrefix.TSS.cg.n.txt]

​ --TTS Caculate heatmap for TTS. [Outfile: outPrefix.TTS.cg.n.txt]

​ --GENE Caculate heatmap for GENE and flank 2k. [Outfile: outPrefix.GENE.cg.n.txt]

​ -s|--step Gene body and their flanking sequences using an overlapping sliding window of 5% of the sequence length at a step of 2.5% of the sequence length. So default step: 0.025 (2.5%)

​ -S|--chromStep Caculate the density of genes/TEs in chromsome using an overlapping sliding window of 100000bp at a step of 50000bp, must equal "-s" in Split.. default step: 50000(bp)

​ -h|--help

4. BatMeth2 methyPlot

Usage1:

  • methyPlot chromsome.bins.txt chrosome.methy.distri.pdf stepdefault:0.025 Infile1.from.batmeth2:methyGff out1.pdf starLabel endLabel Infile2 out2.pdf

eg: methyPlot chromsome.bins.txt chrosome.methy.distri.pdf 0.025 gene.meth.Methylevel.1.txt methlevel.pdf TSS TTS gene.meth.AverMethylevel.1.txt elements.pdf

**Usage2: **

  • methyPlot chromsome.bins.txt chrosome.methy.distri.pdf 0.025 gene.meth.Methylevel.1.txt methlevel.pdf TSS TTS gene.meth.AverMethylevel.1.txt elements.pdf test.annoDensity.1.txt test.density.pdf sampleElmentName test.mCdensity.txt test.mCdensity.pdf test.mCcatero.txt test.mCcatero.pdf 0.8 0.1 0.1

Visulization case 2: two more samples

Contains:

  • The density of gene, transposon elements (TE) and the level of DNA methylation in the different samples of the whole genome.
    $ Rscript density_plot_with_methyl.r inputFile1 input2 genedensityFile TEdensity output.pdf label1 label2

    *example: Rscript ~/software/batmeth2/src/density_plot_with_methyl.r WT.methChrom.Rd.txt Mutant.methChrom.Rd.txt WT.noRd.Gff.gffDensity.1.txt WT.noRd.TE.gffDensity.1.txt density.Out.pdf WT mutant *

  • DNA methylation level distribution across genes/TEs in different samples.

    $ Rscript methylevel.elements.r step(default:0.025) Input.Sample1.from.Batmeth2:methyGff Input.Sample2 outfilePrefix xLab1 xLab2 Sample1Prefix Sample2Prefix

    *example: Rscript methylevel.elements.compare.r 0.025 sample1.gene.meth.Methylevel.1.txt sample2.gene.meth.Methylevel.1.txt methlevel TSS TTS mutant WT *

5. BatMeth2 DMC or DMR/DMG

dmr-batmeth2.png

BatMeth2: DMR

Command Format : DMR [options] -g genome.fa -o_dm <DM_result> -1 [Sample1-methy ..] -2 [sample2-methy ..]

Usage:

​ -o_dm output file

​ -o_dmr when use auto detect by dmc

​ -g|--genome Genome

​ -1 sample1 methy files, sperate by space.

​ -2 sample2 methy files, sperate by space.

​ -FDR adjust pvalue cutoff default : 0.05

​ -methdiff the cutoff of methylation differention. default: 0.25 [CpG]

​ -element caculate gene or TE etc function elements.

​ -L predefinded regions

​ -h|--help


Make sure all index files reside in the same directory.

Built with BatMeth2 build_index Genome.fa

=-=-=-=-=-=-=-=-=-=

GNU automake v1.11.1, GNU autoconf v2.63, gcc v4.4.7.

Tested on Red Hat 4.4.7-11 Linux

Thank you for your patience.

About

BS-seq analysis pipeline

Resources

Releases

No releases published

Packages

No packages published