GitHub - dphansti/mango: chia pet analysis software

Mango

ChIA-PET Analysis Software

Citation

Phanstiel DH, Boyle AP, Heidari N, Snyder MP. Mango: a bias-correcting ChIA-PET analysis pipeline. Bioinformatics. 2015 Oct 1;31(19):3092-8. doi:10.1093/bioinformatics/btv336. Epub 2015 Jun 1. PubMed PMID: 26034063; PubMed Central PMCID: PMC4592333.

Mango Installation

Mango depends on the following R packages.

hash

Rcpp

optparse

readr

They can be installed throug CRAN. For example to install the package 'hash' open R and type the following

install.packages('hash')
install.packages('Rcpp')
install.packages('optparse')
install.packages('readr')

Mango depends on the following software pacakges which should be installed and included in the system PATH prior to using Mango.

Bowtie (http://bowtie-bio.sourceforge.net)

Bedtools >= 2.20.0 (https://github.com/arq5x/bedtools2)

MACS2 (https://github.com/taoliu/MACS)

Once dependencies are installed Mango can be installed from the command line using the following command.

git clone https://github.com/dphansti/mango.git
R CMD INSTALL --no-multiarch --with-keep.source mango

If installing on cluster where you have limited permissions you may need to setup a local library. An example is shown below.

git clone https://github.com/dphansti/mango.git
mkdir ~/R
mkdir ~/R/library
export R_LIBS=$R_LIBS:~/R/library
R CMD INSTALL --no-multiarch --with-keep.source mango --library=~/R/library

We recommend adding the $R_LIBS:~/R/library command to your bash_profile as well.

Features

Mango uses fastq files generated by illumina sequencers to call peaks and interactions from ChIA-PET experiments. Arguments can be passed to Mango either by a configuration file, through the command line, or a combination of both. In cases where arguments at supplied both through the command line and a configuration file the values passed via command line arguments will take precidence.

Usage of Mango

Rscript mango.R [-options]

Example for regular interactions calling

Rscript Mango.R --fastq1 samplename_1.fastq --fastq2 samplename_1.fastq --prefix samplename --argsfile argsfile.txt
   --chromexclude chrM,chrY --stages 1:5

Example of a argsfile

bowtieref         = /path/to/hg19
bedtoolsgenome    = /path/to/human.hg19.genome

!! Note if using tagmentation-generated libraries !!

If the libraries were generated using tagmentation instead of MmeI digestions and adapter ligation we recommend the following settings:

--keepempty TRUE
--shortreads FALSE
--maxlength 1000

Parameters

ALL STAGES

argsfile: The full path to a file containing any of the following parameters. See above for example.
stages: stages of the pipeline to execute. stage can be either a single stage (e.g 1 or a range of stagnes e.g 1:5). default = 1:5
prefix: prefix for all output files. default = mango
outdir: The output direcoroy. default = NULL
bowtieref: genome reference file for bowtie
bedtoolsgenome: bedtools genome file
chrominclude: comma separated list of chromosomes to use (e.g. chr1,chr2,chr3,...). Only these chromosomes will be processed. If NULL all chromosomes with be processed. default = NULL
chromexclude: comma separated list of chromosomes to exclude (e.g. chrM,chrY). If NULL all chromosomes with be processed. !!chrM should always be excluded due to its extremely short length!! default = NULL
bedtoolspath: full path to bedtools (only required if not found in system PATH). default = NULL
macs2path: full path to macs2 (only required if not found in system PATH). default = NULL
bowtiepath: full path to bowtie (only required if not found in system PATH). default = NULL

STAGE 1 PARAMETERS

linkerA: linker sequence to look for. default = GTTGGATAAG
linkerB: linker sequence to look for. default = GTTGGAATGT
singlelinker: Was only a single linker used? If TRUE Mango will only look for linkerA. LinkerB will be ignored. default = FALSE
minlength: min length of reads after linker trimming. default = 15
maxlength: max length of reads after linker trimming. If libraries were generated via tagmentation this should be set to a value greater than the read length (i.e. 1000). default = 25
keepempty: Should reads with no linker be kept (TRUE or FALSE). If libraries were generated via tagmentation this should be set to TRUE. default = FALSE

STAGE 2 PARAMETERS

shortreads: should bowtie alignments be done using paramter for very short reads (~20 bp). If libraries were generated via tagmentation this should be set to FALSE. default = TRUE
threads: number of threads to be used for bowtie alignment. default = 1 (!! This option is currently disabled to due to errors. We are working on a solution !!)

STAGE 2 PARAMETERS

npets4dist: the number of PETS to use to plot PET distance distribution. default = 1000000 (use -1 for all PETS).

STAGE 4 PARAMETERS

MACS_qvalue: pvalue cutoff for peak calling in MACS2. default = 0.05
MACS_shiftsize: MACS shiftize. NULL allows MACS to determine it
peakslop: Number of basespairs to extend peaks on both sides. default = 500
peakinput: Name of user supplied peaks file. If NULL Mango will use peaks determined from MACS2 analysis. default = NULL
blacklist: BED file of regions to remove from MACS peaks
gzize: mappable genome size or effective genome size for MACS2.default = 'hs'

STAGE 5 PARAMETERS

distcutrangemin: When Mango determines the self-ligation cutoff this is the minimum distance it will consider. Changing this setting is not recommended. default = 1000
distcutrangemax: When Mango determines the self-ligation cutoff this is the maximum distance it will consider. Changing this setting is not recommended. default = 100000
biascut: Mango exlcudes very short distance PETS since they tend to arise from self-ligation of a single DNA framgent as opposed to interligation of two interacting fragments. To determine this distnce cutoff Mango determines the fraction of PETs at each distance that come from self-ligation and sets the cutoff at the point where the fraction is less than or equal to BIASCUT. default = 0.05
FDR: FDR cutoff for significant interactions. default = 0.05
numofbins: number of bins to use for binomial p-value calculations. default = 50
corrMethod: Method to use for correction of mulitply hypothesis testing. See (http://stat.ethz.ch/R-manual/R-devel/library/stats/html/p.adjust.html) for more details. default = BH
maxinteractingdist: The maximum disance (in basepairs) considered for interaction. Optimum sensitivity is generally acheived at values of 1000000-2000000. default = 1000000
extendreads: how many bp to extend reads towards peak. default = 120
minPETS: The minimum number of PETs required for an interaction (applied after FDR filtering). default = 2
reportallpairs: Should all pairs be reported or just significant pairs (TRUE or FALSE). default = FALSE

Intermediate Files

...same.fastq

These files contain fastq formated sequences after linkers have been detected and removed. Only pairs of reads with the same linker sequences on both ends of the PET are reported in these files. These are the only reads used for subsequent steps.

...chim.fastq

These files contain fastq formated sequences after linkers have been detected and removed. Only pairs of reads with the different linker sequences on both ends of the PET are reported in these files. These reads are NOT used for subsequent steps.

...bedpe

This file desribes all aligned PETs. The columns are (chromosome1, start1, end1, chromosome2, start2, end2, readname, score, strand1, strand2)

...rmdup.bedpe

This file desribes all aligned PETs after removal of duplicate PETs. The columns are (chromosome1, start1, end1, chromosome2, start2, end2, readname, score, strand1, strand2)

...tagAlign

This file desribes all reads (PETs are split into two lines in this file) in standard tagAlign format. The columns are (chromosome, start, end, readname, score, strand)

...slopPeak

This file desribes peaks after peak calling, addition of a user defined number of basepairs (peakslop), and merging of overlapping peaks. The columns are (chromosome, start, end, peakname).

Output Files

...interactions.fdr.mango

This files contains all significant interactions. The columns are (chromosome1, start1, end1, chromosome2, start2, end2, PETs supporting the interaction, the adjusted P-value of the interaction)*

...interactions.all.mango

This files contains all tested interactions and is only generated if 'reportallpairs' is TRUE. The columns are the same as those for the file above.

*More columns with column headers will be output if 'verboseoutput' is TRUE.

Name		Name	Last commit message	Last commit date
Latest commit History 143 Commits
R		R
man		man
src		src
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.rst		README.rst
argfile.txt		argfile.txt
mango.R		mango.R
mango.Rproj		mango.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Mango

Citation

Mango Installation

Features

Usage of Mango

!! Note if using tagmentation-generated libraries !!

Parameters

ALL STAGES

STAGE 1 PARAMETERS

STAGE 2 PARAMETERS

STAGE 2 PARAMETERS

STAGE 4 PARAMETERS

STAGE 5 PARAMETERS

Intermediate Files

Output Files

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

dphansti/mango

Folders and files

Latest commit

History

Repository files navigation

Mango

Citation

Mango Installation

Features

Usage of Mango

!! Note if using tagmentation-generated libraries !!

Parameters

ALL STAGES

STAGE 1 PARAMETERS

STAGE 2 PARAMETERS

STAGE 2 PARAMETERS

STAGE 4 PARAMETERS

STAGE 5 PARAMETERS

Intermediate Files

Output Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages