Skip to content
/ snp Public

The EpiDiverse Bisulfite SNP-Calling and Methylation Clustering Pipeline, implemented with Nextflow

License

Notifications You must be signed in to change notification settings

EpiDiverse/snp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EpiDiverse-SNP Pipeline

Nextflow install with bioconda Docker Release Publication Twitter

EpiDiverse/snp is a bioinformatics analysis pipeline for calling single nucleotide polymorphism variants from bisulfite sequencing data and/or for clustering of eg. environmental plant samples according to their methylation profiles while masking the genomic variation.

The workflow pre-processes a collection of bam files from the EpiDiverse/WGBS pipeline using samtools, then masks genomic and/or bisulfite variation relative to the reference using custom scripts. Genomic masked alignments are then extracted into fastq format and tested for kmer diversity using kWIP for clustering groups. Bisulfite-masked alignments are taken forward for variant calling using a combination of Freebayes and post-call filtering with bcftools.

See the output documentation for more details of the results.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.

Quick Start

  1. Install nextflow

  2. Install one of docker, singularity or conda

  3. Start running your own analysis!

NXF_VER=20.07.1 nextflow run epidiverse/snp -profile <docker|singularity|conda> \
--input /path/to/wgbs/bam --reference /path/to/reference.fa

See the usage documentation for all of the available options when running the pipeline.

Test data

A minimal example dataset for testing purposes can be found in the EpiDiverse/datasets repository. You can either download the files manually and run the pipeline above as intended, or you can directly run the pipeline using the test profile option which will automatically download the data for you:

NXF_VER=20.07.1 nextflow run epidiverse/snp -profile test,<docker|singularity|conda>

Wiki Documentation

The EpiDiverse/snp pipeline is part of the EpiDiverse Toolkit, a best practice suite of tools intended for the study of Ecological Plant Epigenetics. Links to general guidelines and pipeline-specific documentation can be found below:

  1. Installation
  2. Pipeline configuration
  3. Running the pipeline
  4. Understanding the results
  5. Runtime and memory usage guidelines
  6. Troubleshooting

Credits

These scripts were originally written for use by the EpiDiverse European Training Network, by Adam Nunn (@bio15anu).

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 764965

Citation

If you use epidiverse/snp for your analysis, please cite it using the following doi:

About

The EpiDiverse Bisulfite SNP-Calling and Methylation Clustering Pipeline, implemented with Nextflow

Resources

License

Stars

Watchers

Forks

Packages

No packages published