Epigenomics genotyping pipeline

Nextflow pipeline for genotyping from epigenomics data

Requirements

Nextflow (https://www.nextflow.io/)
samtools (http://www.htslib.org/)
bcftools (http://www.htslib.org/)
pyfaidx (https://github.com/mdshw5/pyfaidx)

Pipeline overview

Samples BAM files are merged by corresponding individual and then used for a bcftools-based genotyping pipeline. Genetic relatedness calculated using plink2.

Usage

[sabramov@dev0 ~]$ nextflow run genotyping.nf -profile Altius
[sabramov@dev0 ~]$ nextflow run clustering.nf -profile Altius

Input

Sample file [--samples_file]

A tab-delimited file containing information about each sample. The file must contain a header and the following columns (other columns are permitted and ignored):

indiv_id: Individual identifier for each sample; many samples can refer to one individual
bam_file: Absolute path the BAM-formated file

Genome reference [--genome_fasta_file]

dbSNP reference [--dbsnp_file]

Ancestral genome [--genome_ancestral_fasta_file]

Encode blacklisted regions [--encode_blacklist_regions]

Additonal Parameters:

Chunk size [--chunksize 5000000]

Specificies the size (in base-pairs) to use when dividing the genome into chunks for parallel processing.

SNP quality [--min_SNPQ 10]

Filter variants with poor quality

Genotype quality [--min_GQ 50]

Set genotype for an individual to ./. (missing) when genotyping score (FORMAT/GQ) is less than this value.

Sequencing depth [--min_DP 12]

Minimum sequencing depth per individual to call heterozygous sites.

Per-allele depth [--min_AD 4]

Minimum sequencing depth at each allele per individual to call heterozygous sites.

Hardy-Weinberg equilbrium [--hwe_cutoff 0.01]

Filter variants that are out of Hardy-Weinberg equilibrium (p-value threshold)

Output directory [--outdir output]

Specify output direectory

Output

The pipeline outputs a single VCF-formated file containing the called and filtered genotypes for each distinct invididual in the samples file. Each variant is annotated with the following extra infornation:

ID field: dbSNP rs number
INFO/CAF: 1000 genomes project allele frequency (from dbSNP annotation file)
INFO/TOPMED: TOPMED project allele frequency (from dbSNP annotation file)
INFO/AA: Inferred ancenstral allele from EPO/PECAN alignments (see "Input" for information about how this is obtained)

Name		Name	Last commit message	Last commit date
Latest commit History 503 Commits
bin		bin
LICENSE		LICENSE
README.md		README.md
clustering.nf		clustering.nf
environment.yml		environment.yml
genotyping.nf		genotyping.nf
nextflow.config		nextflow.config
params.config		params.config
post_processing.nf		post_processing.nf
scan_with_moods.nf		scan_with_moods.nf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

LICENSE

LICENSE

README.md

README.md

clustering.nf

clustering.nf

environment.yml

environment.yml

genotyping.nf

genotyping.nf

nextflow.config

nextflow.config

params.config

params.config

post_processing.nf

post_processing.nf

scan_with_moods.nf

scan_with_moods.nf

Repository files navigation

Epigenomics genotyping pipeline

Requirements

Pipeline overview

Usage

Input

Additonal Parameters:

Output

About

Releases

Packages

Contributors 2

Languages

License

vierstralab/nf-genotyping

Folders and files

Latest commit

History

Repository files navigation

Epigenomics genotyping pipeline

Requirements

Pipeline overview

Usage

Input

Additonal Parameters:

Output

About

Resources

License

Stars

Watchers

Forks

Languages