smoove-nf

Nextflow implementation of the smoove toolset (and some others) focused on reliably calling SVs in your data.

The workflow

The workflow consists of a number of steps, each generally outputing to unique result directories.

Call genotypes

smoove call is run on individual bam or cram alignment files. Output is written to $outdir/smoove-called and includes $sample-smoove.genotyped.vcf.gz and an index.

Merge genotypes

Next, we collect all SVs across samples into a single, merged (union) VCF using smoove merge. Results are written to $outdir/smoove-merged and include the file $project.sites.vcf.gz.

Genotype all samples

Using the union of SVs across all samples, we genotype each sample at those sites using smoove genotype with duphold for depth annotations. Output is written to $outdir/smoove-genotyped/$sample-smoove.genotyped.vcf.gz.

Square and annotate VCF

Take all single sample genotyped VCFs and paste into a single, square, joint-called file using smoove paste. Then annotate the variants using the annotation supplied from --gff with smoove annotate. Results are written to:

$outdir/smoove-squared/$project.smoove.square.anno.vcf.gz
- Annotated and indexed VCF for all SVs across all samples.
$outdir/bpbio/svvcf.html
- A report of SV counts per sample by SV type.

Coverage profiling

Using indexcov, estimate coverage across the genome per sample and perform coverage-based quality control. The full report output of goleft indexcov is written to $outdir/indexcov. Its report is written to $outdir/indexcov/index.html.

Workflow report

Logs and output of various steps are aggregated and summarized into one report written to $outdir/smoove-nf.html.

Cumulative chromosome coverage is available in $outdir/covviz_report.html.

Usage

A Docker container is maintained in parallel with this workflow (https://hub.docker.com/r/brentp/smoove) and will be pulled by Nextflow before data processing begins. There's no need to download and install dependencies outside of Docker or Singularity and Nextflow.

nextflow run brwnj/smoove-nf -latest [nextflow options] [smoove-nf options]

Running this using provided containers can be accomplished using the docker profile:

nextflow run brwnj/smoove-nf -latest -profile docker [nextflow options] [smoove-nf options]

Required parameters

--bams
- Aligned sequences in .bam and/or .cram format. Indexes (.bai/.crai) must be present.
- Use wildcards like 'SRP1234/alignments/*.cram' to specify your alignment files.

Note: Nextflow will handle wildcard expansion in this case, so it's critical we quote we the string like:

nextflow run brwnj/smoove-nf -latest \
	--bams '~/SRP1234/alignments/*.cram'

--fasta
- File path to reference fasta. Index (.fai) must be present.
- GRCh38 is available at: ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/GRCh38_reference_genome
--gff
- Annotation GFF used to annotate variants.
- GRCh38 reference is available at: ftp://ftp.ensembl.org/pub/release-95/gff3/homo_sapiens/Homo_sapiens.GRCh38.95.chr.gff3.gz

Optional parameters

--outdir
- The base results directory for output
- default: './results'
--bed
- File path to bed of exclude regions for smoove call.
- Exclude regions for b37 and GRCh38 are made available by the Hall lab under speedseq.
--exclude
- regular expression of chromosomes to skip
- You should escape '$', e.g. "~random\$,~_alt\$"
- default: "~^HLA,~^hs,~:,~^GL,~M,~EBV,~^NC,~^phix,~decoy,~random\$,~Un,~hap,~_alt\$"
--project
- Acts as the file prefix for merged and squared sites
- default: 'sites'
--sexchroms
- Comma delimited names of the sex chromosome(s) used to infer sex, e.g. --sexchroms 'chrX,chrY'
- default: 'X,Y'
--sensitive
- Preserves more variants from being filtered throughout the workflow
- default: false

covviz params

--zthreshold
- a sample must greater than this many standard deviations in order to be found significant
- default: 3.5
--distancethreshold
- consecutive significant points must span this distance in order to pass this filter
- default: 150000
--slop
- leading and trailing segments added to significant regions to make them more visible
- default: 500000
--minsamples
- Show all traces when analyzing this few samples; ignores z-threshold, distance-threshold, and slop
- default: 8

somalier params

--knownsites
- optional, but required in order to run somalier quality control
- VCF of known polymorphic sites -- download links can be found at https://github.com/brentp/somalier/releases, but any set of common variants will work
- default: false
--ped
- optional, but required in order to run somalier relate and generate somalier's HTML report
- sample relationship definitions
- default: false

Updating

To pull changes to made to the workflow and ensure you're running the latest version, use:

nextflow pull brwnj/smoove-nf

That will either pull any changes or confirm you're at the latest version.

Alternatively, when you run the workflow simply use:

nextflow run brwnj/smoove-nf -latest

Name		Name	Last commit message	Last commit date
Latest commit History 214 Commits
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

smoove-nf

The workflow

Call genotypes

Merge genotypes

Genotype all samples

Square and annotate VCF

Coverage profiling

Workflow report

Usage

Required parameters

Optional parameters

covviz params

somalier params

Updating

About

Releases 3

Packages

Languages

License

brwnj/smoove-nf

Folders and files

Latest commit

History

Repository files navigation

smoove-nf

The workflow

Call genotypes

Merge genotypes

Genotype all samples

Square and annotate VCF

Coverage profiling

Workflow report

Usage

Required parameters

Optional parameters

covviz params

somalier params

Updating

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages