Skip to content

brwnj/smoove-nf

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 

smoove-nf

Nextflow implementation of the smoove toolset (and some others) focused on reliably calling SVs in your data.

The workflow

The workflow consists of a number of steps, each generally outputing to unique result directories.

Call genotypes

smoove call is run on individual bam or cram alignment files. Output is written to $outdir/smoove-called and includes $sample-smoove.genotyped.vcf.gz and an index.

Merge genotypes

Next, we collect all SVs across samples into a single, merged (union) VCF using smoove merge. Results are written to $outdir/smoove-merged and include the file $project.sites.vcf.gz.

Genotype all samples

Using the union of SVs across all samples, we genotype each sample at those sites using smoove genotype with duphold for depth annotations. Output is written to $outdir/smoove-genotyped/$sample-smoove.genotyped.vcf.gz.

Square and annotate VCF

Take all single sample genotyped VCFs and paste into a single, square, joint-called file using smoove paste. Then annotate the variants using the annotation supplied from --gff with smoove annotate. Results are written to:

  • $outdir/smoove-squared/$project.smoove.square.anno.vcf.gz
    • Annotated and indexed VCF for all SVs across all samples.
  • $outdir/bpbio/svvcf.html
    • A report of SV counts per sample by SV type.

Coverage profiling

Using indexcov, estimate coverage across the genome per sample and perform coverage-based quality control. The full report output of goleft indexcov is written to $outdir/indexcov. Its report is written to $outdir/indexcov/index.html.

Workflow report

Logs and output of various steps are aggregated and summarized into one report written to $outdir/smoove-nf.html.

Cumulative chromosome coverage is available in $outdir/covviz_report.html.

Usage

A Docker container is maintained in parallel with this workflow (https://hub.docker.com/r/brentp/smoove) and will be pulled by Nextflow before data processing begins. There's no need to download and install dependencies outside of Docker or Singularity and Nextflow.

nextflow run brwnj/smoove-nf -latest [nextflow options] [smoove-nf options]

Running this using provided containers can be accomplished using the docker profile:

nextflow run brwnj/smoove-nf -latest -profile docker [nextflow options] [smoove-nf options]

Required parameters

  • --bams
    • Aligned sequences in .bam and/or .cram format. Indexes (.bai/.crai) must be present.
    • Use wildcards like 'SRP1234/alignments/*.cram' to specify your alignment files.

Note: Nextflow will handle wildcard expansion in this case, so it's critical we quote we the string like:

nextflow run brwnj/smoove-nf -latest \
	--bams '~/SRP1234/alignments/*.cram'
  • --fasta
    • File path to reference fasta. Index (.fai) must be present.
    • GRCh38 is available at: ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/GRCh38_reference_genome
  • --gff
    • Annotation GFF used to annotate variants.
    • GRCh38 reference is available at: ftp://ftp.ensembl.org/pub/release-95/gff3/homo_sapiens/Homo_sapiens.GRCh38.95.chr.gff3.gz

Optional parameters

  • --outdir
    • The base results directory for output
    • default: './results'
  • --bed
    • File path to bed of exclude regions for smoove call.
    • Exclude regions for b37 and GRCh38 are made available by the Hall lab under speedseq.
  • --exclude
    • regular expression of chromosomes to skip
    • You should escape '$', e.g. "~random\$,~_alt\$"
    • default: "~^HLA,~^hs,~:,~^GL,~M,~EBV,~^NC,~^phix,~decoy,~random\$,~Un,~hap,~_alt\$"
  • --project
    • Acts as the file prefix for merged and squared sites
    • default: 'sites'
  • --sexchroms
    • Comma delimited names of the sex chromosome(s) used to infer sex, e.g. --sexchroms 'chrX,chrY'
    • default: 'X,Y'
  • --sensitive
    • Preserves more variants from being filtered throughout the workflow
    • default: false

covviz params

  • --zthreshold
    • a sample must greater than this many standard deviations in order to be found significant
    • default: 3.5
  • --distancethreshold
    • consecutive significant points must span this distance in order to pass this filter
    • default: 150000
  • --slop
    • leading and trailing segments added to significant regions to make them more visible
    • default: 500000
  • --minsamples
    • Show all traces when analyzing this few samples; ignores z-threshold, distance-threshold, and slop
    • default: 8

somalier params

  • --knownsites
  • --ped
    • optional, but required in order to run somalier relate and generate somalier's HTML report
    • sample relationship definitions
    • default: false

Updating

To pull changes to made to the workflow and ensure you're running the latest version, use:

nextflow pull brwnj/smoove-nf

That will either pull any changes or confirm you're at the latest version.

Alternatively, when you run the workflow simply use:

nextflow run brwnj/smoove-nf -latest

About

Nextflow implementation of the smoove workflow and other tools for SV calling and QC

Resources

License

Stars

Watchers

Forks

Packages

No packages published