-
Notifications
You must be signed in to change notification settings - Fork 23
parallel slivar
On whole genome cohorts of many families or trios, slivar expr
can take some time to run. To speed the (iterative) analysis of large and small cohorts, we provide pslivar
which runs slivar expr
in parallel across regions of the genome. Using this, we can do the rare-disease pipeline in ~2 minutes for a VCF with 150 exome trios and about 30 minutes for 150 WGS trios using 32 CPUs.
To run pslivar
, a user should first get a slivar expr
command that runs without error. Then converting a slivar
command to pslivar
is as simple
as changing slivar expr
to pslivar
, adding --fasta $reference
, and capturing the VCF output to STDOUT. ($reference
is the fasta sequence associate with the genome build used for aligning and calling variants in the cohort.) By default pslivar
will use all available cores. This can be adjusted by adding, for example: --processes 12
.
Here is a slivar command:
slivar expr --vcf vcfs/$cohort.annotated.bcf --ped data-links/$cohort.ped \
--exclude /uufs/chpc.utah.edu/common/HIPAA/u6000771/Data/LCR-hs38.bed.gz \
--pass-only \
--js $js \
--trio "denovo:denovo(kid, mom, dad) && INFO.gnomad_popmax_af < 0.001" \
-o vcfs/$cohort$name.vcf
and the corresponding pslivar
pslivar expr --vcf vcfs/$cohort.annotated.bcf --ped data-links/$cohort.ped \
--exclude /uufs/chpc.utah.edu/common/HIPAA/u6000771/Data/LCR-hs38.bed.gz \
--pass-only \
--js $js \
--trio "denovo:denovo(kid, mom, dad) && INFO.gnomad_popmax_af < 0.001" \
--fasta $reference_fasta \ # NOTE: THIS IS ADDED
> vcfs/$cohort$name.vcf # NOTE: this is changed to `>` from `-o` and can be piped to bgzip.
NOTE that this only works for slivar expr
and not for other commands like compound-hets
.