PipeVar is a pathogenic variant prioritization workflow for undiagnosed, rare diseases. It utilizes various tools developed from WGLab and other softwares to call structural variants, singule-nucleotide variants, indels and repeat expansions, and prioritize potential pathogenic variants and existing pathogenic variants as well.
PipeVar is implenmetned in Nextflow, and can be ran using Docker or Singularity. We are in the development of utilizing Conda for running, but for the best consistency, use either Docker or Singularity for running PipeVar. We currently only have support for Slurm, but are working on other cluster system as well.

PipeVar requires either Docker or Singularity to run. If your system do not have Singularity installed as a module, you can try to install Singularity using conda with
conda create -n singularity singularity
or install singualrity in your conda environment by
conda install conda-forge::singularity
To use PipeVar, there is a set up stage required for two softwares. The first one is ANNOVAR.
To download ANNOVAR, go to this link https://www.openbioinformatics.org/annovar/annovar_download_form.php
Once ANNOVAR is downloaded, run setup.sh in ANNOVAR folder to download the necessary files for ANNOVAR.
setup.sh will also download necessarily PhenoSV model files as well.
After downloading PhenoSV model file, run this setup_config.sh file in your folder. It will modify nextflow.config to contain your required phenosv and download necessary datasets to run ANNOVAR.
nextflow run main.nf --bam nextflow run main.nf --vcf
REQUIRED PARAMETERS:
--bam Path to input BAM file. Cannot be used with VCF option. Must be full path. Requires .bai index file.
--vcf Path to input VCF file. Cannot be used with BAM option. Must be full path.
--ref_fa Reference genome in FASTA format. Must be full path.
--out_prefix Prefix for output files
--note Clinical note text file, in a format of VCF. used for HPO term extraction. Only neded if HPO terms are not available.
--hpo HPO ID file; note file can be used instead.
HP:0001250
HP:0000750
HP:0001257
OPTIONAL PARAMETERS: --input_directory
Directory containing input files --output_directory Path to output directory (default: current dir) --mode <sv|snp> Variant type to analyze (required with --vcf or --bam) --type <short|long> Input data type: short or long reads (required with --bam) --light <yes|no> Use lightweight PhenoSV model and NanoCaller (faster, lower memory) --gq Minimum genotype quality [default: 20] --ad Minimum allelic depth [default: 15] --gnomad Max gnomAD allele frequency [default: 0.0001] --help Print this help message and exitEXAMPLES:
1. Long-read full pipeline (SV + SNP + STR):
nextflow run main.nf \
--bam /data/sample.bam \
--ref_fa /refs/hg38.fa \
--out_prefix patient1 \
--hpo /data/hpo.txt \
--type long
2. Short-read full pipeline:
nextflow run main.nf \
--bam /data/sample.bam \
--ref_fa /refs/hg38.fa \
--out_prefix patient1 \
--hpo /data/hpo.txt \
--type short
3. Short-read with lightweight model:
nextflow run main.nf \
--bam /data/sample.bam \
--ref_fa /refs/hg38.fa \
--out_prefix patient1 \
--hpo /data/hpo.txt \
--type short \
--light yes
4. Variant re-annotation using VCF (SV mode):
nextflow run main.nf \
--vcf /data/sample.vcf \
--ref_fa /refs/hg38.fa \
--out_prefix patient_sv \
--hpo /data/hpo.txt \
--mode sv
5. Auto-extract HPO from clinical notes:
nextflow run main.nf \
--bam /data/sample.bam \
--ref_fa /refs/hg38.fa \
--out_prefix patient1 \
--note /data/note.txt \
--type ont
NOTES:
- At least one of --hpo
or --note
must be provided.
- If --note
is used, --hpo
is auto-generated via phenotagger.
- --mode
must be specified for VCF input, and helps direct SNV vs SV flow.
- --type
is required for BAM input to specify sequencing technology.
- All file paths must be absolute or relative to --input_directory
.
- --light yes
uses faster, resource-friendly software such as haplotypecaller, NanoCaller and PhenoSV-light.
PIPELINE MODULES:
SNV CALLING
- clair3 : Deep learning SNP caller (long-read)
- nanocaller : Lightweight long-read SNP caller
- haplotypecaller: Short-read SNP caller (GATK)
- deepvariant : Deep learning short-read SNP caller
SV CALLING
- cuteSV : Long-read SV caller
- sniffles : Long-read SV caller
- Manta : Short-read SV caller
- truvari : SV comparison/benchmarking
- SURVIVOR : SV merging
STR DETECTION
- NanoRepeat : Long-read STR caller
- ExpansionHunter: Short-read STR detection
PHENOTYPING
- Phen2gene : HPO-to-gene mapping
- phenotagger : NLP-based clinical note to HPO term conversion
- PhenoGpt2 : To be implemented
VARIANT RANKING
- ANNOVAR : SNV/SV annotation
- RankVar : Final SNV ranking
- Rankscore_analysis: Additional ranking analysis
All the output will be stored in output folder, or the launch folder. The list of outputs are as followed:
Add LongPhase process, and ACMG Guideline, and PhenoGPT2 for note direction.