PanVariants Usage Documentation

Overview

PanVariants is a highly accurate variant detection pipeline that integrates pangenome references and artificial intelligence (AI) model to identify single nucleotide variants (SNVs), INDELs (<50 bp), copy number variants (CNVs), short tandem repeats (STRs), and other structural variants (SVs, ≥50 bp). It supports execution on both local machines and Sun Grid Engine (SGE) clusters.

Quick Start

Simple Run Command

python3 PanVariant.py --samplelist samplelist --executor local

Install

git clone https://github.com/MGI-EU/PanVariants.git

Database download

Here we take the download of databases related to the GRCh38 reference genome as an example; the download process for other reference genomes is similar.

sh PanVariants_required_files.sh

singularity .sif files download

Download all .sif files from this URL: https://zenodo.org/records/19848582 and place them in the "sifs" folder.

Required Software

Software	Version	Installation
Nextflow	23.10.0	Download from: `https://github.com/nextflow-io/nextflow/releases/download/v23.10.0/nextflow-23.10.0-all` Rename to `nextflow` and copy to software directory
Java	≥ 17	Required for Nextflow execution
Singularity	≥ 3.8	Required for containerized execution
Python	≥ 3.9.13	Required for running the Python wrapper script

Optional Software

Software	Description
SGE (Sun Grid Engine)	Optional cluster management system for distributed computing (required only for `blc` executor)

Arguments Description

Required Arguments

Argument	Short	Description
`--samplelist`	`-s`	Sample information file. Must contain a single line with 3 tab-separated columns:`sample_name` `read1_path` `read2_path`
`--executor`	`-ex`	Execution engine options:- `blc`: Run pipeline using a Sun Grid Engine cluster- `local`: Run pipeline using the local machine

Optional Arguments

Argument	Short	Default	Description
`--output`	`-o`	`result`	Output directory path
`--reference`	`-ref`	`GRCh38`	Reference genome version. Options: `GRCh38` or `CHM13`
`--run_cnv`	`-rc`	`yes`	Enable CNV detection (`yes` / `no`)
`--run_sv`	`-rs`	`yes`	Enable SV detection (`yes` / `no`)
`--run_str`	`-rstr`	`yes`	Enable STR detection (`yes` / `no`)
`--mark_dup`	`-md`	`no`	Mark duplicate reads (`yes` / `no`)
`--fq_filter`	`-ft`	`no`	Run FastQ filtering (`yes` / `no`)
`--split_bam`	`-sb`	`no`	Split BAM files by chromosome and run DeepVariant per chromosome (`yes` / `no`)
`--adapter1`	`-ad1`	`AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA`	Adapter 1 sequence
`--adapter2`	`-ad2`	`AAGTCGGATCGTAGCCATGTCGTTCTGTGAGCCAAGGAGTTG`	Adapter 2 sequence
`--re_alignment`	`-ra`	`no`	Perform re-alignment (`yes` / `no`)
`--nf_config`	`-c`	-	Path to Nextflow configuration file
`--db`	`-d`	-	Database path
`--script`	`-sp`	-	Script path
`--sif`	`-f`	-	Path to Singularity container image (`.sif`)
`--scratch_tmp`	`-scr`	-	Path for Nextflow scratch directive
`--queue`	`-q`	-	Queue name for the `qsub -q` parameter. Note: Only effective when `--executor` is set to `blc`.
`--project`	`-pj`	-	Project name for the `qsub -P` parameter. Note: Only effective when `--executor` is set to `blc`.
`--ansi_log`	`-an`	`false`	Enable ANSI logging for Nextflow (`true` / `false`)
`--help`	`-h`	-	Show help message and exit

Example Commands

Run Locally with All Variant Types Enabled

python3 PanVariant.py \
  --samplelist samples.tsv \
  --executor local \
  --output results \
  --reference GRCh38 \
  --run_cnv yes \
  --run_sv yes \
  --run_str yes

Run on SGE Cluster with Specific Queue and Project

python3 PanVariant.py \
  --samplelist samples.tsv \
  --executor blc \
  --queue high_mem \
  --project my_project \
  --output cluster_results

Notes

The --queue and --project parameters are only valid when --executor is set to blc.
The sample list file must strictly follow the format: three tab-separated columns with no header row.
If using Singularity containers, ensure the correct .sif file path is provided via -f.
Duplicate marking and FastQ filtering are disabled by default. Set them explicitly to yes if required.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
database/STR/RepeatCatalogs/hg38		database/STR/RepeatCatalogs/hg38
tasks		tasks
LICENSE		LICENSE
PanVariant.config		PanVariant.config
PanVariant.py		PanVariant.py
PanVariants_required_files.sh		PanVariants_required_files.sh
README.md		README.md
main.nf		main.nf
runit		runit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PanVariants Usage Documentation

Overview

Quick Start

Simple Run Command

Install

Database download

singularity .sif files download

Required Software

Optional Software

Arguments Description

Required Arguments

Optional Arguments

Example Commands

Run Locally with All Variant Types Enabled

Run on SGE Cluster with Specific Queue and Project

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PanVariants Usage Documentation

Overview

Quick Start

Simple Run Command

Install

Database download

singularity .sif files download

Required Software

Optional Software

Arguments Description

Required Arguments

Optional Arguments

Example Commands

Run Locally with All Variant Types Enabled

Run on SGE Cluster with Specific Queue and Project

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages