-
Notifications
You must be signed in to change notification settings - Fork 11
Options
-h | --help show this help message and exit
Display help message
Required
-i, -bam ... BAM file(s)
More than one BAM file can be passed to SV2, separating by a space
$sv2 -i HG00096.bam NA12878.bam ...
SV2 can take multiple files containing SV predictions as input. BED and VCF files are supported.
-b, -bed ... BED file(s) of SVs
Multiple BED files can be passed to SV2, separating by a space.
$ sv2 -i HG00096.bam -b del.bed dup.bed
BED files are either space or tab delimited, formatted as CHROM START END SVTYPE
.
Details on the required BED format
-v, -vcf ... VCF file(s) of SVs
Multiple VCF files can be passed to SV2, separating by a space.
VCF files are tab delimited, END=
and SVTYPE=
are required in the INFO
column.
Details on the required VCF format
-snv ... SNV VCF file(s)
SNV calls are required to genotype duplications with imprecise breakpoints. For such variants genotyping considers both coverage and heterozygous allele ratio.
VCF files must be compressed with bgzip and indexed with tabix.
Multiple VCF files can be passed to SV2, separating by a space.
-p, -ped ... PED file(s)
PED format defined by plink
Multiple PED files can be passed to SV2, separating by a space.
-g, -genome STR Reference genome build [hg19, hg38, mm10]. Default: hg19
Accepted reference genome builds for SV2 are hg19 (GRCh37), hg38 (GRCh38), or mm10. Accepted command line argument strings are either hg19
, hg38
, or mm10
.
-pcrfree GC content normalization for PCRfree libraries
SV2 performs a GC content normalization for coverage estimates adapted from CNVator. Supply this flag if the samples in the sample information list were sequenced with PCRfree chemistries.
By default this flag is off and SV2 assumes samples were sequenced with PCR protocols.
-M bwa mem -M compatibility. Split-reads flagged as secondary instead of supplementary
SV2 can accommodate legacy alignments with chimeric reads flagged as secondary. Pass the -M
flag if samples in the sample information file were aligned with bwa mem -M
.
By default SV2 assumes chimeric reads are flagged as supplementary (-M
is off).
-merge Merge SV after genotyping
SV2 can merge breakpoints that are reciprocally overlapping by 80% (by default). This step is done recursively until no more SVs can be merged. The SV position with the maximum ALT genotype likelihood is retained.
By default SV2 does not merge breakpoints.
-min-ovr FLOAT Minimum reciprocal overlap for merging SVs [0.8]
Users can define the minimum reciprocal overlap required for merging SVs after genotyping. The -merge
flag is not required if -min-ovr
option is passed.
-no-anno Genotype without annotating
Skip variant annotation with the -no-anno
flag. By default, SV2 will annotate each variant.
-pre PATH Preprocessing output directory. Skips preprocessing
Users can skip preprocessing by passing the path of the sv2_preprocessing/
directory to the -pre
argument. Doing this will instruct SV2 to load the values in sv2_preprocessing/
skipping this step.
Skipping preprocessing is useful when genotyping a different set of variants. Example
-feats PATH Feature output directory. Skips feature extraction
Passing the path of the sv2_features/
to the -feats
argument will skip this step.
Skipping feature extraction does not require BAM or SNV files. Additionally, multiple samples can be passed to square off a genotype matrix.
Skipping feature extraction example
-load-clf PATH Add custom classifiers. `-load-clf <clf.JSON>`
SV2 can incorporate new classifiers for genotyping. Packaged with SV2 is a guide on training new classifiers. The output of this guide is a JSON file containing paths to the new classifier.
Pass the JSON file to the -load-clf
argument to add more classifiers to SV2. Details for training
-clf STR Specify classifiers for genotyping [default]
After loading a new classifier, specify the name of the classifier in the -clf
argument to genotype variants with that classifier. The original classifier from SV2 is named default
, and this is the default classifier.
Download the required resource files
$ sv2 -download
Follow the instructions when prompted. You will be asked to download a zipped file ~250MB in size. This contains documents SV2 uses for filtering and annotation. The default install location is the SV2 install location.
Before genotyping, users have to supply the full path to FASTA files for SV2. At least one FASTA file is required for SV2 to run. Configuration needs only to be executed once or updated if the FASTA paths change.
-hg19 PATH hg19 FASTA file
-hg19
takes the full path to a faidx indexed FASTA file for the hg19 (GRCh37) reference build.
-hg38 PATH hg38 FASTA file
-hg38
takes the full path to a faidx indexed FASTA file for the hg38 (GRCh38) reference build.
-mm10 PATH mm10 FASTA file
-mm10
takes the full path to a faidx indexed FASTA file for the mm10 reference build.
-L, -log PATH log file for standard error messages
Error messages and warnings are printed to a log file. The default log file outputs to $WORKING_DIR/sv2.err
-T, -tmp-dir PATH directory for temporary files
SV2 generates temporary files that are placed by default in $WORKING_DIR/sv2_tmp/
-s, -seed INT Random seed for genome shuffling in preprocessing [42]
During preprocessing, SV2 randomly selects reads from each chromosome to generate basic alignment statistics. The random seed is set at 42.
-O, -odir PATH output path, location for sv2 output directories [default: working directory]
Path to SV2 output directories. Default is current working directory.
-o, -out STR Output name
Prefix for the output files in sv2_genotypes/