Skip to content
Fritz Sedlazeck edited this page Jul 6, 2021 · 7 revisions

Sniffles provide two different output formats VCF and BEDPE. Here we describe the different output and information that is provided.

VCF format

The vcf format is a standard format for SNPs that got recently extended to structural variations. It follows a tab-delimited file with different information for each SVs:

chromosome The chromosome name where the SVs occurred
position Starting breakpoint position of the SVs.
ID the id of the SVs. SVs with the same ID belong together. They are indicated by _X, where X is an increasing number.
Ref The sequence of the reference if not otherwise specified only an N.
Alt Shows the type of the SVs. Sniffles report deletions (DEL), duplications (DUP), insertions (INS), inversion (INV) and translocations (TRA) as the standard types. Furthermore, we report inverted duplications (INVDUP). Sniffles report cases where it is not certain what type the SVs is e.g. DEL/INV. This field can also just be the sequence representing the insertion or deletion.
Quality This is currently not indicated.
Filter This is currently always set to be PASS or UNRESOLVED which means for insertions that the length could not have been resolved.
Info Provides a list of information (see below)
FORMAT Provides information about the next tag
Sample information Depending on the way sniffles was run: Genotype estimation: Reads supporting the reference: Reads supporting the variant.

Info field description

Sniffles report multiple information in the Info field. The entries are delimited by ;.

IMPRECISE/PRECISE Indicates the confidence of the exact breakpoint positions (bp).
CHR2= The chromosome of the second breakpoint of the SV reported.
END= The position (bp) of the second breakpoint of the SV reported.
ZMW= For PacBio based reads, shows the number of ZMW that support the SV.
SVTYPE= The type of the SV. (see Alt field above)
SUPTYPE= Indicates what evidence supports the SVs (SR: Split Reads, AL: Alignment, NR: Noisy Region).
STD_quant_start= the standard deviation of the start breakpoints.
STD_quant_stop= the standard deviation of the stop breakpoints.
RNAMES= A comma-separated list of read names that support the SV event. Controlled by -n Parameter.
SVLEN= Indicates the length of SVs.
STRANDS= Strand information at both breakpoints.
RE= Number of reads supporting the variance.
AF= Allele frequency (only if run with –genotype)

BEDPE format

BEDPE format follows a bed file format, which is also tab-delimited.

chromosome The chromosome name where the SVs occurred
start left most starting breakpoint position of the SVs
stop right most starting breakpoint position of the SVs
chromosome The chromosome name of the end breakpoint
start left most end breakpoint position of the SVs
stop right most end breakpoint position of the SVs
variant_name/ID ID of the SVs
score Score of the variant, currently not supported (-1).
strand1 Strand information of the starting breakpoint.
strand2 Strand information of the end breakpoint.
type Type of the SVS (see above at VCF for further explanation)
number_of_reads Number of reads supporting the SVs
best_chr1 Chr of start position
best_start Best estimation of Sniffles for the exact start position
best_chr2 Chr of end position
best_stop Best estimation of Sniffles for the exact stop position
predicted_length Estimation of SVs length