Skip to content

VCF Requirements

Mark Woon edited this page Oct 6, 2021 · 11 revisions

PharmCAT expects the incoming VCF files to follow the official VCF spec.

In addition, PharmCAT expects incoming VCF to have the following properties:

  1. Build version must be aligned to the GRCh38 assembly (aka b38, hg38, etc.).
  2. Any position not in the input VCF is assumed to be a "no call". Missing positions will not be interpreted as reference. You must specify all positions in the input VCF that you want to be considered.
  3. Use a parsimonious, left aligned variant representation format.
  4. Have insertions and deletions normalized to the expected representation.
  5. The CHROM field must be in the format chr##.
  6. The QUAL and FILTER columns are not interpreted. It is left to the user to remove data not meeting quality criteria before passing it to PharmCAT.
  7. Should only have data for a single sample. If it's a multi-sample VCF file, only the first sample is used.

Variant Representation Format

To avoid ambiguity in variant representation, PharmCAT is using a parsimonious, left-aligned variant representation format (as discussed in Unified Representation of Genetic Variants by Tan, Abecasis, and Kang).

Insertions & Deletions

Deletions

PharmCAT expects deletions to be represented with an "anchoring" base at the beginning of the REF sequence and then the anchoring base to also appear in the ALT sequence. For example, the following shows a deletion of AGAAATGGAA:

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	SAMPLE
chr10	94942212	.	AAGAAATGGAA	A	.	PASS	desired-deletion-format	GT	0/1

as opposed to the unwanted format:

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	SAMPLE
chr10	94942212	.	AGAAATGGAA	.	.	PASS	do-not-want	GT	0/1

If the REF is a single letter it means no variant was found, so it's safe to replace it with the appropriate nucleotide string.

Insertions

Similarly, PharmCAT expects to find insertions with a reference base REF="A" ALT="ATCT". For example, here's an insertion of A:

#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	SAMPLE
chr7	99652770	rs41303343	T	TA	.	PASS	desired-insertion-format	GT	0/1

More Information

Every PharmCAT release includes a pharmcat_positions.vcf VCF file that contains all positions of interest to PharmCAT.

For more details about fulfilling these requirements for PharmCAT read the Preparing VCF Files page.

See Preprocessing VCF Files for PharmCAT for a script to automate some of these steps.