Skip to content

Amplicon SARS CoV 2 genotyping

Hannes Pétur Eggertsson edited this page Apr 21, 2020 · 2 revisions

Here I list my current suggested pipeline on how to use graphtyper to genotype SARS-CoV-2 with amplicon sequence data.

Pipeline

Get the latest graphtyper binary (v2.3+ is required) and run:

./graphtyper genotype ${COVID_REF} --sams=${BAMLIST} --region=NC_045512.2 --advanced --no_filter_on_read_bias --no_filter_on_strand_bias --no_filter_on_coverage --impurity_threshold=1.0 --primer_bedpe=${PRIMERS.BEDPE} --genotype_aln_min_support_ratio=0.6 --genotype_aln_min_support=4 --is_discovery_only_for_paired_reads --no_filter_on_begin_pos --is_only_cigar_discovery --no_asterisks --is_all_biallelic

where ${COVID_REF} is a FASTA reference genome (I use https://www.ncbi.nlm.nih.gov/assembly/GCF_009858895.2/ ), ${BAMLIST} is a one per line list of SAM/BAM/CRAM files to genotype, and ${PRIMERS.BEDPE} is a BEDPE formatted file containing all pairs of primers used when sequencing.

Additionally, if you know of a certain set of variants that aren't discovered with graphtyper you may try adding them as a prior using "--prior_vcf=prior.vcf.gz" (must be bgzipped and tabix indexed).

Limitations

  • primers are assumed to match the reference sequence you use.
  • graphtyper makes diploid calls in the GT and PL fields but you can use the AD field with the threshold you want to get haploid calls.