Very high false positive rate #68

dancooke · 2018-07-02T12:37:42Z

Hi,

I'm currently evaluating VarDict on a synthetic tumour dataset similar to the ICGC DREAM set. I'm using VarDict version 1.5.2 with the following commands:

Call variants:

$ VarDict \
    -G hs37d5.fa \
    -f 0.01 \
    -N NA12878.TUMOUR \
    -b "NA12878.TUMOUR.60x.skin.bwa-mem.b37.bam|NA12878.NORMAL.30x.bwa.b37.bam" \
    -z 0 -c 1 -S 2 -E 3 \
    hs37d5.chromosomes.bed \
    | testsomatic.R \
    | var2vcf_paired.pl \
      -N "NA12878.TUMOUR|NA12878.NORMAL" \
      -f 0.01 \
      | bgzip \
      > vardict.NA12878.syntumour.skin.b37.bwa.vcf.gz
$ tabix vardict.NA12878.syntumour.skin.b37.bwa.vcf.gz

Filter somatics:

$ bcftools view \
    -i 'INFO/STATUS="StrongSomatic"' \
    -f PASS \
    -Oz -o vardict.NA12878.syntumour.skin.b37.bwa.somatic.PASS.vcf.gz \
    vardict.NA12878.syntumour.skin.b37.bwa.vcf.gz
$ tabix vardict.NA12878.syntumour.skin.b37.bwa.somatic.PASS.vcf.gz

Although VarDict achieves high sensitivity, the false positive rate is very high (FDR 0.25 compared with < 0.02 for all other callers tested). Am I doing something wrong?

Thanks,
Dan

chapmanb · 2018-07-11T15:03:11Z

Dan;
Thanks much for starting this discussion. More validation sets are always welcome and it's great you're working on this. Are you able to share the truth sets and outcomes you're seeing? We're actively working on benchmarking VarDict on harder inputs (tumor-only, FFPE, low frequency) with truth sets:

https://github.com/bcbio/bcbio_validation_workflows#somatic-low-frequency-variants

and here are the current in progress validations:

https://github.com/bcbio/bcbio_validations/tree/master/somatic-lowfreq

It would be useful to see how your results compare.

Practically we do apply additional filters after calling which help improve specificity. The gory details are here:

https://github.com/bcbio/bcbio-nextgen/blob/ed98597efe7d18ff684a8ec64bd45bd39b647bfd/bcbio/variation/vardict.py#L270

and we wrote up some details about the filters here:

http://bcb.io/2016/04/04/vardict-filtering/

Looking forward to coordinating more to help improve your specificity.

dancooke · 2018-07-11T16:00:18Z

Thanks Brad, I'll have a go with these additional filters. I'm not able to release our validation set yet. Hopefully this will be forthcoming soon, along with the Octopus paper.

chapmanb · 2018-07-11T17:09:46Z

Dan -- sounds good, please let me know if you run into any problems. I know it is not super generalized to run outside of bcbio.

Is it worth running octopus on the tumor-only low frequency samples I mentioned above to provide a baseline comparison? The docs mention tumor/normal as preferred but not sure if you have a reasonable expectation of decent calls for tumor-only.

Looking forward to the octopus paper and working more on the validation set you're using when it's available.

dancooke · 2018-07-11T17:36:39Z

@chapmanb I'm considering looking at tumour-only however tumour-normal is definitely the priority. The main issue I have with tumour-only is to what extent somatic status is relevant. From the little testing I've done with tumour-only, octopus will happily call somatic variants, but often misclassifies them as germline (especially if the VAF isn't far from 50%). I wouldn't be surprised if this is the case for any tumour-only caller. Now the question is how important this actually is; from a clinical perspective it may not be relevant at all. How to deal with this when composing a validation framework is not obvious to me.

chapmanb · 2018-07-12T15:05:53Z

Dan -- thanks for this. Misclassifying germline as somatic is definitely a problem across callers, and is a limitation of not having the normal for effectively doing this. I'll definitely work on including octopus in the comparisons as an additional data point you can use for octopus versus VarDict in terms of specificity (at least for these harder cases). Thanks again.

chapmanb · 2018-07-18T12:29:30Z

Dan;
Thanks again for the suggestions about Octopus tumor-only. To provide an additional data point for this discussion with the 0.5% validations we've been doing, I included octopus in bcbio and added in to the comparisons. Unfortunately I'm not seeing much detection at this low frequency:

https://github.com/bcbio/bcbio_validations/tree/master/somatic-lowfreq#low-frequency-umi-tagged-tumor-only-samples

so I probably need to tweak additional parameters for this type of high depth low frequency detection. Here's the implementation right now, which is pretty vanilla other than changing --min-credible-somatic-frequency:

https://github.com/bcbio/bcbio-nextgen/blob/master/bcbio/variation/octopus.py#L68

I'd be happy to tweak and improve to get octopus running on par with the other callers and provide a fair comparison. Thanks again for all this helpful discussion.

dancooke · 2018-07-18T16:11:48Z

Thanks for the feed back Brad. I've opened an issue on Octopus' Github page regarding this, perhaps we should continue this discussion there?

dancooke mentioned this issue Jul 18, 2018

Poor sensitivity for somatic mutations with VAF below 1% luntergroup/octopus#29

Closed

ahwanpandey mentioned this issue Nov 22, 2018

VarDictJava 1.5.5 and 1.5.6: testsomatic.R (line X did not have 55 elements) AstraZeneca-NGS/VarDictJava#151

Closed

dancooke mentioned this issue Aug 24, 2019

I have true sets, which vardict call high FP in tumor-only luntergroup/octopus#76

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very high false positive rate #68

Very high false positive rate #68

dancooke commented Jul 2, 2018 •

edited

Loading

chapmanb commented Jul 11, 2018 •

edited

Loading

dancooke commented Jul 11, 2018

chapmanb commented Jul 11, 2018

dancooke commented Jul 11, 2018

chapmanb commented Jul 12, 2018

chapmanb commented Jul 18, 2018

dancooke commented Jul 18, 2018

Very high false positive rate #68

Very high false positive rate #68

Comments

dancooke commented Jul 2, 2018 • edited Loading

chapmanb commented Jul 11, 2018 • edited Loading

dancooke commented Jul 11, 2018

chapmanb commented Jul 11, 2018

dancooke commented Jul 11, 2018

chapmanb commented Jul 12, 2018

chapmanb commented Jul 18, 2018

dancooke commented Jul 18, 2018

dancooke commented Jul 2, 2018 •

edited

Loading

chapmanb commented Jul 11, 2018 •

edited

Loading