Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How DNAScope improves variant candidate detection? #10

Open
leedchou opened this issue Aug 31, 2023 · 5 comments
Open

How DNAScope improves variant candidate detection? #10

leedchou opened this issue Aug 31, 2023 · 5 comments

Comments

@leedchou
Copy link

leedchou commented Aug 31, 2023

Hi @DonFreed

I found out that there are less FNs in VCFs called by DNAScope, which is impressive. Runing GATK haplotypecaller with sensitive setting did not performs well as DNAScope in terms of FN variants.

So can I draw such a conclusion that DNAScope is not just a re-implemention of GATK haplotypecaller with parameters set sensitively?

@DonFreed
Copy link
Contributor

We have a preprint describing DNAscope's methodology at, https://www.biorxiv.org/content/10.1101/2022.05.20.492556v1. DNAscope is very similar to Sentieon's Haplotyper (Sentieon's implementation of the GATK's HaplotypeCaller), but has three major improvements:

  • An improved approach for active region detection
  • An improved local assembler implementation
  • ML-based genotyping and variant filtering

The improved local assembler allows DNAscope to call variants that cannot be called by Haplotyper due to limitations of the local assembler. The ML-based genotyping can also rescue incorrect genotype calls from the Bayesian statistical model, which can also help reduce FNs.

Besides the assembly and genotyping improvements, DNAscope does perform a very sensitive first-pass of variant calling followed by a filtering step. This also helps to greatly reduce FNs.

Best,
Don

@leedchou
Copy link
Author

leedchou commented Sep 1, 2023

Thanks for your kindly reply.

I do have a follow-up question regarding the sevaral major improvements. Have you tested the performance of these components individually? Could you please share some of statistically results on that if possible?

Best,
Leed

@DonFreed
Copy link
Contributor

DonFreed commented Sep 6, 2023

We've studied the impact of the improved local assembler quite extensively. The new local assembler rescues some variants were not previously called due to errors in the assembly. However, the new assembler also calls some additional false-positive variants, mostly due to variants from chimeric/spurious alignments. The combined effect is that new assembler has a higher recall but lower precision (relative to the original assembler) when compared to the GIAB truthset.

Combining the improved local assembler (improving sensitivity) with the ML model for genotyping/filtering (improving precision) are both necessary for obtaining the highest variant calling accuracy.

I can share some slides with some specific variants and overall metrics relative to the GIAB truthset through email, if you'd like to reach out to me at support@sentieon.com.

@leedchou
Copy link
Author

leedchou commented Sep 8, 2023

That would be great if you could share slides about it, much appreciated!

@DonFreed
Copy link
Contributor

DonFreed commented Sep 8, 2023

Slides are sent!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants