Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we use BAQ model when detect variants sites? #106

Closed
biozzq opened this issue Sep 19, 2017 · 4 comments
Closed

Should we use BAQ model when detect variants sites? #106

biozzq opened this issue Sep 19, 2017 · 4 comments

Comments

@biozzq
Copy link

biozzq commented Sep 19, 2017

Dear all,

In order to compare with GATK, I also called the variants using ANGSD with following command which can report the homozygous for alternative alleles. But I found the ANGSD have excluded many sites which can be high quality in GATK. But when I ran without BAQ model, the variants detected by ANGSD and GATK will have a high overlapping rate. So I want to know when should we run use BAQ when running ANGSD.
angsd -bam $list -only_proper_pairs 1 -uniqueOnly 1 -remove_bads 1 -minQ 20 -minMapQ 30 -C 50 -ref reference.fa -baq 1 -r $chr -out $chr -doMaf 1 -minInd $min -skipTriallelic 1 -doMajorMinor 4 -GL 1 -setMinDepth $mindepth -setMaxDepth $maxdepth -doCounts 1 -P 5 -doGlf 2 -SNP_pval 1e-6

Best
zhuqing

@claudiuskerth
Copy link

Hi Zhuqing,

I think it is advisable to apply BAQ when using ANGSD. GATK does not apply BAQ anymore because they do local re-assembly around candiate variable sites, which handles false-positive SNP's around indels better than BAQ does. Also note, that ANGSD provides two versions of BAQ: standard with baq 1 and extended with baq 2, the latter being more permissive than the former. Some more details can be found in this thread: #97.

claudius

@biozzq
Copy link
Author

biozzq commented Nov 4, 2017

Dear @claudiuskerth

Many thanks. I think the baq 1 should improve more for the accuracy of SNP discovery but with a reduced sensitivity when compared with baq 2, Is this right?

Best
Zhuqing

@claudiuskerth
Copy link

Yes, Zhuqing,I think you are right. That's what I have read in forum posts by Heng Li (the author of BAQ). BAQ 2 (or "extended BAQ") as far as I understand should be more sensitive at the cost of accuracy. So more detected SNP's but also more false positive SNP's as compared to BAQ 1. I currently use BAQ 2, because my impression has been that BAQ 1 could lead to too many false negatives. I also get a much better looking site frequency spectrum with BAQ 2 as compared to BAQ 1.

claudius

@biozzq
Copy link
Author

biozzq commented Nov 6, 2017

Dear @claudiuskerth
Thanks for your information.

Best
Zhuqing

@biozzq biozzq closed this as completed Nov 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants