Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcftools norm -m +any gives incorrect AN value when SNP and INDEL entries are merged #2137

Open
Fan-iX opened this issue Mar 24, 2024 · 0 comments

Comments

@Fan-iX
Copy link

Fan-iX commented Mar 24, 2024

When I join SNP and INDEL entries using bcftools norm -m +any, one of the AN ("Total number of alleles in called genotypes") value is discard.

Here is a reproducible example:

1.vcf (merged from two vcf files using bcftools merge --no-index part1.vcf part2.vcf, see below)

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##contig=<ID=Chr1,length=100>
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##bcftools_mergeVersion=1.19+htslib-1.19
##bcftools_mergeCommand=merge --no-index part1.vcf part2.vcf; Date=Sun Mar 24 15:29:30 2024
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	A001	A002	A003	A004	A005	A006	A007	A008	A009	A010	A011	A012	A013	C001	C002	C003	C004	C005	C006	C007	C008	C009	C010	C011	C012	C013	C014	C015	C016	C017	C018	C019	C020	C021	C022	C023	C024	C025	C026	C027	C028	C029	C030	C031	C032	C033	C034	C035	C036	C037	C038	C039	C040	C041	C042	C043	C044	C045	C046	C047	C048	C049	C050	C051	C052	C053	C054	C055	C056	C057	C058	C059	C060
Chr1	1	.	T	A	228.246	PASS	AN=24;AC=8	GT:DP	1/1:20	0/0:2	0/1:8	1/1:6	./.:0	0/0:8	0/0:2	0/0:5	0/0:1	0/0:1	1/1:17	0/0:2	0/1:13	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.
Chr1	1	.	T	TAAAAA,TAAA,TAA,TAAAA,TA	228.401	PASS	INDEL;AN=120;AC=28,43,10,8,30	GT:DP	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	./.:.	1/1:11	2/3:16	4/4:11	4/1:11	2/2:18	2/5:29	5/5:23	1/1:35	2/2:36	0/3:14	3/5:19	3/2:15	2/2:18	3/2:14	2/2:17	2/2:9	1/2:16	5/1:15	5/2:17	1/1:11	2/1:9	5/5:5	5/5:44	1/5:21	2/2:18	5/3:19	1/1:19	5/1:19	2/1:48	2/5:31	1/5:23	2/2:12	2/4:11	1/1:20	2/4:10	1/2:9	1/2:14	1/1:17	5/2:10	3/2:12	2/2:16	1/2:14	1/5:55	1/2:41	5/3:47	1/4:39	5/5:13	5/2:11	5/2:37	3/5:43	2/1:27	5/5:30	4/2:30	5/5:35	4/2:12	2/1:10	5/5:13	2/3:23	5/2:14	2/2:12

After bcftools norm -m +any 1.vcf

...
##bcftools_normVersion=1.19+htslib-1.19
##bcftools_normCommand=norm -m +any c.vcf; Date=Sun Mar 24 15:32:45 2024
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	A001	A002	A003	A004	A005	A006	A007	A008	A009	A010	A011	A012	A013	C001	C002	C003	C004	C005	C006	C007	C008	C009	C010	C011	C012	C013	C014	C015	C016	C017	C018	C019	C020	C021	C022	C023	C024	C025	C026	C027	C028	C029	C030	C031	C032	C033	C034	C035	C036	C037	C038	C039	C040	C041	C042	C043	C044	C045	C046	C047	C048	C049	C050	C051	C052	C053	C054	C055	C056	C057	C058	C059	C060
Chr1	1	.	T	A,TAAAAA,TAAA,TAA,TAAAA,TA	228.401	PASS	AN=24;AC=8,28,43,10,8,30	GT:DP	1/1:20	0/0:2	0/1:8	1/1:6	./.:0	0/0:8	0/0:2	0/0:5	0/0:1	0/0:1	1/1:17	0/0:2	0/1:13	2/2:.	3/4:.	5/5:.	5/2:.	3/3:.	3/6:.	6/6:.	2/2:.	3/3:.	./4:.	4/6:.	4/3:.	3/3:.	4/3:.	3/3:.	3/3:.	2/3:.	6/2:.	6/3:.	2/2:.	3/2:.	6/6:.	6/6:.	2/6:.	3/3:.	6/4:.	2/2:.	6/2:.	3/2:.	3/6:.	2/6:.	3/3:.	3/5:.	2/2:.	3/5:.	2/3:.	2/3:.	2/2:.	6/3:.	4/3:.	3/3:.	2/3:.	2/6:.	2/3:.	6/4:.	2/5:.	6/6:.	6/3:.	6/3:.	4/6:.	3/2:.	6/6:.	5/3:.	6/6:.	5/3:.	3/2:.	6/6:.	3/4:.	6/3:.	3/3:.

As you can see, the AN value for the normed entry is 24, instead of the correct 144 (120+24).
This leads to an error when I ran bcftools norm -m +any 1.vcf | bcftools view -q 0.1:nonmajor :

[E::bcf_calc_ac] Incorrect AN/AC counts at Chr1:1

On the other hand, bcftools merge -m any --no-index part1.vcf part2.vcf gives the correct AN value:

...
##bcftools_mergeVersion=1.19+htslib-1.19
##bcftools_mergeCommand=merge --no-index -m any part1.vcf part2.vcf; Date=Sun Mar 24 15:39:19 2024
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  A001    A002    A003    A004    A005    A006    A007    A008    A009    A010    A011    A012    A013    C001    C002C003     C004    C005    C006    C007    C008    C009    C010    C011    C012    C013    C014    C015    C016    C017    C018    C019    C020    C021    C022    C023    C024    C025    C026C027     C028    C029    C030    C031    C032    C033    C034    C035    C036    C037    C038    C039    C040    C041    C042    C043    C044    C045    C046    C047    C048    C049    C050C051     C052    C053    C054    C055    C056    C057    C058    C059    C060
Chr1    1       .       T       A,TAAAAA,TAAA,TAA,TAAAA,TA      228.401 PASS    INDEL;AN=144;AC=8,28,43,10,8,30 GT:DP   1/1:20  0/0:2   0/1:8   1/1:6   ./.:0   0/0:8   0/0:2   0/0:5   0/0:10/0:1   1/1:17  0/0:2   0/1:13  2/2:11  3/4:16  5/5:11  5/2:11  3/3:18  3/6:29  6/6:23  2/2:35  3/3:36  0/4:14  4/6:19  4/3:15  3/3:18  4/3:14  3/3:17  3/3:9   2/3:16  6/2:15  6/3:17  2/2:11       3/2:9   6/6:5   6/6:44  2/6:21  3/3:18  6/4:19  2/2:19  6/2:19  3/2:48  3/6:31  2/6:23  3/3:12  3/5:11  2/2:20  3/5:10  2/3:9   2/3:14  2/2:17  6/3:10  4/3:12  3/3:16  2/3:14  2/6:55       2/3:41  6/4:47  2/5:39  6/6:13  6/3:11  6/3:37  4/6:43  3/2:27  6/6:30  5/3:30  6/6:35  5/3:12  3/2:10  6/6:13  3/4:23  6/3:14  3/3:12
part1.vcf and part2.vcf

part1.vcf

##fileformat=VCFv4.2
##contig=<ID=Chr1,length=100>
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	A001	A002	A003	A004	A005	A006	A007	A008	A009	A010	A011	A012	A013
Chr1	1	.	T	A	228.246	PASS	AN=24;AC=8	GT:DP	1/1:20	0/0:2	0/1:8	1/1:6	./.:0	0/0:8	0/0:2	0/0:5	0/0:1	0/0:1	1/1:17	0/0:2	0/1:13

part2.vcf

##fileformat=VCFv4.2
##contig=<ID=Chr1,length=100>
##ALT=<ID=*,Description="Represents allele(s) other than observed.">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	C001	C002	C003	C004	C005	C006	C007	C008	C009	C010	C011	C012	C013	C014	C015	C016	C017	C018	C019	C020	C021	C022	C023	C024	C025	C026	C027	C028	C029	C030	C031	C032	C033	C034	C035	C036	C037	C038	C039	C040	C041	C042	C043	C044	C045	C046	C047	C048	C049	C050	C051	C052	C053	C054	C055	C056	C057	C058	C059	C060
Chr1	1	.	T	TAAAAA,TAAA,TAA,TAAAA,TA	228.401	PASS	INDEL;AN=120;AC=28,43,10,8,30	GT:DP	1/1:11	2/3:16	4/4:11	4/1:11	2/2:18	2/5:29	5/5:23	1/1:35	2/2:36	0/3:14	3/5:19	3/2:15	2/2:18	3/2:14	2/2:17	2/2:9	1/2:16	5/1:15	5/2:17	1/1:11	2/1:9	5/5:5	5/5:44	1/5:21	2/2:18	5/3:19	1/1:19	5/1:19	2/1:48	2/5:31	1/5:23	2/2:12	2/4:11	1/1:20	2/4:10	1/2:9	1/2:14	1/1:17	5/2:10	3/2:12	2/2:16	1/2:14	1/5:55	1/2:41	5/3:47	1/4:39	5/5:13	5/2:11	5/2:37	3/5:43	2/1:27	5/5:30	4/2:30	5/5:35	4/2:12	2/1:10	5/5:13	2/3:23	5/2:14	2/2:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant