-
Notifications
You must be signed in to change notification settings - Fork 591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AS_FilterStatus uses an incorrect number of fields for some records #6857
Comments
@ldgauthier It seems that we might be violating the VCF spec with the way values are delimited in the |
@droazen Are you thinking String with count=1 and then DIY parsing? Despite years of discussions, we never made any progress on lists of lists in the VCF spec. |
It looks like that's just what Don did in #6858 |
I am facing same problem when I tried to merge Mutect2 vcf files (from several patients) to one. I tried to use bcftools merge, but I got a same error. I used Mutect2 of GATK4.1.9.0. From when does Mutect2 have this error? I am thinking of downgrade GATK tools such as 4.1.7.0 or older version. |
Is this issue fixed yet? @TnakaNY I am using |
I am running |
I came across this problem when using
Sharing my code below in case it is useful for others: bcftools head my.vcf \
| perl -nle "if (/ID=AS_FilterStatus,/){ s/Number=A/Number=./ } print" > my.header.txt
bcftools reheader -h my.header.txt my.vcf \
| bgzip > my.vcf.gz
tabix my.vcf.gz |
I have the same problem and I use
Best, |
Yes we made a workflow on Terra to debug this: https://dockstore.org/workflows/github.com/broadinstitute/depmap_omics/omics_mutect2:dev?tab=files |
Hello! I wrote a script to circumvent this error:
|
Hello! Same problem with comma in Description field in VCF header. I wrote a awk script to prevent error in further tools that can not deal with this format. fix_vcf_header_with_comma_in_description.awk
Usage:
|
A faster awk alternative to the above Python script to swap the characters in the variant rows: awk '{
# find the filter status field
match($0, /AS_FilterStatus=[^;]+;/);
if (RSTART != 0) {
# the line matches
before = substr($0, 1, RSTART - 1);
match_str = substr($0, RSTART, RLENGTH);
after = substr($0, RSTART + RLENGTH);
# temp replace "|" with a non-printing char to swap "|" and "," chars
gsub(/\|/, "\x1e", match_str);
gsub(/,/, "|", match_str);
gsub(/\x1e/, ",", match_str);
# print modified line
print before match_str after;
} else {
# no match
print $0;
}
}' your.vcf > fixed.vcf |
Bug Report
Affected tool
Mutect2
Affected versions
Description
Mutect2’s header defines
AS_FilterStatus
as follows:AS_FilterStatus
uses the pipe character|
for per-allele concatenation and a comma,
for filter concatenation. This causes records to have an incorrect number of values at sites with multiple filters or multiple alleles. Some examples:A quick fix would be to define
Number=1
forAS_FilterStatus
in the VCF header. Alternatively, using a pipe for filter concatenation and a comma for per-allele concatenation might be more compliant with the VCF specification.The text was updated successfully, but these errors were encountered: