Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating filter syntax and threshold ? #2101

Open
davidyuyuan opened this issue Feb 18, 2024 · 1 comment
Open

Updating filter syntax and threshold ? #2101

davidyuyuan opened this issue Feb 18, 2024 · 1 comment

Comments

@davidyuyuan
Copy link

I could be very wrong. The syntax and threshold values in the example might be outdated in Filtering variants on https://samtools.github.io/bcftools/howtos/variant-calling.html:

bcftools filter -sLowQual -g3 -G10 \
    -e'%QUAL<10 || (RPB<0.1 && %QUAL<15) || (AC<2 && %QUAL<15) || %MAX(DV)<=3 || %MAX(DV)/%MAX(DP)<=0.3' \
    calls.vcf.gz

I am using the following on bcftools_filterVersion=1.18+htslib-1.18 with a public GRCh38 dataset from PacBio:

bcftools filter -sLowQual -g3 -G10 \
    -e 'QUAL<100 || (RPBZ<0.1 && QUAL<150) || (AC<2 && QUAL<150) || VDB<1.0e-04' \
    "${input_vcf}" -Oz -o "${output_dir}/filtered.vcf.gz" --write-index

So might be the other examples in "Filtering variants".

P.S. I am going to try the filter on the G1K dataset next to understand if the filter can be generic enough.

@pd3
Copy link
Member

pd3 commented Feb 19, 2024

The thresholds are extremely unlikely to work as is for different datasets, that's certain. The example is intended as an illustration only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants