Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scripts to extract variant allele frequency (VAF) #234

Open
maximus3219 opened this issue Oct 18, 2023 · 1 comment
Open

Scripts to extract variant allele frequency (VAF) #234

maximus3219 opened this issue Oct 18, 2023 · 1 comment

Comments

@maximus3219
Copy link

Since variant allele frequency (VAF), allele depth (AD), depth (DP) is the fundamental information to interpret NGS data, but unfortunately it is not readily available in the outputs from Strelka.
If there is no plan to incorporate such findings in the outputs, can you provide the bash script as to extract such information and output in a separate column, or directly filter the variants based on the values of VAF, AD and DP?
bcftools can filter such information directly if such information is available directly from INFO or FORMAT field
e.g. bcftools filter -i FORMAT/AF[1] >0.05 input.vcf.gz

But unfortunately extracting information is extremely complicated as stated in the manual:
refCounts = Value of FORMAT column $REF + “U” (e.g. if REF="A" then use the value in FOMRAT/AU)
altCounts = Value of FORMAT column $ALT + “U” (e.g. if ALT="T" then use the value in FOMRAT/TU)
tier1RefCounts = First comma-delimited value from $refCounts
tier1AltCounts = First comma-delimited value from $altCounts
Somatic allele freqeuncy is $tier1AltCounts / ($tier1AltCounts + $tier1RefCounts)

How exactly can I implement the above pseudocode in the bash script with bcftools or other tools?

I have searched hundreds of webpage, and there is no one giving solutions or even discussing it!!

@juliawiggeshoff
Copy link

@maximus3219 I don't know if you are still interested but I had the same problem last week. I wrote a Python script to calculate VAF for indels and snvs from the somatic VCF files. I couldn't get it done with bcftools either, but here is the script. It calculates the VAF for each variant and includes this information for the normal and tumour samples in the final output vcf. Usage instructions are on the README.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants