Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Varscan generates a REF allele that is not compatible with GATK. #25

Closed
lindenb opened this issue Mar 13, 2021 · 4 comments · Fixed by #26
Closed

Varscan generates a REF allele that is not compatible with GATK. #25

lindenb opened this issue Mar 13, 2021 · 4 comments · Fixed by #26
Labels
bug Something isn't working

Comments

@lindenb
Copy link

lindenb commented Mar 13, 2021

Hi again (sorry)

I got this new GATK error with var scan ouput

htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 49313350: unparsable vcf record with allele R, for input source: /SCRATCH-BIRD/users/lindenbaum-p/work/NEXTFLOW/20210312.varCA/20210312.ATAC/callers/D02_i_WT8288_C03_p20_20/varscan/varscan.vcf.gz

the problem is that a REF allele contains an allele that is not compatible with gatk:

gunzip -c  /SCRATCH-BIRD/users/lindenbaum-p/work/NEXTFLOW/20210312.varCA/20210312.ATAC/callers/A12_CM_WT8288_C03_p32_12/varscan/varscan.vcf.gz | awk '($4=="R")'
hs37d5	31982609	.	R	.	.	PASS	ADP=7;WT=0;HET=0;HOM=0;NC=1	GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR	./.:.:7
hs37d5	32894833	.	R	.	.	PASS	ADP=1;WT=0;HET=0;HOM=0;NC=1	GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR	./.:.:1
@aryarm
Copy link
Owner

aryarm commented Mar 17, 2021

Hi @lindenb,
No need to apologize! Thank you for reporting this and tracking down the source of the problem!
I think this issue originates with varscan's mpileup2cns outputting REF alleles from the reference genome that GATK cannot handle. Based on this, it looks like the creators of VarScan have already identified this issue and are working on incorporating it into a new release of their software. There is also a Github issue for it here.
In the meantime, since there should be no call at these sites, one potential solution is to just discard records where this occurs. The VarCA pipeline will detect the absence of a variant at discarded sites and automatically label these sites as "no call", as they should be.
Here's a patch for the callers/varscan script. I just deleted the last line and replaced it with an awk command that should remove the problematic records.

--- a/callers/varscan
+++ b/callers/varscan
@@ -13,4 +13,4 @@ output_dir="$4"

 samtools mpileup -l "$peaks" -f "$genome" -B "$bam" | \
 varscan mpileup2cns --p-value 1 --strand-filter 0 --output-vcf > "$output_dir/varscan.vcf" && \
-bgzip -f "$output_dir/varscan.vcf" && tabix -p vcf -f "$output_dir/varscan.vcf.gz"
+awk -F $"\t" -v 'OFS=\t' '/^#/ || $4 !~ /^(R|Y|M|W|S|K|V|H|D|B|N)$/' "$output_dir/varscan.vcf" | \
+bgzip > "$output_dir/varscan.vcf.gz" && tabix -p vcf -f "$output_dir/varscan.vcf.gz"

If you have a chance, can you try this out and let me know if it resolves the problem?

@aryarm
Copy link
Owner

aryarm commented Mar 27, 2021

I've created pull request #26 for this patch. I can merge it into master and publish a new release once we are sure that it fixes the problem.
In the meantime, if anybody else runs into this issue, feel free to clone the patch/varscan-iupac branch instead of master. Please also leave a comment if this solves your problem!

@aryarm aryarm added the bug Something isn't working label Jun 30, 2021
@aryarm
Copy link
Owner

aryarm commented Jul 1, 2021

Personal correspondence with another VarCA user who ran into the same problem has indicated that the patch successfully resolves this issue. So I'll be merging this into master soon.

@aryarm aryarm closed this as completed in #26 Jul 1, 2021
@davidroad
Copy link

Hi, I found vardict-java will have the same bug. I edited the pipeline myself via this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants