Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truvari phab fails to read VCF headers from output file and crashes #166

Closed
TimD1 opened this issue Aug 22, 2023 · 4 comments
Closed

Truvari phab fails to read VCF headers from output file and crashes #166

TimD1 opened this issue Aug 22, 2023 · 4 comments

Comments

@TimD1
Copy link

TimD1 commented Aug 22, 2023

Version :
v4.1.0

Describe the bug :
Truvari crashes, failing to read cmrg-sv.vcf VCF headers (file attached below). No such error is encountered with this file using command-line bcftools. Manual inspection shows no missing sample names or trailing spaces/tabs. It look like this error actually occurs with the output file, not the input.

Expected behavior :
This is the error I get:

2023-08-22 11:31:07,055 [INFO] Truvari v4.1.0
2023-08-22 11:31:07,055 [INFO] Command /home/timdunn/truvari/venv3.10/bin/truvari phab -r /home/timdunn/vcfdist/comparison/beds/bench.bed -b /home/timdunn/vcfdist/comparison/vcfs/norm/bench.vcf.gz -c /home/timdunn/vcfdist/comparison/vcfs/norm/cmrg-sv.vcf.gz --bSamples HG002 --cSamples HG002 -f /home/timdunn/vcfdist/data/refs/GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta --align wfa -t 64 -o /home/timdunn/vcfdist/comparison/vcfs/phab-wfa/cmrg-sv.vcf.gz
2023-08-22 11:31:07,056 [INFO] Preparing regions
2023-08-22 11:31:07,072 [INFO] Extracting haplotypes
2023-08-22 11:31:07,380 [INFO] Harmonizing variants
[E::bcf_hdr_add_sample_len] Empty sample name: trailing spaces/tabs in the header line?
Traceback (most recent call last):
  File "/home/timdunn/truvari/venv3.10/bin/truvari", line 8, in <module>
    sys.exit(main())
  File "/home/timdunn/truvari/venv3.10/lib/python3.10/site-packages/truvari/__main__.py", line 102, in main
    TOOLS[args.cmd](args.options)
  File "/home/timdunn/truvari/venv3.10/lib/python3.10/site-packages/truvari/phab.py", line 426, in phab_main
    phab(all_regions, args.base, args.reference, args.output, args.bSamples, args.buffer,
  File "/home/timdunn/truvari/venv3.10/lib/python3.10/site-packages/truvari/phab.py", line 314, in phab
    harmonize_variants(harm_jobs, mafft_params, base_vcf,
  File "/home/timdunn/truvari/venv3.10/lib/python3.10/site-packages/truvari/phab.py", line 267, in harmonize_variants
    truvari.compress_index_vcf(output_fn[:-len(".gz")], output_fn)
  File "/home/timdunn/truvari/venv3.10/lib/python3.10/site-packages/truvari/utils.py", line 434, in compress_index_vcf
    out_hdlr.write(bcftools.sort(fn))
  File "/home/timdunn/truvari/venv3.10/lib/python3.10/site-packages/pysam/utils.py", line 83, in __call__
    raise SamtoolsError(
pysam.utils.SamtoolsError: 'bcftools returned with error -1: stdout=, stderr=Writing to /tmp/bcftools.6e0zyu\nCould not read VCF/BCF headers from /home/timdunn/vcfdist/comparison/vcfs/phab-wfa/cmrg-sv.vcf\nCleaning\n'

As you can see, the error is in reading the VCF cmrg-sv.vcf. I've attached that below. I get the same error when running with the --align mafft flag, but after few hundred copies of the following lines:

2023-08-22 11:31:12,095 [ERROR] Unable to run MAFFT
2023-08-22 11:31:12,095 [ERROR]

Example Data :
cmrg-sv.vcf.gz

Additional context :
I tried both building Truvari v4.1.0 from source and using pip install truvari==4.1.0. Both result in the same error. Everything is run in a Python 3.10 virtual environment.

@ACEnglish
Copy link
Owner

ACEnglish commented Aug 22, 2023

The error is sort of pointing to a problem with /home/timdunn/vcfdist/comparison/vcfs/norm/bench.vcf.gz . But I suspect that VCF won't have an issue, either. Instead, I'm thinking there's a problem with something around --bSamples and --cSamples. I can't confirm this without having access to the bench.vcf.gz.

Could you provide the bench.vcf.gz?

Side note: the ability of phab to harmonize two VCFs into one is an almost vestigial function that's largely been superseded by refine. Depending on how this ticket goes I might need remove it.

ACEnglish added a commit that referenced this issue Aug 22, 2023
possible change for #166
@ACEnglish
Copy link
Owner

I'm marking this ticket as abandoned. You can reopen if there are any updates.

@TimD1
Copy link
Author

TimD1 commented Oct 10, 2023

Hi Adam,

So it seems like the intended method for running Truvari at the moment is bench followed by refine, instead of phab followed by bench. Is that correct?

In that case, is there any supported method for refining the counts without modifying any of the variant representations? I'm looking to compare the results of Truvari on a per-variant basis with other tools, and so it seems like the only supported method of doing so is to first call Truvari phab, and then evaluate with multiple tools on that variant callset.

Thanks,
Tim

@ACEnglish
Copy link
Owner

Truvari phab was not deprecated. My speculation based on "depending on how this ticket goes" was answered when the issue was fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants