Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can LiftoverVcf TAGS_TO_REVERSE be made more automatic? #1149

Closed
1 task
bbimber opened this issue Apr 1, 2018 · 4 comments
Closed
1 task

Can LiftoverVcf TAGS_TO_REVERSE be made more automatic? #1149

bbimber opened this issue Apr 1, 2018 · 4 comments

Comments

@bbimber
Copy link
Contributor

bbimber commented Apr 1, 2018

Bug Report

Affected tool(s)

LiftoverVcf

Affected version(s)

  • [ x] Latest public release version [version?]
  • Latest development/master branch as of [date of test?]

Description

You recently added code to automatically try to reverse the REF/ALT alleles if the lifted variant is reverse complemented. As you recognized, this would sometimes invalidate INFO field tags. You special-cased AF and MAX_AF and added TAGS_TO_REVERSE; however, this is still pretty manual and easy for less sophisticated users to get wrong.

Can you actually inspect the header and use tags of type VCFHeaderLineType.A and VCFHeaderLineType.R to more automatically determine what needs to get reversed? This will not help with the MAX_AF situation, but it would help considerably over the current code.

Steps to reproduce

Expected behavior

Actual behavior


Feature request

Tool(s) involved

Tool name(s), special parameters?

Description

Specify whether you want a modification of an existing behavior or addition of a new capability.
Provide examples, screenshots, where appropriate.


Documentation request

Tool(s) involved

Tool name(s), parameters?

Description

Describe what needs to be added or modified.


@bbimber bbimber changed the title Liftover Performance Regression? Can LiftoverVcf TAGS_TO_REVERSE be made more automatic? Apr 1, 2018
@bbimber
Copy link
Contributor Author

bbimber commented Apr 1, 2018

For example, AF is special-cased, but AC is also more or less a built-in field that is also allele level, but it does not get this special treatment.

@yfarjoun
Copy link
Contributor

yfarjoun commented Apr 1, 2018

AC is a reasonable tag to add to the default, but I'm not sure how to automate and at the same time allow for over-ride. I'm happy to discuss possibilities or review a PR if you have a proposed implementation.

@bbimber
Copy link
Contributor Author

bbimber commented Apr 2, 2018

The tool cant reasonably be expected to get everything right automatically. However, if a field is either A (per allele) or R (per allele plus ref), there is near certainty that swapping REF/ALT is going to leave them invalid. I understand the goal of lifting the maximum number of variants, but VCF file format is complex, and the current default behavior of this tool is quite likely to leave the user with invalid data, in a way that is pretty cryptic and hard to realize.

It's useful to special-case AF (which is an admittedly important, but also annotation use-case), but this seem like an instance where this tool special-cases whatever narrow scenario drove the addition of the feature, but isnt doing an adequate job of thinking about the broader space.

For solutions:

  • As you saw, I put in a separate PR to add an argument to disable this behavior, since at least in some of our cases the problems this introduced for annotations is greater than the additional variants.

  • I put in my PR with a default value of true (preserving status quo), but I would pretty strongly argue this feature should be opt-in (default false), given the fact the user needs to think carefully about the implications for the annotations of their specific VCF.

  • As far as more automatic fixing: it's probable VCFHeaderLineCount.R fields could be reversed safely; however, I dont know what to do with VCFHeaderLineCount.A other than null them out. Maybe there could be an argument to automatically swap and clear INFO fields on these types?

@yfarjoun
Copy link
Contributor

fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants