Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'str()' fails with 'HGVSUnsupportedOperationError' for variants containing an uncertain position #532

Open
dhauge opened this issue Nov 1, 2018 · 3 comments
Labels
bug Something isn't working

Comments

@dhauge
Copy link

dhauge commented Nov 1, 2018

If a variant is constructed with a position for which uncertain is True, then calling str() on it fails with a HGVSUnsupportedOperationError (A detailed stack trace is included below). This is caused by the Interval.format function comparing its start and end, but __eq__ raises that error if either is uncertain (which itself seems a violation of Python equality semantics).

This same problem is also manifested in the validate() method, where trying to validate a variant with an uncertain position produces

In [4]: print(var_1.validate())
(<ValidationLevel.WARNING: 2>, 'Cannot compare coordinates of uncertain positions')

Here's a hgvs-shell session that demonstrates the problems.

############################################################################
hgvs-shell -- interactive hgvs
hgvs version: 1.2.4
data provider url: postgresql://anonymous:anonymous@uta.biocommons.org/uta/uta_20161216
schema_version: 1.1
data_version: uta_20161216
sequences source: remote (bioutils.seqfetcher)
...
In [1]: var_1 = hp.parse_hgvs_variant('NM_000540.2:c.7615_7926del')

In [2]: var_1.posedit.pos.start.uncertain = True

In [3]: print(str(var_1))
---------------------------------------------------------------------------
HGVSUnsupportedOperationError             Traceback (most recent call last)
/Users/dhauge/venv/hgvs-py2.7/lib/python2.7/site-packages/hgvs/shell.pyc in <module>()
----> 1 print(str(var_1))

/Users/dhauge/venv/hgvs-py2.7/lib/python2.7/site-packages/hgvs/sequencevariant.pyc in format(self, conf)
     33         """
     34         if self.posedit is not None:
---> 35             posedit = self.posedit.format(conf)
     36         else:
     37             posedit = "?"

/Users/dhauge/venv/hgvs-py2.7/lib/python2.7/site-packages/hgvs/posedit.pyc in format(self, conf)
     27             rv = str(self.edit.format(conf))
     28         else:
---> 29             rv = "{pos}{edit}".format(pos=self.pos.format(conf), edit=self.edit.format(conf))
     30 
     31         if self.uncertain:

/Users/dhauge/venv/hgvs-py2.7/lib/python2.7/site-packages/hgvs/location.pyc in format(self, conf)
    313         if self.start is None:
    314             return ""
--> 315         if self.end is None or self.start == self.end:
    316             return self.start.format(conf)
    317         iv = self.start.format(conf) + "_" + self.end.format(conf)

/Users/dhauge/venv/hgvs-py2.7/lib/python2.7/site-packages/hgvs/location.pyc in __eq__(lhs, rhs)
    169         assert type(lhs) == type(rhs), "Cannot compare coordinates of different representations"
    170         if lhs.uncertain or rhs.uncertain:
--> 171             raise HGVSUnsupportedOperationError("Cannot compare coordinates of uncertain positions")
    172         return lhs.datum == rhs.datum and lhs.base == rhs.base and lhs.offset == rhs.offset
    173 

HGVSUnsupportedOperationError: Cannot compare coordinates of uncertain positions

In [4]: print(var_1.validate())
(<ValidationLevel.WARNING: 2>, 'Cannot compare coordinates of uncertain positions')

@dhauge dhauge changed the title str() fails with HGVSUnsupportedOperationError for variants containing an uncertain position 'str()' fails with 'HGVSUnsupportedOperationError' for variants containing an uncertain position Nov 1, 2018
@reece
Copy link
Member

reece commented Nov 1, 2018

@dhauge: Your very clear report is a model for others to follow. Thanks!

I'm inclined to agree with you: objects should be compared exclusively on their members. (Is that what you meant by "violation of Python equality semantics"? If so, this interpretation of equality is well-beyond Python.)

I think it should suffice to drop the uncertain comparison and instead add the equality to the other conditionals in the return statement.

In retrospect, I think we can drop the eq method altogether because attr's default eq is to compare objects by type and members, which is exactly what we want. (I think.)

Would you care to work up a PR?

@dhauge
Copy link
Author

dhauge commented Nov 1, 2018

Yes, I can create a PR. I wasn't sure how you felt about the similar uncertain checks in the other comparison operators; I'm not sure exactly how I'd want them to behave in all circumstances. My current thoughts are along the lines of

  • ignore the uncertain property for __lt__ and __gt__, and have the object ordered the same with and without the uncertainty.
  • Perhaps instead of raising an error if the two positions are not comparable, returning False instead (for both __lt__ and __gt__, and if they are not equivalent also __le__ and __ge__)
  • If the types are not comparable, instead of an assert return a NotImplemented (or raise NotImplementedError) so Python raises a TypeError)

@reece
Copy link
Member

reece commented Nov 1, 2018

Thanks for your willingness to contribute.

I don't recall at the moment where or why __lt__ and __gt__ are used, so I can't assess whether there will be subtle gotchas with changing the semantics. My instinct is that attr-based object comparisons (i.e., member comparison) will be the most transparent and least surprising implementation.

Two options: 1) Just try it and see what tests say. 2) Profile to get call traces and the contexts in which eq, lt, and gt are used. (1) seems likely to succeed, can be done iteratively, and is likely faster. I'd go that way.

Specific comments:

  • Include uncertain in __lt__ and __eq__ (__gt__ is via the @total_ordering decorator). For __lt__, put it at the end so that positions that differ only by uncertainty are ordered adjacent.
  • I agree that raising an exception is a poor choice for the ordering.
  • Go ahead with TypeError instead of assertions. (FWIW, the assertions were intended because they indicate circumstances that should be impossible by construction. There shouldn't be any code path that leads to a comparison of unalike types, regardless of what the user typed.)

Thanks!

@reece reece added the bug Something isn't working label Jun 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants