Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide graded variant validation that includes uncertainty/severity #365

Closed
reece opened this issue Sep 20, 2016 · 4 comments
Closed

provide graded variant validation that includes uncertainty/severity #365

reece opened this issue Sep 20, 2016 · 4 comments
Milestone

Comments

@reece
Copy link
Member

reece commented Sep 20, 2016

Originally reported by: Reece Hart (Bitbucket: reece, GitHub: reece)


Dependents: #366, #315

Currently, validation is essentially binary: variants are either valid or invalid. For intermediate states, such as when validating reference sequence, an HGVSUnsupportedOperation error is raised. This makes it's difficult to implement two common use cases simultaneously: 1) Is this variant definitely valid (conversely, possibly invalid)?, 2) Is this variant definitely invalid (conversely, possibly valid)?

This issue proposes to create a three result states: invalid, uncertain, and valid.
In addition, there would be a notion of two modes: strict and permissive.

  • In strict mode, invalid and uncertain variants would raise errors; only known-valid variants would pass.

  • In permissive mode, only invalid variants would raise errors; uncertain and known-valid variants would pass.

Here are some examples:

variant strict? permissive? comment
NC_123.4:g.100A>T pass pass valid variant
NC_123.4:g.100_101insT pass pass valid variant
NC_123.4:g.100insT fail fail ins requires range
NM_456.7:c.100+22A>T fail pass can't validate intronic sequence
NM_003777.3:c.13552_*36del57 fail pass can't validate del length across intron

The last two cases provide motivation for this issue. Both of those variants raise HGVSUnsupportedOperationError, which means that they can't be validated. That is, there's no known error, but they're also not definitively valid. Issue #366 proposes to validate variants prior to projection. However, that issue can't be implemented now because validating these example variants would raise exceptions, even though they could be projected to genomic sequence reliably.

The interface could be:

#!python

hv = hgvs.validator.Validator(hdp, strict=True)
hv.validate(var, strict=False)

That is, the strict argument in the initializer might be overridden by the method.


@reece
Copy link
Member Author

reece commented Jan 20, 2017

Original comment by Meng Wang (Bitbucket: icebert, GitHub: icebert):


Hi Reece,

Based on our last discussion, I think we can add a function in the validator class that provide graded validation. This functional may be called graded_validate or get_validate_info or something else (I am not sure which name is better). The original validate function in the validator remains the original behavior so that the api would not change.

In the graded validation, a tuple of (res, msg) would be returned. The res is an enum of the error level, which could be VALID, WARNING or ERROR. The msg is the error or warning message. When the res is VALID, the msg is None.

Looking forward to your comments.

Meng

@reece
Copy link
Member Author

reece commented Feb 15, 2017

Original comment by Reece Hart (Bitbucket: reece, GitHub: reece):


Hi Meng- Your general proposal looks right to me.

What do you think about the following ideas?

  • use the enum34 package. e.g., ValidationLevel = OrderedEnum("ValidationLevel", "VALID WARNING ERROR"). See https://docs.python.org/3/library/enum.html#orderedenum

  • add a strict=True argument to the current validate methods and raise Exception appropriately (to match current behavior)
    (True should be default because that will match current behavior)

  • I think we should move intrinsic validation to models themselves (See consider redisgn of validation mechanisms #249). Each model would validate its own instance or peek into children in some cases, then recursively validate children. That way, validation would be at the right scope and local to the model. IntrinsicValidator would then merely call var.validate(). We'd have to move the ValidationLevels to a separate module, but that's okay.

  • generate an iterator over test results (instances of <result, message> or perhaps dict instances of {"result": ..., "message": ...}). See below.

Rationale for an iterator: Evaluating all criteria for every variant could be expensive. It will be common to want to exit on the first ERROR (or WARNING if strict). Using an iterator will enable a structure like this (untested):

def run_validations(self, var):
    methods = [self._start_lte_end, self._ins_length_is_one, self._del_length]
    return (method(var) for method in methods)

Then, in validate(), something like:

    fail_level = ValidationLevel.WARNING if strict else ValidationLevel.ERROR
    filtered_results = (res for res in self.run_validations(var) if res["result"] >= fail_level)
    if any(filtered_results):
        raise ...

Does that make sense?

@reece
Copy link
Member Author

reece commented Feb 16, 2017

Original comment by Meng Wang (Bitbucket: icebert, GitHub: icebert):


Hi Reece,

I agree with these ideas. I'll try to implement the graded validator based on these ideas.

@reece
Copy link
Member Author

reece commented Mar 1, 2017

Original comment by Reece Hart (Bitbucket: reece, GitHub: reece):


Merged in pull request #58.

@reece reece added this to the 0.5.0 milestone Mar 9, 2017
@reece reece closed this as completed Mar 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant