Skip to content

Add local validation framework for GEMD objects#161

Merged
kroenlein merged 9 commits intomainfrom
feature/improve-validation
Jan 6, 2022
Merged

Add local validation framework for GEMD objects#161
kroenlein merged 9 commits intomainfrom
feature/improve-validation

Conversation

@kroenlein
Copy link
Copy Markdown
Collaborator

Right now, all validation is performed server side with no constraints or feedback to users client side when types or values are inconsistent with templates. This PR does two things:

  1. It adds automatic validation code at every level of value assignment so that a user can receive feedback immediately when a bound is violated.
  2. It adds the ability for the user to control whether the inconsistencies with bounds are ignored, are a warning, or are fatal. The default here is Warning, which will make existing scripts noisier but will not break existing workflows. It is proposed that we consider changing the default to Fatal when we release gemd-python 2.0.

Note that there is no guarantee that an object can be validated since any template could just be referenced by LinkByUID, in which case we have no local knowledge of its bounds.

@sfriedowitz
Copy link
Copy Markdown

I am going to start looking at this one now @kroenlein

@bfolie
Copy link
Copy Markdown

bfolie commented Jan 3, 2022

From a quick glance, the logic looks fine. I think that the relevant discussion here is not how to do local validation but whether to do it at all.

There are good reasons to only do validation server-side: it localizes the validation logic in one place and the server has access to all of the relevant information, while the client may only know about link-by-uids. So if we're going to add client-side validation code then there should be a clear and compelling reason. What's the relevant user story for which this is a significant benefit? Is there a JIRA ticket? And if there is a relevant user story, could be solve the problem with a faster failure and/or more clear error message from the server?

@kroenlein
Copy link
Copy Markdown
Collaborator Author

kroenlein commented Jan 3, 2022

From a quick glance, the logic looks fine. I think that the relevant discussion here is not how to do local validation but whether to do it at all.

There are good reasons to only do validation server-side: it localizes the validation logic in one place and the server has access to all of the relevant information, while the client may only know about link-by-uids. So if we're going to add client-side validation code then there should be a clear and compelling reason. What's the relevant user story for which this is a significant benefit? Is there a JIRA ticket? And if there is a relevant user story, could be solve the problem with a faster failure and/or more clear error message from the server?

Sorry, I should have included this in the initial description:
https://citrine.atlassian.net/browse/PLA-5806

Comment thread gemd/entity/bounds_validation.py Outdated
Comment thread gemd/entity/object/has_parameters.py Outdated
@parameters.setter
def parameters(self, parameters):
self._parameters = validate_list(parameters, Parameter)
def _template_check(x: Parameter) -> Parameter:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This internal method _template_check is roughly identical to the above for conditions. Can we abstract it out and pass the accept condition as a lambda function maybe?

Copy link
Copy Markdown
Collaborator Author

@kroenlein kroenlein Jan 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The methods are very similar, but would require a closure for construction since (ignoring the type checking) they depend on HasXXXXTemplates and validate_XXXX. We can't refactor those into the same namespace since the same object can have multiple attribute types. There's also no mixin parent class here, though I can certainly build one (or just create a local util package) if DRY suggests this is the wiser route.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created a centralized _template_check closure generator in HasTemplates, and then has HasSpec piggy back on it since I didn't want to implement it twice.

@bfolie
Copy link
Copy Markdown

bfolie commented Jan 3, 2022

Sorry, I should have included this in the initial description: https://citrine.atlassian.net/browse/PLA-5806

One of the last comments on that ticket is "This card was iced because this is best accomplished in the backend ("client validation is no validation")". Is this actually best accomplished in the client or are you doing it because nobody is picking up the backend work?

Copy link
Copy Markdown

@sfriedowitz sfriedowitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't study the tests in excruciating detail, but the validation logic seems okay with me.

I think it would be worth while to find a way to reduce code duplication on those internal methods. I also think validation_level makes more sense as the context manager if you feel so inclined to make that change.

Comment thread gemd/entity/object/material_spec.py Outdated
@kroenlein
Copy link
Copy Markdown
Collaborator Author

Sorry, I should have included this in the initial description: https://citrine.atlassian.net/browse/PLA-5806

One of the last comments on that ticket is "This card was iced because this is best accomplished in the backend ("client validation is no validation")". Is this actually best accomplished in the client or are you doing it because nobody is picking up the backend work?

We now have register_all w/ a validation check (dry_run) and most of its behaviors around large checks have been resolved with the batching work. This is better than what existed at the initial time of request. In order to do better, we would need to increase the batch size to arbitrarily large (10k?). So you are still left with a situation where an ingestor must run for hours before you get feedback that your cell was misformatted (a test that could be done in 10 minutes for a large dataset). As well, the information returned by the platform is well-separated from the actual data error, making it much harder to understand what actually needs fixing.

tldr; I'd say it's better accomplished in the client, and we've pushed server side to as good as it can get.

@kroenlein kroenlein requested a review from sfriedowitz January 4, 2022 15:55
Copy link
Copy Markdown

@sfriedowitz sfriedowitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the inheritance structure is much improved with this iteration.

I am requesting some additional comments on the critical method so that the next reader has an easier time with this.

Comment thread gemd/entity/object/has_template_check_generator.py
Comment thread gemd/entity/object/has_template_check_generator.py
@kroenlein kroenlein requested a review from sfriedowitz January 6, 2022 18:16
Comment thread gemd/entity/object/has_template_check_generator.py
Copy link
Copy Markdown

@sfriedowitz sfriedowitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kroenlein kroenlein merged commit f884247 into main Jan 6, 2022
@kroenlein kroenlein deleted the feature/improve-validation branch January 6, 2022 19:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants