Add local validation framework for GEMD objects by kroenlein · Pull Request #161 · CitrineInformatics/gemd-python

kroenlein · 2022-01-03T18:13:22Z

Right now, all validation is performed server side with no constraints or feedback to users client side when types or values are inconsistent with templates. This PR does two things:

It adds automatic validation code at every level of value assignment so that a user can receive feedback immediately when a bound is violated.
It adds the ability for the user to control whether the inconsistencies with bounds are ignored, are a warning, or are fatal. The default here is Warning, which will make existing scripts noisier but will not break existing workflows. It is proposed that we consider changing the default to Fatal when we release gemd-python 2.0.

Note that there is no guarantee that an object can be validated since any template could just be referenced by LinkByUID, in which case we have no local knowledge of its bounds.

sfriedowitz · 2022-01-03T20:56:42Z

I am going to start looking at this one now @kroenlein

bfolie · 2022-01-03T21:07:35Z

From a quick glance, the logic looks fine. I think that the relevant discussion here is not how to do local validation but whether to do it at all.

There are good reasons to only do validation server-side: it localizes the validation logic in one place and the server has access to all of the relevant information, while the client may only know about link-by-uids. So if we're going to add client-side validation code then there should be a clear and compelling reason. What's the relevant user story for which this is a significant benefit? Is there a JIRA ticket? And if there is a relevant user story, could be solve the problem with a faster failure and/or more clear error message from the server?

kroenlein · 2022-01-03T21:18:58Z

From a quick glance, the logic looks fine. I think that the relevant discussion here is not how to do local validation but whether to do it at all.

There are good reasons to only do validation server-side: it localizes the validation logic in one place and the server has access to all of the relevant information, while the client may only know about link-by-uids. So if we're going to add client-side validation code then there should be a clear and compelling reason. What's the relevant user story for which this is a significant benefit? Is there a JIRA ticket? And if there is a relevant user story, could be solve the problem with a faster failure and/or more clear error message from the server?

Sorry, I should have included this in the initial description:
https://citrine.atlassian.net/browse/PLA-5806

sfriedowitz · 2022-01-03T21:37:31Z

    @parameters.setter
    def parameters(self, parameters):
-        self._parameters = validate_list(parameters, Parameter)
+        def _template_check(x: Parameter) -> Parameter:


This internal method _template_check is roughly identical to the above for conditions. Can we abstract it out and pass the accept condition as a lambda function maybe?

The methods are very similar, but would require a closure for construction since (ignoring the type checking) they depend on HasXXXXTemplates and validate_XXXX. We can't refactor those into the same namespace since the same object can have multiple attribute types. There's also no mixin parent class here, though I can certainly build one (or just create a local util package) if DRY suggests this is the wiser route.

I've created a centralized _template_check closure generator in HasTemplates, and then has HasSpec piggy back on it since I didn't want to implement it twice.

bfolie · 2022-01-03T21:44:42Z

Sorry, I should have included this in the initial description: https://citrine.atlassian.net/browse/PLA-5806

One of the last comments on that ticket is "This card was iced because this is best accomplished in the backend ("client validation is no validation")". Is this actually best accomplished in the client or are you doing it because nobody is picking up the backend work?

sfriedowitz

I didn't study the tests in excruciating detail, but the validation logic seems okay with me.

I think it would be worth while to find a way to reduce code duplication on those internal methods. I also think validation_level makes more sense as the context manager if you feel so inclined to make that change.

kroenlein · 2022-01-03T21:54:31Z

Sorry, I should have included this in the initial description: https://citrine.atlassian.net/browse/PLA-5806

One of the last comments on that ticket is "This card was iced because this is best accomplished in the backend ("client validation is no validation")". Is this actually best accomplished in the client or are you doing it because nobody is picking up the backend work?

We now have register_all w/ a validation check (dry_run) and most of its behaviors around large checks have been resolved with the batching work. This is better than what existed at the initial time of request. In order to do better, we would need to increase the batch size to arbitrarily large (10k?). So you are still left with a situation where an ingestor must run for hours before you get feedback that your cell was misformatted (a test that could be done in 10 minutes for a large dataset). As well, the information returned by the platform is well-separated from the actual data error, making it much harder to understand what actually needs fixing.

tldr; I'd say it's better accomplished in the client, and we've pushed server side to as good as it can get.

sfriedowitz

I think the inheritance structure is much improved with this iteration.

I am requesting some additional comments on the critical method so that the next reader has an easier time with this.

sfriedowitz

LGTM

kroenlein and others added 2 commits December 29, 2021 09:18

Added value validation logic with option reporting levels

b81c26e

validate_attribute now handles LinkByUIDs properly

7eecd07

kroenlein requested review from bfolie and sfriedowitz January 3, 2022 18:13

sfriedowitz reviewed Jan 3, 2022

View reviewed changes

Comment thread gemd/entity/bounds_validation.py Outdated

sfriedowitz reviewed Jan 3, 2022

View reviewed changes

Comment thread gemd/entity/object/material_spec.py Outdated

kroenlein and others added 3 commits January 3, 2022 17:07

Merge branch 'main' into feature/improve-validation

627c7b3

Merge branch 'main' into feature/improve-validation

647171c

Centralize template_check generator

c0a7464

kroenlein requested a review from sfriedowitz January 4, 2022 15:55

kroenlein added 3 commits January 5, 2022 15:01

Abstracted template_check_generator for cleaner inheritence

eb9f0a6

flake8 compliance

a93b7cc

Simplify inheritence for template check

3e1f3fc

sfriedowitz suggested changes Jan 6, 2022

View reviewed changes

Comment thread gemd/entity/object/has_template_check_generator.py

Comment thread gemd/entity/object/has_template_check_generator.py

Improve description of _generate_template_check

9673a20

kroenlein requested a review from sfriedowitz January 6, 2022 18:16

sfriedowitz reviewed Jan 6, 2022

View reviewed changes

Comment thread gemd/entity/object/has_template_check_generator.py

sfriedowitz approved these changes Jan 6, 2022

View reviewed changes

kroenlein merged commit f884247 into main Jan 6, 2022

kroenlein deleted the feature/improve-validation branch January 6, 2022 19:21

Conversation

kroenlein commented Jan 3, 2022

Uh oh!

sfriedowitz commented Jan 3, 2022

Uh oh!

bfolie commented Jan 3, 2022

Uh oh!

kroenlein commented Jan 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

sfriedowitz Jan 3, 2022

Choose a reason for hiding this comment

Uh oh!

kroenlein Jan 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kroenlein Jan 4, 2022

Choose a reason for hiding this comment

Uh oh!

bfolie commented Jan 3, 2022

Uh oh!

sfriedowitz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kroenlein commented Jan 3, 2022

Uh oh!

sfriedowitz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sfriedowitz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kroenlein commented Jan 3, 2022 •

edited

Loading

kroenlein Jan 4, 2022 •

edited

Loading