Add local validation framework for GEMD objects#161
Conversation
|
I am going to start looking at this one now @kroenlein |
|
From a quick glance, the logic looks fine. I think that the relevant discussion here is not how to do local validation but whether to do it at all. There are good reasons to only do validation server-side: it localizes the validation logic in one place and the server has access to all of the relevant information, while the client may only know about link-by-uids. So if we're going to add client-side validation code then there should be a clear and compelling reason. What's the relevant user story for which this is a significant benefit? Is there a JIRA ticket? And if there is a relevant user story, could be solve the problem with a faster failure and/or more clear error message from the server? |
Sorry, I should have included this in the initial description: |
| @parameters.setter | ||
| def parameters(self, parameters): | ||
| self._parameters = validate_list(parameters, Parameter) | ||
| def _template_check(x: Parameter) -> Parameter: |
There was a problem hiding this comment.
This internal method _template_check is roughly identical to the above for conditions. Can we abstract it out and pass the accept condition as a lambda function maybe?
There was a problem hiding this comment.
The methods are very similar, but would require a closure for construction since (ignoring the type checking) they depend on HasXXXXTemplates and validate_XXXX. We can't refactor those into the same namespace since the same object can have multiple attribute types. There's also no mixin parent class here, though I can certainly build one (or just create a local util package) if DRY suggests this is the wiser route.
There was a problem hiding this comment.
I've created a centralized _template_check closure generator in HasTemplates, and then has HasSpec piggy back on it since I didn't want to implement it twice.
One of the last comments on that ticket is "This card was iced because this is best accomplished in the backend ("client validation is no validation")". Is this actually best accomplished in the client or are you doing it because nobody is picking up the backend work? |
sfriedowitz
left a comment
There was a problem hiding this comment.
I didn't study the tests in excruciating detail, but the validation logic seems okay with me.
I think it would be worth while to find a way to reduce code duplication on those internal methods. I also think validation_level makes more sense as the context manager if you feel so inclined to make that change.
We now have register_all w/ a validation check (dry_run) and most of its behaviors around large checks have been resolved with the batching work. This is better than what existed at the initial time of request. In order to do better, we would need to increase the batch size to arbitrarily large (10k?). So you are still left with a situation where an ingestor must run for hours before you get feedback that your cell was misformatted (a test that could be done in 10 minutes for a large dataset). As well, the information returned by the platform is well-separated from the actual data error, making it much harder to understand what actually needs fixing. tldr; I'd say it's better accomplished in the client, and we've pushed server side to as good as it can get. |
sfriedowitz
left a comment
There was a problem hiding this comment.
I think the inheritance structure is much improved with this iteration.
I am requesting some additional comments on the critical method so that the next reader has an easier time with this.
Right now, all validation is performed server side with no constraints or feedback to users client side when types or values are inconsistent with templates. This PR does two things:
Note that there is no guarantee that an object can be validated since any template could just be referenced by LinkByUID, in which case we have no local knowledge of its bounds.