Fix deserialization bug when JSON is not as nested as expected#325
Fix deserialization bug when JSON is not as nested as expected#325
Conversation
| if isinstance(value, dict): | ||
| value = value.get(field, self.default) | ||
| else: | ||
| value = self.default |
There was a problem hiding this comment.
In this instance we're deserializing the field 'report.descriptors'. If report key does not exist in the JSON then it takes the default value (edit: for the descriptors and models fields), which is an empty list in both cases. But it then tries to extract the descriptors key from that object, and an empty list does not have a get method.
The assumption that was being made here is that while the JSON might have missing terminal fields its basic structure is intact. That's not true in all cases.
There was a problem hiding this comment.
Why is the default value of "report" an empty list? I agree with @asantas93 that this fix is smelly.
There was a problem hiding this comment.
It's not. The default value of report.descriptors is an empty list. report isn't a field we deserialize.
There was a problem hiding this comment.
If report key does not exist in the JSON then it takes the default value, which is an empty list.
That's what I was commenting on.
Which of the three added tests was the one that was failing?
There was a problem hiding this comment.
I updated the comment with better prepositions.
They were all failing. report=None corresponds to what the backend returns for still-validating predictors.
| fields = self.serialization_path.split('.') | ||
| for field in fields: | ||
| value = value.get(field, self.default) | ||
| if isinstance(value, dict): |
There was a problem hiding this comment.
This check seems a little vague to me. Do we really want to ignore all non-dict values here? What if this is a string or a list? I would expect us to raise an exception in that case since the field was specified but (presumably) incorrect. Would value is None be a sufficient check for this fix?
There was a problem hiding this comment.
This code is only used for serialization paths that are more than one level deep, in which case we do want to ignore all non-dict values. Any time we get to a non-dict value at this point, it means that the JSON did not have the expected structure and we should return the default value.
I actually think I should have made it more strict: any time value isn't a dict or it is a dict but doesn't have key field, we should set value = self.default and break out of the loop.
There was a problem hiding this comment.
Any time we get to a non-dict value at this point, it means that the JSON did not have the expected structure and we should return the default value.
It's not obvious to me that it's appropriate to use a default value when the structure isn't correct. It makes sense to use a default value when an optional field is missing, but when the value is present and wrong, it probably indicates a bug that should be fixed rather than suppressed.
In the event that graceful recovery is appropriate (I would probably argue this is only when an optional field is missing), it does seem correct to break after falling back on the default value.
There was a problem hiding this comment.
In practice, in what case is it a string or a list? Aren’t we always expecting a dict? Agree on narrowing the scope of the check, in that throwing an exception is better so we don’t silently fail. Saying that, it is a programming error if it’s anything other than a dict or nothing, or am I missing something?
There was a problem hiding this comment.
when the value is present and wrong, it probably indicates a bug that should be fixed rather than suppressed.
I agree, and that would throw an exception. But in this case the value isn't present and wrong, it's just not present.
Let's say the serialization path 'foo.bar' holds an integer which we want to deserialize as the field 'baz' and the default value is 0. This would be encoded as baz = properties.Integer(serialization_path='foo.bar', default=0).
A typical JSON looks like:
{
'foo': {
'bar': 17
}
}
What if we get a non-integer as the value?
{
'foo': {
'bar': 'a string'
}
}
That's an error and this PR doesn't change that.
What if the field bar is missing?
{
'foo': {
}
}
We would use the default value. That was the behavior before and is still the behavior.
What if the field foo is missing?
{
}
Previously this would result in an error, but now we'll use the default value.
There was a problem hiding this comment.
I'm saying that you should use is None to check for absence. Otherwise you are allowing incorrect data to be treated the same as absent data.
There was a problem hiding this comment.
I think Andy be right that correct usage of this function will never lead to strings or lists here. I wouldn't necessarily try to write code that handles contrived scenarios -- I am just advocating for doing precisely the check that you want to do, which seems to be key absence based on what I've seen in this conversation so far.
There was a problem hiding this comment.
Also, to be clear, had I submitted this review before merge (which I thought I was doing at the time), I would have just left this as a comment. This wasn't intended as blocking feedback.
There was a problem hiding this comment.
In your example @bfolie I am specifically worried about a case where we try to extract foo.bar but we have
{"foo":"whyisthisnotadict"}
There was a problem hiding this comment.
I agree that it should always be a dict, and I agree that {"foo":"whyisthisnotadict"} should lead to an error. I've opened another PR and I'll include a test to check that that is the case.
Citrine Python PR
Description
Predictor report deserialization was failing because pending reports return an empty
reportsfield, and our deserialization code assumes that the JSON includes sufficiently-nested entries.PR Type:
Adherence to team decisions