You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR fixes a regression caused by #13375 where null ORC inputs would be processed into {} instead of null as expected.
The cause of the regression was allowing the nested types to be returned during conversion to support nested ingestion, which exposed another underlying oddity of why the values were ending up as empty maps instead of null.
The ORC json provider isMap method looks like this
@Override
public boolean isMap(final Object o)
{
return o == null || o instanceof Map || o instanceof OrcStruct;
}
which is a bit strange, however is consistent with the other implementations of other nested formats. This means toPlainJavaObject will treat null as a map for most types, resulting in the empty map when converting to java objects. I haven't quite discovered why these are implemented like this (if it was me, i cannot remember 😅), but to avoid changing the behavior here, toMap now checks for null response from toPlainJavaObject and returns an empty map if so, so that toPlainJavaObject will not translate null into an empty map.
While writing a test for this I noticed that toPlainJavaObject could still leak format specific types since the fall through value was not 'finalized' like the values inside of maps and lists are, so the json NullNode for example by processing a null input row would cause the sampler to explode. I'm unsure how common this example is, but it seems safer to finalize the values the fall through just to be safe.
This PR has:
been self-reviewed.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Although due to the old behavior, would the sampler show empty map for null values in top level nested structures? But they'll be ingested as nulls.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR fixes a regression caused by #13375 where null ORC inputs would be processed into
{}instead ofnullas expected.The cause of the regression was allowing the nested types to be returned during conversion to support nested ingestion, which exposed another underlying oddity of why the values were ending up as empty maps instead of
null.The ORC json provider
isMapmethod looks like thiswhich is a bit strange, however is consistent with the other implementations of other nested formats. This means
toPlainJavaObjectwill treatnullas a map for most types, resulting in the empty map when converting to java objects. I haven't quite discovered why these are implemented like this (if it was me, i cannot remember 😅), but to avoid changing the behavior here,toMapnow checks fornullresponse fromtoPlainJavaObjectand returns an empty map if so, so thattoPlainJavaObjectwill not translatenullinto an empty map.While writing a test for this I noticed that
toPlainJavaObjectcould still leak format specific types since the fall through value was not 'finalized' like the values inside of maps and lists are, so the jsonNullNodefor example by processing anullinput row would cause the sampler to explode. I'm unsure how common this example is, but it seems safer to finalize the values the fall through just to be safe.This PR has: