-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API, Flink: StructProjection returns null projection object for null nested struct value #7517
API, Flink: StructProjection returns null projection object for null nested struct value #7517
Conversation
…struct value. Adjusted the same behavior in Flink RowDataProjection.
bcc2c35
to
fd4bd67
Compare
@rdblue @aokolnychyi @RussellSpitzer @Reo-LEI @chenjunjiedada can you please take a look? I think it is a bit cleaner. not much benefit to have a projection object wrapping a null nested struct value. |
LGTM, Thanks @stevenzwu for digging and fixing! |
if (nestedProjections[pos] != null) { | ||
return javaClass.cast(nestedProjections[pos].wrap(struct.get(structPos, StructLike.class))); | ||
StructLike nestedStruct = struct.get(structPos, StructLike.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a potential npe here if "struct" is null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question. this change avoided the wrapping null at nested level, which is the main reason we added the lines from 192-196 that are removed in this PR.
I guess it can potentially happen if we use StructProjection
to wrap a null struct
at the root level. If it is a valid scenario, we need to bring back line 192-196. If it is not a valid scenario, we should add Preconditions
check in the wrap method.
public StructProjection wrap(StructLike newStruct) {
this.struct = newStruct;
return this;
}
My take is that StructProjection shouldn't wrap a null root value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if we need to bring back line 192-196, it may still be good to return null projection for nested null struct
Struct(1, null)
->
before this change: StructProjection(1, StructProjection(null))
after this change: StructProjection(1, null)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree about the behavior to return nested nulls without wrapping but I am not sure it is a good idea to remove lines 192-196. This API is widely used and it does not seem like a totally weird use case to wrap top-level nulls. Otherwise, every caller would have to check if the passed value is null. If someone has to do that check, better hide it inside this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API is widely used and it does not seem like a totally weird use case to wrap top-level nulls.
that is a fair point. will bring back the null check. will update comments to reflect the latest state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RussellSpitzer @aokolnychyi I have a second thought. The behaviors of wrapping null root value seems a little odd. Note that StructProjection
is also a StructLike
.
struct = null
projection = StructProjection(null)
struct == null
would be true. but projection == null
would be false. StructProjection(null)
behaves more like an empty struct.
line 192-196 were originally added to deal with wrapping nested null struct (issue #2738 and PR #3240). the wrapping of nested null struct is avoided in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a closer look, you are probably right. Let me check usages but seems reasonable to remove that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any blockers. Let's remove those lines and get this in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this change. The only issue is what happens when struct is null. That could still happen if wrap
is never called.
I think I would favor the original null check (rather than the precondition) and the new behavior to return null instead of a StructProjection
that wraps null. That way if somehow a null slips in, we at least just produce null. That seems to fit the intent as well, which is to return null
when there is no value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only issue is what happens when struct is null. That could still happen if wrap is never called.
This is a good point. will revert to the original null check then.
api/src/main/java/org/apache/iceberg/util/StructProjection.java
Outdated
Show resolved
Hide resolved
… even though this change prevented nested null struct from being wrapped.
…k for null struct, since it is avoided in this PR.
@@ -174,6 +174,7 @@ private StructProjection(StructType structType, StructType projection, boolean a | |||
} | |||
|
|||
public StructProjection wrap(StructLike newStruct) { | |||
Preconditions.checkState(newStruct != null, "Invalid root struct for wrapping: null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Can this cause any performance degradation? Can we check places where this is invoked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel the removal of if-check and adding Preconditions check should go hand-to-hand. Either we do both or don't change it all. Otherwise, we can get NPE for wrapping root null object.
One Preconditions check shouldn't cause performance degradation. Also, we removed one if check and added a new check in Preconditions. So it is net neutral in the end too.
Here are the usages on the StructProjection#wrap(StructLike)
inside Iceberg repo
- core: transform on
CloseableIterable<StructLike>
for manifest or file entries - data: delete filter on record
- flink: always wrap non-null
RowDataWrapper
object for EqualityFieldKeySelector - spark: always wrap non-null
InternalRowWrapper
object for wrapping partition data
@RussellSpitzer @rdblue do you see any potential null root values for the core and data module usage of wrap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The struct
field is defaulted to null
. Plus, a check here would create a failure rather than gracefully handling a null struct. If the struct is somehow null, then the projection should be as well right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdblue you are saying wrapping a null root struct should be allowed (not throwing exception), right?
StructProjection projection = StructProjection.create(schema, projectedSchema);
projection.wrap(null); // allowed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We probably want to double check places that rely on this class and see if the new precondition may cause any performance degradation in a critical path. If yes, I'd probably drop it. Up to you, @stevenzwu.
I am going to merge this PR. the only change on handling the nested null struct is everyone have agreed upon. We have kept the old behavior regarding the null root struct and null check. |
…merged. added Preconditions check for non-null root oject, as RowDataProjection never allowed it.
Adjusted the same behavior in Flink RowDataProjection.
closes issue #7507