Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API, Flink: StructProjection returns null projection object for null nested struct value #7517

Merged
merged 5 commits into from
May 12, 2023

Conversation

stevenzwu
Copy link
Contributor

Adjusted the same behavior in Flink RowDataProjection.

closes issue #7507

…struct value.

Adjusted the same behavior in Flink RowDataProjection.
@stevenzwu stevenzwu force-pushed the issue-7507-null-nested-struct branch from bcc2c35 to fd4bd67 Compare May 3, 2023 20:24
@stevenzwu
Copy link
Contributor Author

stevenzwu commented May 3, 2023

@rdblue @aokolnychyi @RussellSpitzer @Reo-LEI @chenjunjiedada can you please take a look? I think it is a bit cleaner. not much benefit to have a projection object wrapping a null nested struct value.

@chenjunjiedada
Copy link
Collaborator

LGTM, Thanks @stevenzwu for digging and fixing!

if (nestedProjections[pos] != null) {
return javaClass.cast(nestedProjections[pos].wrap(struct.get(structPos, StructLike.class)));
StructLike nestedStruct = struct.get(structPos, StructLike.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a potential npe here if "struct" is null?

Copy link
Contributor Author

@stevenzwu stevenzwu May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question. this change avoided the wrapping null at nested level, which is the main reason we added the lines from 192-196 that are removed in this PR.

I guess it can potentially happen if we use StructProjection to wrap a null struct at the root level. If it is a valid scenario, we need to bring back line 192-196. If it is not a valid scenario, we should add Preconditions check in the wrap method.

  public StructProjection wrap(StructLike newStruct) {
    this.struct = newStruct;
    return this;
  }

My take is that StructProjection shouldn't wrap a null root value.

Copy link
Contributor Author

@stevenzwu stevenzwu May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if we need to bring back line 192-196, it may still be good to return null projection for nested null struct

Struct(1, null)
->
before this change: StructProjection(1, StructProjection(null))
after this change:    StructProjection(1, null)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about the behavior to return nested nulls without wrapping but I am not sure it is a good idea to remove lines 192-196. This API is widely used and it does not seem like a totally weird use case to wrap top-level nulls. Otherwise, every caller would have to check if the passed value is null. If someone has to do that check, better hide it inside this class.

Copy link
Contributor Author

@stevenzwu stevenzwu May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API is widely used and it does not seem like a totally weird use case to wrap top-level nulls.

that is a fair point. will bring back the null check. will update comments to reflect the latest state.

Copy link
Contributor Author

@stevenzwu stevenzwu May 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RussellSpitzer @aokolnychyi I have a second thought. The behaviors of wrapping null root value seems a little odd. Note that StructProjection is also a StructLike.

struct = null
projection = StructProjection(null)

struct == null would be true. but projection == null would be false. StructProjection(null) behaves more like an empty struct.

line 192-196 were originally added to deal with wrapping nested null struct (issue #2738 and PR #3240). the wrapping of nested null struct is avoided in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a closer look, you are probably right. Let me check usages but seems reasonable to remove that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any blockers. Let's remove those lines and get this in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with this change. The only issue is what happens when struct is null. That could still happen if wrap is never called.

I think I would favor the original null check (rather than the precondition) and the new behavior to return null instead of a StructProjection that wraps null. That way if somehow a null slips in, we at least just produce null. That seems to fit the intent as well, which is to return null when there is no value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only issue is what happens when struct is null. That could still happen if wrap is never called.

This is a good point. will revert to the original null check then.

@stevenzwu stevenzwu changed the title API: StructProjection returns null projection object for null nested struct value API, Flink: StructProjection returns null projection object for null nested struct value May 5, 2023
… even though this change prevented nested null struct from being wrapped.
…k for null struct, since it is avoided in this PR.
@@ -174,6 +174,7 @@ private StructProjection(StructType structType, StructType projection, boolean a
}

public StructProjection wrap(StructLike newStruct) {
Preconditions.checkState(newStruct != null, "Invalid root struct for wrapping: null");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Can this cause any performance degradation? Can we check places where this is invoked?

Copy link
Contributor Author

@stevenzwu stevenzwu May 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the removal of if-check and adding Preconditions check should go hand-to-hand. Either we do both or don't change it all. Otherwise, we can get NPE for wrapping root null object.

One Preconditions check shouldn't cause performance degradation. Also, we removed one if check and added a new check in Preconditions. So it is net neutral in the end too.

Here are the usages on the StructProjection#wrap(StructLike) inside Iceberg repo

  • core: transform on CloseableIterable<StructLike> for manifest or file entries
  • data: delete filter on record
  • flink: always wrap non-null RowDataWrapper object for EqualityFieldKeySelector
  • spark: always wrap non-null InternalRowWrapper object for wrapping partition data

@RussellSpitzer @rdblue do you see any potential null root values for the core and data module usage of wrap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The struct field is defaulted to null. Plus, a check here would create a failure rather than gracefully handling a null struct. If the struct is somehow null, then the projection should be as well right?

Copy link
Contributor Author

@stevenzwu stevenzwu May 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rdblue you are saying wrapping a null root struct should be allowed (not throwing exception), right?

StructProjection projection = StructProjection.create(schema, projectedSchema);
projection.wrap(null); // allowed

Copy link
Contributor

@aokolnychyi aokolnychyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We probably want to double check places that rely on this class and see if the new precondition may cause any performance degradation in a critical path. If yes, I'd probably drop it. Up to you, @stevenzwu.

@stevenzwu
Copy link
Contributor Author

I am going to merge this PR. the only change on handling the nested null struct is everyone have agreed upon. We have kept the old behavior regarding the null root struct and null check.

@stevenzwu stevenzwu merged commit b88951b into apache:master May 12, 2023
42 checks passed
stevenzwu added a commit to stevenzwu/iceberg that referenced this pull request May 12, 2023
…merged. added Preconditions check for non-null root oject, as RowDataProjection never allowed it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants