Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-3936] Fix projection for a nested field as pre-combined key #5379

Merged

Conversation

yihua
Copy link
Contributor

@yihua yihua commented Apr 21, 2022

What is the purpose of the pull request

This PR fixes the projection logic around a nested field which is used as the pre-combined key field. The fix is to only check and append the root level field for projection, i.e., "a", for a nested field "a.b.c" in the mandatory columns.

Brief change log

  • Changes the logic to check and append the root level field for a required nested field in the mandatory columns in HoodieBaseRelation.appendMandatoryColumns
  • Adds tests to guard around the behavior for a nested field used as pre-combined key.

Verify this pull request

This change adds tests in TestHoodieAvroUtils and TestMORDataSourceStorage. TestMORDataSourceStorage contains tests that use nested field "fare.currency" as the pre-combined key. Before this change, the tests with nest fields fail. After this PR, the tests pass.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

// For a nested field in mandatory columns, we should first get the root-level field, and then
// check for any missing column, as the requestedColumns should only contain root-level fields
// We should only append root-level field as well
val missing = mandatoryColumns.map(col => HoodieAvroUtils.getRootLevelFieldName(col))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do this filtering when we assign it (name mandatoryColumns is misleading otherwise)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correctly, do you mean to say we can do the filtering of mandatoryColumns upon initialization of the class? That's not possible since we need to do on-the-fly filtering based on the passed-in requestedColumns which may vary when buildScan is called.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think Alexey meant a readability improvement

@apache apache deleted a comment from hudi-bot Apr 21, 2022
@apache apache deleted a comment from hudi-bot Apr 21, 2022
@xushiyan xushiyan force-pushed the HUDI-3936-fix-nested-field-precombine-key branch from dd8db55 to d24bcd6 Compare April 21, 2022 12:02
Copy link
Member

@xushiyan xushiyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI failure is unrelated. LGTM

@apache apache deleted a comment from hudi-bot Apr 21, 2022
@yihua yihua force-pushed the HUDI-3936-fix-nested-field-precombine-key branch from d24bcd6 to 8d5d576 Compare April 21, 2022 20:00
@nsivabalan nsivabalan force-pushed the HUDI-3936-fix-nested-field-precombine-key branch from 8d5d576 to 5aae2f9 Compare April 21, 2022 20:53
@nsivabalan
Copy link
Contributor

@hudi-bot run azure

1 similar comment
@alexeykudinkin
Copy link
Contributor

@hudi-bot run azure

@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan merged commit c4bc2de into apache:master Apr 22, 2022
nsivabalan pushed a commit that referenced this pull request Apr 22, 2022
)

This PR fixes the projection logic around a nested field which is used as the pre-combined key field. The fix is to only check and append the root level field for projection, i.e., "a", for a nested field "a.b.c" in the mandatory columns.

- Changes the logic to check and append the root level field for a required nested field in the mandatory columns in HoodieBaseRelation.appendMandatoryColumns
xushiyan pushed a commit that referenced this pull request Apr 22, 2022
)

This PR fixes the projection logic around a nested field which is used as the pre-combined key field. The fix is to only check and append the root level field for projection, i.e., "a", for a nested field "a.b.c" in the mandatory columns.

- Changes the logic to check and append the root level field for a required nested field in the mandatory columns in HoodieBaseRelation.appendMandatoryColumns
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants