-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-3936] Fix projection for a nested field as pre-combined key #5379
[HUDI-3936] Fix projection for a nested field as pre-combined key #5379
Conversation
hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/HoodieSparkUtils.scala
Outdated
Show resolved
Hide resolved
// For a nested field in mandatory columns, we should first get the root-level field, and then | ||
// check for any missing column, as the requestedColumns should only contain root-level fields | ||
// We should only append root-level field as well | ||
val missing = mandatoryColumns.map(col => HoodieAvroUtils.getRootLevelFieldName(col)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do this filtering when we assign it (name mandatoryColumns
is misleading otherwise)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand it correctly, do you mean to say we can do the filtering of mandatoryColumns
upon initialization of the class? That's not possible since we need to do on-the-fly filtering based on the passed-in requestedColumns
which may vary when buildScan
is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think Alexey meant a readability improvement
dd8db55
to
d24bcd6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI failure is unrelated. LGTM
d24bcd6
to
8d5d576
Compare
8d5d576
to
5aae2f9
Compare
@hudi-bot run azure |
1 similar comment
@hudi-bot run azure |
) This PR fixes the projection logic around a nested field which is used as the pre-combined key field. The fix is to only check and append the root level field for projection, i.e., "a", for a nested field "a.b.c" in the mandatory columns. - Changes the logic to check and append the root level field for a required nested field in the mandatory columns in HoodieBaseRelation.appendMandatoryColumns
) This PR fixes the projection logic around a nested field which is used as the pre-combined key field. The fix is to only check and append the root level field for projection, i.e., "a", for a nested field "a.b.c" in the mandatory columns. - Changes the logic to check and append the root level field for a required nested field in the mandatory columns in HoodieBaseRelation.appendMandatoryColumns
What is the purpose of the pull request
This PR fixes the projection logic around a nested field which is used as the pre-combined key field. The fix is to only check and append the root level field for projection, i.e., "a", for a nested field "a.b.c" in the mandatory columns.
Brief change log
HoodieBaseRelation.appendMandatoryColumns
Verify this pull request
This change adds tests in
TestHoodieAvroUtils
andTestMORDataSourceStorage
.TestMORDataSourceStorage
contains tests that use nested field "fare.currency" as the pre-combined key. Before this change, the tests with nest fields fail. After this PR, the tests pass.Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.