Skip to content

[HUDI-7908] Hotfix: HoodieFileGroupReader fails if preCombine and partition fields are the same#11656

Merged
codope merged 1 commit intoapache:masterfrom
wombatu-kun:HUDI-7908_bugfix
Jul 20, 2024
Merged

[HUDI-7908] Hotfix: HoodieFileGroupReader fails if preCombine and partition fields are the same#11656
codope merged 1 commit intoapache:masterfrom
wombatu-kun:HUDI-7908_bugfix

Conversation

@wombatu-kun
Copy link
Contributor

@wombatu-kun wombatu-kun commented Jul 19, 2024

Change Logs

From previous PR:
HoodieFileGroupReader failed if preCombine and partition fields are the same with IllegalArgumentException: Field: ts does not exist in the table schema. precombineField is required but it was filtered from dataSchema as other partition fields.

To fix this I made HoodieFileGroupReaderBasedParquetFileFormat do not filter partitionColumn from dataSchema if it is the same as preCombine field.

But I did it wrong, as you can see from this discussion: https://github.com/apache/hudi/pull/11473/files#r1681098941

There were 2 mistakes:

  • filtering condition was wrong during evaluation of dataSchema;
  • options did not contain precombineField.

With this PR I fixed it.

Impact

precombineField and partition field may be the same, and it works with local spark and on cluster.

Risk level (write none, low medium or high below)

none

Documentation Update

none

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:S PR with lines of changes in (10, 100] label Jul 19, 2024
@wombatu-kun wombatu-kun marked this pull request as draft July 19, 2024 13:54
@wombatu-kun wombatu-kun force-pushed the HUDI-7908_bugfix branch 3 times, most recently from df61d27 to 459a581 Compare July 19, 2024 22:14
@wombatu-kun wombatu-kun marked this pull request as ready for review July 19, 2024 22:16
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Copy link
Member

@codope codope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@codope codope merged commit e4b2067 into apache:master Jul 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S PR with lines of changes in (10, 100]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants