Skip to content

[HUDI-6217] Spark reads should skip record with delete operation metadata#10219

Merged
danny0405 merged 2 commits intoapache:masterfrom
beyond1920:HUDI-6217
Dec 2, 2023
Merged

[HUDI-6217] Spark reads should skip record with delete operation metadata#10219
danny0405 merged 2 commits intoapache:masterfrom
beyond1920:HUDI-6217

Conversation

@beyond1920
Copy link
Contributor

Change Logs

Currently Spark would read the deleted data whose _hoodie_operation is D, which is unexpected.
See more detail in HUDI-6217.
The pr aims to fix the bug.

Impact

None

Risk level (write none, low medium or high below)

None

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

} else {
val mergedRecordOpt = merge(curRow, updatedRecordOpt.get)
if (mergedRecordOpt.isEmpty) {
if (mergedRecordOpt.orNull == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary change?

Copy link
Contributor Author

@beyond1920 beyond1920 Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the last commit, Some(null) might appear here if it's delete operation.
To better readable, I update the pr in the latest commit to change optional.map to optional.flatmap to avoid Some(null).

projection.apply(r.getLeft.getData.asInstanceOf[InternalRow])
val data = r.getLeft.getData.asInstanceOf[InternalRow]
if (isDeleteData(data)) {
null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the return type an option ?

Copy link
Contributor Author

@beyond1920 beyond1920 Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the return type of optional.map function.
To better readable, I update the pr in the latest commit to change optional.map to optional.flatmap to avoid Some(null).

@hudi-bot
Copy link
Collaborator

hudi-bot commented Dec 1, 2023

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@xuzifu666
Copy link
Member

xuzifu666 commented Jan 18, 2024

@beyond1920 Hi,Thanks for the pr,we also occur the problem,and query only miss scenario of delete operation?such as insert or update data missing was not occur?

@beyond1920
Copy link
Contributor Author

Hi, @xuzifu666 , sorry, I didn't understand. What's the question?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants