Skip to content

Conversation

@codope
Copy link
Member

@codope codope commented Apr 3, 2025

Change Logs

The USP of partial updates is that users don't have to specify all fields in the merge into command. However, with global index and partition path updates, merge into command will fail because the expectation is that full record is provided to HoodieIndexUtils::mergeIncomingWithExistingRecordWithExpressionPayload. This PR attempts to fix the behavior by doing partial merge and building full record i.e. get the merged record and then fill in the missing fields from existing record. The PR still uses record merger API in both the above method as well as HoodieMergedReadHandle#doMergedRead. Ideally, we would want to use the filegroup reader in HoodieMergedReadHandle instad of record merger. That's a larger refactoring and for now, maybe we can just error out with message that merge into partial updates are not supported with global index.

Impact

Fix Merge Into Partial Updates with Global Index

Risk level (write none, low medium or high below)

low

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:M PR with lines of changes in (100, 300] label Apr 3, 2025
}

// At this point, result.getData() contains a partial record update.
IndexedRecord existingRecord = existing.toIndexedRecord(existingSchema, config.getProps())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's hold off any changes to the non-conventional merging logic that goes through merger or does not go through the file group reader. We can simplify the logic along with unifying the reader path.

Copy link
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic realizing MERGE INTO with partial update goes through the file group reader now which works with global index, mainly fixed by #13600. Closing this PR that is no longer needed.

@yihua yihua closed this Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:M PR with lines of changes in (100, 300]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants