Skip to content

[HUDI-8513] Fix equals in HoodieRecordGlobalLocation#12255

Merged
yihua merged 1 commit intoapache:masterfrom
linliu-code:fix_record_location
Nov 15, 2024
Merged

[HUDI-8513] Fix equals in HoodieRecordGlobalLocation#12255
yihua merged 1 commit intoapache:masterfrom
linliu-code:fix_record_location

Conversation

@linliu-code
Copy link
Collaborator

Change Logs

We remove the position from the equals function; otherwise, the workload profile will be huge and can cause OOM issue.

Impact

Better performance when using global indexes.

Risk level (write none, low medium or high below)

Low.

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:S PR with lines of changes in (10, 100] label Nov 14, 2024
&& Objects.equals(instantTime, otherLoc.instantTime)
&& Objects.equals(fileId, otherLoc.fileId)
&& Objects.equals(position, otherLoc.position);
&& Objects.equals(fileId, otherLoc.fileId);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any side effects wrt positional deletes if we remove position from equals? Also, don't we need to keep the same fields in hashcode as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hashCode function does not use position at the first place. Do not use any use case. HoodieRecordLocation has the same logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the position is not even used anywhere, we better remove it from the class, the only usage is from HoodieKeyLocationFetchHandle.globalLocations, where the getHoodieKeyIterator should be used instead of fetchRecordKeysWithPositions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant the position is used from this class, but there should not be any cases where we need to group records at position level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see any usesage of the position outside this class, can you confirm that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HoodieGlobalSimpleIndex#tagLocationInternal uses fetchRecordGlobalLocations which put the position of records into the HoodieRecordGlobalLocation instances. Such instance is later put into the corresponding HoodieRecord instance if the record is an update.

@linliu-code
Copy link
Collaborator Author

Re-trigger the CI.

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

&& Objects.equals(instantTime, otherLoc.instantTime)
&& Objects.equals(fileId, otherLoc.fileId)
&& Objects.equals(position, otherLoc.position);
&& Objects.equals(fileId, otherLoc.fileId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HoodieGlobalSimpleIndex#tagLocationInternal uses fetchRecordGlobalLocations which put the position of records into the HoodieRecordGlobalLocation instances. Such instance is later put into the corresponding HoodieRecord instance if the record is an update.

@yihua yihua changed the title [HUDI-8513] Fix a bug in HoodieRecordGlobalLocation [HUDI-8513] Fix equals in HoodieRecordGlobalLocation Nov 15, 2024
@yihua yihua merged commit 1c2ca4a into apache:master Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S PR with lines of changes in (10, 100]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants