Skip to content

[MINOR] Add rowId field to HoodieRecordIndexInfo in metadata payload#9286

Closed
codope wants to merge 3 commits intoapache:masterfrom
codope:rli-schema-rowid
Closed

[MINOR] Add rowId field to HoodieRecordIndexInfo in metadata payload#9286
codope wants to merge 3 commits intoapache:masterfrom
codope:rli-schema-rowid

Conversation

@codope
Copy link
Member

@codope codope commented Jul 26, 2023

Change Logs

Add rowId long field to HoodieRecordIndexInfo in metadata payload. The rowId is not being read/written right now. Default is 0L. In future, we would want to map a record to file and rowId (to indicate offset within a page in column chunk) in record index, then this field would be useful.

Impact

None. Just an addition of a long field which should be backwards compatible.

Risk level (write none, low medium or high below)

none

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@codope codope marked this pull request as ready for review July 26, 2023 06:10
@danny0405
Copy link
Contributor

Can we introduce it when the rowId really got used if the addition of a long field is backwards compatible?
Can we add some tests for it.

@codope
Copy link
Member Author

codope commented Jul 27, 2023

@danny0405 We will probably be using rowId in this release itself for positional deletes. cc @yihua
I've added a simple test to read record index payload with older schema using newer schema.

@codope codope force-pushed the rli-schema-rowid branch from 8ab387b to f5e1525 Compare July 27, 2023 09:58
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@SteNicholas
Copy link
Member

@codope, could rowId field add to metadata column?

@codope
Copy link
Member Author

codope commented Aug 2, 2023

@codope, could rowId field add to metadata column?

The intention behind adding this to metadata table was to efficiently locate offset within a page. If we add as a meta column, we still need to add it to separately to metadata table. Any particular reason or usecase you think where this field as a meta column would be useful?

@SteNicholas
Copy link
Member

@codope, append mode deduplication may use this field.

Copy link
Contributor

@danny0405 danny0405 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to think through before we move into this direction.

@nsivabalan nsivabalan closed this Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants