Skip to content

[HUDI-7665] Support rolling upgrade to table version 8#12250

Closed
codope wants to merge 2 commits intoapache:masterfrom
codope:hudi-7665-table-props
Closed

[HUDI-7665] Support rolling upgrade to table version 8#12250
codope wants to merge 2 commits intoapache:masterfrom
codope:hudi-7665-table-props

Conversation

@codope
Copy link
Copy Markdown
Member

@codope codope commented Nov 13, 2024

Change Logs

  • Migrating table properties including partition fields, key generators, payload type, bootstrap index type. Handling both upgrade and downgrade
  • Migrating timeline to new layout: a) archived to LSM timeline layout, b) read both json/avro commit metadata, c) rename instants (including clustering action). These are all done for upgrade. For downgrade, I need to write a LSM to legacy archive timeline v1 writer.
  • Full compact the table, to get rid of log files. Both in case of upgrade and downgrade.
  • Drop version 7.
  • Some tests for above.

TODO:

  • LSM to legacy archived timeline writer to use in downgrade.
  • Migration path for CDC and incremental queries.
  • Handle differences between 0.x and 1.x for stuff needed in upgrade e.g compaction (need to compact older file slice), rollback (marker differences b/w 0.14 and 0.15). Though if we compact and delete any leftover markers, it might be okay. Need to test these scenarios.

Impact

Support rolling upgrade to table version 8.

Risk level (write none, low medium or high below)

high

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@github-actions github-actions bot added the size:L PR with lines of changes in (300, 1000] label Nov 13, 2024
String instantTime, SupportsUpgradeDowngrade upgradeDowngradeHelper) {
HoodieTable table = upgradeDowngradeHelper.getTable(config, context);
HoodieTableMetaClient metaClient = table.getMetaClient();
HoodieTableConfig tableConfig = metaClient.getTableConfig();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can trigger a rollback for pending instants first, then remove the any log file markers explicitly if there are any.
This way we can clean up the table as much as possible before any other steps.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, changed the code to rollback and compact in one step.

throw new HoodieException(e);
}
};
lsmTimelineWriter.write(Collections.singletonList(ActiveAction.fromInstants(archivedTimeline.getInstants())),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list for archived timeline is huge, we need to split the list into small batchs and write into the LSM timeline per-batch, by default, just use 10 instants as a batch which is in line with the current behavior.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another concern is that the legacy archived timeline may contain enormous instants there (like several GBs of avro logs), it would be very time-consuming to load the whole legacy archived timeline, maybe we just load the latest avro log(which should be enough for file slicing).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i thought about it and I agree. I wanted to quickly do some local testing to validate the upgrade path. I will do the batching soon.


// Migrate the LSM timeline back to the old archived timeline format
try {
// TODO: Convert instants from the LSM format to the old format
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need timeline archiver v1 from Balaj's PR. cc @bvaradar

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. For now, i've ported over some legacy archiver code (copied from #11923) for local testing. Once that PR lands, I will rebase.

@codope codope force-pushed the hudi-7665-table-props branch from af2c110 to 4cfaa0a Compare November 14, 2024 15:27
@github-actions github-actions bot added size:XL PR with lines of changes > 1000 and removed size:L PR with lines of changes in (300, 1000] labels Nov 14, 2024
@codope codope force-pushed the hudi-7665-table-props branch 3 times, most recently from 12271aa to 9d9683f Compare November 16, 2024 14:05
}

@Override
public int archiveInstants(HoodieEngineContext context, List<HoodieInstant> instantsToArchive, boolean acquireLock) throws IOException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this acquireLock is always false.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is true in UpgradeDowngradeUtils methods upgradeToLSMTimeline and downgradeFromLSMTimeline

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need lock for upgrade/downgrade, is there any concurrency here?

@codope codope force-pushed the hudi-7665-table-props branch from 9d9683f to f5b1aa8 Compare November 17, 2024 15:31
@codope codope force-pushed the hudi-7665-table-props branch from f5b1aa8 to 56bac12 Compare November 20, 2024 17:10
@github-actions github-actions bot added size:L PR with lines of changes in (300, 1000] and removed size:XL PR with lines of changes > 1000 labels Nov 20, 2024
@hudi-bot
Copy link
Copy Markdown
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@codope
Copy link
Copy Markdown
Member Author

codope commented Nov 24, 2024

Closing in favor of #12327

@codope codope closed this Nov 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants