Skip to content

[HUDI-6652] Implement basePath-level synchronization in runHoodieMetaSync#9374

Merged
codope merged 1 commit intoapache:masterfrom
codope:concurrent-meta-sync
Aug 6, 2023
Merged

[HUDI-6652] Implement basePath-level synchronization in runHoodieMetaSync#9374
codope merged 1 commit intoapache:masterfrom
codope:concurrent-meta-sync

Conversation

@codope
Copy link
Member

@codope codope commented Aug 5, 2023

Change Logs

This PR introduces a targeted synchronization mechanism based on the targetBasePath in the runHoodieMetaSync method. The previous class-level synchronization has been replaced with finer-grained locking, allowing concurrent processing for different base paths. Added a unti test to check two syncs can run concurrently if it's different base paths.

Impact

Meta sync wouldn't block for multi table deltastreamer.

Risk level (write none, low medium or high below)

low

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@codope codope added component:catalog-sync Catalog-sync related priority:blocker Production down; release blocker release-0.14.0 labels Aug 5, 2023
}

// Get or create a lock for the specific table
Lock tableLock = TABLE_LOCKS.computeIfAbsent(targetBasePath, k -> new ReentrantLock());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts on keying off of the table base path and the syncToolClassName to allow concurrent updates to multiple meta syncs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that for the same table different sync tools run one after the other, so did not see the need to concat tool class to the key. In future, if we plan to run everything concurrently, we can add tool class the the key.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we can always extend it with more sophisticated requests.

@apache apache deleted a comment from hudi-bot Aug 6, 2023
@hudi-bot
Copy link
Collaborator

hudi-bot commented Aug 6, 2023

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@codope codope merged commit cecd79e into apache:master Aug 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:catalog-sync Catalog-sync related priority:blocker Production down; release blocker release-0.14.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants