Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-2862] Table upgrade and metadata table bootstrap under the table lock #4124

Conversation

manojpec
Copy link
Contributor

What is the purpose of the pull request

Today table upgrade need is detected by the first write client after upgrade
and if there is write concurrency mode configured it grabs the transaction
lock to protect the table upgrade from other concurrent writers. However, the
follow on metadata table creation and the initial bootstrapping also needs
similar global protection to avoid race in inflight commits and the metadata
table bootstrapping proces. Made the table upgrade and the follow-on metadata
table creation and there by the initial bootstrapping process under the
table level lock to avoid potential race with concurrent writers and other
async table services.

Brief change log

(for example:)

  • Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@manojpec manojpec changed the title [HUDI-2862] Table upgrade and metadata table bootstrap under the table lock [WIP] [HUDI-2862] Table upgrade and metadata table bootstrap under the table lock Nov 26, 2021
@manojpec manojpec force-pushed the fix/HUDI-2862-table-upgrade-metadata-table-bootstrap-under-table-lock branch from 8f6c03b to bd1b6ab Compare November 26, 2021 09:17
@manojpec
Copy link
Contributor Author

@nsivabalan @vinothchandar :

This PR is based off #4114. The commit to review here is the last one only bd1b6ab

@manojpec manojpec changed the title [WIP] [HUDI-2862] Table upgrade and metadata table bootstrap under the table lock [HUDI-2862] Table upgrade and metadata table bootstrap under the table lock Nov 26, 2021
Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 corner case

@manojpec manojpec force-pushed the fix/HUDI-2862-table-upgrade-metadata-table-bootstrap-under-table-lock branch from bd1b6ab to 4be51b1 Compare November 26, 2021 18:35
@manojpec manojpec force-pushed the fix/HUDI-2862-table-upgrade-metadata-table-bootstrap-under-table-lock branch 2 times, most recently from 3e0ca30 to ec2388b Compare November 26, 2021 19:39
manojpec and others added 7 commits November 26, 2021 11:41
…ites

  - Across upgrades, metadata table may have to be re-bootstrapped from the
    data table. Usually it is detected by the first write client, and the
    metadata table is fully removed and attempt is made to bootstrap. This
    bootstrapping is deferred if there are pending actions on the timeline
    and until they are resolved.

  - Today, spark write client gets the metadata writer for every write.
    The metadata writer initialization attempt is made only once to avoid
    on the expensive fs exists check on every write path. If the metadata
    bootstrapping fails in the first time due to pending actions (which is
    usually the case after upgrade), the writer client never tries to
    recreate it again, leading to it not updating the metadata table even
    after bootstrapping by other writers.

  - Fix is to retry metadata table creation until it is successful by
    either its own bootstrapping or from the concurrent table services or
    other writers
 - SparkRDDWriteClient constructor will no more create the metadata table writer
   as it is one time only for the continuous mode and also can potentially race
   with same metadata table bootstrapping from other concurrent writers

 - Made the HoodieTable#getMetadataWriter() to always create the metadata table
   and there by attempt bootstrapping if needed to avoid continuous writers
   missing out the metadata table update

 - SparkRDDWriteClient#getTableAndInitCtx() will also attempt to create the
   metadata table just after the table upgrade to bootstrap if needed.
 - When getting the metadata writer, the inflight instant timestamp is passed
   in so that the MetadataTableWriter can ignore the inflight action when
   bootstrapping the table if needed.
 - Incorporating review comments on the variable naming and code style
… table lock

 - Today table upgrade need is detected by the first write client after upgrade
   and if there is write concurrency mode configured it grabs the transaction
   lock to protect the table upgrade from other concurrent writers. However, the
   follow on metadata table creation and the initial bootstrapping also needs
   similar global protection to avoid race in inflight commits and the metadata
   table bootstrapping proces. Made the table upgrade and the follow-on metadata
   table creation and there by the initial bootstrapping process under the
   table level lock to avoid potential race with concurrent writers and other
   async table services.
… table lock

 - Moving the check for under the table lock to make sure concurrent
   writers don't redo the upgrade work redundantly on race
@manojpec manojpec force-pushed the fix/HUDI-2862-table-upgrade-metadata-table-bootstrap-under-table-lock branch from ec2388b to 0aed0f2 Compare November 26, 2021 19:48
@hudi-bot
Copy link

CI report:

  • 27c4ff2ffb00f6e1fb73081d52004007544faab0 UNKNOWN
  • 3e0ca308a88b536aaee9d9ad68205b96c682ef23 UNKNOWN
  • 0aed0f2 Azure: FAILURE
Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@manojpec
Copy link
Contributor Author

@nsivabalan Have opened #4134 in place of this to workaround rebase/merge issues

@manojpec
Copy link
Contributor Author

This change is pulled into #4114. Closing this.

@manojpec manojpec closed this Nov 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants