Skip to content

[HUDI-6117] Parallelize the initial creation of file groups for a new MDT partition.#8527

Merged
nsivabalan merged 1 commit intoapache:masterfrom
prashantwason:pw_parallel_filegroup_creation
May 8, 2023
Merged

[HUDI-6117] Parallelize the initial creation of file groups for a new MDT partition.#8527
nsivabalan merged 1 commit intoapache:masterfrom
prashantwason:pw_parallel_filegroup_creation

Conversation

@prashantwason
Copy link
Member

[HUDI-6117] Parallelize the initial creation of file groups for a new MDT partition.

Change Logs

File group creation is parallelized using engineContext.foreach.
Previous leftover files in the MDT partition are deleted before creation.

Impact

Faster file group creation when there are a large number of file groups for a MDT partition.
Fixes the issue where previous failed initialization could have left over partially or wholly written log files with different instant time.

Risk level (write none, low medium or high below)

None

Documentation Update

None

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

… MDT partition.

Any existing / leftover files groups in that partition are also cleaned up to deal with previous failed initialization attempts which may have used a different instant time or different count of file groups.
@danny0405 danny0405 self-assigned this Apr 21, 2023
@danny0405 danny0405 added metadata area:performance Performance optimizations labels Apr 21, 2023
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan self-assigned this May 2, 2023
@nsivabalan nsivabalan added release-0.14.0 priority:blocker Production down; release blocker labels May 2, 2023
@nsivabalan nsivabalan removed their assignment May 2, 2023
@nsivabalan nsivabalan merged commit 45b7936 into apache:master May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:performance Performance optimizations priority:blocker Production down; release blocker release-0.14.0

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants