Skip to content

Comments

[HUDI-5261] Use Proper Parallelism for Engine Context APIs#7449

Closed
jonvex wants to merge 1 commit intoapache:masterfrom
jonvex:use_proper_parallelism
Closed

[HUDI-5261] Use Proper Parallelism for Engine Context APIs#7449
jonvex wants to merge 1 commit intoapache:masterfrom
jonvex:use_proper_parallelism

Conversation

@jonvex
Copy link
Contributor

@jonvex jonvex commented Dec 13, 2022

Change Logs

A lot of occurrences are using number of items as parallelism, which affect performance. Parallelism should be based on num cores available in the cluster and set by user via parallelism configs.

Impact

Better, more tunable performance.

Risk level (write none, low medium or high below)

low

Documentation Update

N/A

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@jonvex
Copy link
Contributor Author

jonvex commented Dec 13, 2022

FileSystemBackedTableMetadata has config
DEFAULT_LISTING_PARALLELISM = 1500;
which seems pretty high

@jonvex
Copy link
Contributor Author

jonvex commented Dec 13, 2022

TimelineServerPerf has numExecuters with a default of 10

But then also has numCoresPerExecutor also with a default of 10

Something seems off here. Maybe it's supposed to be numExecutors per core? Whatever it is, those configs seem to conflict

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants