Make getTeamByServers O(1) in time#12938
Merged
spraza merged 2 commits intoapple:release-7.3from Apr 9, 2026
Merged
Conversation
Contributor
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
gxglass
previously approved these changes
Apr 7, 2026
|
|
||
| bool DDTeamCollection::removeTeam(Reference<TCTeamInfo> team) { | ||
| TraceEvent("RemovedServerTeam", distributorId).detail("Team", team->getDesc()); | ||
| if (teamsByServerIDs.find(team->getServerIDsStr()) != teamsByServerIDs.end()) { |
Collaborator
There was a problem hiding this comment.
auto it = teamsByServerIds.find(...);
if (it != thing.end()) {
thing.erase(it);
}
Collaborator
Author
There was a problem hiding this comment.
done, reads better and avoids the redundant hash + lookup
Contributor
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Contributor
Result of foundationdb-pr-clang on Linux RHEL 9
|
Contributor
Result of foundationdb-pr on Linux RHEL 9
|
Contributor
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Contributor
Result of foundationdb-pr-cluster-tests on Linux RHEL 9
|
Contributor
Result of foundationdb-pr-clang on Linux RHEL 9
|
Contributor
Result of foundationdb-pr on Linux RHEL 9
|
Contributor
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Contributor
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
gxglass
approved these changes
Apr 7, 2026
neethuhaneesha
approved these changes
Apr 8, 2026
alecgrieser
approved these changes
Apr 8, 2026
Collaborator
Author
|
Thanks for the reviews. I'm doing some additional perf testing against this change. Hoping to merge later this week or early next week once I have positive signal from the perf testing. |
Collaborator
Based on experience in production so far I am comfortable merging |
spraza
added a commit
to spraza/foundationdb
that referenced
this pull request
Apr 16, 2026
* Make getTeamByServers O(1) in time * address feedback
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Under certain workloads (large data movement, storage migration, etc.), with SHARD_ENCODE_LOCATION_METADATA enabled, DD can get indefinitely stuck at initialization time, because getTeamByServers function saturates the CPU and starves other critical DD actors to complete (symptom: txn_too_old).
This PR is a perf optimization that makes getTeamByServers O(1) instead of O(teams). Previously, for every team, we were doing expensive CPU operations.
500K: 20260407-065022-praza-6a689f78cbedc591 compressed=True data_size=35201181 duration=15433678 ended=500000 fail_fast=50 max_runs=500000 pass=500000 priority=100 remaining=0 runtime=2:28:17 sanity=False started=500000 stopped=20260407-091839 submitted=20260407-065022 timeout=5400 username=praza
ctests passed:
Post feedback 500K: 20260407-232634-praza-193a7d8cc8dd5b43 compressed=True data_size=35200768 duration=14012243 ended=500000 fail=2 fail_fast=50 max_runs=500000 pass=499998 priority=100 remaining=0 runtime=22:58:12 sanity=False started=500000 stopped=20260408-222446 submitted=20260407-232634 timeout=5400 username=praza.
The two failures are in ConfigureStorageMigration restart test, does not seem related to the additional feedback commit.
Code-Reviewer Section
The general pull request guidelines can be found here.
Please check each of the following things and check all boxes before accepting a PR.
For Release-Branches
If this PR is made against a release-branch, please also check the following:
release-branchormainif this is the youngest branch)