Skip to content

Reduce address comparisons for network topology replica calculation#532

Closed
mpenick wants to merge 8 commits intomasterfrom
v2.15.2-large-rf
Closed

Reduce address comparisons for network topology replica calculation#532
mpenick wants to merge 8 commits intomasterfrom
v2.15.2-large-rf

Conversation

@mpenick
Copy link
Copy Markdown
Contributor

@mpenick mpenick commented Aug 15, 2022

This uses a DenseHashSet to keep prevent duplicate replicas instead of doing a linear scan through the existing replicas. I'm seeing around a 4.5x speed up for larger replication factors (rf = 54).

Michael Penick added 2 commits August 8, 2022 09:56
This uses a `DenseHashSet` to keep prevent duplicate replicas instead of doing a linear scan through the existing replicas. I'm seeing around a 4.5x speed up for larger replication factors (rf = 54).
size_t replication_factor = 3;
size_t total_replicas = std::min(num_hosts, replication_factor) * num_dcs;
size_t replication_factor = 54;
size_t total_replicas = replication_factor;
Copy link
Copy Markdown
Contributor Author

@mpenick mpenick Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should likely be reverted. This is a pathological use case though.


static size_t size_of(const String& value) { return value.size(); }

static size_t size_of(const Address& value) { char buf[16]; return value.to_inet(buf); }
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes fix test warnings.

skipped_endpoints_this_dc.push_back(curr_token_it);
} else {
if (add_replica(replicas, Host::Ptr(host))) {
if (replicas_set.insert(host).second) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just embed the changes in the add_replica()?
another option is to try keep replicas sorted and use binary search? i.e. use
std::sort(), or even std::stable_sort() that should be faster for sorted or, almost sorted data, then just use std::lower_bound() with custom Comparator to check if the host needs to be added.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The token order of replicas matters.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still could embed the logic in to add_replica()

@absurdfarce absurdfarce deleted the v2.15.2-large-rf branch October 29, 2025 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants