Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: add shard reduction ability to dynamic resharding #57538

Merged
merged 4 commits into from
Jun 3, 2024

Conversation

ivancich
Copy link
Member

@ivancich ivancich commented May 18, 2024

Previously, dynamic resharding could only increase the number of bucket index shards for a given bucket. This adds the ability to also reduce the number of shards.

So in addition the existing 100,000 entries (current default value) per shard trigger for an increase, there's a new trigger of 10,000 entries per shard for a decrease.

However, for buckets with object-counts that go up and down regularly, we don't want to keep resharding up and down to chase the number of objects. So for shard reduction to take place there's also a time delay (default 5 days). Once the entry on the reshard queue (log) is added for reduction, processing will not result in a reshard reduction within this delay period as the queue is processed. Only when the reshard entry is processed after this delay can it perform the shard reduction.

However, if at any point between the time the shard reduction entry is added to the queue and after the delay, if the entry is processed and there are not few enough entries to trigger a shard reduction, the entry on the reshard queue entry will be discarded.

So using the defaults, this effectively means the bucket must have few enough objects for a shard reduction for 5 consecutive days before the reshard will take place.

Fixes: https://tracker.ceph.com/issues/66104

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

src/rgw/driver/rados/rgw_reshard.cc Outdated Show resolved Hide resolved
desc: Whether dynamic resharding can reduce the number of shards
long_desc: If true, RGW's dynamic resharding ability is allowed to
reduce the number of shards if it appears there are too many.
default: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason this shouldn't default to true?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we sit with changes in default behaviour between releases? Asking because when users upgrade their clusters, this feature would be disabled by default, which is preferable, IMO - that way clusters will continue to behave as they did prior to the upgrade, and users can opt in if they choose to by enabling the new feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @nhoad. conversely, if we invest in general enhancements that aren't enabled by default, the majority of users won't benefit because they don't know to enable them. ceph has too many knobs already and can be difficult to tune correctly. we really should strive for optimal behavior by default

i think all clusters should benefit from appropriately-sized bucket indices. if this change adds new risks to existing deployments, then i'd like to explore those and try to resolve them at the design level

src/common/options/rgw.yaml.in Outdated Show resolved Hide resolved
src/rgw/driver/rados/rgw_reshard.cc Outdated Show resolved Hide resolved
@ivancich ivancich force-pushed the wip-shrinky-dink branch 4 times, most recently from fc59dd4 to 70fc0a7 Compare May 23, 2024 13:25
Comment on lines 1050 to 1055
if (reshard_log) {
ret = reshard_log->update(dpp, bucket_info, y);
ret = reshard_log->update(dpp, bucket_info, initiator, y);
if (ret < 0) {
return ret;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbodley : I'm trying to think of why when having decided to execute a reshard we would update the reshard queue (log). In testing I've commented out this logic and I've seen no adverse impact, although maybe I haven't brushed up against the reason it's there. Do you have any thoughts?

@ivancich ivancich requested review from cbodley and nhoad May 29, 2024 18:42
@ivancich
Copy link
Member Author

@cbodley: Following your insight during the first review, I've added a commit that now tracks the initiator of the entry on the reshard queue (log). It adds a field to the at-rest data structure. Valid values are Admin and Dynamic, and it's an enum class, so other options can be added in the future. And there are logic updates that depend on this setting.

Copy link
Contributor

@mattbenjamin mattbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve the content and the name of this PR

src/rgw/driver/rados/rgw_rados.cc Show resolved Hide resolved
src/rgw/driver/rados/rgw_reshard.cc Outdated Show resolved Hide resolved
Adds the ability to prevent overwriting a reshard queue (log) entry
for a given bucket with a newer entry. This adds a flag to the op, so
it will either CREATE or make no changes. If an entry already exists
when this flag is set, -EEXIST will be returned.

This is a preparatory step to adding shard reduction to dynamic
resharding.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
@ivancich
Copy link
Member Author

jenkins test make check

Previously, dynamic resharding could only *increase* the number of
bucket index shards for a given bucket. This adds the ability to also
*reduce* the number of shards.

So in addition the existing 100,000 entries (current default value)
per shard trigger for an increase, there's a new trigger of 10,000
entries per shard for a decrease.

However, for buckets with object-counts that go up and down regularly,
we don't want to keep resharding up and down to chase the number of
objects. So for shard reduction to take place there's also a time
delay (default 5 days). Once the entry on the reshard queue (log) is
added for reduction, processing will not result in a reshard reduction
within this delay period as the queue is processed. Only when the
reshard entry is processed after this delay can it perform the shard
reduction.

However, if at any point between the time the shard reduction entry is
added to the queue and after the delay, if the entry is processed and
there are *not* few enough entries to trigger a shard reduction, the
entry on the reshard queue entry will be discarded.

So using the defaults, this effectively means the bucket must have few
enough objects for a shard reduction for 5 consecutive days before the
reshard will take place.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
Adds a config option rgw_reshard_debug_interval that will allow us to
make the resharding algorithms run on a faster schedule by allowing
one day to be simulated by a set number of seconds.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
The logic for managing the reshard queue (log) can vary depending on
whether the entry was added by an admin or by dynamic resharding. For
example, if it's a reshard reduction, dynamic resharding won't
overwrite the queue entry so as not to disrupt the reduction wait
period. On the other hand, and admin should be able to overwrite the
entry at will.

So we now track the initiator of each entry on the queue. This adds
another field to that at rest data structure, and it updates the logic
to make use of it.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
@ivancich
Copy link
Member Author

ivancich commented Jun 1, 2024

jenkins test api

@ivancich
Copy link
Member Author

ivancich commented Jun 1, 2024

jenkins test windows

@ivancich ivancich merged commit 087aaa2 into ceph:main Jun 3, 2024
10 of 11 checks passed
@cbodley
Copy link
Contributor

cbodley commented Jun 5, 2024

seeing a new test_rgw_reshard.py failure on main that i suspect is related: https://tracker.ceph.com/issues/66367

@cbodley
Copy link
Contributor

cbodley commented Jun 5, 2024

p.s. this change deserves a mention in PendingReleaseNotes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants