Skip to content

Conversation

@cbodley
Copy link
Contributor

@cbodley cbodley commented Aug 30, 2024

allow clients to request more than the default 1000 keys per request

bucket listing of large buckets (with many shards) can be more efficient with larger batches. raising max-keys also improves the balls-into-bins algorithm (see #30853) which we artificially cap at a minimum of 8 keys per shard

with the default max-keys=1000, we reach this min=8 at around 350 shards. above that shard count, we're always requesting more than the optimal number of entries per shard, so wasting i/o and bandwidth

as we raise the max-keys, we push this minimum to higher shard counts:

max-keys shard count
1000 350
2000 720
3000 1100
4000 1500
5000 1900

this table was calculated with the formula (using various values of max-keys):

solve 8 = 1 + (1000/x + sqrt(2 * 1000 * log10(1000) / x))

ex. https://www.wolframalpha.com/input?i=solve+8+%3D+1+%2B+%281000%2Fx+%2B+sqrt%282+*+1000+*+log10%281000%29+%2F+x%29%29

raising max-keys does increase the total memory usage of each bucket listing request, which gets into megabytes (tens to hundreds?). so i chose 5000 as a cutoff that covers most of the range up to our current rgw_max_dynamic_shards=1999

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

Signed-off-by: Casey Bodley <cbodley@redhat.com>
allow clients to request more than the default 1000 keys per request

bucket listing of large buckets (with many shards) can be more efficient
with larger batches. raising max-keys also improves the balls-into-bins
algorithm which we artificially cap at a minimum of 8 keys per shard

with the default max-keys=1000, we reach this min=8 at around 350
shards. above that shard count, we're always requesting more than the
optimal number of entries per shard, so wasting i/o and bandwidth

as we raise the max-keys, we push this minimum to higher shard
counts:

| max-keys | shard count |
|----------|-------------|
|     1000 |         350 |
|     2000 |         720 |
|     3000 |        1100 |
|     4000 |        1500 |
|     5000 |        1900 |

this table was calculated with the formula (using various values of max-keys):

> solve 8 = 1 + (1000/x + sqrt(2 * 1000 * log10(1000) / x))

ex. https://www.wolframalpha.com/input?i=solve+8+%3D+1+%2B+%281000%2Fx+%2B+sqrt%282+*+1000+*+log10%281000%29+%2F+x%29%29

raising max-keys does increase the total memory usage of each bucket
listing request, which gets into megabytes (tens to hundreds?). so i
chose 5000 as a cutoff that covers most of the range up to our current
rgw_max_dynamic_shards=1999

Signed-off-by: Casey Bodley <cbodley@redhat.com>
@cbodley cbodley requested review from a team and ivancich August 30, 2024 14:27
@cbodley cbodley added the rgw label Aug 30, 2024
@adamemerson
Copy link
Contributor

jenkins test make check

@cbodley
Copy link
Contributor Author

cbodley commented Sep 9, 2024

@cbodley cbodley merged commit 03f572d into ceph:main Sep 9, 2024
@cbodley cbodley deleted the wip-rgw-listing-max-entries branch September 9, 2024 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants