Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: address crash and race in RGWIndexCompletionManager #45882

Merged
merged 1 commit into from Apr 30, 2022

Conversation

ivancich
Copy link
Member

@ivancich ivancich commented Apr 12, 2022

An atomic int was used in a modulo operator to distribute contention
among a set of locks and to track completions. Because it was an int,
enough increments would cause it to go negative (due to
twos-complement encoding and overflow) thereby causing a
crash. Additionally, even though it was atomic, the read and increment
were separate operations, leading to a race.

This commit addresses both of these issues.

Signed-off-by: J. Eric Ivancich ivancich@redhat.com

Fixes: https://tracker.ceph.com/issues/55131

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@mattbenjamin mattbenjamin self-requested a review April 14, 2022 14:27
@mattbenjamin
Copy link
Contributor

@ivancich which downstream branches do we need this fix on?

@adamemerson adamemerson removed their assignment Apr 14, 2022
Copy link
Contributor

@mattbenjamin mattbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you hack the int decl?

@@ -774,7 +774,7 @@ struct complete_op_data {

class RGWIndexCompletionManager {
RGWRados* const store;
const int num_shards;
const uint num_shards;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer uint32_t or at worst, "unsigned int"

// used to distribute the completions and the locks they use across
// their respective vectors; it will get incremented and can wrap
// around back to 0 without issue
std::atomic<uint> cur_shard {0};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

int result = cur_shard % num_shards;
cur_shard++;
return result;
uint next_shard() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup

@ivancich ivancich force-pushed the wip-index-completion-mgr-crash branch from c650ddf to 4c8ec26 Compare April 26, 2022 21:49
An atomic int was used in a modulo operator to distribute contention
among a set of locks and to track completions. Because it was an int,
enough increments would cause it to go negative (due to
twos-complement encoding and overflow) thereby causing a
crash. Additionally, even though it was atomic, the read and increment
were separate operations, leading to a race.

This commit addresses both of these issues.

Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
@ivancich ivancich force-pushed the wip-index-completion-mgr-crash branch from 4c8ec26 to 41f4e83 Compare April 26, 2022 22:37
@ivancich ivancich changed the title rgw: address crash and race in index completion mgr rgw: address crash and race in RGWIndexCompletionManager Apr 26, 2022
@ivancich ivancich added the wip-eric-testing-1 for ivancich testing label Apr 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants