New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rgw: address crash and race in RGWIndexCompletionManager #45882
Conversation
c75b06b
to
c650ddf
Compare
@ivancich which downstream branches do we need this fix on? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you hack the int decl?
src/rgw/rgw_rados.cc
Outdated
@@ -774,7 +774,7 @@ struct complete_op_data { | |||
|
|||
class RGWIndexCompletionManager { | |||
RGWRados* const store; | |||
const int num_shards; | |||
const uint num_shards; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would prefer uint32_t or at worst, "unsigned int"
src/rgw/rgw_rados.cc
Outdated
// used to distribute the completions and the locks they use across | ||
// their respective vectors; it will get incremented and can wrap | ||
// around back to 0 without issue | ||
std::atomic<uint> cur_shard {0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
src/rgw/rgw_rados.cc
Outdated
int result = cur_shard % num_shards; | ||
cur_shard++; | ||
return result; | ||
uint next_shard() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup
c650ddf
to
4c8ec26
Compare
An atomic int was used in a modulo operator to distribute contention among a set of locks and to track completions. Because it was an int, enough increments would cause it to go negative (due to twos-complement encoding and overflow) thereby causing a crash. Additionally, even though it was atomic, the read and increment were separate operations, leading to a race. This commit addresses both of these issues. Signed-off-by: J. Eric Ivancich <ivancich@redhat.com>
4c8ec26
to
41f4e83
Compare
An atomic int was used in a modulo operator to distribute contention
among a set of locks and to track completions. Because it was an int,
enough increments would cause it to go negative (due to
twos-complement encoding and overflow) thereby causing a
crash. Additionally, even though it was atomic, the read and increment
were separate operations, leading to a race.
This commit addresses both of these issues.
Signed-off-by: J. Eric Ivancich ivancich@redhat.com
Fixes: https://tracker.ceph.com/issues/55131
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows