New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crimson: fix incorrect interval check in PG::submit_transaction #49935
Conversation
These have been handy in imposing uniformity on seastore logging, let's start using them in PG/OSD code. Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
These two state variables duplicate checks that *should* already be handled by the IOInterruptCondition. None of the stopping checks should ever trigger because the caller would be in an interruptible future context which already performed that check. Moreover, peering doesn't really work -- it relies on the callback firing prior the call to on_activate_complete(), and there isn't any guarantee that will happen. Storing the epoch from when the callback was created as we do in IOInterruptCondition would be required. Signed-off-by: Samuel Just <sjust@redhat.com>
It's required and we don't check for null. Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
Signed-off-by: Samuel Just <sjust@redhat.com>
https://pulpito.ceph.com/sjust-2023-01-30_19:56:26-crimson-rados-wip-sjust-58486-distro-default-smithi/ -- failures due to rbd issues |
Modifies common/dout.h, but I'm disinclined to do a rados run since there's a large backlog of tests pending and it's a pretty trivial change more likely to cause a build issue than a runtime one. |
…sction It's entirely fine for the map_epoch to change while the op is processed as long as none of the intervening epochs caused an interval change. In general, the correct way to check for an interval change is with has_reset_since. We don't need to do this check here at all because IOInterruptCondition will already have performed it against the captured epoch when the continuation resumed. Introduced: 3141802 Fixes: https://tracker.ceph.com/issues/58486 Signed-off-by: Samuel Just <sjust@redhat.com>
52194ee
to
ab91b7c
Compare
@athanatos I think this PR is laying the ground work for fixing https://tracker.ceph.com/issues/58486, am I right? |
@xxhdx1985126 The last commit should fix it. |
Ah, yes, pg's acting/up set may be the same in different osd maps, am I right? |
@xxhdx1985126 Precisely. For any specific PG, the vast majority of map changes do not affect the up or acting set. Many map changes contain only changes to things like snapshot metadata and therefore affect no pg's up/acting set. For a specific PG, we call a contiguous sequence of map epochs with no changes to the up/acting set (or a few other things) an interval. See osd_types.h PastIntervals::is_new_interval() for the precise definition of when two consecutive maps constitute an interval change. |
Removes incorrect interval check in PG::submit_transaction as well as a few other checks which duplicate IOInterruptCondition. This PR also adds logging improvements that were helpful in tracking down the bug.
Fixes: https://tracker.ceph.com/issues/58486
Show available Jenkins commands
jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows