crimson: fix incorrect interval check in PG::submit_transaction #49935

athanatos · 2023-01-31T04:27:51Z

Removes incorrect interval check in PG::submit_transaction as well as a few other checks which duplicate IOInterruptCondition. This PR also adds logging improvements that were helpful in tracking down the bug.

Fixes: https://tracker.ceph.com/issues/58486

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows

These have been handy in imposing uniformity on seastore logging, let's start using them in PG/OSD code. Signed-off-by: Samuel Just <sjust@redhat.com>

Signed-off-by: Samuel Just <sjust@redhat.com>

These two state variables duplicate checks that *should* already be handled by the IOInterruptCondition. None of the stopping checks should ever trigger because the caller would be in an interruptible future context which already performed that check. Moreover, peering doesn't really work -- it relies on the callback firing prior the call to on_activate_complete(), and there isn't any guarantee that will happen. Storing the epoch from when the callback was created as we do in IOInterruptCondition would be required. Signed-off-by: Samuel Just <sjust@redhat.com>

It's required and we don't check for null. Signed-off-by: Samuel Just <sjust@redhat.com>

Signed-off-by: Samuel Just <sjust@redhat.com>

athanatos · 2023-01-31T04:29:06Z

https://pulpito.ceph.com/sjust-2023-01-30_19:56:26-crimson-rados-wip-sjust-58486-distro-default-smithi/ -- failures due to rbd issues

athanatos · 2023-01-31T04:32:24Z

Modifies common/dout.h, but I'm disinclined to do a rados run since there's a large backlog of tests pending and it's a pretty trivial change more likely to cause a build issue than a runtime one.

…sction It's entirely fine for the map_epoch to change while the op is processed as long as none of the intervening epochs caused an interval change. In general, the correct way to check for an interval change is with has_reset_since. We don't need to do this check here at all because IOInterruptCondition will already have performed it against the captured epoch when the continuation resumed. Introduced: 3141802 Fixes: https://tracker.ceph.com/issues/58486 Signed-off-by: Samuel Just <sjust@redhat.com>

xxhdx1985126 · 2023-01-31T05:27:28Z

@athanatos I think this PR is laying the ground work for fixing https://tracker.ceph.com/issues/58486, am I right?

athanatos · 2023-01-31T15:23:47Z

@xxhdx1985126 The last commit should fix it.

xxhdx1985126 · 2023-02-01T01:58:19Z

@xxhdx1985126 The last commit should fix it.

Ah, yes, pg's acting/up set may be the same in different osd maps, am I right?

athanatos · 2023-02-01T03:14:40Z

@xxhdx1985126 Precisely. For any specific PG, the vast majority of map changes do not affect the up or acting set. Many map changes contain only changes to things like snapshot metadata and therefore affect no pg's up/acting set. For a specific PG, we call a contiguous sequence of map epochs with no changes to the up/acting set (or a few other things) an interval. See osd_types.h PastIntervals::is_new_interval() for the precise definition of when two consecutive maps constitute an interval change.

athanatos added 8 commits January 28, 2023 01:20

crimson/common: move log macros to log.h from os/seastore/logging.h

3ae7e4d

These have been handy in imposing uniformity on seastore logging, let's start using them in PG/OSD code. Signed-off-by: Samuel Just <sjust@redhat.com>

crimson/osd/replicated_backend: use new logging macros

10ea3fe

Signed-off-by: Samuel Just <sjust@redhat.com>

crimson/osd/pg_interval_interrupt_condition: use new logging macros

6b99bb0

Signed-off-by: Samuel Just <sjust@redhat.com>

crimson/osd/object_context_loader: take backend by reference

edf4f3e

It's required and we don't check for null. Signed-off-by: Samuel Just <sjust@redhat.com>

crimson/osd/object_context_loader: convert to log macros

4abbbc5

Signed-off-by: Samuel Just <sjust@redhat.com>

crimson/osd/object_context_loader: add logging for notify_on_change

0355c52

Signed-off-by: Samuel Just <sjust@redhat.com>

crimson/osd/osd_operations/client_request: clarify get_obc stage message

a389c31

Signed-off-by: Samuel Just <sjust@redhat.com>

athanatos assigned rzarzynski Jan 31, 2023

athanatos requested a review from a team as a code owner January 31, 2023 04:27

athanatos assigned xxhdx1985126 Jan 31, 2023

github-actions bot added common crimson labels Jan 31, 2023

athanatos force-pushed the sjust/wip-58486 branch from 52194ee to ab91b7c Compare January 31, 2023 04:35

xxhdx1985126 approved these changes Jan 31, 2023

View reviewed changes

athanatos requested a review from rzarzynski January 31, 2023 22:33

athanatos merged commit 8a095e9 into ceph:main Feb 2, 2023

athanatos mentioned this pull request Feb 8, 2023

crimson/net: refactors to encapsulate ProtocolV2 message read and write paths and event dispatching #49420

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crimson: fix incorrect interval check in PG::submit_transaction #49935

crimson: fix incorrect interval check in PG::submit_transaction #49935

athanatos commented Jan 31, 2023

athanatos commented Jan 31, 2023

athanatos commented Jan 31, 2023

xxhdx1985126 commented Jan 31, 2023

athanatos commented Jan 31, 2023

xxhdx1985126 commented Feb 1, 2023

athanatos commented Feb 1, 2023

crimson: fix incorrect interval check in PG::submit_transaction #49935

crimson: fix incorrect interval check in PG::submit_transaction #49935

Conversation

athanatos commented Jan 31, 2023

athanatos commented Jan 31, 2023

athanatos commented Jan 31, 2023

xxhdx1985126 commented Jan 31, 2023

athanatos commented Jan 31, 2023

xxhdx1985126 commented Feb 1, 2023

athanatos commented Feb 1, 2023