osd/scrub: tag replica scrub messages to identify stale events #42684

ronen-fr · 2021-08-05T15:30:53Z

A lot can happen while a replica is waiting for the backend
to collect the map data. The Primary might, for example, abort
the scrub and start a new one (following no-scrub commands).

The 'token' introduced here tags 'replica scrub resched'
messages with an index value that is modified on each 'release
scrub resources' request from the Primary.

Fixes: https://tracker.ceph.com/issues/52012
Signed-off-by: Ronen Friedman rfriedma@redhat.com

src/osd/OSD.h

src/osd/PG.cc

src/osd/scrubber_common.h

src/osd/scrub_machine.h

src/osd/pg_scrubber.cc

athanatos · 2021-08-10T16:53:39Z

I don't quite understand the token lifecycle. It doesn't look like it gets encoded in any primary->replica messages, so it's purely used for the replica to cancel any outstanding local events when its reservation is canceled?

ronen-fr · 2021-08-10T17:10:37Z

I don't quite understand the token lifecycle. It doesn't look like it gets encoded in any primary->replica messages, so it's purely used for the replica to cancel any outstanding local events when its reservation is canceled?

Yes. (And - even internal to the replica, I started by adding it to some of the events, but found out
later it's not needed)

ronen-fr · 2021-08-10T17:15:38Z

I don't quite understand the token lifecycle. It doesn't look like it gets encoded in any primary->replica messages, so it's purely used for the replica to cancel any outstanding local events when its reservation is canceled?

yes
(we have discussed this issue once, some months ago. At that time - I was under the false assumption that if the Primary is stopping and restarting the scrub, the replica will notice a new interval. So - there are two separate issues to handle:

how can a replica know that the scrub request is new?, and
if handling a new request - what enables us to discard the stale internal events.)

src/osd/scrubber_common.h

src/osd/pg_scrubber.h

neha-ojha · 2021-08-12T20:38:19Z

I don't quite understand the token lifecycle. It doesn't look like it gets encoded in any primary->replica messages, so it's purely used for the replica to cancel any outstanding local events when its reservation is canceled?

yes
(we have discussed this issue once, some months ago. At that time - I was under the false assumption that if the Primary is stopping and restarting the scrub, the replica will notice a new interval. So - there are two separate issues to handle:

how can a replica know that the scrub request is new?, and

if handling a new request - what enables us to discard the stale internal events.)

@ronen-fr how was pre-refactored code handling this? It seems like https://tracker.ceph.com/issues/52012#note-1 was possible even earlier?

github-actions · 2021-08-13T17:33:28Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

github-actions · 2021-08-13T17:33:33Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

ronen-fr · 2021-08-14T13:31:50Z

I don't quite understand the token lifecycle. It doesn't look like it gets encoded in any primary->replica messages, so it's purely used for the replica to cancel any outstanding local events when its reservation is canceled?

yes
(we have discussed this issue once, some months ago. At that time - I was under the false assumption that if the Primary is stopping and restarting the scrub, the replica will notice a new interval. So - there are two separate issues to handle:

how can a replica know that the scrub request is new?, and

if handling a new request - what enables us to discard the stale internal events.)

@ronen-fr how was pre-refactored code handling this? It seems like https://tracker.ceph.com/issues/52012#note-1 was possible even earlier?

(before actual analysis of the code:) I think so. Requires a combination of "scrub abort" and max_osd_scrubs>1.

github-actions · 2021-08-16T12:40:53Z

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

A lot can happen while a replica is waiting for the backend to collect the map data. The Primary might, for example, abort the scrub and start a new one (following no-scrub commands). The 'token' introduced here tags 'replica scrub resched' messages with an index value that is modified on each 'release scrub resources' request from the Primary. Fixes: https://tracker.ceph.com/issues/52012 Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

…ir scrubs Previously, after-repair scrubs were started without waiting for either local or remote OSDs' scrub resources. The tagging of scrub sessions by the replicas is based on monitoring replica-request and replica-release messages from the primary. Scrub-map requests arriving without any reservations interfere with this mechanism. The benefits of this fast-track were limited at best, and do not justify the complexity of a solution that accommodates both the bypass and the tagging. Signed-off-by: Ronen Friedman <rfriedma@redhat.com>

ronen-fr · 2021-08-21T09:24:07Z

Initial set of 72 tests is clean:
http://pulpito.front.sepia.ceph.com/?branch=wip-ronenf-scrub-token

ronen-fr · 2021-09-12T12:31:47Z

Runs:
original: yuriw-2021-09-09_21:20:54-rados-wip-yuri-testing-2021-09-09-1227-distro-basic-smithi (9F, 16D)
rerun: yuriw-2021-09-10_14:25:47-rados-wip-yuri-testing-2021-09-09-1227-distro-basic-smithi (4F, 1D)

Failures:

#6382926: “cephadm/test_dashboard_e2e.sh” - dashboard related failure. Seems to be https://tracker.ceph.com/issues/52417
#6382927: mgr/rook: might be https://tracker.ceph.com/issues/52321
#6382940: ceph_test_objectstore crash. Opened https://tracker.ceph.com/issues/52576
#6382942: seems the same as #6382926

The dead job: test issue (reimaging)

ronen-fr force-pushed the wip-ronenf-scrub-token branch from 7113ad1 to 1e4453c Compare August 5, 2021 16:42

github-actions bot added the core label Aug 5, 2021

ronen-fr added the bug-fix label Aug 5, 2021

ronen-fr requested a review from neha-ojha August 6, 2021 07:32

ronen-fr marked this pull request as ready for review August 6, 2021 07:32

ronen-fr requested review from athanatos, a team and amathuria August 6, 2021 07:33