Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

osd/scrub: tag replica scrub messages to identify stale events #42684

Merged
merged 2 commits into from Sep 12, 2021

Conversation

ronen-fr
Copy link
Contributor

@ronen-fr ronen-fr commented Aug 5, 2021

A lot can happen while a replica is waiting for the backend
to collect the map data. The Primary might, for example, abort
the scrub and start a new one (following no-scrub commands).

The 'token' introduced here tags 'replica scrub resched'
messages with an index value that is modified on each 'release
scrub resources' request from the Primary.

Fixes: https://tracker.ceph.com/issues/52012
Signed-off-by: Ronen Friedman rfriedma@redhat.com

@github-actions github-actions bot added the core label Aug 5, 2021
@ronen-fr ronen-fr requested a review from neha-ojha August 6, 2021 07:32
@ronen-fr ronen-fr marked this pull request as ready for review August 6, 2021 07:32
@ronen-fr ronen-fr requested review from athanatos, a team and amathuria August 6, 2021 07:33
src/osd/OSD.h Outdated Show resolved Hide resolved
src/osd/OSD.h Outdated Show resolved Hide resolved
src/osd/PG.cc Outdated Show resolved Hide resolved
src/osd/scrub_machine.h Outdated Show resolved Hide resolved
@athanatos
Copy link
Contributor

I don't quite understand the token lifecycle. It doesn't look like it gets encoded in any primary->replica messages, so it's purely used for the replica to cancel any outstanding local events when its reservation is canceled?

@ronen-fr
Copy link
Contributor Author

I don't quite understand the token lifecycle. It doesn't look like it gets encoded in any primary->replica messages, so it's purely used for the replica to cancel any outstanding local events when its reservation is canceled?

Yes. (And - even internal to the replica, I started by adding it to some of the events, but found out
later it's not needed)

@ronen-fr
Copy link
Contributor Author

I don't quite understand the token lifecycle. It doesn't look like it gets encoded in any primary->replica messages, so it's purely used for the replica to cancel any outstanding local events when its reservation is canceled?

yes
(we have discussed this issue once, some months ago. At that time - I was under the false assumption that if the Primary is stopping and restarting the scrub, the replica will notice a new interval. So - there are two separate issues to handle:

  • how can a replica know that the scrub request is new?, and
  • if handling a new request - what enables us to discard the stale internal events.)

src/osd/pg_scrubber.h Outdated Show resolved Hide resolved
@ronen-fr ronen-fr force-pushed the wip-ronenf-scrub-token branch 2 times, most recently from 504b683 to aa5fe4c Compare August 11, 2021 09:15
@neha-ojha
Copy link
Member

I don't quite understand the token lifecycle. It doesn't look like it gets encoded in any primary->replica messages, so it's purely used for the replica to cancel any outstanding local events when its reservation is canceled?

yes
(we have discussed this issue once, some months ago. At that time - I was under the false assumption that if the Primary is stopping and restarting the scrub, the replica will notice a new interval. So - there are two separate issues to handle:

  • how can a replica know that the scrub request is new?, and
  • if handling a new request - what enables us to discard the stale internal events.)

@ronen-fr how was pre-refactored code handling this? It seems like https://tracker.ceph.com/issues/52012#note-1 was possible even earlier?

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

1 similar comment
@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@ronen-fr
Copy link
Contributor Author

I don't quite understand the token lifecycle. It doesn't look like it gets encoded in any primary->replica messages, so it's purely used for the replica to cancel any outstanding local events when its reservation is canceled?

yes
(we have discussed this issue once, some months ago. At that time - I was under the false assumption that if the Primary is stopping and restarting the scrub, the replica will notice a new interval. So - there are two separate issues to handle:

  • how can a replica know that the scrub request is new?, and
  • if handling a new request - what enables us to discard the stale internal events.)

@ronen-fr how was pre-refactored code handling this? It seems like https://tracker.ceph.com/issues/52012#note-1 was possible even earlier?

(before actual analysis of the code:) I think so. Requires a combination of "scrub abort" and max_osd_scrubs>1.

@github-actions
Copy link

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

A lot can happen while a replica is waiting for the backend
to collect the map data. The Primary might, for example, abort
the scrub and start a new one (following no-scrub commands).

The 'token' introduced here tags 'replica scrub resched'
messages with an index value that is modified on each 'release
scrub resources' request from the Primary.

Fixes: https://tracker.ceph.com/issues/52012
Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
…ir scrubs

Previously, after-repair scrubs were started without waiting for either local
or remote OSDs' scrub resources.
The tagging of scrub sessions by the replicas is based on monitoring replica-request
and replica-release messages from the primary. Scrub-map requests arriving without
any reservations interfere with this mechanism.
The benefits of this fast-track were limited at best, and do not justify the complexity
of a solution that accommodates both the bypass and the tagging.

Signed-off-by: Ronen Friedman <rfriedma@redhat.com>
@ronen-fr
Copy link
Contributor Author

Initial set of 72 tests is clean:
http://pulpito.front.sepia.ceph.com/?branch=wip-ronenf-scrub-token

@ronen-fr
Copy link
Contributor Author

Runs:
original: yuriw-2021-09-09_21:20:54-rados-wip-yuri-testing-2021-09-09-1227-distro-basic-smithi (9F, 16D)
rerun: yuriw-2021-09-10_14:25:47-rados-wip-yuri-testing-2021-09-09-1227-distro-basic-smithi (4F, 1D)

Failures:

#6382926: “cephadm/test_dashboard_e2e.sh” - dashboard related failure. Seems to be https://tracker.ceph.com/issues/52417
#6382927: mgr/rook: might be https://tracker.ceph.com/issues/52321
#6382940: ceph_test_objectstore crash. Opened https://tracker.ceph.com/issues/52576
#6382942: seems the same as #6382926

The dead job: test issue (reimaging)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants