Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd-mirror: stop local journal replayer first during shut down #35348

Merged
merged 2 commits into from Jun 4, 2020

Conversation

dillaman
Copy link

@dillaman dillaman commented Jun 2, 2020

Shutting down the local journal replayer will flush any IO and cancel any
potentially stuck ops (waiting for FinishOp event). This reverts back to the
pre-Octopus behavior prior to the refactoring to support snapshot mirroring.

Fixes: https://tracker.ceph.com/issues/45714
Signed-off-by: Jason Dillaman dillaman@redhat.com

Checklist

  • References tracker ticket
  • Updates documentation if necessary
  • Includes tests for new functionality or reproducer for bug

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard backend
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox

@trociny
Copy link
Contributor

trociny commented Jun 3, 2020

@dillaman Don't you think it will re-introduce [1]? The reason why the order was changed was that the replay shut down could be initiated when a replay flush was in progress (which also does the shut down).

[1] https://tracker.ceph.com/issues/45409

@dillaman
Copy link
Author

dillaman commented Jun 3, 2020

Forgot about that. We need to keep this order but I suppose that just means we need additional tracking for flushes added.

Jason Dillaman added 2 commits June 3, 2020 07:50
…l replay"

This reverts commit aeccb03.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
If a journal replay flush is in-progress when the ImageReplayer is stopped,
it can race and result in an assertion failure due to two attempted shutdowns
of the same journal replay state machine.

Fixes: https://tracker.ceph.com/issues/45409
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
@dillaman
Copy link
Author

dillaman commented Jun 3, 2020

Tweaked

Copy link
Contributor

@trociny trociny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants