Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd-mirror: generally skip replay/resync if remote image is not primary #46759

Merged
merged 6 commits into from
Jun 22, 2022

Conversation

idryomov
Copy link
Contributor

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

Initialize local_promotion_state and remote_promotion_state to UNKNOWN
instead of counterintuitive PRIMARY and NON_PRIMARY -- half the time the
final values are flipped.  Then is_local_primary() and is_linked() can
be strengthened as a non-existent image should stay in UNKNOWN.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Replay and resync should generally be skipped if the remote image is
not primary.

If this is not done for replay, snapshot-based mirroring can run into
a livelock if the primary image is demoted while a mirror snapshot is
being synced.  On the demote site, rbd-mirror would pick up the just
demoted image, grab the exclusive lock on it and idle waiting for a new
mirror snapshot to be created.  On the (still) non-primary site,
rbd-mirror would eventually finish syncing that mirror snapshot and
attempt to unlink from it on the demote site.  These attempts would
fail with EROFS due to exclusive lock being held in the "refuse proxied
maintenance operations" mode, blocking forward progress (syncing of the
demotion snapshot so that the non-primary image can be orderly promoted
to primary, etc).

If this is not done for resync, data loss can ensue as the just demoted
image would be immediately trashed, underneath the non-primary site that
is still syncing.

Currently this is done in PrepareReplayRequest only for journal-based
mirroring.  Note that it is conditional: if the local image is linked
to the remote image, proceeding is desirable.

Generalize this check, consolidate it with a related check in
PrepareRemoteImageRequest and move the result to BootstrapRequest to
cover both "local image does not exist" and "local image is unlinked"
cases for both modes.

Fixes: https://tracker.ceph.com/issues/54448
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Now unused (and if it was used, the entire StateBuilder is passed in
anyway).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
It was botched in commit 2bca9ee ("rbd-mirror: consolidate
prepare local/remote image steps to bootstrap") and went unnoticed
because currently no special handling is needed for disconnected
clients -- is_disconnected() check happens to be the last step
and it doesn't generate an error.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
There is a difference: non-primary means NON_PRIMARY promotion state,
while "not primary" can refer to any of NON_PRIMARY, ORPHAN or UNKNOWN
promotion states.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
@idryomov
Copy link
Contributor Author

jenkins test windows

Copy link
Member

@MrFreezeex MrFreezeex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems right! I also did some changes in this part of the code in #43359 and I think it should be a bit cleaner with this new approach for the multipeer work. It might even (I have little hope though as I also have some on the journal mode) fix some bugs I have there.

Copy link
Contributor

@trociny trociny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@ideepika ideepika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested with reliable reproducer https://tracker.ceph.com/attachments/download/6024/test_many.sh it resolves the issue manifesting as read-only filesystem error, eventually leading to failure in peer unlinking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants