Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mimic: rbd-mirror: optionally support active/active replication #22105

Merged
merged 25 commits into from
May 21, 2018

Conversation

dillaman
Copy link

No description provided.

Jason Dillaman added 25 commits May 19, 2018 08:16
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 53b87b9)
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 06e1244)
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit bce5328)
If the leader role is manually released, upon failback the instance
will have removed its local instance object, preventing RPC
messaging.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit ce97430)
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 524f08c)
Also avoid attempting to send status using an invalid librados::IoCtx
handle due to a deleted pool.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit e2a0088)
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit d7cb5db)
…tartup

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit a9d335d)
In an active/active scenario, if the leader was offline while mirroring for
a remote image was disabled, the assigned replayer instance may not detect
the image removal.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit b3acc56)
… down

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 4525b7a)
A recent code change associated with a librbd cleanup incorrectly started
using the remote parent image's snapshot id.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 02afa2a)
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 88436b5)
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 9a1fcef)
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit af42984)
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 2534e5e)
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit c1e111e)
…ok command

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 15a197e)
…thology

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 1bb6d4f)
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit a6a8c8a)
…mons

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 3086f29)
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 02e32fe)
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 917f8a0)
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit e2d5f84)
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 61e94e6)
Previously, the image map would only return a maximum of 64 mappings.

Signed-off-by: Jason Dillaman <dillaman@redhat.com>
(cherry picked from commit 3cbe0ce)
@trociny
Copy link
Contributor

trociny commented May 21, 2018

@dillaman, the teuthology result contains one rbd_mirror_fsx_compare.sh failure. It failed waiting snapshot creation propagated to slave, because the replay was in error state in that time due to this error:

2018-05-20 12:44:45.338 7fd88d7fa700 -1 librbd::SnapshotUnprotectRequest: cannot unprotect: at least 1 child(ren) [108323436314] in pool 'mirror'
2018-05-20 12:44:45.338 7fd88d7fa700 -1 librbd::SnapshotUnprotectRequest: encountered error: (16) Device or resource busy
2018-05-20 12:44:45.338 7fd88d7fa700 -1 librbd::SnapshotUnprotectRequest: 0x7fd7ec1a10b0 should_complete_error: ret_val=-16
2018-05-20 12:44:45.338 7fd88d7fa700 -1 librbd::SnapshotUnprotectRequest: 0x7fd7ec1a10b0 should_complete_error: ret_val=-16
2018-05-20 12:44:45.338 7fd88d7fa700 -1 librbd::deep_copy::SnapshotCopyRequest: 0x7fd7ec13fc20 handle_snap_unprotect: failed to unprotect snapshot 'snap': (16) Device or resource busy
2018-05-20 12:44:45.338 7fd88d7fa700 -1 librbd::DeepCopyRequest: 0x7fd7ec575020 handle_copy_snapshots: failed to copy snapshot metadata: (16) Device or resource busy
2018-05-20 12:44:45.338 7fd88d7fa700 10 rbd::mirror::ImageSync: 0x7fd7b410fb20 handle_copy_image: r=-16
2018-05-20 12:44:45.338 7fd88d7fa700 -1 rbd::mirror::ImageSync: 0x7fd7b410fb20 handle_copy_image: failed to copy image: (16) Device or resource busy

Do you think we still may proceed and merge this to mimic?

@dillaman
Copy link
Author

@trociny Yeah, I think it's fine since it looks like it's just this known issue [1]

[1] http://tracker.ceph.com/issues/24140

@trociny trociny merged commit 063fb45 into ceph:mimic May 21, 2018
@dillaman dillaman deleted the wip-rbd-mirror-policy-mimic branch May 21, 2018 12:45
@luoguanzzz
Copy link

What caused this error?

[root@ceph-mon-node1 lgrbd0]# rbd snap unprotect rbd1/lgrbd2@lgrbd2.snap
2019-11-14 10:31:46.409 7f07f8ff9700 -1 librbd::SnapshotUnprotectRequest: cannot unprotect: at least 1 child(ren) [25a4e6b8b4567] in pool 'rbd1'
2019-11-14 10:31:46.412 7f07f8ff9700 -1 librbd::SnapshotUnprotectRequest: encountered error: (16) Device or resource busy
2019-11-14 10:31:46.412 7f07f8ff9700 -1 librbd::SnapshotUnprotectRequest: 0x563503d4c380 should_complete_error: ret_val=-16
2019-11-14 10:31:46.419 7f07f8ff9700 -1 librbd::SnapshotUnprotectRequest: 0x563503d4c380 should_complete_error: ret_val=-16
rbd: unprotecting snap failed: (16) Device or resource busy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants