New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rbd-mirror: clean-up unnecessary non-primary snapshots #34496
Conversation
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
This will allow a remote rbd-mirror process to have a snapshot to use for delta sync operations during failover. Signed-off-by: Jason Dillaman <dillaman@redhat.com>
…mage A pending refresh could occur after setting the non-primary feature flag but before the creation of the demotion snapshot. This would prevent the snapshot from being created and would leave the image in a half-primary state. Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
snapshot-based mirroring needs to be able to potentially delete a demotion snapshot during the unlink process. Previously, these snapshots have been left while the read-only error was ignored. Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Previously only newly created user snapshots were included in the non-primary snapshot snap-seq mapping table. However, we need to retain a full history of the mapping table if we want to be able to prune non-primary snapshots. Failovers are a special case since we won't have a valid snap seq mapping so it will need to be rebuilt. Luckily, both sides should be read-only in the previous state so we can use the snapshot names to find matches. Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Once a non-primary snapshot is no longer required for syncing, delete it from the image. Fixes: https://tracker.ceph.com/issues/44105 Signed-off-by: Jason Dillaman <dillaman@redhat.com>
If a previous remote snapshot was synced but the unlink failed, ensure we retry the unlink so that the remote can cleanup the unused snapshot. Signed-off-by: Jason Dillaman <dillaman@redhat.com>
Example of failover/failback image prior to changes:
Example of same image after improvements:
Note that ".mirror.primary.31ecfa9a-aa92-4bcd-a14f-34fc61296309.4be93599-88ef-4709-8812-3505fc749e17" should also be pruned but that requires a larger refactoring of rbd-mirror since the replayer never restarts after its promoted to primary. If it fails over again, it will get cleaned up. |
jenkins test make check |
coredumps caused by an unrelated race condition now tracked here [1]. Other failures related to the "multiple mirror peers not supported" race that has since been fixed. |
@dillaman I think it is unrelated to this PR but I still see rbd-mirror crashes on There were 4 crashes, and 3 times it looked the same as I reported earlier, i.e. to me it looked like /a/trociny-2020-04-13_17:36:28-rbd-wip-mgolub-testing-distro-basic-smithi/4951849/remote/smithi122/log/cluster1-client.mirror.1.30013.log.gz And one time it was for a journal based mirroring. It seemed like
[1] http://pulpito.ceph.com/jdillaman-2020-04-09_09:42:22-rbd-wip-jd-testing-distro-basic-smithi/ |
@trociny It's related to that linked tracker ticket above. It's an existing issue that will need to be fixed and backported. |
Signed-off-by: Jason Dillaman <dillaman@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Checklist
Show available Jenkins commands
jenkins retest this please
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard backend
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox