mon: handle bad snapshot removal reqs gracefully #20835

emmericp · 2018-03-11T02:14:27Z

Snapshot deletion requests on snap ids larger than the snap_seq of
the pool will leave the pool in a state with snap_seq being less
than max(removed_snaps).

This is bad because further deletion requests to a pool in this state
might crash the mon in some cases: the deletion also inserts the new
snap_seq into the removed_snaps set -- which might already exist
in this case and trigger an assert.

Such bad requests will be generated by rbd clients without a fix for
issue #21567.

The change in OSDMonitor prevents pools from getting into this state
and may prevent old broken clients from incorrectly deleting snaps.
The change in osd_types avoids a crash if a pool is already in this
state.

Fixes https://tracker.ceph.com/issues/18746

Signed-off-by: Paul Emmerich paul.emmerich@croit.io

Snapshot deletion requests on snap ids larger than the snap_seq of the pool will leave the pool in a state with snap_seq being less than max(removed_snaps). This is bad because further deletion requests to a pool in this state might crash the mon in some cases: the deletion also inserts the new snap_seq into the removed_snaps set -- which might already exist in this case and trigger an assert. Such bad requests will be generated by rbd clients without a fix for issue ceph#21567. The change in OSDMonitor prevents pools from getting into this state and may prevent old broken clients from incorrectly deleting snaps. The change in osd_types avoids a crash if a pool is already in this state. Fixes ceph#18746 Signed-off-by: Paul Emmerich <paul.emmerich@croit.io>

gregsfortytwo

Hmm.

Unfortunately, this isn't safe on its own: the MDS does not use the monitor to allocate snap IDs, so its deleted snapids won't necessarily be lower than that of the pool overall.

I am...really not sure what the best way to resolve this is. Perhaps a tool that goes through and allocates snapids in the (ec) data pool that match those used in the (replicated) rbd metadata pool? And then does the deletes?

ukernel · 2018-03-27T06:38:35Z

why not updating snap_seq? like OSDMonitor::prepare_remove_snaps() does.

emmericp · 2018-03-27T09:47:52Z

So just adding the second change:

 +  if (!removed_snaps.contains(get_snap_seq())) {

would at least fix the crash if you already have a broken pool (or, apparently, a pool that's being used by both cephfs and rbd?)

Can you point me to the relevant code in the MDS? I'm not familiar with cephfs snapshots :/

Also, somewhat related: if you got a pool which was used by both 12.2.1 and newer clients: it's quite annoying to safely handle all edge cases to clean up the mess that was left behind...
In this case we just bumped the snap_seq and then wrote a tool to delete all snap ids from both the pool and the rbd images since we luckily didn't need the snaps.

ukernel · 2018-03-27T12:07:29Z

src/mds/SnapServer.cc SnapServer::check_osd_map()

gregsfortytwo · 2018-04-10T17:46:47Z

@ukernel pointed out to me that CephFS uses a completely different snapshot removal pathway so this is actually fine for them. Which means it looks good to me!

@dillaman, this okay with you?

dillaman

lgtm

yuriw · 2018-04-23T15:03:34Z

wip-yuri-testing-2018-04-23-1502

xiexingguo added bug-fix mon labels Mar 12, 2018

gregsfortytwo added needs-qa and removed needs-qa labels Mar 27, 2018

gregsfortytwo requested changes Mar 27, 2018

View reviewed changes

tchaikov self-requested a review March 27, 2018 13:18

batrick added cephfs Ceph File System rbd labels Mar 28, 2018

gregsfortytwo approved these changes Apr 10, 2018

View reviewed changes

gregsfortytwo added needs-qa and removed cephfs Ceph File System labels Apr 10, 2018

dillaman approved these changes Apr 10, 2018

View reviewed changes

dillaman added this to the mimic milestone Apr 20, 2018

yuriw added the wip-yuri-testing label Apr 23, 2018

yuriw merged commit 0eaa750 into ceph:master Apr 26, 2018

emmericp mentioned this pull request Apr 28, 2018

luminous: mon: handle bad snapshot removal reqs gracefully #21717

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mon: handle bad snapshot removal reqs gracefully #20835

mon: handle bad snapshot removal reqs gracefully #20835

emmericp commented Mar 11, 2018 •

edited

gregsfortytwo left a comment

ukernel commented Mar 27, 2018

emmericp commented Mar 27, 2018

ukernel commented Mar 27, 2018

gregsfortytwo commented Apr 10, 2018

dillaman left a comment

yuriw commented Apr 23, 2018

mon: handle bad snapshot removal reqs gracefully #20835

mon: handle bad snapshot removal reqs gracefully #20835

Conversation

emmericp commented Mar 11, 2018 • edited

gregsfortytwo left a comment

Choose a reason for hiding this comment

ukernel commented Mar 27, 2018

emmericp commented Mar 27, 2018

ukernel commented Mar 27, 2018

gregsfortytwo commented Apr 10, 2018

dillaman left a comment

Choose a reason for hiding this comment

yuriw commented Apr 23, 2018

emmericp commented Mar 11, 2018 •

edited