osd: don't crash on empty snapset #21058

trociny · 2018-03-27T09:48:27Z

Fixes: http://tracker.ceph.com/issues/23851

Signed-off-by: Igor Fedotov ifedotov@suse.com

trociny · 2018-03-27T09:49:52Z

@liewegas For one of our customers we observed osd crashes due to unexpectedly empty snapset returned for some objects in the cache tire pool. And the proposed patch helped to make their cluster functional again.

This looks like related to https://tracker.ceph.com/issues/21557, and although the root cause is unknown, wouldn't it be a good idea to have these guards upstream so this type of inconsistency would not make the cluster down? Or do you have other suggestions how it could be addressed?

liewegas · 2018-03-27T18:19:18Z

I think the best would be to assert if the debug config option is true so that we can catch it qa. Although we have failed to do that.. unclear what the root cause is :(

See #20040. Ideally we'd make some attempt to fix the inconsistency during scrub...

trociny · 2018-03-28T08:27:34Z

@liewegas

I think the best would be to assert if the debug config option is true so that we can catch it qa.

Thank you! Updated. I cherry-picked your 618f549 from #20040 to make it build. I will rebase when #20040 is merged.

Ideally we'd make some attempt to fix the inconsistency during scrub...

Interesting. I think it could done as a separate PR. Do you have an idea what info could be used to check/restore a snapset?

trociny · 2018-03-29T11:35:21Z

Rebased after #20040 is merged.

smithfarm · 2018-03-29T11:38:06Z

src/osd/SnapMapper.cc

-    assert(!out->snaps.empty());
+    if (out->snaps.empty()) {
+      dout(1) << __func__ << " " << oid << " empty snapset" << dendl;
+      assert(!cct->_conf->osd_debug_verify_snaps);


no return -ENOENT here?

Yes, it was intentional: we don't want SnapMapper::get_snaps to fail here, just return an empty snapset.

liewegas · 2018-04-02T16:59:58Z

Interesting. I think it could done as a separate PR. Do you have an idea what info could be used to check/restore a snapset?

It might not require anything special, actually: perhaps an empty SnapSet will result in any/all clones getting removed (as stray clones) and we'd be done with it. I would give it a try by injecting the corruption (perhaps removing it directly via the fuse mountpoint or via ceph-objectstore-tool) to verify the io path now behaves and also that scrub will too.

trociny · 2018-04-06T17:40:58Z

@liewegas I have temporary added a DNM commit to this PR that shows how I tested this.

The test contains a small tool to clear the snapset in an object snaps blob. It is used together with ceph-objectstore-tool to inject the empty snapset inconsistency. The objects belong to an rbd image with snapshots. I tested that after injecting the corruption rbd export still returns the same result. The "empty snapset" message (or osd_debug_verify_snaps assert) is triggered on scrub in SnapMapper::get_snaps.

I have failed to trigger "empty snapset" in PrimaryLogPG::find_object_context() because I don't know how to reproduce this. Though we saw this in the field [1].

[1] https://tracker.ceph.com/issues/22672

trociny · 2018-04-16T04:43:25Z

@liewegas ping

liewegas · 2018-04-16T11:40:42Z

Looks good to me; let's drop teh DNM commit and then queue for testing?

Signed-off-by: Igor Fedotov <ifedotov@suse.com> Signed-off-by: Mykola Golub <mgolub@suse.com>

trociny · 2018-04-16T12:16:03Z

@liewegas thanks! Rebased.

tchaikov · 2018-04-23T10:48:03Z

http://pulpito.ceph.com/kchai-2018-04-23_07:38:14-rados-wip-kefu-testing-2018-04-23-1357-distro-basic-smithi/

test failures are tracked / addressed by

tchaikov · 2018-04-23T10:49:21Z

retest this please.

tchaikov · 2018-04-23T10:49:42Z

@jdurgin @liewegas shall we include this change in mimic?

trociny added bug-fix core labels Mar 27, 2018

trociny requested a review from liewegas March 27, 2018 09:48

trociny force-pushed the wip-osd-empty-snapset branch from 0a073c3 to 61c546a Compare March 28, 2018 08:13

trociny force-pushed the wip-osd-empty-snapset branch from 61c546a to 9f0ccc1 Compare March 29, 2018 11:34

smithfarm reviewed Mar 29, 2018

View reviewed changes

trociny force-pushed the wip-osd-empty-snapset branch 2 times, most recently from e137211 to 8e18306 Compare April 6, 2018 17:37

osd: don't crash on empty snapset

3996c0a

Signed-off-by: Igor Fedotov <ifedotov@suse.com> Signed-off-by: Mykola Golub <mgolub@suse.com>

trociny force-pushed the wip-osd-empty-snapset branch from 8e18306 to 3996c0a Compare April 16, 2018 12:15

liewegas approved these changes Apr 16, 2018

View reviewed changes

liewegas added the needs-qa label Apr 16, 2018

tchaikov added the wip-kefu-testing label Apr 23, 2018

tchaikov removed needs-qa wip-kefu-testing labels Apr 23, 2018

tchaikov self-assigned this Apr 23, 2018

liewegas added this to the mimic milestone Apr 23, 2018

liewegas added the wip-sage3-testing label Apr 23, 2018

tchaikov merged commit 9a48ccb into ceph:master Apr 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osd: don't crash on empty snapset #21058

osd: don't crash on empty snapset #21058

trociny commented Mar 27, 2018 •

edited by smithfarm

trociny commented Mar 27, 2018

liewegas commented Mar 27, 2018

trociny commented Mar 28, 2018

trociny commented Mar 29, 2018

smithfarm Mar 29, 2018

trociny Mar 29, 2018

liewegas commented Apr 2, 2018 via email

trociny commented Apr 6, 2018

trociny commented Apr 16, 2018

liewegas commented Apr 16, 2018

trociny commented Apr 16, 2018

tchaikov commented Apr 23, 2018 •

edited

tchaikov commented Apr 23, 2018

tchaikov commented Apr 23, 2018

osd: don't crash on empty snapset #21058

osd: don't crash on empty snapset #21058

Conversation

trociny commented Mar 27, 2018 • edited by smithfarm

trociny commented Mar 27, 2018

liewegas commented Mar 27, 2018

trociny commented Mar 28, 2018

trociny commented Mar 29, 2018

smithfarm Mar 29, 2018

Choose a reason for hiding this comment

trociny Mar 29, 2018

Choose a reason for hiding this comment

liewegas commented Apr 2, 2018 via email

trociny commented Apr 6, 2018

trociny commented Apr 16, 2018

liewegas commented Apr 16, 2018

trociny commented Apr 16, 2018

tchaikov commented Apr 23, 2018 • edited

tchaikov commented Apr 23, 2018

tchaikov commented Apr 23, 2018

trociny commented Mar 27, 2018 •

edited by smithfarm

tchaikov commented Apr 23, 2018 •

edited