osd: filestore: fast abort if statfs encounters ENOENT #7703

xiexingguo · 2016-02-19T04:45:47Z

This is a supplement patch for #6869.

The original case is that if we remove the root directory of an OSD by accident or something alike, and the OSD is currently idle, the OSD process will be able to survive a long time before it does have some consumers to take it down, which may confuse other OSDs and decrease the reliability of the cluster.

This patch try to find out the above case as soon as possible, that is: if statfs to the basedir returned ENOENT, which means the file referred to by path does not exist any more, we abort quickly.
And also for other forgivable errors such as EINTR, we want update heartbeat peers normally.

Verified locally, and below is part of the result:

    -1> 2016-02-19 19:47:57.640730 7f69bebd7700  0 filestore(/home/xxg/ceph-0.94.5/src/dev/osd0)  path(/home/xxg/ceph-0.94.5/src/dev
/osd0) no longer exists, aborting...
     0> 2016-02-19 19:47:57.645429 7f69bebd7700 -1 os/FileStore.cc: In function 'virtual int FileStore::statfs(statfs*)' thread 7f69
bebd7700 time 2016-02-19 19:47:57.640781
os/FileStore.cc: 660: FAILED assert(0)

liewegas · 2016-02-22T18:19:51Z

src/osd/OSD.cc

@@ -673,6 +673,8 @@ void OSDService::update_osd_stat(vector<int>& hb_peers)
  int r = osd->store->statfs(&stbuf);
  if (r < 0) {
    derr << "statfs() failed: " << cpp_strerror(r) << dendl;
+    osd_stat.hb_in.swap(hb_peers);
+    osd_stat.hb_out.clear();
    return;


better to just move the block below to before the statfs call

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

xiexingguo · 2016-02-23T03:16:00Z

Fixed

osd: filestore: fast abort if statfs encounters ENOENT Reviewed-by: Sage Weil <sage@redhat.com>

xiexingguo force-pushed the xxg-wip-statfs branch from 402911e to dfa4c86 Compare February 19, 2016 05:48

ghost added the core label Feb 19, 2016

liewegas reviewed Feb 22, 2016
View reviewed changes

liewegas self-assigned this Feb 22, 2016

liewegas added the cleanup label Feb 22, 2016

xiexingguo force-pushed the xxg-wip-statfs branch from dfa4c86 to 27ba6ff Compare February 23, 2016 00:48

os/filestore: fast abort when basedir no more exists

144fa29

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

xiexingguo force-pushed the xxg-wip-statfs branch from 27ba6ff to e1bc3fe Compare February 23, 2016 00:54

OSD: update heartbeat peers if unable to statfs

46da33b

Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>

xiexingguo force-pushed the xxg-wip-statfs branch from e1bc3fe to 46da33b Compare February 23, 2016 00:57

liewegas added needs-qa wip-sage-testing labels Feb 24, 2016

liewegas changed the title ~~os/filestore: fast abort if statfs encounters ENOENT~~ osd: filestore: fast abort if statfs encounters ENOENT Mar 1, 2016

liewegas added a commit that referenced this pull request Mar 1, 2016

Merge pull request #7703 from xiexingguo/xxg-wip-statfs

ef59733

osd: filestore: fast abort if statfs encounters ENOENT Reviewed-by: Sage Weil <sage@redhat.com>

liewegas merged commit ef59733 into ceph:master Mar 1, 2016

xiexingguo deleted the xxg-wip-statfs branch March 1, 2016 13:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osd: filestore: fast abort if statfs encounters ENOENT #7703

osd: filestore: fast abort if statfs encounters ENOENT #7703

xiexingguo commented Feb 19, 2016

liewegas Feb 22, 2016

xiexingguo commented Feb 23, 2016

osd: filestore: fast abort if statfs encounters ENOENT #7703

osd: filestore: fast abort if statfs encounters ENOENT #7703

Conversation

xiexingguo commented Feb 19, 2016

liewegas Feb 22, 2016

Choose a reason for hiding this comment

xiexingguo commented Feb 23, 2016