ceph-disk: do not activate device that is not ready #9943

b-ranto · 2016-06-27T09:53:52Z

If the journal (or data) device is not ready when we are activating the
data (or journal) device, just print an info message and exit with 0 so
that the ceph-disk systemd service won't fail in this case.

Fixes: http://tracker.ceph.com/issues/15990
Signed-off-by: Boris Ranto branto@redhat.com

b-ranto · 2016-06-27T09:54:44Z

@dachary what do you think? imo, this could work and we would not just ignore all the errors.

btw: we already do the ~same in ceph-osd-prestart.sh.

EDIT: I did not test this just yet, I'm waiting for gitbuilders to build this so that I could test.

ghost · 2016-06-27T22:09:19Z

src/ceph-disk/ceph_disk/main.py

@@ -3311,6 +3311,12 @@ def main_activate(args):
        else:
            raise Error('%s is not a directory or block device' % args.path)

+        # exit with 0 if the journal device is not up, yet 
+        # journal device will do the activation
+        if not os.access('{path}/journal'.format(path=osd_data), os.F_OK):


not all OSD have a journal

ghost · 2016-06-27T22:10:49Z

It looks good, modulo the no journal case :-)

ghost · 2016-06-27T22:11:22Z

And the flake8 / test cases that need to be adjusted accordingly.

b-ranto · 2016-06-28T07:02:36Z

Updated and hopefully fixed. I used the similar approach as ceph-osd-prestart.sh and the patch checks whether the journal is a sym-link and only then it tries to follow it.

ghost · 2016-06-28T11:27:39Z

@b-ranto I suspect the test timesout. If you run it locally you'll get more information about why it fails.

b-ranto · 2016-06-28T14:57:38Z

I've tested my ~local mock build and it fixed the original issue (ct#15990) for me. I no longer see any ceph-disk failures when the cluster is being installed.

@dachary: I ran the ./run-make-check.sh manually and I got two failures, no time-out and all three that failed in jenkins passed on my machine. The two failures I got don't seem to be related to this patch:

unittest_librbd seg-faulted
failure in unittest_chain_xattr [1]

[1] test/objectstore/chain_xattr.cc:371: Failure
Value of: get_xattrs(fd).size()
Actual: 2
Expected: 1UL
Which is: 1

b-ranto · 2016-07-26T19:56:13Z

FYI: After the last rebase, this passed the jenkins build.

b-ranto · 2016-08-05T08:15:00Z

@dachary any objections to merging this?

ghost · 2016-08-05T13:22:17Z

@b-ranto the ceph-disk suite should be run to validate this change. The make check tests are not enough.

ghost · 2016-08-05T13:26:07Z

src/ceph-disk/ceph_disk/main.py

@@ -3334,14 +3334,21 @@ def main_activate(args):
        else:
            raise Error('%s is not a directory or block device' % args.path)

+        # exit with 0 if the journal device is not up, yet
+        # journal device will do the activation
+        osd_journal = '{path}/journal'.format(path=osd_data)


That won't work for bluestore, because it does not have a journal file.

To be fair, it is not a regression in this respect either -- ceph-disk have passed the --osd-journal argument with this path even before this (old line 3344). If the file does not exist, the next if will simply fail (it is not a link if it does not exist) and that osd_journal path will be sent to ceph-osd --osd-journal flag just like before.

b-ranto · 2016-08-05T15:14:36Z

@dachary are we talking teuthology ceph-disk suite?

ghost · 2016-08-05T15:28:58Z

@b-ranto yes, the teuthology ceph-disk suite

If the journal (or data) device is not ready when we are activating the data (or journal) device, just print an info message and exit with 0 so that the ceph-disk systemd service won't fail in this case. Fixes: http://tracker.ceph.com/issues/15990 Signed-off-by: Boris Ranto <branto@redhat.com>

tchaikov · 2016-08-19T04:53:19Z

ceph-disk test run failed, see #10135 (comment)

tchaikov · 2016-08-25T14:56:53Z

http://pulpito.ceph.com/kchai-2016-08-25_02:27:35-ceph-disk-wip-kefu-testing---basic-vps/ fails, it has #10824, #10825, #9943 and #10135. @dachary does this mean that either #9943 and/or #10135 is broken?

ghost · 2016-08-25T15:15:04Z

@tchaikov #10135 (comment) is failing.

tchaikov · 2016-08-25T15:28:16Z

@dachary thanks. rescheduling without #10135.

tchaikov · 2016-08-26T01:36:20Z

passes in http://pulpito.ceph.com/kchai-2016-08-25_17:28:34-ceph-disk-wip-kefu-testing---basic-vps/

b-ranto force-pushed the wip-ceph-disk-systemd branch from 72ed75a to d2c2db6 Compare June 27, 2016 10:36

liewegas added the build/ops label Jun 27, 2016

liewegas assigned ghost Jun 27, 2016

ghost reviewed Jun 27, 2016
View reviewed changes

b-ranto force-pushed the wip-ceph-disk-systemd branch 2 times, most recently from d69dbea to 1183ab8 Compare June 28, 2016 06:56

b-ranto force-pushed the wip-ceph-disk-systemd branch from 1183ab8 to 3639be6 Compare June 30, 2016 07:44

b-ranto force-pushed the wip-ceph-disk-systemd branch from 3639be6 to 6861743 Compare July 13, 2016 09:57

b-ranto force-pushed the wip-ceph-disk-systemd branch from 6861743 to c4dc6f2 Compare July 20, 2016 10:15

ghost reviewed Aug 5, 2016
View reviewed changes

b-ranto added the needs-qa label Aug 5, 2016

b-ranto force-pushed the wip-ceph-disk-systemd branch from c4dc6f2 to 73a7a65 Compare August 11, 2016 11:39

tchaikov added the wip-kefu-testing label Aug 18, 2016

tchaikov mentioned this pull request Aug 19, 2016

ceph-disk: support creating block.db and block.wal with customized size for bluestore #10135

Merged

tchaikov removed needs-qa wip-kefu-testing labels Aug 19, 2016

tchaikov mentioned this pull request Aug 24, 2016

tests: populate /dev/disk/by-partuuid for scsi_debug #10824

Merged

tchaikov added the wip-kefu-testing label Aug 25, 2016

tchaikov merged commit 577f027 into master Aug 26, 2016

tchaikov deleted the wip-ceph-disk-systemd branch August 26, 2016 01:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ceph-disk: do not activate device that is not ready #9943

ceph-disk: do not activate device that is not ready #9943

b-ranto commented Jun 27, 2016

b-ranto commented Jun 27, 2016 •

edited

ghost Jun 27, 2016

ghost commented Jun 27, 2016

ghost commented Jun 27, 2016

b-ranto commented Jun 28, 2016 •

edited

ghost commented Jun 28, 2016

b-ranto commented Jun 28, 2016

b-ranto commented Jul 26, 2016

b-ranto commented Aug 5, 2016

ghost commented Aug 5, 2016

ghost Aug 5, 2016

b-ranto Aug 5, 2016

b-ranto commented Aug 5, 2016

ghost commented Aug 5, 2016

tchaikov commented Aug 19, 2016

tchaikov commented Aug 25, 2016

ghost commented Aug 25, 2016

tchaikov commented Aug 25, 2016

tchaikov commented Aug 26, 2016

ceph-disk: do not activate device that is not ready #9943

ceph-disk: do not activate device that is not ready #9943

Conversation

b-ranto commented Jun 27, 2016

b-ranto commented Jun 27, 2016 • edited

ghost Jun 27, 2016

Choose a reason for hiding this comment

ghost commented Jun 27, 2016

ghost commented Jun 27, 2016

b-ranto commented Jun 28, 2016 • edited

ghost commented Jun 28, 2016

b-ranto commented Jun 28, 2016

b-ranto commented Jul 26, 2016

b-ranto commented Aug 5, 2016

ghost commented Aug 5, 2016

ghost Aug 5, 2016

Choose a reason for hiding this comment

b-ranto Aug 5, 2016

Choose a reason for hiding this comment

b-ranto commented Aug 5, 2016

ghost commented Aug 5, 2016

tchaikov commented Aug 19, 2016

tchaikov commented Aug 25, 2016

ghost commented Aug 25, 2016

tchaikov commented Aug 25, 2016

tchaikov commented Aug 26, 2016

b-ranto commented Jun 27, 2016 •

edited

b-ranto commented Jun 28, 2016 •

edited