Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph-disk: do not activate device that is not ready #9943

Merged
merged 1 commit into from Aug 26, 2016

Conversation

b-ranto
Copy link
Contributor

@b-ranto b-ranto commented Jun 27, 2016

If the journal (or data) device is not ready when we are activating the
data (or journal) device, just print an info message and exit with 0 so
that the ceph-disk systemd service won't fail in this case.

Fixes: http://tracker.ceph.com/issues/15990
Signed-off-by: Boris Ranto branto@redhat.com

@b-ranto
Copy link
Contributor Author

b-ranto commented Jun 27, 2016

@dachary what do you think? imo, this could work and we would not just ignore all the errors.

btw: we already do the ~same in ceph-osd-prestart.sh.

EDIT: I did not test this just yet, I'm waiting for gitbuilders to build this so that I could test.

@@ -3311,6 +3311,12 @@ def main_activate(args):
else:
raise Error('%s is not a directory or block device' % args.path)

# exit with 0 if the journal device is not up, yet
# journal device will do the activation
if not os.access('{path}/journal'.format(path=osd_data), os.F_OK):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not all OSD have a journal

@ghost
Copy link

ghost commented Jun 27, 2016

It looks good, modulo the no journal case :-)

@ghost
Copy link

ghost commented Jun 27, 2016

And the flake8 / test cases that need to be adjusted accordingly.

@b-ranto b-ranto force-pushed the wip-ceph-disk-systemd branch 2 times, most recently from d69dbea to 1183ab8 Compare June 28, 2016 06:56
@b-ranto
Copy link
Contributor Author

b-ranto commented Jun 28, 2016

Updated and hopefully fixed. I used the similar approach as ceph-osd-prestart.sh and the patch checks whether the journal is a sym-link and only then it tries to follow it.

@ghost
Copy link

ghost commented Jun 28, 2016

@b-ranto I suspect the test timesout. If you run it locally you'll get more information about why it fails.

@b-ranto
Copy link
Contributor Author

b-ranto commented Jun 28, 2016

I've tested my ~local mock build and it fixed the original issue (ct#15990) for me. I no longer see any ceph-disk failures when the cluster is being installed.

@dachary: I ran the ./run-make-check.sh manually and I got two failures, no time-out and all three that failed in jenkins passed on my machine. The two failures I got don't seem to be related to this patch:

  • unittest_librbd seg-faulted
  • failure in unittest_chain_xattr [1]

[1] test/objectstore/chain_xattr.cc:371: Failure
Value of: get_xattrs(fd).size()
Actual: 2
Expected: 1UL
Which is: 1

@b-ranto
Copy link
Contributor Author

b-ranto commented Jul 26, 2016

FYI: After the last rebase, this passed the jenkins build.

@b-ranto
Copy link
Contributor Author

b-ranto commented Aug 5, 2016

@dachary any objections to merging this?

@ghost
Copy link

ghost commented Aug 5, 2016

@b-ranto the ceph-disk suite should be run to validate this change. The make check tests are not enough.

@@ -3334,14 +3334,21 @@ def main_activate(args):
else:
raise Error('%s is not a directory or block device' % args.path)

# exit with 0 if the journal device is not up, yet
# journal device will do the activation
osd_journal = '{path}/journal'.format(path=osd_data)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That won't work for bluestore, because it does not have a journal file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair, it is not a regression in this respect either -- ceph-disk have passed the --osd-journal argument with this path even before this (old line 3344). If the file does not exist, the next if will simply fail (it is not a link if it does not exist) and that osd_journal path will be sent to ceph-osd --osd-journal flag just like before.

@b-ranto
Copy link
Contributor Author

b-ranto commented Aug 5, 2016

@dachary are we talking teuthology ceph-disk suite?

@ghost
Copy link

ghost commented Aug 5, 2016

@b-ranto yes, the teuthology ceph-disk suite

If the journal (or data) device is not ready when we are activating the
data (or journal) device, just print an info message and exit with 0 so
that the ceph-disk systemd service won't fail in this case.

Fixes: http://tracker.ceph.com/issues/15990
Signed-off-by: Boris Ranto <branto@redhat.com>
@tchaikov
Copy link
Contributor

ceph-disk test run failed, see #10135 (comment)

@tchaikov
Copy link
Contributor

http://pulpito.ceph.com/kchai-2016-08-25_02:27:35-ceph-disk-wip-kefu-testing---basic-vps/ fails, it has #10824, #10825, #9943 and #10135. @dachary does this mean that either #9943 and/or #10135 is broken?

@ghost
Copy link

ghost commented Aug 25, 2016

@tchaikov #10135 (comment) is failing.

@tchaikov
Copy link
Contributor

@dachary thanks. rescheduling without #10135.

@tchaikov
Copy link
Contributor

@tchaikov tchaikov merged commit 577f027 into master Aug 26, 2016
@tchaikov tchaikov deleted the wip-ceph-disk-systemd branch August 26, 2016 01:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants