New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ceph-disk: do not activate device that is not ready #9943
Conversation
@dachary what do you think? imo, this could work and we would not just ignore all the errors. btw: we already do the ~same in ceph-osd-prestart.sh. EDIT: I did not test this just yet, I'm waiting for gitbuilders to build this so that I could test. |
72ed75a
to
d2c2db6
Compare
@@ -3311,6 +3311,12 @@ def main_activate(args): | |||
else: | |||
raise Error('%s is not a directory or block device' % args.path) | |||
|
|||
# exit with 0 if the journal device is not up, yet | |||
# journal device will do the activation | |||
if not os.access('{path}/journal'.format(path=osd_data), os.F_OK): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not all OSD have a journal
It looks good, modulo the no journal case :-) |
And the flake8 / test cases that need to be adjusted accordingly. |
d69dbea
to
1183ab8
Compare
Updated and hopefully fixed. I used the similar approach as ceph-osd-prestart.sh and the patch checks whether the journal is a sym-link and only then it tries to follow it. |
@b-ranto I suspect the test timesout. If you run it locally you'll get more information about why it fails. |
I've tested my ~local mock build and it fixed the original issue (ct#15990) for me. I no longer see any ceph-disk failures when the cluster is being installed. @dachary: I ran the ./run-make-check.sh manually and I got two failures, no time-out and all three that failed in jenkins passed on my machine. The two failures I got don't seem to be related to this patch:
[1] test/objectstore/chain_xattr.cc:371: Failure |
1183ab8
to
3639be6
Compare
3639be6
to
6861743
Compare
6861743
to
c4dc6f2
Compare
FYI: After the last rebase, this passed the jenkins build. |
@dachary any objections to merging this? |
@b-ranto the ceph-disk suite should be run to validate this change. The make check tests are not enough. |
@@ -3334,14 +3334,21 @@ def main_activate(args): | |||
else: | |||
raise Error('%s is not a directory or block device' % args.path) | |||
|
|||
# exit with 0 if the journal device is not up, yet | |||
# journal device will do the activation | |||
osd_journal = '{path}/journal'.format(path=osd_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That won't work for bluestore, because it does not have a journal file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be fair, it is not a regression in this respect either -- ceph-disk have passed the --osd-journal argument with this path even before this (old line 3344). If the file does not exist, the next if will simply fail (it is not a link if it does not exist) and that osd_journal path will be sent to ceph-osd --osd-journal flag just like before.
@dachary are we talking teuthology ceph-disk suite? |
@b-ranto yes, the teuthology ceph-disk suite |
If the journal (or data) device is not ready when we are activating the data (or journal) device, just print an info message and exit with 0 so that the ceph-disk systemd service won't fail in this case. Fixes: http://tracker.ceph.com/issues/15990 Signed-off-by: Boris Ranto <branto@redhat.com>
c4dc6f2
to
73a7a65
Compare
ceph-disk test run failed, see #10135 (comment) |
@tchaikov #10135 (comment) is failing. |
If the journal (or data) device is not ready when we are activating the
data (or journal) device, just print an info message and exit with 0 so
that the ceph-disk systemd service won't fail in this case.
Fixes: http://tracker.ceph.com/issues/15990
Signed-off-by: Boris Ranto branto@redhat.com