Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
init-ceph: make crush update on osd start time out
If the monitor is not currently available, this crush update would block forever, preventing the OSD and (potentially) the rest of the system from starting up. Instead, make it time out after 10 seconds and then abort startup. This prevents startup of an OSD if we failed to update the CRUSH position for some reason. In fact, do not start up the OSD if the CRUSH update fails for any reason--not just a timeout! Works-around: #5612 Signed-off-by: Sage Weil <sage@inktank.com>
- Loading branch information
177e2ab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit appears to be breaking mkcephfs as described in http://tracker.ceph.com/issues/6720
reverting the changes allows mkcephfs from next to function properly.
177e2ab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
workaround is to put 'osd crush update on start = false' in the config
i think i'm just going to make mkcephfs fail if that option is not defined in ceph.conf.
177e2ab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, i think the problem isn't mkcephfs, but the calling script, which should start the mons before starting any osds.
177e2ab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was thinking that mkcephfs wasn't properly creating the osdmap, but I don't know that I was properly parsing what ceph osd dump was telling me. So when you say that the mons should start before starting any OSDs, I'm confused. Don't we need to accept any ordering as mon servers could be starting up in a data center before OSD servers? Am I missing something?
177e2ab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any event, adding the "|| :" back basically fixes things as well.