Skip to content

Commit

Permalink
init-ceph: make crush update on osd start time out
Browse files Browse the repository at this point in the history
If the monitor is not currently available, this crush update would block
forever, preventing the OSD and (potentially) the rest of the system
from starting up.  Instead, make it time out after 10 seconds and then
abort startup.  This prevents startup of an OSD if we failed to update
the CRUSH position for some reason.

In fact, do not start up the OSD if the CRUSH update fails for any
reason--not just a timeout!

Works-around: #5612
Signed-off-by: Sage Weil <sage@inktank.com>
  • Loading branch information
Sage Weil committed Oct 28, 2013
1 parent ac8dcdb commit 177e2ab
Showing 1 changed file with 2 additions and 3 deletions.
5 changes: 2 additions & 3 deletions src/init-ceph.in
Expand Up @@ -324,7 +324,7 @@ for name in $what; do
get_conf osd_weight "" "osd crush initial weight"
defaultweight="$(do_cmd "df $osd_data/. | tail -1 | awk '{ d= \$2/1073741824 ; r = sprintf(\"%.2f\", d); print r }'")"
get_conf osd_keyring "$osd_data/keyring" "keyring"
do_cmd "$BINDIR/ceph \
do_cmd "timeout 10 $BINDIR/ceph \
--name=osd.$id \
--keyring=$osd_keyring \
osd crush create-or-move \
Expand All @@ -333,8 +333,7 @@ for name in $what; do
${osd_weight:-${defaultweight:-1}} \
root=default \
host=$host \
$osd_location \
|| :"
$osd_location"
fi
fi

Expand Down

5 comments on commit 177e2ab

@markhpc
Copy link
Member

@markhpc markhpc commented on 177e2ab Nov 7, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit appears to be breaking mkcephfs as described in http://tracker.ceph.com/issues/6720

reverting the changes allows mkcephfs from next to function properly.

@liewegas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workaround is to put 'osd crush update on start = false' in the config

i think i'm just going to make mkcephfs fail if that option is not defined in ceph.conf.

@liewegas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, i think the problem isn't mkcephfs, but the calling script, which should start the mons before starting any osds.

@markhpc
Copy link
Member

@markhpc markhpc commented on 177e2ab Nov 7, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was thinking that mkcephfs wasn't properly creating the osdmap, but I don't know that I was properly parsing what ceph osd dump was telling me. So when you say that the mons should start before starting any OSDs, I'm confused. Don't we need to accept any ordering as mon servers could be starting up in a data center before OSD servers? Am I missing something?

@markhpc
Copy link
Member

@markhpc markhpc commented on 177e2ab Nov 7, 2013

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any event, adding the "|| :" back basically fixes things as well.

Please sign in to comment.