ceph-disk: enable --runtime ceph-osd systemd units #12241

ghost · 2016-11-30T16:42:30Z

http://tracker.ceph.com/issues/17889

idealguo

idealguo · 2016-12-01T01:08:25Z

systemd/ceph-osd@.service

@@ -19,6 +19,7 @@ TasksMax=infinity
 Restart=on-failure
 StartLimitInterval=30min
 StartLimitBurst=3
+RestartSec=5min


Maybe 5 minutes delay is too long in production env?

@idealguo maybe it is. Do you have a specific scenario in mind ?

I see the default value is 100ms (https://www.freedesktop.org/software/systemd/man/systemd.service.html)
and the "osd_heartbeat_grace" is 20s by default, may be 20s or around is an option

In the scenario I have in mind, at boot time, ceph-disk@.service may happen 5 to 10 minutes after ceph-osd@.service attempted to run. There is no doubt that waiting minutes is useful. What I'm not 100% sure yet is if this can cause problem.

My concern is when the osd service exit accidently, such as: killed by someone. If we wait 5mins before restart, it will be too long

you're right. What about

RestartSec=20s StartLimitBurst=30

That will account for both cases. At boot time it will retry every 20s for about 10 minutes. And at runtime it will restart within 20 seconds with no risk for the OSD to be marked down. I don't think anything at runtime will behave differently if the OSD restarts after 20 seconds instead of restarting after 100ms. What do you think ?

Yes, 100ms is too sensitive, it will reach "StartLimitBurst" limitation quickly

smithfarm · 2016-12-01T05:50:39Z

@dachary Would "ceph-disk: do not enable ceph-osd systemd units" be a better title for this PR (and for the second commit)?

ghost · 2016-12-01T05:57:34Z

@smithfarm indeed, thanks :-)

Instead of the default 100ms pause before trying to restart an OSD, wait 20 seconds instead and retry 30 times instead of 3. There is no scenario in which restarting an OSD almost immediately after it failed would get a better result. It is possible that a failure to start is due to a race with another systemd unit at boot time. For instance if ceph-disk@.service is delayed, it may start after the OSD that needs it. A long pause may give the racing service enough time to complete and the next attempt to start the OSD may succeed. This is not a sound alternative to resolve a race, it only makes the OSD boot process less sensitive. In the example above, the proper fix is to enable --runtime ceph-osd@.service so that it cannot race at boot time. The wait delay should not be minutes to preserve the current runtime behavior. For instance, if an OSD is killed or fails and restarts after 10 minutes, it will be marked down by the ceph cluster. This is not a change that could break things but it is significant and should be avoided. Refs: http://tracker.ceph.com/issues/17889 Signed-off-by: Loic Dachary <loic@dachary.org>

ghost · 2016-12-01T15:57:32Z

@liewegas I'll figure something out to fix existing installations on upgrade.

If ceph-osd@.service is enabled for a given device (say /dev/sdb1 for osd.3) the ceph-osd@3.service will race with ceph-disk@dev-sdb1.service at boot time. Enabling ceph-osd@3.service is not necessary at boot time because ceph-disk@dev-sdb1.service calls ceph-disk activate /dev/sdb1 which calls systemctl start ceph-osd@3 The systemctl enable/disable ceph-osd@.service called by ceph-disk activate is changed to add the --runtime option so that ceph-osd units are lost after a reboot. They are recreated when ceph-disk activate is called at boot time so that: systemctl stop ceph knows which ceph-osd@.service to stop when a script or sysadmin wants to stop all ceph services. Before enabling ceph-osd@.service (that happens at every boot time), make sure the permanent enablement in /etc/systemd is removed so that only the one added by systemctl enable --runtime in /run/systemd remains. This is useful to upgrade an existing cluster without creating a situation that is even worse than before because ceph-disk@.service races against two ceph-osd@.service (one in /etc/systemd and one in /run/systemd). Fixes: http://tracker.ceph.com/issues/17889 Signed-off-by: Loic Dachary <loic@dachary.org>

smithfarm · 2016-12-01T16:27:16Z

src/ceph-disk/ceph_disk/main.py

+                [
+                    'systemctl',
+                    'disable',
+                    'ceph-osd@{osd_id}'.format(osd_id=osd_id),


any reason why trailing ".service" was omitted? This is the default, but we don't have 100% certainty that it will always be this way, and IMO it's better to specify the full form of the unit name instead of relying on systemd to fill in the blank.

unless you think that's important and likely to happen, I'd rather have that in a separate commit for cleanup to keep this one minimal. I'm under the impression that systemd will keep that naming convetion to avoid backward incompatible problems.

of course, it's not likely to happen and that kind of change falls into the "cleanup" category.

ghost · 2016-12-01T21:04:19Z

jenkins test this please (test_objectstore_memstore.sh)

ghost · 2016-12-01T22:01:31Z

teuthology-suite -k distro --verbose --suite ceph-disk --suite-branch master --ceph wip-17889-systemd-order --machine-type vps --priority 101 machine_types/vps.yaml ~/shaman.yaml

pass http://pulpito.ceph.com/loic-2016-12-01_21:03:34-ceph-disk-wip-17889-systemd-order-distro-basic-vps/

ceph-disk is responsable for enabling the unit file if needed. Actually since ceph/ceph#12241 it seems that it's not even needed. On an event of a restart, udev rules will be trigger and they will ceph-disk activate the device too so the 'enabled' is not needed. Closes: #1142 Signed-off-by: Sébastien Han <seb@redhat.com>

ghost added bug-fix build/ops labels Nov 30, 2016

ghost added this to the kraken milestone Nov 30, 2016

ghost changed the title ~~build/ops: long ceph-osd@.service pause between failures~~ DNM: build/ops: long ceph-osd@.service pause between failures Nov 30, 2016

smithfarm approved these changes Nov 30, 2016

View reviewed changes

ghost changed the title ~~DNM: build/ops: long ceph-osd@.service pause between failures~~ DNM: ceph-disk: systemctl must not enable OSD units Nov 30, 2016

idealguo reviewed Dec 1, 2016

View reviewed changes

ghost changed the title ~~DNM: ceph-disk: systemctl must not enable OSD units~~ DNM: ceph-disk: do not enable ceph-osd systemd units Dec 1, 2016

ghost changed the title ~~DNM: ceph-disk: do not enable ceph-osd systemd units~~ DNM: ceph-disk: enable --runtime ceph-osd systemd units Dec 1, 2016

ghost changed the title ~~DNM: ceph-disk: enable --runtime ceph-osd systemd units~~ ceph-disk: enable --runtime ceph-osd systemd units Dec 1, 2016

ghost assigned tchaikov Dec 1, 2016

ghost changed the title ~~ceph-disk: enable --runtime ceph-osd systemd units~~ DNM: ceph-disk: enable --runtime ceph-osd systemd units Dec 1, 2016

ghost assigned smithfarm Dec 1, 2016

smithfarm reviewed Dec 1, 2016

View reviewed changes

ghost changed the title ~~DNM: ceph-disk: enable --runtime ceph-osd systemd units~~ ceph-disk: enable --runtime ceph-osd systemd units Dec 1, 2016

ghost mentioned this pull request Dec 1, 2016

jewel: ceph-disk: ceph-disk@.service races with ceph-osd@.service #12147

Merged

leseb mentioned this pull request Dec 2, 2016

change systemd activation and units ceph/ceph-ansible#1142

Closed

liewegas added wip-sage-testing and removed wip-sage-testing labels Dec 2, 2016

ghost merged commit bbd97ce into ceph:master Dec 2, 2016

leseb mentioned this pull request Jul 26, 2017

osd: do not enable osd@id unit file ceph/ceph-ansible#1715

Merged

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ceph-disk: enable --runtime ceph-osd systemd units #12241

ceph-disk: enable --runtime ceph-osd systemd units #12241

ghost commented Nov 30, 2016 •

edited by ghost

idealguo left a comment •

edited

idealguo Dec 1, 2016

ghost Dec 1, 2016

idealguo Dec 1, 2016

ghost Dec 1, 2016 •

edited by ghost

idealguo Dec 1, 2016

ghost Dec 1, 2016

idealguo Dec 1, 2016

smithfarm commented Dec 1, 2016 •

edited

ghost commented Dec 1, 2016

ghost commented Dec 1, 2016

smithfarm Dec 1, 2016

ghost Dec 1, 2016

smithfarm Dec 1, 2016

ghost commented Dec 1, 2016

ghost commented Dec 1, 2016

ceph-disk: enable --runtime ceph-osd systemd units #12241

ceph-disk: enable --runtime ceph-osd systemd units #12241

Conversation

ghost commented Nov 30, 2016 • edited by ghost

idealguo left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost Dec 1, 2016 • edited by ghost

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smithfarm commented Dec 1, 2016 • edited

ghost commented Dec 1, 2016

ghost commented Dec 1, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ghost commented Dec 1, 2016

ghost commented Dec 1, 2016

ghost commented Nov 30, 2016 •

edited by ghost

idealguo left a comment •

edited

ghost Dec 1, 2016 •

edited by ghost

smithfarm commented Dec 1, 2016 •

edited