Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph-disk: enable --runtime ceph-osd systemd units #12241

Merged
2 commits merged into from Dec 2, 2016
Merged

ceph-disk: enable --runtime ceph-osd systemd units #12241

2 commits merged into from Dec 2, 2016

Conversation

ghost
Copy link

@ghost ghost commented Nov 30, 2016

@ghost ghost added this to the kraken milestone Nov 30, 2016
@ghost ghost changed the title build/ops: long ceph-osd@.service pause between failures DNM: build/ops: long ceph-osd@.service pause between failures Nov 30, 2016
@ghost ghost changed the title DNM: build/ops: long ceph-osd@.service pause between failures DNM: ceph-disk: systemctl must not enable OSD units Nov 30, 2016
Copy link

@idealguo idealguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -19,6 +19,7 @@ TasksMax=infinity
Restart=on-failure
StartLimitInterval=30min
StartLimitBurst=3
RestartSec=5min
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe 5 minutes delay is too long in production env?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@idealguo maybe it is. Do you have a specific scenario in mind ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the default value is 100ms (https://www.freedesktop.org/software/systemd/man/systemd.service.html)
and the "osd_heartbeat_grace" is 20s by default, may be 20s or around is an option

Copy link
Author

@ghost ghost Dec 1, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the scenario I have in mind, at boot time, ceph-disk@.service may happen 5 to 10 minutes after ceph-osd@.service attempted to run. There is no doubt that waiting minutes is useful. What I'm not 100% sure yet is if this can cause problem.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern is when the osd service exit accidently, such as: killed by someone. If we wait 5mins before restart, it will be too long

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right. What about

RestartSec=20s
StartLimitBurst=30

That will account for both cases. At boot time it will retry every 20s for about 10 minutes. And at runtime it will restart within 20 seconds with no risk for the OSD to be marked down. I don't think anything at runtime will behave differently if the OSD restarts after 20 seconds instead of restarting after 100ms. What do you think ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 100ms is too sensitive, it will reach "StartLimitBurst" limitation quickly

@smithfarm
Copy link
Contributor

smithfarm commented Dec 1, 2016

@dachary Would "ceph-disk: do not enable ceph-osd systemd units" be a better title for this PR (and for the second commit)?

@ghost ghost changed the title DNM: ceph-disk: systemctl must not enable OSD units DNM: ceph-disk: do not enable ceph-osd systemd units Dec 1, 2016
@ghost
Copy link
Author

ghost commented Dec 1, 2016

@smithfarm indeed, thanks :-)

Instead of the default 100ms pause before trying to restart an OSD, wait
20 seconds instead and retry 30 times instead of 3. There is no scenario
in which restarting an OSD almost immediately after it failed would get
a better result.

It is possible that a failure to start is due to a race with another
systemd unit at boot time. For instance if ceph-disk@.service is
delayed, it may start after the OSD that needs it. A long pause may give
the racing service enough time to complete and the next attempt to start
the OSD may succeed.

This is not a sound alternative to resolve a race, it only makes the OSD
boot process less sensitive. In the example above, the proper fix is to
enable --runtime ceph-osd@.service so that it cannot race at boot time.

The wait delay should not be minutes to preserve the current runtime
behavior. For instance, if an OSD is killed or fails and restarts after
10 minutes, it will be marked down by the ceph cluster.  This is not a
change that could break things but it is significant and should be
avoided.

Refs: http://tracker.ceph.com/issues/17889

Signed-off-by: Loic Dachary <loic@dachary.org>
@ghost ghost changed the title DNM: ceph-disk: do not enable ceph-osd systemd units DNM: ceph-disk: enable --runtime ceph-osd systemd units Dec 1, 2016
@ghost ghost changed the title DNM: ceph-disk: enable --runtime ceph-osd systemd units ceph-disk: enable --runtime ceph-osd systemd units Dec 1, 2016
@ghost ghost assigned tchaikov Dec 1, 2016
@ghost
Copy link
Author

ghost commented Dec 1, 2016

@liewegas I'll figure something out to fix existing installations on upgrade.

@ghost ghost changed the title ceph-disk: enable --runtime ceph-osd systemd units DNM: ceph-disk: enable --runtime ceph-osd systemd units Dec 1, 2016
@ghost ghost assigned smithfarm Dec 1, 2016
If ceph-osd@.service is enabled for a given device (say /dev/sdb1 for
osd.3) the ceph-osd@3.service will race with ceph-disk@dev-sdb1.service
at boot time.

Enabling ceph-osd@3.service is not necessary at boot time because

   ceph-disk@dev-sdb1.service

calls

   ceph-disk activate /dev/sdb1

which calls

   systemctl start ceph-osd@3

The systemctl enable/disable ceph-osd@.service called by ceph-disk
activate is changed to add the --runtime option so that ceph-osd units
are lost after a reboot. They are recreated when ceph-disk activate is
called at boot time so that:

   systemctl stop ceph

knows which ceph-osd@.service to stop when a script or sysadmin wants
to stop all ceph services.

Before enabling ceph-osd@.service (that happens at every boot time),
make sure the permanent enablement in /etc/systemd is removed so that
only the one added by systemctl enable --runtime in /run/systemd
remains. This is useful to upgrade an existing cluster without creating
a situation that is even worse than before because ceph-disk@.service
races against two ceph-osd@.service (one in /etc/systemd and one in
/run/systemd).

Fixes: http://tracker.ceph.com/issues/17889

Signed-off-by: Loic Dachary <loic@dachary.org>
[
'systemctl',
'disable',
'ceph-osd@{osd_id}'.format(osd_id=osd_id),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why trailing ".service" was omitted? This is the default, but we don't have 100% certainty that it will always be this way, and IMO it's better to specify the full form of the unit name instead of relying on systemd to fill in the blank.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless you think that's important and likely to happen, I'd rather have that in a separate commit for cleanup to keep this one minimal. I'm under the impression that systemd will keep that naming convetion to avoid backward incompatible problems.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course, it's not likely to happen and that kind of change falls into the "cleanup" category.

@ghost
Copy link
Author

ghost commented Dec 1, 2016

jenkins test this please (test_objectstore_memstore.sh)

@ghost
Copy link
Author

ghost commented Dec 1, 2016

teuthology-suite -k distro --verbose --suite ceph-disk --suite-branch master --ceph wip-17889-systemd-order --machine-type vps --priority 101 machine_types/vps.yaml ~/shaman.yaml

@ghost ghost changed the title DNM: ceph-disk: enable --runtime ceph-osd systemd units ceph-disk: enable --runtime ceph-osd systemd units Dec 1, 2016
@ghost ghost merged commit bbd97ce into ceph:master Dec 2, 2016
leseb added a commit to ceph/ceph-ansible that referenced this pull request Jul 26, 2017
ceph-disk is responsable for enabling the unit file if needed. Actually
since ceph/ceph#12241 it seems that it's not
even needed. On an event of a restart, udev rules will be trigger and
they will ceph-disk activate the device too so the 'enabled' is not
needed.

Closes: #1142
Signed-off-by: Sébastien Han <seb@redhat.com>
leseb added a commit to ceph/ceph-ansible that referenced this pull request Jul 26, 2017
ceph-disk is responsable for enabling the unit file if needed. Actually
since ceph/ceph#12241 it seems that it's not
even needed. On an event of a restart, udev rules will be trigger and
they will ceph-disk activate the device too so the 'enabled' is not
needed.

Closes: #1142
Signed-off-by: Sébastien Han <seb@redhat.com>
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants