Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph-disk: resolve race conditions #12136

Merged
merged 2 commits into from Nov 23, 2016
Merged

ceph-disk: resolve race conditions #12136

merged 2 commits into from Nov 23, 2016

Conversation

ghost
Copy link

@ghost ghost commented Nov 22, 2016

A ceph udev action may be triggered before the local file systems are
mounted because there is no ordering in udev. The ceph udev action
delegates asynchronously to systemd via ceph-disk@.service which will
fail if (for instance) the LVM partition required to mount /var/lib/ceph
is not available yet. The systemd unit will retry a few times but will
eventually fail permanently. The sysadmin can systemctl reset-fail at a
later time and it will succeed.

Add a dependency to ceph-disk@.service so that it waits until the local
file systems are mounted:

After=local-fs.target

Since local-fs.target depends on lvm, it will wait until the lvm
partition (as well as any dm devices) is ready and mounted before
attempting to activate the OSD. It may still fail because the
corresponding journal/data partition is not ready yet (which is
expected) but it will no longer fail because the lvm/filesystems/dm are
not ready.

Fixes: http://tracker.ceph.com/issues/17889

Signed-off-by: Loic Dachary <loic@dachary.org>
The udev rules that set the owner/group of the OSD devices are racing
with 50-udev-default.rules and depending on which udev event fires last,
ownership may not be as expected.

Since ceph-disk trigger --sync runs as root, always happens after
dm/lvm/filesystem units are complete and before activation, it is a good
time to set the ownership of the device.

It does not eliminate all races: a script running after systemd
local-fs.target and firing a udev event may create a situation where the
permissions of the device are temporarily reverted while the activation
is running.

Fixes: http://tracker.ceph.com/issues/17813

Signed-off-by: Loic Dachary <loic@dachary.org>
@ghost ghost added bug-fix core labels Nov 22, 2016
@ghost ghost added this to the kraken milestone Nov 22, 2016
@ghost
Copy link
Author

ghost commented Nov 22, 2016

jenkins test this please (http://tracker.ceph.com/issues/17991)

@ghost ghost changed the title resolve ceph-disk race conditions ceph-disk: resolve race conditions Nov 22, 2016
@ghost
Copy link
Author

ghost commented Nov 22, 2016

teuthology-suite -k distro --verbose --suite ceph-disk --suite-branch master --ceph wip-17889-systemd-order --machine-type vps --priority 101 machine_types/vps.yaml

@@ -1,5 +1,7 @@
[Unit]
Description=Ceph disk activation: %f
After=local-fs.target
Wants=local-fs.target
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick, but shouldn't this be Requires=?
You want this unit to fail if the local fs can't be mounted (although, if that happens, you have bigger problems).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make sense to me also. I chose to go with the same kind of dependency as the other .service and .target defined by Ceph. As you say: if that is not present...we're in trouble big time ;-)

@ghost ghost assigned tchaikov Nov 22, 2016
@ghost
Copy link
Author

ghost commented Nov 22, 2016

@tchaikov would you have time to take a look ?

Copy link
Contributor

@tchaikov tchaikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@tchaikov tchaikov merged commit bd1eb7a into ceph:master Nov 23, 2016
@rubenk
Copy link
Contributor

rubenk commented Nov 25, 2016

The first patch might actually have no effect, since this is the default behaviour.
Lennart describes this in https://lists.freedesktop.org/archives/systemd-devel/2016-November/037924.html:

Note that After=local-fs.target is redundant for regular
services. ("Regular" being defined here as service which do not set
DefaultDependencies=no). This is because if DefaultDependencies=yes is
set (which is the implied default for all services), then they gain an
ordering dep on basic.target which in turn is ordered against
local-fs.target. Hence, because ordering deps are transitive all
regular services are indirectly also ordered against local-fs.target.

Not that it can do any harm to set this explicitly, but it might not be the fix for the race condition we're looking for.

@ghost
Copy link
Author

ghost commented Nov 25, 2016

@rubenk that would be really surprising because the observed behavior can only be explained by ceph-disk trying to activate an OSD before a local file system mounted on a LVM is available. Thanks a lot for the link, I'll explore that further.

@rubenk
Copy link
Contributor

rubenk commented Nov 25, 2016

Indeed, it surprised me as well, but I just checked (note this is without lvm and without this PR):

$ systemctl list-dependencies --after ceph-disk@dev-sda1.service
ceph-disk@dev-sda1.service
● ├─system-ceph\x2ddisk.slice
● ├─systemd-journald.socket
● └─basic.target
●   ├─rhel-import-state.service
●   ├─systemd-ask-password-plymouth.path
●   ├─paths.target
●   │ ├─brandbot.path
●   │ ├─systemd-ask-password-console.path
●   │ └─systemd-ask-password-wall.path
●   ├─slices.target
●   │ ├─-.slice
●   │ ├─system.slice
●   │ └─user.slice
●   ├─sockets.target
●   │ ├─dbus.socket
●   │ ├─sshd.socket
●   │ ├─syslog.socket
●   │ ├─systemd-initctl.socket
●   │ ├─systemd-journald.socket
●   │ ├─systemd-shutdownd.socket
●   │ ├─systemd-udevd-control.socket
●   │ └─systemd-udevd-kernel.socket
●   └─sysinit.target
●     ├─auditd.service
●     ├─dev-hugepages.mount
●     ├─dev-mqueue.mount
●     ├─emergency.service
●     ├─kmod-static-nodes.service
●     ├─plymouth-read-write.service
●     ├─proc-sys-fs-binfmt_misc.automount
●     ├─rhel-autorelabel-mark.service
●     ├─rhel-autorelabel.service
●     ├─rhel-loadmodules.service
●     ├─sys-fs-fuse-connections.mount
●     ├─sys-kernel-config.mount
●     ├─sys-kernel-debug.mount
●     ├─systemd-binfmt.service
●     ├─systemd-firstboot.service
●     ├─systemd-hwdb-update.service
●     ├─systemd-journal-catalog-update.service
●     ├─systemd-journald.service
●     ├─systemd-machine-id-commit.service
●     ├─systemd-modules-load.service
●     ├─systemd-random-seed.service
●     ├─systemd-readahead-collect.service
●     ├─systemd-readahead-replay.service
●     ├─systemd-sysctl.service
●     ├─systemd-tmpfiles-setup-dev.service
●     ├─systemd-tmpfiles-setup.service
●     ├─systemd-udev-trigger.service
●     ├─systemd-udevd.service
●     ├─systemd-update-done.service
●     ├─systemd-update-utmp.service
●     ├─systemd-vconsole-setup.service
●     ├─cryptsetup.target
●     ├─emergency.target
●     │ ├─emergency.service
●     │ ├─rhel-import-state.service
●     │ └─rhel-readonly.service
●     ├─local-fs.target
●     │ ├─-.mount
●     │ ├─boot-efi.mount
●     │ ├─rhel-readonly.service
●     │ ├─run-user-10001.mount
●     │ ├─systemd-fsck-root.service
●     │ ├─systemd-remount-fs.service
●     │ ├─tmp.mount
●     │ ├─var-lib-ceph-osd-ams1\x2dceph01\x2d108.mount
●     │ ├─var-lib-ceph-osd-ams1\x2dceph01\x2d109.mount
●     │ ├─var-lib-ceph-osd-ams1\x2dceph01\x2d110.mount
●     │ ├─var-lib-ceph-osd-ams1\x2dceph01\x2d111.mount
●     │ ├─var-lib-ceph-osd-ams1\x2dceph01\x2d112.mount
●     │ ├─var-lib-ceph-osd-ams1\x2dceph01\x2d113.mount
●     │ ├─var-lib-ceph-osd-ams1\x2dceph01\x2d114.mount
●     │ ├─var-lib-ceph-osd-ams1\x2dceph01\x2d115.mount
●     │ ├─var-lib-ceph-osd-ams1\x2dceph01\x2d116.mount
●     │ ├─var-lib-ceph-osd-ams1\x2dceph01\x2d117.mount
●     │ ├─var-lib-ceph-osd-ams1\x2dceph01\x2d118.mount
●     │ ├─var-lib-ceph-osd-ams1\x2dceph01\x2d119.mount
●     │ ├─var.mount
●     │ └─local-fs-pre.target
●     │   ├─systemd-remount-fs.service
●     │   └─systemd-tmpfiles-setup-dev.service
●     └─swap.target
●       └─dev-disk-by\x2duuid-7bbd0383\x2d1882\x2d403d\x2db126\x2dc98722da63b9.swap

@ghost
Copy link
Author

ghost commented Nov 25, 2016

@rubenk thanks, that's very helpful :-) I'll resume investigations on monday since it seems the problem is still there.

@ghost
Copy link
Author

ghost commented Nov 25, 2016

I reopened http://tracker.ceph.com/issues/17813

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants