ceph-disk: fix failures when preparing disks with udev > 214 #6926

ghost · 2015-12-14T23:36:58Z

No description provided.

loic-bot · 2015-12-19T14:34:08Z

SUCCESS: http://jenkins.ceph.dachary.org/job/ceph/10341/

SUCCESS http://jenkins.ceph.dachary.org/job/ceph/LABELS=centos-7&&x86_64/10341/

Signed-off-by: Loic Dachary <loic@dachary.org>

Should parted output fail to parse, it is useful to get the full output when running in verbose mode. Signed-off-by: Loic Dachary <loic@dachary.org>

Signed-off-by: Loic Dachary <loic@dachary.org>

So that reading the teuthology log is enough in most cases to figure out the cause of the error. Signed-off-by: Loic Dachary <loic@dachary.org>

The default of 120 seconds may be exceeded when the disk is very slow which can happen in cloud environments. Increase it to 600 seconds instead. The partprobe command may fail for the same reason but it does not have a timeout parameter. Instead, try a few times before failing. The udevadm settle guarding partprobe are not necessary because partprobe already does the same. However, partprobe does not provide a way to control the timeout. Having a udevadm settle after another is going to be a noop most of the time and not add any delay. It matters when the udevadm settle run by partprobe fails with a timeout because partprobe will silentely ignores the failure. http://tracker.ceph.com/issues/14080 Fixes: #14080 Signed-off-by: Loic Dachary <loic@dachary.org>

The behavior of partprobe or sgdisk may be subtly different if given a symbolic link to a device instead of an actual device. The debug output is also more confusing when the symlink shows instead of the device it points to. Always dereference the symlink before running destroy and zap. Signed-off-by: Loic Dachary <loic@dachary.org>

sgdisk -i 1 /dev/vdb opens /dev/vdb in write mode which indirectly triggers a BLKRRPART ioctl from udev (starting version 214 and up) when the device is closed (see below for the udev release note). The implementation of this ioctl by the kernel (even old kernels) removes all partitions and adds them again (similar to what partprobe does explicitly). The side effects of partitions disappearing while ceph-disk is running are devastating. sgdisk is replaced by blkid which only opens the device in read mode and will not trigger this unexpected behavior. The problem does not show on Ubuntu 14.04 because it is running udev < 214 but shows on CentOS 7 which is running udev > 214. git clone git://anonscm.debian.org/pkg-systemd/systemd.git systemd/NEWS: CHANGES WITH 214: * As an experimental feature, udev now tries to lock the disk device node (flock(LOCK_SH|LOCK_NB)) while it executes events for the disk or any of its partitions. Applications like partitioning programs can lock the disk device node (flock(LOCK_EX)) and claim temporary device ownership that way; udev will entirely skip all event handling for this disk and its partitions. If the disk was opened for writing, the close will trigger a partition table rescan in udev's "watch" facility, and if needed synthesize "change" events for the disk and all its partitions. This is now unconditionally enabled, and if it turns out to cause major problems, we might turn it on only for specific devices, or might need to disable it entirely. Device Mapper devices are excluded from this logic. http://tracker.ceph.com/issues/14094 Fixes: #14094 Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Signed-off-by: Loic Dachary <loic@dachary.org>

When ceph-disk prepares the disk, it triggers udev events and each of them ceph-disk activate. If systemctl stop ceph-osd@2 happens while there still are ceph-disk activate in flight, the systemctl stop may be cancelled by the systemctl enable issued by one of the pending ceph-disk activate. This only matters in a test environment where disks are destroyed shortly after they are activated. Signed-off-by: Loic Dachary <loic@dachary.org>

ghost · 2015-12-21T17:39:58Z

cw ceph-qa-suite --verbose --ceph-qa-suite-git-url http://github.com/dachary/ceph-qa-suite --suite-branch master --ceph-git-url http://github.com/dachary/ceph --ceph wip-14080-ceph-disk-udevadm --suite ceph-disk --upload

pass http://167.114.233.29:8081/ubuntu-2015-12-21_15:35:08-ceph-disk-wip-14080-ceph-disk-udevadm---basic-openstack/

ghost · 2015-12-21T17:48:45Z

Proposed merge message:

ceph-disk: fix failures when preparing disks with udev > 214
On CentOS 7.1 and other operating systems with a version of udev greater or equal to 214, 
running ceph-disk prepare triggered unexpected removal and addition of partitions on
the disk being prepared. That created problems ranging from the OSD not being activated
to failures because /dev/sdb1 does not exist although it should.

ceph-disk: fix failures when preparing disks with udev > 214 On CentOS 7.1 and other operating systems with a version of udev greater or equal to 214, running ceph-disk prepare triggered unexpected removal and addition of partitions on the disk being prepared. That created problems ranging from the OSD not being activated to failures because /dev/sdb1 does not exist although it should. Reviewed-by: Sage Weil <sage@redhat.com>

hoonetorg · 2016-02-11T15:41:37Z

src/ceph-disk

-    command_check_call(['udevadm', 'settle'])
-    command_check_call(['partprobe', dev])
-    command_check_call(['udevadm', 'settle'])
+    partprobe_ok = False


this fixes for me ceph-disk create on raw disk device under centos 7.2
I downloaded 10.0.0.3 from gitbuilder repo today, which contains that fix.
As of http://tracker.ceph.com/issues/14080 it's planned to backport to all current releases. This would be nice. Will this pull request get into 0.94.6?

It did not make it to 0.94.6

ghost added bug-fix core labels Dec 14, 2015

ldachary added 8 commits December 21, 2015 11:31

ceph-disk: do not discard stderr

5fa35ba

Signed-off-by: Loic Dachary <loic@dachary.org>

ceph-disk: log parted output

f5d36b9

Should parted output fail to parse, it is useful to get the full output when running in verbose mode. Signed-off-by: Loic Dachary <loic@dachary.org>

ceph-disk: fix typo

b271a06

Signed-off-by: Loic Dachary <loic@dachary.org>

tests: ceph-disk workunit increase verbosity

fd7fe8c

So that reading the teuthology log is enough in most cases to figure out the cause of the error. Signed-off-by: Loic Dachary <loic@dachary.org>

ghost changed the title ~~ceph-disk: increase udevadm settle timeout~~ ceph-disk: use blkid instead of sgdisk -i Dec 21, 2015

ghost assigned liewegas Dec 21, 2015

liewegas merged commit e8dccac into ceph:master Dec 21, 2015

ghost changed the title ~~ceph-disk: use blkid instead of sgdisk -i~~ ceph-disk: fix failures when preparing disks with udev > 214 Feb 10, 2016

hoonetorg reviewed Feb 11, 2016
View reviewed changes

adamcrume mentioned this pull request Sep 29, 2014

rbd: Fix rbd diff for non-existent objects #2523

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ceph-disk: fix failures when preparing disks with udev > 214 #6926

ceph-disk: fix failures when preparing disks with udev > 214 #6926

ghost commented Dec 14, 2015

loic-bot commented Dec 19, 2015

ghost commented Dec 21, 2015

ghost commented Dec 21, 2015

hoonetorg Feb 11, 2016

ghost Feb 11, 2016

ceph-disk: fix failures when preparing disks with udev > 214 #6926

ceph-disk: fix failures when preparing disks with udev > 214 #6926

Conversation

ghost commented Dec 14, 2015

loic-bot commented Dec 19, 2015

ghost commented Dec 21, 2015

ghost commented Dec 21, 2015

hoonetorg Feb 11, 2016

Choose a reason for hiding this comment

ghost Feb 11, 2016

Choose a reason for hiding this comment