ceph-disk: partprobe should block udev induced BLKRRPART #9330

ghost · 2016-05-25T17:23:18Z

Fixes http://tracker.ceph.com/issues/15176

idryomov · 2016-05-25T18:35:24Z

I would suggest a shorter sleep and more iterations, at least 50/5 or 100/2-3. Given that this race is so easy to hit that you thought the device had stayed busy, 25 might not cut it. I also think it's worth printing an info message on each 10th or 20th iteration.

ghost · 2016-05-25T20:09:31Z

@idryomov thanks for your quick review :-) Is this better ?

idryomov · 2016-05-25T20:21:33Z

Looks good to me. Did you try it on that box?

ghost · 2016-05-25T20:34:14Z

@idryomov not yet, I will

ghost · 2016-05-26T06:35:38Z

while : ; do
    for i in $(seq 50) ; do
        if partprobe /dev/vdc > /dev/null 2>&1 ; then
            echo " $i"
            break
        else
            echo -n .
            sleep 5
        fi
    done
done

to verify on a machine where the problem was observed originally that running partprobe 50 times is enough. The highest number of retry so far is 14 (which is really high...).

ghost · 2016-05-26T07:08:24Z

A better option could be to block udev BLKRRPART with flock -s /dev/vdd partprobe /dev/vdd.

ghost · 2016-05-26T08:04:23Z

@idryomov instead of retrying, partprobe is wrapped around flock (b7fff6a). Do you think this is sane ? I wrote a long explanation with pointers in the b7fff6a commit message. I've been running

[root@dhcp-126-126 ~]# while : ; do flock -s /dev/vdc partprobe /dev/vdc || break ; done

in a loop and saw no failure so far. The idea was suggested by Marius Vollmer and Brian C. Lane. It's simple and elegant and I think it fits well with how ceph-disk is currently using partprobe and sgdisk.

I also have verified that the flock in udev was there from the start (i.e. udev 214 which is a tag that includes 02ba8fb3357daf57f6120ac512fb464a4c623419 which adds the flock).

idryomov · 2016-05-26T08:27:56Z

I had flock on my mind too - that's what I meant by "udev backs off" in my reply on the tracker, but the problem is that udev started doing flock not too long ago. If we care about distros with an older udev (< 214), flock won't work. One such distro is Ubuntu 14.04, with parted 2.3 and a pre-systemd udev. Have you ever observed failures there, or is this specific to parted 3.1?

ghost · 2016-05-26T08:33:59Z

I had flock on my mind too - that's what I meant by "udev backs off" in my reply on the tracke

I was too uneducated at the time to understand what you meant, sorry about that :-(

If we care about distros with an older udev (< 214), flock won't work.

Prior to udev 214, no BLKRRPART was issued when the device open in write mode is closed. Therefore I don't think we need to worry about that. Or am I missing something ?

ghost · 2016-05-26T08:37:32Z

test this please

idryomov · 2016-05-26T08:39:08Z

In that case, I'd call this resolved ;)

(I lost track of all the udev-related problems, see http://tracker.ceph.com/issues/14737 for an example.)

ghost · 2016-05-26T09:11:47Z

ceph-workbench ceph-qa-suite --verbose --simultaneous-jobs 9 --ceph-qa-suite-git-url http://github.com/dachary/ceph-qa-suite --suite-branch wip-ceph-disk --ceph-git-url http://github.com/dachary/ceph --ceph wip-15176-partprobe --suite ceph-disk --upload --filter ubuntu_14

http://167.114.237.66:8081/ubuntu-2016-05-26_15:45:06-ceph-disk-wip-15176-partprobe---basic-openstack/

ghost · 2016-05-27T10:03:33Z

The bluestore part of the ceph-disk suite fails on master, therefore also fails with this patch. The rest of the suite passes on Ubuntu 14.04. Now trying with CentOS 7.2.

ghost · 2016-05-27T11:01:36Z

It also passes the ceph-disk suite on CentOS 7.2 (except for the bluestore and the multipath test that also fails on master).

idealguo · 2016-07-02T09:16:56Z

@dachary, I test the patch on jewel 10.2.2 for several times, it resolves the bug #15176. Do you has some plans to merge it to jewel branch?

ghost · 2016-07-02T09:51:40Z

I'm waiting for someone to review this pull request. I should ask explicitly now. @tchaikov would you have time for that ?

tchaikov · 2016-07-02T10:05:02Z

sure, lemme take a look at it.

tchaikov · 2016-07-04T08:29:36Z

note for myself, when trying to understand the racing here.

ceph-disk: write(/dev/sdc), close(/dev/sdc) => inotify =>
- systemd/udev: with flock(dev/sdc, exclusive=True): ioctl(BLKRRPART)
- ceph-disk: partprobe: BLKPG commands

guard partprobe with flock() fixes this.

tchaikov · 2016-07-04T08:31:09Z

@dachary could you please add "Fixes: http://tracker.ceph.com/issues/15176" in your commit message. then it is good to merge.

Wrap partprobe with flock to stop udev from issuing BLKRRPART because this is racy and frequently fails with a message like: Error: Error informing the kernel about modifications to partition /dev/vdc1 -- Device or resource busy. This means Linux won't know about any changes you made to /dev/vdc1 until you reboot -- so you shouldn't mount it or use it in any way before rebooting. Opening a device (/dev/vdc for instance) in write mode indirectly triggers a BLKRRPART ioctl from udev (starting version 214 and up) when the device is closed (see below for the udev release note). However, if udev fails to acquire an exclusive lock (with flock(fd, LOCK_EX|LOCK_NB); ) the BLKRRPART ioctl is not issued. https://github.com/systemd/systemd/blob/045e00cf16c47bc516c0823d059b7548f3ce9c7c/src/udev/udevd.c#L1042 Acquiring an exclusive lock before running the process that opens the device in write mode is therefore an effective way to control this behavior. git clone git://anonscm.debian.org/pkg-systemd/systemd.git systemd/NEWS: CHANGES WITH 214: * As an experimental feature, udev now tries to lock the disk device node (flock(LOCK_SH|LOCK_NB)) while it executes events for the disk or any of its partitions. Applications like partitioning programs can lock the disk device node (flock(LOCK_EX)) and claim temporary device ownership that way; udev will entirely skip all event handling for this disk and its partitions. If the disk was opened for writing, the close will trigger a partition table rescan in udev's "watch" facility, and if needed synthesize "change" events for the disk and all its partitions. This is now unconditionally enabled, and if it turns out to cause major problems, we might turn it on only for specific devices, or might need to disable it entirely. Device Mapper devices are excluded from this logic. Fixes: http://tracker.ceph.com/issues/15176 Signed-off-by: Marius Vollmer <marius.vollmer@redhat com> Signed-off-by: Loic Dachary <loic@dachary.org>

ghost · 2016-07-04T09:03:19Z

@tchaikov thanks for your review. I amended the commit message and will merge as soon as make check returns.

ghost added bug-fix core labels May 25, 2016

ghost changed the title ~~ceph-disk: try partprobe more and more often~~ ceph-disk: partprobe should block udev induced BLKRRPART May 26, 2016

tchaikov self-assigned this Jul 2, 2016

tchaikov assigned ghost and unassigned tchaikov Jul 4, 2016

tchaikov merged commit 042c0aa into ceph:master Jul 4, 2016

tchaikov assigned tchaikov and unassigned ghost Jul 4, 2016

pcahyna mentioned this pull request Oct 18, 2017

partprobe fails rear/rear#793

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ceph-disk: partprobe should block udev induced BLKRRPART #9330

ceph-disk: partprobe should block udev induced BLKRRPART #9330

ghost commented May 25, 2016 •

edited by ghost

idryomov commented May 25, 2016

ghost commented May 25, 2016

idryomov commented May 25, 2016 •

edited

ghost commented May 25, 2016

ghost commented May 26, 2016 •

edited by ghost

ghost commented May 26, 2016 •

edited by ghost

ghost commented May 26, 2016 •

edited by ghost

idryomov commented May 26, 2016

ghost commented May 26, 2016

ghost commented May 26, 2016

idryomov commented May 26, 2016

ghost commented May 26, 2016 •

edited by ghost

ghost commented May 27, 2016

ghost commented May 27, 2016 •

edited by ghost

idealguo commented Jul 2, 2016

ghost commented Jul 2, 2016

tchaikov commented Jul 2, 2016

tchaikov commented Jul 4, 2016

tchaikov commented Jul 4, 2016

ghost commented Jul 4, 2016

ceph-disk: partprobe should block udev induced BLKRRPART #9330

ceph-disk: partprobe should block udev induced BLKRRPART #9330

Conversation

ghost commented May 25, 2016 • edited by ghost

idryomov commented May 25, 2016

ghost commented May 25, 2016

idryomov commented May 25, 2016 • edited

ghost commented May 25, 2016

ghost commented May 26, 2016 • edited by ghost

ghost commented May 26, 2016 • edited by ghost

ghost commented May 26, 2016 • edited by ghost

idryomov commented May 26, 2016

ghost commented May 26, 2016

ghost commented May 26, 2016

idryomov commented May 26, 2016

ghost commented May 26, 2016 • edited by ghost

ghost commented May 27, 2016

ghost commented May 27, 2016 • edited by ghost

idealguo commented Jul 2, 2016

ghost commented Jul 2, 2016

tchaikov commented Jul 2, 2016

tchaikov commented Jul 4, 2016

tchaikov commented Jul 4, 2016

ghost commented Jul 4, 2016

ghost commented May 25, 2016 •

edited by ghost

idryomov commented May 25, 2016 •

edited

ghost commented May 26, 2016 •

edited by ghost

ghost commented May 26, 2016 •

edited by ghost

ghost commented May 26, 2016 •

edited by ghost

ghost commented May 26, 2016 •

edited by ghost

ghost commented May 27, 2016 •

edited by ghost