Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mimic: krbd: avoid udev netlink socket overrun and retry on transient errors from udev_enumerate_scan_devices() #31322

Merged
merged 10 commits into from Nov 6, 2019

Conversation

adamemerson and others added 10 commits Oct 22, 2019
The thread may well no longer exist by the time we try to set the
name, so have the thread set its own name first thing.

Thanks to Ilya Dryomov <idryomov@gmail.com> for pointing it out.

Signed-off-by: Adam C. Emerson <aemerson@redhat.com>
(cherry picked from commit 2bd106e)
This timeout was added as a (very poor) workaround for an issue
addressed in commit 42dd1ea ("krbd: fix rbd map hang due to udev
return subsystem unordered").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit ffb66ff)

Conflicts:
	src/krbd.cc [ ceph_abort_msgf() not in mimic ]
Move event processing into UdevMapHandler and UdevUnmapHandler
functors and replace wait_for_udev_{add,remove}() with a single
wait_for_mapping() template.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit c84f9e2)

Conflicts:
	src/krbd.cc [ krbd_spec not in mimic ]
This also exposes errors from udev_monitor_receive_device() which were
previously ignored.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 53aab34)
Because the event(s) we are interested in can be deliveled while we are
still in the kernel finishing map or unmap, we start listening for udev
events before going into the kernel.  However, if (un)mapping takes its
time, udev netlink socket can be fairly easily overrun -- the filtering
is done on the listener side, so we get to process everything, not just
rbd events.  If any of the events of interest get dropped (ENOBUFS), we
hang in poll().

Go into the kernel in a separate thread and leave the main thread to
run the event loop.  The return value is communicated to the reactor
though a pipe.

Fixes: https://tracker.ceph.com/issues/41404
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 5444a11)

Conflicts:
	src/krbd.cc [ krbd_spec, ceph_abort_msgf() not in mimic ]
Even though with the previous commit we no longer block between binding
the socket and starting handling events, we still want a larger receive
buffer to accommodate for scheduling delays.  Since the filtering is
done in the listener, an estimate focused on just rbd is not accurate,
but anyway: a pair of "rbd" and "block" events for "rbd map" take 2048
bytes in the receive buffer.  This allows for roughly a thousand of
them ("rbd map" and "rbd unmap" require root and libudev makes use of
SO_RCVBUFFORCE so rmem_max limit is ignored).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 1c6cac1)
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit 898c113)
udev_enumerate_scan_devices() doesn't handle disappearing devices well.
If called while some devices are being removed, it sometimes propagates
ENOENT and ENODEV errors encountered operating on directory entries in
/sys that no longer exist.  Some of these errors are suppressed, but
this isn't reliable and varies across versions.  In particular, systemd
239 suppresses ENODEV from sd_device_new_from_syspath() but doesn't
suppress ENODEV from sd_device_get_devnum().  In systemd 243 the call
to sd_device_get_devnum() has been moved, but it still leaks ENOENT
from sd_device_get_is_initialized() (referring to the body of
FOREACH_DIRENT_ALL loop in enumerator_scan_dir_and_add_devices()).

Assume that all ENOENT and ENODEV errors are transient and retry the
call to udev_enumerate_scan_devices().  Don't limit the number, but log
each retry.

Fixes: https://tracker.ceph.com/issues/41036
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit e5921ef)

Conflicts:
	src/krbd.cc [ rbd namespaces not in mimic ]
systemd 219 doesn't have the issue that is worked around in the
previous commit, but has a different one: udev_enumerate_scan_devices()
always succeeds, but sometimes returns an empty list when the device is
actually there.  This happens rarely and at random so I haven't been
able to get to the bottom of it yet, but it looks like another similar
race condition in libudev.

Since an empty list is expected if the device isn't there, retry just
twice with a small sleep in-between.  This appears to be enough: I got
7 occurrences per 600000 "rbd unmap" invocations, all of which needed
a single retry:

  rbd: udev enumerate missed a device, tries = 1

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit bd37a72)

Conflicts:
	src/krbd.cc [ krbd_spec not in mimic ]
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
(cherry picked from commit b7a0e2a)
@yuriw

This comment has been minimized.

Copy link
Contributor

yuriw commented Nov 4, 2019

@smithfarm

This comment has been minimized.

Copy link
Contributor

smithfarm commented Nov 5, 2019

Thanks, @idryomov

@smithfarm

This comment has been minimized.

Copy link
Contributor

smithfarm commented Nov 6, 2019

Sorry, @yuriw - I broke the rbd suite in mimic. Just merged the fix #31424 so it should work again now.

@idryomov

This comment has been minimized.

Copy link
Contributor Author

idryomov commented Nov 6, 2019

Note that this should be tested with krbd suite, not rbd.

Copy link
Contributor

dillaman left a comment

👍

@smithfarm smithfarm added krbd and removed rbd labels Nov 6, 2019
@smithfarm

This comment has been minimized.

Copy link
Contributor

smithfarm commented Nov 6, 2019

OK, I made a new "krbd" label.

@yuriw yuriw merged commit aa6f46e into ceph:mimic Nov 6, 2019
4 checks passed
4 checks passed
Docs: build check OK - docs built
Details
Signed-off-by all commits in this PR are signed
Details
Unmodified Submodules submodules for project are unmodified
Details
make check make check succeeded
Details
@idryomov idryomov deleted the idryomov:wip-krbd-udev-fixes-mimic branch Nov 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.