New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cope with block devices with no MAJOR in udev #86
Conversation
https://bugs.launchpad.net/subiquity/+bug/1868109 reports a crash stemming from a block device that has no MAJOR in udev. I'm not sure what that represents, but crashing probably isn't the correct response. Ignore such devices instead.
probert/filesystem.py
Outdated
@@ -34,7 +34,7 @@ def probe(context=None): | |||
for device in context.list_devices(subsystem='block'): | |||
# Ignore block major=1 (ramdisk) and major=7 (loopback) | |||
# these won't ever be used in recreating storage on target systems. | |||
if device['MAJOR'] not in ["1", "7"]: | |||
if device.get('MAJOR') not in ["1", "7", None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be clearer if we spelled out what we're doing here; these are two separate checks and I'd prefer if they weren't conflated:
if 'MAJOR' in device and device['MAJOR'] not in ["1", "7"]:
Should we be updating the comment above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI. The "device" that triggered this bug, seems to be the one below (extract from udevadm info -e):
P: /devices/pci0000:17/0000:17:00.0/0000:18:00.0/nvme/nvme0/nvme0c0n1
L: 0
E: DEVPATH=/devices/pci0000:17/0000:17:00.0/0000:18:00.0/nvme/nvme0/nvme0c0n1
E: SUBSYSTEM=block
E: DEVTYPE=disk
E: USEC_INITIALIZED=762755939
E: MPATH_SBIN_PATH=/sbin
E: DM_MULTIPATH_DEVICE_PATH=0
E: ID_SERIAL_SHORT=S4CCNE0M300015
E: ID_WWN=eui.344343304d3000150025384500000004
E: ID_MODEL=SAMSUNG MZPLL3T2HAJQ-00005
E: ID_REVISION=GPJA0B3Q
E: ID_SERIAL=SAMSUNG MZPLL3T2HAJQ-00005_S4CCNE0M300015
E: ID_PATH=pci-0000:18:00.0-nvme-1
E: ID_PATH_TAG=pci-0000_18_00_0-nvme-1
E: TAGS=:systemd:
I somehow thing this might be a kernel bug on power system
But amd64
Not sure why my nvme drive is called also nvme0c0n1 does not exist on disk....
Things look very odd. |
Ok so it has multipath enabled. |
It is as if, we only see multipath nvme device, but not the virtual nvme subsystem block namespace (nvme0n1). |
So as if we need to request pyudev to give us virtual nvme drives too.... |
Managed to configure multipath on that nmve, but it seems like it is single path only. I do wonder if we need to support assembling nvme multipath devices somehow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A subsystem='block'
device nvme0c0n1 without a MAJOR number in this case is multipath control node, and is not a block device at all. It does not have /dev/nvme* node, and it does not have DEVLINKS. And is not something we can install anything, and thus it should be ingored, and not returned by the function at all.
Given above processing by multipathd maybe nvme0c0n1
is a single path, to nvme0n
blockdevice. I.e. we should only care about nvme0n1
.
if 'MAJOR' not in device:
continue
should be added before checking that MAJOR
is not in 1,7.
OK I updated to be more clear and less cute. |
That feels like a bug. I'm not a fan of this "partially hiding stuff" in the kernel. Why isn't the control node a char device since you can't actually open or interact with it as a block device? Previous discuss a while back on the curtin side for more background. |
probert/filesystem.py
Outdated
for device in context.list_devices(subsystem='block'): | ||
if "MAJOR" not in device: | ||
# Shouldn't happen but apparently does! (LP: #1868109) | ||
continue | ||
# Ignore block major=1 (ramdisk) and major=7 (loopback) | ||
# these won't ever be used in recreating storage on target systems. | ||
if device['MAJOR'] not in ["1", "7"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should have filtering helper which takes a context and yields devices if they pass these two if checks; then both files can call the same helper to enumerate the devices they need to operate on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That feels like a bug. I'm not a fan of this "partially hiding stuff" in the kernel. Why isn't the control node a char device since you can't actually open or interact with it as a block device?
Agreed but I'm even less of a fan of block probe failures :)
Linux block devices, not even once.
Previous discuss a while back on the curtin side for more background.
I thought I'd seen some discussion of this sort of thing before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unittest needs fixing:
diff --git a/probert/tests/test_lvm.py b/probert/tests/test_lvm.py
index 548bd74..c14f375 100644
--- a/probert/tests/test_lvm.py
+++ b/probert/tests/test_lvm.py
@@ -264,12 +264,13 @@ class TestLvm(testtools.TestCase):
@mock.patch('probert.lvm.read_sys_block_size_bytes')
@mock.patch('probert.lvm.activate_volgroups')
@mock.patch('probert.lvm.lvm_scan')
- @mock.patch('probert.lvm.pyudev.Context.list_devices')
+ @mock.patch('probert.lvm.sane_block_devices')
@mock.patch('probert.lvm.probe_vgs_report')
- def test_probe(self, m_vgs, m_pyudev, m_scan, m_activate, m_size, m_run):
+ def test_probe(self, m_vgs, m_blockdevs, m_scan, m_activate, m_size,
+ m_run):
size = 1000
m_size.return_value = size
- m_pyudev.return_value = CONTEXT
+ m_blockdevs.return_value = CONTEXT
m_vgs.return_value = VGS_REPORT
expected_result = {
@@ -297,13 +298,13 @@ class TestLvm(testtools.TestCase):
@mock.patch('probert.lvm.read_sys_block_size_bytes')
@mock.patch('probert.lvm.activate_volgroups')
@mock.patch('probert.lvm.lvm_scan')
- @mock.patch('probert.lvm.pyudev.Context.list_devices')
+ @mock.patch('probert.lvm.sane_block_devices')
@mock.patch('probert.lvm.probe_vgs_report')
- def test_probe_skip_dupes(self, m_vgs, m_pyudev, m_scan, m_activate,
+ def test_probe_skip_dupes(self, m_vgs, m_blockdevs, m_scan, m_activate,
m_size, m_run):
size = 1000
m_size.return_value = size
- m_pyudev.return_value = CONTEXT_DUPES
+ m_blockdevs.return_value = CONTEXT_DUPES
m_vgs.return_value = VGS_REPORT_DUPES
expected_result = {
https://bugs.launchpad.net/subiquity/+bug/1868109 reports a crash
stemming from a block device that has no MAJOR in udev. I'm not sure
what that represents, but crashing probably isn't the correct response.
Ignore such devices instead.