Use device_create to ensure /dev nodes are created correctly. #547

arnej27959 · 2023-08-16T14:05:05Z

After following installation instruction for CUDA on RHEL 8.8, I got into problems later on; after debugging with system call tracing it turned out because some of the device nodes like /dev/nvidia-uvm or /dev/nvidiactl did not exist. There are tips in
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#device-node-verification
for how to fix this manually, but that should not really be necessary.
Currently there are rules in /usr/lib/udev/rules.d/60-nvidia.rules which creates these using "mknod", but "journalctl" showed that they fail randomly:

Aug 16 09:35:34 gpu-test-arnej-1 sudo[6801]: arnej_yahooinc_com : TTY=pts/0 ; PWD=/home/arnej_yahooinc_com ; USER=root ; COMMAND=/bin/nvidia-modprobe
Aug 16 09:35:34 gpu-test-arnej-1 sudo[6801]: pam_unix(sudo:session): session opened for user root by arnej_yahooinc_com(uid=0)
Aug 16 09:35:35 gpu-test-arnej-1 kernel: nvidia: module license 'NVIDIA' taints kernel.
Aug 16 09:35:35 gpu-test-arnej-1 kernel: Disabling lock debugging due to kernel taint
Aug 16 09:35:35 gpu-test-arnej-1 systemd-udevd[614]: Network interface NamePolicy= disabled on kernel command line, ignoring.
Aug 16 09:35:35 gpu-test-arnej-1 kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 240
Aug 16 09:35:35 gpu-test-arnej-1 kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  535.86.10  Wed Jul 26 23:20:03 UTC 2023
Aug 16 09:35:35 gpu-test-arnej-1 systemd-udevd[6804]: Process '/usr/bin/bash -c '/usr/bin/mknod -Z -m 666 /dev/nvidiactl c $(grep nvidia-frontend /proc/devices | cut -d \  -f 1) 255'' failed with exit code 1.
Aug 16 09:35:35 gpu-test-arnej-1 systemd-udevd[6812]: Process '/usr/bin/bash -c '/usr/bin/mknod -Z -m 666 /dev/nvidiactl c $(grep nvidia-frontend /proc/devices | cut -d \  -f 1) 255'' failed with exit code 1.
Aug 16 09:35:35 gpu-test-arnej-1 systemd-udevd[6804]: Process '/usr/bin/bash -c 'for i in $(cat /proc/driver/nvidia/gpus/*/information | grep Minor | cut -d \  -f 4); do /usr/bin/mknod -Z -m 666 /dev/nvidia${i} c $(grep nvidia-frontend /proc/devices | cut -d \  -f 1) ${i}; done'' failed with exit code 1.
Aug 16 09:35:35 gpu-test-arnej-1 kernel: nvidia-uvm: Loaded the UVM driver, major device number 238.

Best practice however is that the device driver should trigger creation directly using device_create() kernel function, using "mknod" in udev rules is not the usual way to solve this.

This PR takes care of calling device_create() as needed and device_destroy() to cleanup when module or device is detached. I have tested it after disabling udev rules by using "rmmod" and "modprobe" to load and unload modules, and of course also that it works on reboot.

kanashimia · 2023-10-21T00:23:43Z

The original justification for not using device_create is probably that it is marked as EXPORT_SYMBOL_GPL, and as such can't be used in a proprietary module, that is not a problem in the open kernel module, but it adds additional difference between them.

CLAassistant · 2024-06-06T06:32:26Z

All committers have signed the CLA.

Use device_create to ensure /dev nodes are created correctly.

fb89dc3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use device_create to ensure /dev nodes are created correctly. #547

Use device_create to ensure /dev nodes are created correctly. #547

arnej27959 commented Aug 16, 2023

kanashimia commented Oct 21, 2023

CLAassistant commented Jun 6, 2024 •

edited

Loading

Use device_create to ensure /dev nodes are created correctly. #547

Are you sure you want to change the base?

Use device_create to ensure /dev nodes are created correctly. #547

Conversation

arnej27959 commented Aug 16, 2023

kanashimia commented Oct 21, 2023

CLAassistant commented Jun 6, 2024 • edited Loading

CLAassistant commented Jun 6, 2024 •

edited

Loading