Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipmi_msghandler doesnt let the unload nvidia driver. #173

Closed
onsc opened this issue Jun 30, 2018 · 11 comments
Closed

ipmi_msghandler doesnt let the unload nvidia driver. #173

onsc opened this issue Jun 30, 2018 · 11 comments

Comments

@onsc
Copy link

onsc commented Jun 30, 2018

Hi.
My laptop has nvidia 950m so has optimus technology.
as i write at title. ipmi_msghandler stops unloading nvidia. I think many people suffers about this.

lsmod
nvidia              14045184  12
ipmi_msghandler        57344  1 nvidia
....
sudo rmmod nvidia
rmmod: ERROR: Module nvidia is in use
sudo rmmod ipmi_msghandler 
rmmod: ERROR: Module ipmi_msghandler is in use by: nvidia

i tried to blacklist ipmi_msghandler but it didnt work.
my blacklist.conf :

install nouveau /usr/bin/false
#install nvidia /usr/bin/false
#blacklist nvidia
#blacklist nouveau
#remove nvidia modprobe -r --ignore-remove nvidia-modeset nvidia-uvm nvidia
install ipmi_si /usr/bin/false
install ipmi_devintf /usr/bin/false
install ipmi_msghandler /usr/bin/false

mkinitpcio -p linux is done every blacklist.conf changing then restarted....
remove code didnt work so i commented it. i removed nvidia by pacman. So no ipmi modules loaded. But after installing nvidia driver, ipmi_msghandler loads.I also installed bumblebee so all nvidia drivers are blacklisted. modprobe -c :

blacklist nvidia
blacklist nvidia_drm
blacklist nvidia_modeset
blacklist nvidia_uvm
blacklist nouveau
blacklist nouveau
install nouveau /usr/bin/false
install ipmi_si /usr/bin/false
install ipmi_devintf /usr/bin/false
install ipmi_msghandler /usr/bin/false
...

i use arch linux.
kernel 4.17.3-1
nvidia 396.24-13

i also tried with nvidia-dkms , nothing changed.

i blacklisted nvidia by install command. So no nvidia module or ipmi modules loaded. Then i tried to modprobe nvidia, permission denied by blacklist.conf. Without restarting my laptop, I commented nvidia line in blacklist.conf . Then i tried bbswitch it works. If i force to remove nvidia driver bbswitch works on OFF mode, but cant set ON mode. Laptop freezes.

i tried to detect if my laptop has ipmi. There is no BIOS entry or /dev/ipmi* devices. Also tried by freeipmi tools, ipmi-detect ipmi-ping , dmesg etc. All ipmi tools are uninstalled and there is no systemctl service about ipmi.

Thank you. ( sorry about my english. )

@randombk
Copy link

I have the exact same issue and symptoms, though I'm unsure if IPMI is the issue (it looks like IPMI is a dependency of Nvidia, not the other way around so I don't think that is the reason why the module can't be unloaded). Switching to nouveau seems to be the only viable workaround for VFIO users.

@gsgxnet
Copy link

gsgxnet commented Oct 28, 2018

same here. I need the NVidia GPU for CUDA only. So normally no need for bbswitch etc. Working setup in the past was a /etc/modprobe.d/50-nvidia-az.conf file blacklisting all nvidia drivers:

blacklist nvidia-nvlink
blacklist nvidia-modeset
blacklist nvidia-uvm
blacklist nvidia-drm
blacklist nvidia

When GPU is needed for CUDA just manually modprobe the modules. Command used for that: nvidia-modprobe -c 0

Now when installing the 410 drivers this setup does not work any more. All drives are loaded at boot, despite the blacklist. Same dependency on
ipmi_msghandler 65536 2 ipmi_devintf,nvidia.
So trying to modprobe -r does neither succeed with nvidia.

Anybody any clue?

@mysticaltech
Copy link

Same problem here! Anyone?

@mysticaltech
Copy link

@gsgxnet Got it. List all processes using nvidia and kill them.

lsof | grep /dev/nvidia

Now kill all the processes you see using nvidia. They are chained, so just killing like the 3 mother processes will work.

kill 1234

Then:

modprobe -f -r nvidia_drm
modprobe -f -r nvidia_nodeset
modprobe -f -r nvidia

This will successfully unload nvidia, so the installation can proceed. If there are other errors, of course checking the installer log file is useful. In my case it somehow detected a that X was running, so I also had to kill the process mentioned in the log and remove /tmp/.X1-lock, but I think that this may be particular to my machine.

@onsc
Copy link
Author

onsc commented Nov 20, 2018

@gsgxnet Got it. List all processes using nvidia and kill them.

lsof | grep /dev/nvidia

Now kill all the processes you see using nvidia. They are chained, so just killing like the 3 mother processes will work.

kill 1234

Then:

modprobe -f -r nvidia_drm
modprobe -f -r nvidia_nodeset
modprobe -f -r nvidia

This will successfully unload nvidia, so the installation can proceed. If there are other errors, of course checking the installer log file is useful. In my case it somehow detected a that X was running, so I also had to kill the process mentioned in the log and remove /tmp/.X1-lock, but I think that this may be particular to my machine.

i havent tried to kill process before switch. Can you switch graphics now?

@mysticaltech
Copy link

Sadly man I don't know, I'm just using this technique to update the Nvidia driver, not actually using bumblebee.

@randombk
Copy link

Unfortunately, killing all processes won't help with more advanced use cases (VFIO is the one I care most about) . Both X and Wayland hold references to the Nvidia driver, and must be killed before unloading the driver, effectively killing hotplugging functionality.

@abditag2
Copy link

Any progress on this one? I have the same problem.

@onsc
Copy link
Author

onsc commented May 23, 2019

At The END, i managed to run it.

First of all, i followed the instruction at this site => https://antergos.com/wiki/hardware/bumblebee-for-nvidia-optimus/

There were some instructions about kernel parameter => Bumblebee-Project/Bumblebee#764 (comment)

so i added this => acpi_osi=! acpi_osi="Windows 2009"

my laptop is "MSI GL62 6QD"

using nvidia driver 430 and kernel 5.1

i have done many things but imho what missing is acpi_osi=! parameter

i hope this helps some ppl.

@colapsnux
Copy link

colapsnux commented Jul 26, 2019

Try adding between the last install nvidia.... and remove nvidia.... line in /etc/modprobe.d/nvidia.conf file

install nvidia /bin/false

then

sudo update-initramfs -u

Should be done after reboot.

Its a little hack but its work for me !

@onsc onsc closed this as completed Jul 26, 2019
@EricTheMagician
Copy link

I found this issue on google.
Posting here for posterity.

I had the nvidia-persistenced service running.
I stopped it with sudo systemctl stop nvidia-persistenced.service

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants