problems with unloading nvidia module on fedora 18 with 3.9 kernel #433

Closed
gsgatlin opened this Issue Jun 3, 2013 · 33 comments

Comments

Projects
None yet
@gsgatlin

gsgatlin commented Jun 3, 2013

Hello. Here are some logs:

messages log: http://pastebin.com/yvAPqJNL

xorg log: http://pastebin.com/haJ8LbAz

When a fedora 18 box with nvidia drivers 319.23 is booted. The nvidia card is off.

However, after one use, the card stays on until the next reboot.

The logs seems to indicate that the nvida module cannot be unloaded. I thought maybe this issue was related to #420 however, if I blacklist nvidia in grub2, it still doesn't fix the issue. (maybe blacklisting in grub only works for nouveau and not nvidia?)

I am getting random emails from people on the internet asking me to upgrade the nvidia drivers on fedora. (because of newer laptops I guess) But this seems unwise if the card will never shut off without a reboot. So I'm kind of not sure how I should proceed. Thanks for any ideas anyone has. I messed around with nvidia drivers all day yesterday and most problems seemed solvable (SELINUX issues, etc) but this one has me thinking is a show stopper for doing any upgrade.

Please note this is only a problem in fedora 17/18/19. It is not showing up for me in CentOS kernel 2.6.32-358.6.2 with 319.23.. bbswitch turns off the card fine. But is for sure an issue with kernel 3.9.4-200 and 319.23.

Thanks so much for any ideas anyone might have.

@amonakov

This comment has been minimized.

Show comment Hide comment
@amonakov

amonakov Jun 3, 2013

Contributor

Looks like there's a process still using the card? sudo fuser -v /dev/nvidia*

Contributor

amonakov commented Jun 3, 2013

Looks like there's a process still using the card? sudo fuser -v /dev/nvidia*

@gsgatlin

This comment has been minimized.

Show comment Hide comment
@gsgatlin

gsgatlin Jun 3, 2013

Here is the output after a fresh boot. I have two terminals. One where I am root and one I am launching optirun in as a regular user.

[root@localhost ~]# ls -al /dev/nvidia*
ls: cannot access /dev/nvidia_: No such file or directory
[root@localhost ~]# fuser -v /dev/nvidia_
Specified filename /dev/nvidia* does not exist.

(now user gsgatlin launches optirun -b primus glxgears and leaves it running)

[root@localhost ~]# fuser -v /dev/nvidia*
USER PID ACCESS COMMAND
/dev/nvidia0: root 2102 F...m Xorg
gsgatlin 2106 F...m glxgears
/dev/nvidiactl: root 2102 F.... Xorg
gsgatlin 2106 F.... glxgears

(now gsgatlin closes the glxgears window and optirun exits.)

[root@localhost ~]# fuser -v /dev/nvidia*
[root@localhost ~]# ls -al /dev/nvidia*
crw-rw-rw-. 1 root root 195, 0 Jun 3 17:26 /dev/nvidia0
crw-rw-rw-. 1 root root 195, 255 Jun 3 17:26 /dev/nvidiactl

[root@localhost ~]# cat /proc/acpi/bbswitch
0000:01:00.0 ON

It seems there is no output from fuser -v /dev/nvidia* the third time after optirun has been launched and exited.

gsgatlin commented Jun 3, 2013

Here is the output after a fresh boot. I have two terminals. One where I am root and one I am launching optirun in as a regular user.

[root@localhost ~]# ls -al /dev/nvidia*
ls: cannot access /dev/nvidia_: No such file or directory
[root@localhost ~]# fuser -v /dev/nvidia_
Specified filename /dev/nvidia* does not exist.

(now user gsgatlin launches optirun -b primus glxgears and leaves it running)

[root@localhost ~]# fuser -v /dev/nvidia*
USER PID ACCESS COMMAND
/dev/nvidia0: root 2102 F...m Xorg
gsgatlin 2106 F...m glxgears
/dev/nvidiactl: root 2102 F.... Xorg
gsgatlin 2106 F.... glxgears

(now gsgatlin closes the glxgears window and optirun exits.)

[root@localhost ~]# fuser -v /dev/nvidia*
[root@localhost ~]# ls -al /dev/nvidia*
crw-rw-rw-. 1 root root 195, 0 Jun 3 17:26 /dev/nvidia0
crw-rw-rw-. 1 root root 195, 255 Jun 3 17:26 /dev/nvidiactl

[root@localhost ~]# cat /proc/acpi/bbswitch
0000:01:00.0 ON

It seems there is no output from fuser -v /dev/nvidia* the third time after optirun has been launched and exited.

@hadrons123

This comment has been minimized.

Show comment Hide comment
@hadrons123

hadrons123 Jun 4, 2013

Is this current issue anyway related to this ?
Bumblebee-Project/bbswitch#46

Is this current issue anyway related to this ?
Bumblebee-Project/bbswitch#46

@Lekensteyn

This comment has been minimized.

Show comment Hide comment
@Lekensteyn

Lekensteyn Jun 4, 2013

Owner

@hadrons123 Unless Fedora ships with an old version of bbswitch, no.

The current recommended version for Linux 3.9 is bbswitch 0.7. 0.5 probably does not work, 0.6 shows a warning which some people mistake for an Oops or panic.

@gsgatlin For some reason Xorg segfaults (signal 11), bumblebeed is unable to rmmod the nvidia driver and as a result bbswitch won't be used to disable the card.

Jun  3 16:26:27 localhost bumblebeed[627]: [  118.543106] [INFO]Stopping X server
Jun  3 16:26:27 localhost bumblebeed[627]: [  118.655549] [DEBUG]Process with PID 1993 terminated with 11
Jun  3 16:26:27 localhost bumblebeed[627]: [  118.655966] [INFO]Unloading nvidia driver
Jun  3 16:26:27 localhost bumblebeed[627]: [  118.656255] [DEBUG]Process rmmod started, PID 2002.
Jun  3 16:26:27 localhost bumblebeed[627]: rmmod: ERROR: Module nvidia is in use
Jun  3 16:26:27 localhost bumblebeed[627]: [  118.661445] [DEBUG]Process with PID 2002 returned code 1
Jun  3 16:26:27 localhost abrt[2001]: Can't read '/proc/1993/status': No such file or directory
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673658] [ERROR]Unloading nvidia driver timed out.
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673774] [DEBUG]Drivers are still loaded, unable to disable card
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673814] [DEBUG][XORG] Server terminated successfully (0). Closing log file.
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673835] [ERROR][XORG] (EE)
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673846] [ERROR][XORG] (EE) Backtrace:
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673855] [ERROR][XORG] (EE) 0: Xorg (OsLookupColor+0x139) [0x472509]
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673865] [ERROR][XORG] (EE) 1: /usr/lib64/libpthread.so.0 (__restore_rt+0x0) [0x3478c0efff]
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673875] [ERROR][XORG] (EE) 2: /usr/lib64/libc.so.6 (malloc_usable_size+0x15) [0x3478480375]
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673886] [ERROR][XORG] (EE) 3: /usr/lib64/nvidia-bumblebee/libGL.so.1 (glXCreateNewContext+0x31a64) [0x7ff9982846f8]
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673896] [ERROR][XORG] (EE)
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673904] [ERROR][XORG] (EE) Segmentation fault at address 0x0
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673912] [DEBUG][XORG] Fatal server error:
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673919] [DEBUG][XORG] Caught signal 11 (Segmentation fault). Server aborting
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673928] [ERROR][XORG] (EE)
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673942] [DEBUG][XORG] Please consult the Fedora Project support
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673952] [DEBUG][XORG] #011 at http://wiki.x.org
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673963] [DEBUG][XORG]  for help.
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673971] [ERROR][XORG] (EE) Please also check the log file at "/var/log/Xorg.8.log" for additional information.
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673980] [ERROR][XORG] (EE)
Jun  3 16:26:33 localhost dbus-daemon[661]: dbus[661]: [system] Activating service name='net.reactivated.Fprint' (using servicehelper)
Jun  3 16:26:33 localhost dbus[661]: [system] Activating service name='net.reactivated.Fprint' (using servicehelper)
Owner

Lekensteyn commented Jun 4, 2013

@hadrons123 Unless Fedora ships with an old version of bbswitch, no.

The current recommended version for Linux 3.9 is bbswitch 0.7. 0.5 probably does not work, 0.6 shows a warning which some people mistake for an Oops or panic.

@gsgatlin For some reason Xorg segfaults (signal 11), bumblebeed is unable to rmmod the nvidia driver and as a result bbswitch won't be used to disable the card.

Jun  3 16:26:27 localhost bumblebeed[627]: [  118.543106] [INFO]Stopping X server
Jun  3 16:26:27 localhost bumblebeed[627]: [  118.655549] [DEBUG]Process with PID 1993 terminated with 11
Jun  3 16:26:27 localhost bumblebeed[627]: [  118.655966] [INFO]Unloading nvidia driver
Jun  3 16:26:27 localhost bumblebeed[627]: [  118.656255] [DEBUG]Process rmmod started, PID 2002.
Jun  3 16:26:27 localhost bumblebeed[627]: rmmod: ERROR: Module nvidia is in use
Jun  3 16:26:27 localhost bumblebeed[627]: [  118.661445] [DEBUG]Process with PID 2002 returned code 1
Jun  3 16:26:27 localhost abrt[2001]: Can't read '/proc/1993/status': No such file or directory
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673658] [ERROR]Unloading nvidia driver timed out.
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673774] [DEBUG]Drivers are still loaded, unable to disable card
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673814] [DEBUG][XORG] Server terminated successfully (0). Closing log file.
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673835] [ERROR][XORG] (EE)
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673846] [ERROR][XORG] (EE) Backtrace:
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673855] [ERROR][XORG] (EE) 0: Xorg (OsLookupColor+0x139) [0x472509]
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673865] [ERROR][XORG] (EE) 1: /usr/lib64/libpthread.so.0 (__restore_rt+0x0) [0x3478c0efff]
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673875] [ERROR][XORG] (EE) 2: /usr/lib64/libc.so.6 (malloc_usable_size+0x15) [0x3478480375]
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673886] [ERROR][XORG] (EE) 3: /usr/lib64/nvidia-bumblebee/libGL.so.1 (glXCreateNewContext+0x31a64) [0x7ff9982846f8]
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673896] [ERROR][XORG] (EE)
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673904] [ERROR][XORG] (EE) Segmentation fault at address 0x0
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673912] [DEBUG][XORG] Fatal server error:
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673919] [DEBUG][XORG] Caught signal 11 (Segmentation fault). Server aborting
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673928] [ERROR][XORG] (EE)
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673942] [DEBUG][XORG] Please consult the Fedora Project support
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673952] [DEBUG][XORG] #011 at http://wiki.x.org
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673963] [DEBUG][XORG]  for help.
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673971] [ERROR][XORG] (EE) Please also check the log file at "/var/log/Xorg.8.log" for additional information.
Jun  3 16:26:30 localhost bumblebeed[627]: [  121.673980] [ERROR][XORG] (EE)
Jun  3 16:26:33 localhost dbus-daemon[661]: dbus[661]: [system] Activating service name='net.reactivated.Fprint' (using servicehelper)
Jun  3 16:26:33 localhost dbus[661]: [system] Activating service name='net.reactivated.Fprint' (using servicehelper)
@gsgatlin

This comment has been minimized.

Show comment Hide comment
@gsgatlin

gsgatlin Jun 4, 2013

I upgraded bbswitch to version 0.7. It did not help. But it did not hurt either. So I am going to push that out right now in my yum repo. :)

As for the issue with X crashing, when I downgrade my video drivers back to version 310.32, bbswitch can turn the card off again. (X stops crashing)

I wonder if there is a newer nvidia driver I could try that is not version 319.23. I will experiment some this afternoon to see if I can find a newish driver that does not crash X. Perhaps 319.17 will work better...

Thanks so much for all your help. I will update this issue with whatever I find out.

gsgatlin commented Jun 4, 2013

I upgraded bbswitch to version 0.7. It did not help. But it did not hurt either. So I am going to push that out right now in my yum repo. :)

As for the issue with X crashing, when I downgrade my video drivers back to version 310.32, bbswitch can turn the card off again. (X stops crashing)

I wonder if there is a newer nvidia driver I could try that is not version 319.23. I will experiment some this afternoon to see if I can find a newish driver that does not crash X. Perhaps 319.17 will work better...

Thanks so much for all your help. I will update this issue with whatever I find out.

@gsgatlin

This comment has been minimized.

Show comment Hide comment
@gsgatlin

gsgatlin Jun 4, 2013

Hello. Still crashes and leaves the nvidia card always on after just one use with 319.17...

messages log:
http://pastebin.com/wJCcKy60

xorg log:
http://pastebin.com/1kpKHdaw

Could the problems be with my xorg.conf.nvidia file? Its the only other thing I can think of.

Here is a copy of my xorg.conf.nvidia file:

http://pastebin.com/fsw0PMYe

Like I mentioned this worked fine with 310.32 but not 319.17 or 319.23. Do you think I should file some kind of bug report with nvidia corporation? Like do they have some kind of bug reports web app or something? Or maybe that could be hopeless since they (nvidia) don't support optimus technology on Linux...

Thanks for any ideas anyone has about this problem.

gsgatlin commented Jun 4, 2013

Hello. Still crashes and leaves the nvidia card always on after just one use with 319.17...

messages log:
http://pastebin.com/wJCcKy60

xorg log:
http://pastebin.com/1kpKHdaw

Could the problems be with my xorg.conf.nvidia file? Its the only other thing I can think of.

Here is a copy of my xorg.conf.nvidia file:

http://pastebin.com/fsw0PMYe

Like I mentioned this worked fine with 310.32 but not 319.17 or 319.23. Do you think I should file some kind of bug report with nvidia corporation? Like do they have some kind of bug reports web app or something? Or maybe that could be hopeless since they (nvidia) don't support optimus technology on Linux...

Thanks for any ideas anyone has about this problem.

@gsgatlin

This comment has been minimized.

Show comment Hide comment
@gsgatlin

gsgatlin Jun 4, 2013

Since this works with RHEL 6 and not with fedora. It must be some kind of fedora issue... Like maybe X is too new or something. I will try some newer and older versions to try to get a better handle on what is going on. Cheers.

gsgatlin commented Jun 4, 2013

Since this works with RHEL 6 and not with fedora. It must be some kind of fedora issue... Like maybe X is too new or something. I will try some newer and older versions to try to get a better handle on what is going on. Cheers.

@gsgatlin

This comment has been minimized.

Show comment Hide comment
@gsgatlin

gsgatlin Jun 6, 2013

Ok. I discovered a bit more information.

It turns out that X crashing is only a problem on fedora 18 and fedora 19 beta version. It is not a issue on fedora 17 or RHEL 6. And its only an issue for versions of the closed source driver newer than version 313.30. So version 319.12, 319.17, and 319.27 all do not work well with newer version of fedora. (They work but a reboot is required to turn off your card after a single use...)

I was not sure if 313.30 is a "long lived version" so I did not make rpms for that. But it remains a possibility to create that version of a package.

So what I have decided to do based on this information is to update the bumblebee-nvida drivers package to version 310.51 on fedora 19 and fedora 18. And on fedora 17 and RHEL 6 I have updated the driver rpm to version 319.23 since there are no problems on those distro versions.

I have created 319.23 rpms for fedora 18 and 19. But I did not put them in my repo yet. If any fedora folks would like to help troubleshoot this issue, I have made copies of the packages at:

http://install.linux.ncsu.edu/pub/yum/itecs/public/bumblebee-nonfree/issue433/

I am guessing it has something to do with the version of X in fedora 18/19.

I see there is a place to sort of report bugs at:

https://devtalk.nvidia.com/default/board/98/linux/1/

However, I am unsure how receptive they would be to any bumblebee issues like this one. Alternatively, it could be a X windows bug but we need to narrow down exactly which version broke things before a bugzilla could be created...

Just wanted to let folks know what I have discovered about the issue so far. Cheers,

gsgatlin commented Jun 6, 2013

Ok. I discovered a bit more information.

It turns out that X crashing is only a problem on fedora 18 and fedora 19 beta version. It is not a issue on fedora 17 or RHEL 6. And its only an issue for versions of the closed source driver newer than version 313.30. So version 319.12, 319.17, and 319.27 all do not work well with newer version of fedora. (They work but a reboot is required to turn off your card after a single use...)

I was not sure if 313.30 is a "long lived version" so I did not make rpms for that. But it remains a possibility to create that version of a package.

So what I have decided to do based on this information is to update the bumblebee-nvida drivers package to version 310.51 on fedora 19 and fedora 18. And on fedora 17 and RHEL 6 I have updated the driver rpm to version 319.23 since there are no problems on those distro versions.

I have created 319.23 rpms for fedora 18 and 19. But I did not put them in my repo yet. If any fedora folks would like to help troubleshoot this issue, I have made copies of the packages at:

http://install.linux.ncsu.edu/pub/yum/itecs/public/bumblebee-nonfree/issue433/

I am guessing it has something to do with the version of X in fedora 18/19.

I see there is a place to sort of report bugs at:

https://devtalk.nvidia.com/default/board/98/linux/1/

However, I am unsure how receptive they would be to any bumblebee issues like this one. Alternatively, it could be a X windows bug but we need to narrow down exactly which version broke things before a bugzilla could be created...

Just wanted to let folks know what I have discovered about the issue so far. Cheers,

@gsgatlin

This comment has been minimized.

Show comment Hide comment
@gsgatlin

gsgatlin Jul 16, 2013

Ok. I made a post at:

https://devtalk.nvidia.com/default/topic/556881/linux/bumblebee-issue-in-fedora-18-19-with-nvidia-drivers-newer-than-310-51-and-xorg-x11-server-xorg-1-14-/

I guess we'll just have to wait and see if they are willing to try to fix it. At least I tried...

Ok. I made a post at:

https://devtalk.nvidia.com/default/topic/556881/linux/bumblebee-issue-in-fedora-18-19-with-nvidia-drivers-newer-than-310-51-and-xorg-x11-server-xorg-1-14-/

I guess we'll just have to wait and see if they are willing to try to fix it. At least I tried...

@MichaelAquilina

This comment has been minimized.

Show comment Hide comment
@MichaelAquilina

MichaelAquilina Jul 16, 2013

Every post counts gsgatlin! The more nvidia see optimus needs linux support, the more likely theyll update their drivers for us :)

Every post counts gsgatlin! The more nvidia see optimus needs linux support, the more likely theyll update their drivers for us :)

@tcitworld

This comment has been minimized.

Show comment Hide comment
@tcitworld

tcitworld Jul 24, 2013

Version 319.32 has something in changelog that might interest you and might work. Sorry I can't test right now (away from internet).

Version 319.32 has something in changelog that might interest you and might work. Sorry I can't test right now (away from internet).

@vixus0

This comment has been minimized.

Show comment Hide comment
@vixus0

vixus0 Aug 6, 2013

This isn't just a Fedora issue, I'm having it on my Arch system too.

bbswitch version: 0.7
bumblebee version: 3.2.1-3
nvidia version: 319.32-4

vixus0 commented Aug 6, 2013

This isn't just a Fedora issue, I'm having it on my Arch system too.

bbswitch version: 0.7
bumblebee version: 3.2.1-3
nvidia version: 319.32-4

@aaronp24

This comment has been minimized.

Show comment Hide comment
@aaronp24

aaronp24 Aug 26, 2013

From your log:

 [   105.190] (II) LoadModule: "glamoregl"
 [   105.191] (II) Loading /usr/lib64/xorg/modules/libglamoregl.so

Does the problem go away if you disable loading this module?

From your log:

 [   105.190] (II) LoadModule: "glamoregl"
 [   105.191] (II) Loading /usr/lib64/xorg/modules/libglamoregl.so

Does the problem go away if you disable loading this module?

@gsgatlin

This comment has been minimized.

Show comment Hide comment
@gsgatlin

gsgatlin Aug 27, 2013

Hello. Thanks so much for looking at that. Unfortunately I don't think it fully worked. There is still something keeping it from unloading from the kernel.

I tried downloading the latest "long lived branch" 319.49.

I added Section

"Module"
Disable "libglamoregl"
EndSection

to the xorg.conf.nvidia file and in the logs it now says

128.186 "libglamoregl" will not be loaded unless you've specified it to be loaded elsewhere.

It doesn't crash/segfault now (Yay!) but I get this error message in /var/log/messages:

Aug 27 11:18:50 y470c bumblebeed[566]: rmmod: ERROR: Module nvidia is in use
Aug 27 11:18:53 y470c bumblebeed[566]: [ 146.805464] [ERROR]Unloading nvidia driver timed out.

Trying to rmmod the module by hand also fails with the same error messages. "ERROR: Module nvidia is in use"

fuser -v /dev/nvidia*

doesn't show any output.

I uploaded new logs to:

http://people.engr.ncsu.edu/gsgatlin/issue433/1/nvidia-bug-report.log.gz

http://people.engr.ncsu.edu/gsgatlin/issue433/1/Xorg.8.log

http://people.engr.ncsu.edu/gsgatlin/issue433/1/messages

Please let me know if there is anything else I should try? Thanks again for taking the time to look at this problem.

Hello. Thanks so much for looking at that. Unfortunately I don't think it fully worked. There is still something keeping it from unloading from the kernel.

I tried downloading the latest "long lived branch" 319.49.

I added Section

"Module"
Disable "libglamoregl"
EndSection

to the xorg.conf.nvidia file and in the logs it now says

128.186 "libglamoregl" will not be loaded unless you've specified it to be loaded elsewhere.

It doesn't crash/segfault now (Yay!) but I get this error message in /var/log/messages:

Aug 27 11:18:50 y470c bumblebeed[566]: rmmod: ERROR: Module nvidia is in use
Aug 27 11:18:53 y470c bumblebeed[566]: [ 146.805464] [ERROR]Unloading nvidia driver timed out.

Trying to rmmod the module by hand also fails with the same error messages. "ERROR: Module nvidia is in use"

fuser -v /dev/nvidia*

doesn't show any output.

I uploaded new logs to:

http://people.engr.ncsu.edu/gsgatlin/issue433/1/nvidia-bug-report.log.gz

http://people.engr.ncsu.edu/gsgatlin/issue433/1/Xorg.8.log

http://people.engr.ncsu.edu/gsgatlin/issue433/1/messages

Please let me know if there is anything else I should try? Thanks again for taking the time to look at this problem.

@neupsh

This comment has been minimized.

Show comment Hide comment
@neupsh

neupsh Sep 15, 2013

I have the same issue with opensuse 12.3 64 bit, nvidia 325.15, bumblebee 3.2.1, bbswitch 0.7. The card is off initially, but after it is used once, it never turns off. I have to suspend my laptop a mostly, and i rarely restart it, but after the suspend, i cannot use nvidia again, unless the machine is restarted.

neupsh commented Sep 15, 2013

I have the same issue with opensuse 12.3 64 bit, nvidia 325.15, bumblebee 3.2.1, bbswitch 0.7. The card is off initially, but after it is used once, it never turns off. I have to suspend my laptop a mostly, and i rarely restart it, but after the suspend, i cannot use nvidia again, unless the machine is restarted.

@ex0hunt

This comment has been minimized.

Show comment Hide comment
@ex0hunt

ex0hunt Sep 19, 2013

Same issue on Gentoo(x86_64) 13.0 with bumblebee 3.2.1, nvidia 325.15, bbswitch 0.7. Card off after force unload only: rmmod -f nvidia. fuser -v /dev/nvidia* doesn't show any output.

ex0hunt commented Sep 19, 2013

Same issue on Gentoo(x86_64) 13.0 with bumblebee 3.2.1, nvidia 325.15, bbswitch 0.7. Card off after force unload only: rmmod -f nvidia. fuser -v /dev/nvidia* doesn't show any output.

@gsgatlin

This comment has been minimized.

Show comment Hide comment
@gsgatlin

gsgatlin Oct 1, 2013

Hello. It seems the nvidia guys need more people to run nvidia-bug-report.sh as root user who are experiencing this bug and upload the results to the devtalk forum at https://devtalk.nvidia.com/default/topic/556881/linux/bumblebee-issue-in-fedora-18-19-with-nvidia-drivers-newer-than-310-51-and-xorg-x11-server-xorg-1-14-/

This may help them duplicate this issue inhouse for investigation.

gsgatlin commented Oct 1, 2013

Hello. It seems the nvidia guys need more people to run nvidia-bug-report.sh as root user who are experiencing this bug and upload the results to the devtalk forum at https://devtalk.nvidia.com/default/topic/556881/linux/bumblebee-issue-in-fedora-18-19-with-nvidia-drivers-newer-than-310-51-and-xorg-x11-server-xorg-1-14-/

This may help them duplicate this issue inhouse for investigation.

@hadrons123

This comment has been minimized.

Show comment Hide comment
@hadrons123

hadrons123 Oct 27, 2013

I still have this issue in Fedora 19. Arch linux works perfectly with nvidia 325.

I still have this issue in Fedora 19. Arch linux works perfectly with nvidia 325.

@gsgatlin

This comment has been minimized.

Show comment Hide comment
@gsgatlin

gsgatlin Oct 27, 2013

Nvidia has acknowledged this issue / problem. It is marked in their internal bug traker as bug 1380042. So maybe a future nvidia driver will fix the issue in fedora or perhaps their engineer's will suggest an addition to the /etc/bumblebee/xorg.conf.nvidia file that will fix it. I have not had a chance to test fedora 20 alpha yet.

Nvidia has acknowledged this issue / problem. It is marked in their internal bug traker as bug 1380042. So maybe a future nvidia driver will fix the issue in fedora or perhaps their engineer's will suggest an addition to the /etc/bumblebee/xorg.conf.nvidia file that will fix it. I have not had a chance to test fedora 20 alpha yet.

@moondrakegit

This comment has been minimized.

Show comment Hide comment
@moondrakegit

moondrakegit Nov 2, 2013

I doubt this has to do much with nvidia. I traced the kernel calls of the nvidia module and the lock on the module is actually held by the Xorg that is using the intel driver.

what happens is that udev notices there is a new dri device (usually named card1), and the running X instance picks this up:
[ 68287.919] removing GPU device /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1 52328608
68287.919 config/udev: Adding drm device (/dev/dri/card1)
68287.919 config/udev: Adding drm device (/dev/dri/card1)
68287.920 LoadModule: "modesetting"
68287.920 Loading /usr/lib64/xorg/modules/drivers/modesetting_drv.so
68287.920 Module modesetting: vendor="X.Org Foundation"
[ 68287.920] compiled for 1.13.0, module version = 0.6.0
[ 68287.920] Module class: X.Org Video Driver
[ 68287.920] ABI class: X.Org Video Driver, version 13.0
68287.920 modesetting(G0): using drv /dev/dri/card1
68287.920 modesetting(G0): Depth 24, (--) framebuffer bpp 32
68287.920 modesetting(G0): RGB weight 888
68287.920 modesetting(G0): Default visual is TrueColor
68287.920 modesetting(G0): ShadowFB: preferred NO, enabled NO
68287.920 modesetting(G0): KMS doesn't support dumb interface
68287.920 modesetting(G0): KMS setup failed
[ 68287.920] hotplugged device 0 didn't configure
68287.920 UnloadModule: "modesetting"
[ 68287.920] xf86: found device 1

after bumblebee finished, lsof /dev/dri/card1 still shows that the original X instance (not the nvidia one) is holding this file open (funny we all missed to check the obvious dri device....)

I could not find a way to prevent Xorg to auto open devices (which seems like it would be a good config option to have).

Instead, I tried playing with permissions to prevent access of the xorg using the intel driver to the device, but it seems they are ignored (which is bad if this is really so, but that is another issue. I am not an udev expert anyway).

This seems to work fine though and solves the issue for me:
SUBSYSTEM=="drm", KERNEL=="card1", RUN+="/bin/rm /dev/dri/card1"

So I guess the dri node is not really needed by optimus anyway.

Note that for a generic solution probably better attribute matching or such is needed to be sure that card1 is really the nvidia card.

I doubt this has to do much with nvidia. I traced the kernel calls of the nvidia module and the lock on the module is actually held by the Xorg that is using the intel driver.

what happens is that udev notices there is a new dri device (usually named card1), and the running X instance picks this up:
[ 68287.919] removing GPU device /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1 52328608
68287.919 config/udev: Adding drm device (/dev/dri/card1)
68287.919 config/udev: Adding drm device (/dev/dri/card1)
68287.920 LoadModule: "modesetting"
68287.920 Loading /usr/lib64/xorg/modules/drivers/modesetting_drv.so
68287.920 Module modesetting: vendor="X.Org Foundation"
[ 68287.920] compiled for 1.13.0, module version = 0.6.0
[ 68287.920] Module class: X.Org Video Driver
[ 68287.920] ABI class: X.Org Video Driver, version 13.0
68287.920 modesetting(G0): using drv /dev/dri/card1
68287.920 modesetting(G0): Depth 24, (--) framebuffer bpp 32
68287.920 modesetting(G0): RGB weight 888
68287.920 modesetting(G0): Default visual is TrueColor
68287.920 modesetting(G0): ShadowFB: preferred NO, enabled NO
68287.920 modesetting(G0): KMS doesn't support dumb interface
68287.920 modesetting(G0): KMS setup failed
[ 68287.920] hotplugged device 0 didn't configure
68287.920 UnloadModule: "modesetting"
[ 68287.920] xf86: found device 1

after bumblebee finished, lsof /dev/dri/card1 still shows that the original X instance (not the nvidia one) is holding this file open (funny we all missed to check the obvious dri device....)

I could not find a way to prevent Xorg to auto open devices (which seems like it would be a good config option to have).

Instead, I tried playing with permissions to prevent access of the xorg using the intel driver to the device, but it seems they are ignored (which is bad if this is really so, but that is another issue. I am not an udev expert anyway).

This seems to work fine though and solves the issue for me:
SUBSYSTEM=="drm", KERNEL=="card1", RUN+="/bin/rm /dev/dri/card1"

So I guess the dri node is not really needed by optimus anyway.

Note that for a generic solution probably better attribute matching or such is needed to be sure that card1 is really the nvidia card.

@amonakov

This comment has been minimized.

Show comment Hide comment
@amonakov

amonakov Nov 2, 2013

Contributor

I could not find a way to prevent Xorg to auto open devices (which seems like it would be a good config option to have).

"AutoAddGPU" "false"

Contributor

amonakov commented Nov 2, 2013

I could not find a way to prevent Xorg to auto open devices (which seems like it would be a good config option to have).

"AutoAddGPU" "false"

@amonakov

This comment has been minimized.

Show comment Hide comment
@amonakov

amonakov Nov 2, 2013

Contributor

Here I'm running Xorg 1.14.1 on Gentoo, and while the primary X server also tries to open the device, it doesn't seem to keep the drm node open (it doesn't prevent the module from unloading). However my situation might be different as I don't have the modesetting driver installed. So far the issue was specific to some of the rpm world distros — are they carrying a patch that causes the server to keep the device node open even though loading the (modesetting) driver failed?

Contributor

amonakov commented Nov 2, 2013

Here I'm running Xorg 1.14.1 on Gentoo, and while the primary X server also tries to open the device, it doesn't seem to keep the drm node open (it doesn't prevent the module from unloading). However my situation might be different as I don't have the modesetting driver installed. So far the issue was specific to some of the rpm world distros — are they carrying a patch that causes the server to keep the device node open even though loading the (modesetting) driver failed?

@moondrakegit

This comment has been minimized.

Show comment Hide comment
@moondrakegit

moondrakegit Nov 2, 2013

Good suggestion: I think it is indeed the modesetting driver (removing it prevents the issue). A quick glance at the code does suggest it forgets to check and close to device on failing.

If I get some more time this week will try to patch and test.

Good suggestion: I think it is indeed the modesetting driver (removing it prevents the issue). A quick glance at the code does suggest it forgets to check and close to device on failing.

If I get some more time this week will try to patch and test.

@iserlohn

This comment has been minimized.

Show comment Hide comment
@iserlohn

iserlohn Nov 3, 2013

A new udev rule in /etc/udev/rules.d/ as moondrakegit suggested fixes the problem for me.

iserlohn commented Nov 3, 2013

A new udev rule in /etc/udev/rules.d/ as moondrakegit suggested fixes the problem for me.

@aaronp24

This comment has been minimized.

Show comment Hide comment
@aaronp24

aaronp24 Nov 4, 2013

Thanks for diagnosing this, moondrakegit! I can't believe it didn't occur to me either that open instances of /dev/dri/card* could be the problem. It does look like the xf86-video-modesetting driver is failing like it's supposed to ("KMS doesn't support dumb interface"), but the code to close the file descriptor clearly isn't working. I'll try to reproduce the problem and see if I can get it fixed upstream.

aaronp24 commented Nov 4, 2013

Thanks for diagnosing this, moondrakegit! I can't believe it didn't occur to me either that open instances of /dev/dri/card* could be the problem. It does look like the xf86-video-modesetting driver is failing like it's supposed to ("KMS doesn't support dumb interface"), but the code to close the file descriptor clearly isn't working. I'll try to reproduce the problem and see if I can get it fixed upstream.

@moondrakegit

This comment has been minimized.

Show comment Hide comment
@moondrakegit

moondrakegit Nov 9, 2013

I think the problem is very likely fixed in newer versions of the modesetting driver as:
if (ms->fd > 0) {
int ret;

    if (ms->pEnt->location.type == BUS_PCI)
        ret = drmClose(ms->fd);
    else
        ret = close(ms->fd);

in FreeRec(ScrnInfoPtr pScrn) should do the trick. I patched my own 0.6.0 modesetting driver with this and it worked (I only tried this small patch and not test the latest modesetting driver from git). The code is in there since modesetting v 0.7.0.

People with earlier versions could just use the udev rule I posted above.

I think the problem is very likely fixed in newer versions of the modesetting driver as:
if (ms->fd > 0) {
int ret;

    if (ms->pEnt->location.type == BUS_PCI)
        ret = drmClose(ms->fd);
    else
        ret = close(ms->fd);

in FreeRec(ScrnInfoPtr pScrn) should do the trick. I patched my own 0.6.0 modesetting driver with this and it worked (I only tried this small patch and not test the latest modesetting driver from git). The code is in there since modesetting v 0.7.0.

People with earlier versions could just use the udev rule I posted above.

@moondrakegit

This comment has been minimized.

Show comment Hide comment
@gsgatlin

This comment has been minimized.

Show comment Hide comment
@gsgatlin

gsgatlin Nov 9, 2013

Just to be clear. In order to test this, I must patch xorg-x11-drv-modesetting-0.6.0-7.fc19.src.rpm in say, fedora 19?

If I can verify this works I will make a bugzilla and supply a patch in the bug report. If the maintainer doesn't want to fix it for some odd reason I would consider adding a udev rule as a workaround...

gsgatlin commented Nov 9, 2013

Just to be clear. In order to test this, I must patch xorg-x11-drv-modesetting-0.6.0-7.fc19.src.rpm in say, fedora 19?

If I can verify this works I will make a bugzilla and supply a patch in the bug report. If the maintainer doesn't want to fix it for some odd reason I would consider adding a udev rule as a workaround...

@hadrons123

This comment has been minimized.

Show comment Hide comment
@hadrons123

hadrons123 Nov 9, 2013

Yea it works. For anyone wanting a patched xorg-x11-drv-modesetting.
https://dl.dropboxusercontent.com/u/106654446/xorg-x11-drv-modesetting-0.8.0-6.fc19.x86_64.rpm

Yea it works. For anyone wanting a patched xorg-x11-drv-modesetting.
https://dl.dropboxusercontent.com/u/106654446/xorg-x11-drv-modesetting-0.8.0-6.fc19.x86_64.rpm

@gsgatlin

This comment has been minimized.

Show comment Hide comment
@gsgatlin

gsgatlin Nov 11, 2013

Ok. Created new bugzilla as bug 1028845.

Link to bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1028845

Ok. Created new bugzilla as bug 1028845.

Link to bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1028845

@gsgatlin

This comment has been minimized.

Show comment Hide comment
@gsgatlin

gsgatlin Nov 26, 2013

Looks like this was fixed in fedora 19 very recently by upgrading to version 0.8.0. It does not appear to be in testing for fedora 18 so if you still need to use that version of fedora you may want to look at the package I made here:

http://install.linux.ncsu.edu/pub/yum/itecs/public/bumblebee-nonfree/issue433/

Thanks so much for being able to figure this out @moondrakegit

Looks like this was fixed in fedora 19 very recently by upgrading to version 0.8.0. It does not appear to be in testing for fedora 18 so if you still need to use that version of fedora you may want to look at the package I made here:

http://install.linux.ncsu.edu/pub/yum/itecs/public/bumblebee-nonfree/issue433/

Thanks so much for being able to figure this out @moondrakegit

@gsgatlin gsgatlin closed this Nov 26, 2013

@QkiZMR

This comment has been minimized.

Show comment Hide comment
@QkiZMR

QkiZMR Mar 22, 2016

I have same issue.

  • Kernel 4.2.0-35-lowlatency,
  • Ubuntu 15.10,
  • Nvidia 840M and Intel,
  • Nvidia 361.28 driver,
  • Bumblebee 3.2.1-9,
  • bbswitch-dkms 0.7-2ubuntu1.

After system boot nvidia gfx card is still active:

$ cat /proc/acpi/bbswitch 
0000:01:00.0 ON

When I make some console magic I can turn off nvidia card:

# rmmod nvidia_modeset
# rmmod nvidia_uvm (sometimes not necessary)
# rmmod nvidia
# tee /proc/acpi/bbswitch <<< OFF

and nvidia is turned off. On my power button I have LED that inform me what gfx card is used. Blue is for Intel, orange is for Nvidia. In normal behavior blue is lit when system finish booting. Now it's always orange because nvidia is still active. After disabling it by hand from command line LED is blue.

QkiZMR commented Mar 22, 2016

I have same issue.

  • Kernel 4.2.0-35-lowlatency,
  • Ubuntu 15.10,
  • Nvidia 840M and Intel,
  • Nvidia 361.28 driver,
  • Bumblebee 3.2.1-9,
  • bbswitch-dkms 0.7-2ubuntu1.

After system boot nvidia gfx card is still active:

$ cat /proc/acpi/bbswitch 
0000:01:00.0 ON

When I make some console magic I can turn off nvidia card:

# rmmod nvidia_modeset
# rmmod nvidia_uvm (sometimes not necessary)
# rmmod nvidia
# tee /proc/acpi/bbswitch <<< OFF

and nvidia is turned off. On my power button I have LED that inform me what gfx card is used. Blue is for Intel, orange is for Nvidia. In normal behavior blue is lit when system finish booting. Now it's always orange because nvidia is still active. After disabling it by hand from command line LED is blue.

@ArchangeGabriel

This comment has been minimized.

Show comment Hide comment
@ArchangeGabriel

ArchangeGabriel May 6, 2016

Owner

@QkiZMR What you’re seeing is #719.

Owner

ArchangeGabriel commented May 6, 2016

@QkiZMR What you’re seeing is #719.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment