Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Black X11 Screen and partial lockup when upgraded to 515.76 on RTX3060 #380

Closed
chripell opened this issue Sep 27, 2022 · 26 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@chripell
Copy link

NVIDIA Open GPU Kernel Modules Version

515.76

Does this happen with the proprietary driver (of the same version) as well?

Yes

Operating System and Version

Description: Arch Linux

Kernel Release

Linux eren 5.19.10-arch1-1 #1 SMP PREEMPT_DYNAMIC Tue, 20 Sep 2022 15:17:59 +0000 x86_64 GNU/Linux

Hardware: GPU

GPU 0: NVIDIA GeForce RTX 3060 (UUID: GPU-e54c64bb-4893-74a7-ab52-e7131045d25a) GPU 1: NVIDIA GeForce RTX 3060 (UUID: GPU-5f6a0370-1efa-8e2c-1ca0-4a8f0e558957)

Describe the bug

Black X11 Screen and partial lockup when upgraded to 515.76 and dual RTX3060

After upgrading to 515.76 on my system (Amd CPU, Asus Moterboard, 2 X RTX3060, see the nvidia-bug-report.log.gz for detailed configuration) I get a blank screen when I run startx. I can login remotely, I can take a nvidia-bug-report (although it takes a lot to finish) but reboot hangs (with the last message “kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000c67d:0:0:1119”) so I suspect a problem at kernel level.

Things I tried:

  • Downgrading to 515.65.01 it DOES solve the problem.
  • Disable Amd pstate driver, it does NOT solve the problem.
  • Disable iommu/PCI denylisting for a normal 2xGPU configuration, it does NOT solve the problem.
  • Downgrade to linux LTS 5.15.70, it does NOT solve the problem.

Let me know if you need more information,

Thanks!

To Reproduce

Just run startx (or directly xinit)

Bug Incidence

Always

nvidia-bug-report.log.gz

nvidia-bug-report.log.gz

More Info

@chripell chripell added the bug Something isn't working label Sep 27, 2022
@AstroBarker
Copy link

Exact same issue here, started with the update. Can't even set up a fresh system. Unable to use the system completely after boot.

@broconut123
Copy link

Exact same issue here, started with the update. Can't even set up a fresh system. Unable to use the system completely after boot.

exactly this issue. Just a black screen with nothing. Can't even change to TTY.
It's weird because one time it did boot into Gnome, but next reboot it was the same black screen.

@quangIO
Copy link

quangIO commented Sep 28, 2022

Changing to tty doesn't work if gdm / lightdm / sddm is enabled. KMSCON doesn't work.

Starting Hyprland before starting sddm makes the problem goes away. Perhaps this has something to do with drm modeset.

@bananana
Copy link

Same issue. Black screen, TTY not working. Had to chroot from a live USB and revert back to 515.65.01

OS: Solus
Kernel: 5.15.68-218
GPU: NVIDIA GeForce RTX 3060 Ti

@Pickzelle
Copy link

Pickzelle commented Sep 30, 2022

I've been experiencing the black screen at boot with the 515.76 driver as well but I can get around it by having my monitor off during the boot process/having the HDMI unplugged and then starting it after the session is up. No clue if it works for everyone else but would be nice to know if it works or not.

@tcoopman
Copy link

@PixeL-se that doesn't seem to work for me.

@amrit1711
Copy link
Collaborator

Please set kernel parameter nvidia-drm.modeset=1 and see if it fixes the issue.

@amrit1711 amrit1711 self-assigned this Sep 30, 2022
@quangIO
Copy link

quangIO commented Sep 30, 2022

Please set kernel parameter nvidia-drm.modeset=1 and see if it fixes the issue.

Already did. Neither setting it to 1 nor 0 works. I also run mkinitcpio with nvidia stuff configured.

EDIT: Suspend also make the issue appear. I already ran

sudo systemctl enable nvidia-suspend.service nvidia-hibernate.service nvidia-resume.service

@broconut123
Copy link

broconut123 commented Sep 30, 2022

I've been experiencing the black screen at boot with the 515.76 driver as well but I can get around it by having my monitor off during the boot process/having the HDMI unplugged and then starting it after the session is up. No clue if it works for everyone else but would be nice to know if it works or not.

It actually worked to plug in the hdmi after system is up and running.

@stkain
Copy link

stkain commented Sep 30, 2022

I've been experiencing the black screen at boot with the 515.76 driver as well but I can get around it by having my monitor off during the boot process/having the HDMI unplugged and then starting it after the session is up. No clue if it works for everyone else but would be nice to know if it works or not.

It almost worked for me. in addition, I had to restart sddm.service (logged in remotely)
Which display manager are you using?

@Pickzelle
Copy link

Which display manager are you using?

I'm using SDDM but I didn't have to restart sddm.service or anything. What I usually do is that I just unplug the HDMI cable, start my system, wait for the POST to give me the white light on my motherboard (for VGA), then I plug it back in and wait for it to go through my bootloader (systemd-boot), and then I start the monitor when the display manager loads. It has worked for me each and every time I've done it (about 7 times now) but I guess it doesn't work for everyone?

@chripell
Copy link
Author

chripell commented Oct 1, 2022

Thanks for the suggestion! This actually work for me on 515.76:

  1. I have a system with a RTX3060 connected to a HDMI monitor through a KVM switch (work monitor) and a RTX3060 connected directly to a DP monitor (calibrated for graphics work).
  2. I switch the KVM to other system, not the one with the RTX3060.
  3. I boot my system. Now the POST/linux console is on the DP monitor, usually it is on the HDMI. I login and run startx
  4. I switch the KVM back to the RTX3060 system and I have my usual dual display / GPU correctly working.

So it looks like there is something in the console initialization code specific to HDMI.

@jkrhu
Copy link

jkrhu commented Oct 2, 2022

I can confirm that keeping your HDMI connected monitor turned off during boot mitigates this issue. Still not sure if the culprit is only the 515.76 driver or the recent 5.19.12 kernel as well, which has it's own share of issues with black screens or flickering on systems. Might be just a strange coincidence that both of these stable releases are breaking displays.

@chaseleif
Copy link

chaseleif commented Oct 2, 2022

Arch Linux
Kernel: 5.19.12-arch1-1
GPU Driver: NVIDIA 515.76

The aforementioned "workaround" does not work for me.
3090 card, pc has bios password enabled.
I do not use a window manager.

I can use the computer like normal (minus the GUI) if I don't start X.

I have tried leaving the TV off (hdmi->hdmi) after entering bios password as well as for complete boot (tv off before boot, entering bios password, waiting, logging in, starting x, waiting, tv on).

Keyboard unresponsive after $ exec startx /usr/bin/startxfce4 (numlock doesn't change, can't change tty).

I can only reboot from ssh to recover this bad state, and after I do $ sudo shutdown -r now it takes ~10 minutes 36 seconds (by a stopwatch) between reboot command and my tv to say there is no signal. Then within a couple of seconds the bios password prompt comes up.

@tim77
Copy link

tim77 commented Oct 3, 2022

Same issue with new NVIDIA 515.76 driver on Fedora 36.

  • GPU: NVIDIA GeForce RTX 3060/PCIe/SSE2
  • CPU: AMD Ryzen 3 3300X
  • Connected: HDMI

RPM Fusion bugtracker.

@Arcitec
Copy link

Arcitec commented Oct 3, 2022

Same here on a fully upgraded Fedora Workstation 36.

  • GPU: 3090
  • Port: HDMI (cannot try DP)
  • Driver that broke: 515.76

Here is a Fedora discussion about it:

https://www.reddit.com/r/Fedora/comments/xu5lco/linux_nvidia_driver_51576_causes_black_screen/

I am gonna have to learn how to compile my own NVIDIA driver packages for emergencies like this, because I cannot figure out how to downgrade the official Fedora packages.

@amrit1711
Copy link
Collaborator

We have filed a bug 3817621 internally for tracking purpose.
Shall try to reproduce issue locally and update further on it.
Also this issue is not restricted to open gpu kernel modules, I will close this thread and further update will follow on below threads.

https://forums.developer.nvidia.com/t/515-76-nvidia-drivers/229132
https://forums.developer.nvidia.com/t/bug-report-black-x11-screen-and-partial-lockup-when-upgraded-to-515-76-and-dual-rtx3060/228912/21

@Arcitec
Copy link

Arcitec commented Oct 3, 2022

@amrit1711 Thank you. So far I have been reading every discussion thread in the list posted here:

https://www.reddit.com/r/Fedora/comments/xu5lco/comment/iqu4bqi/

Across distros the common denominators appear to be that it (only?) happens to the RTX 30xx series, and mostly (only?) affects people with HDMI ports as their output.

Several people reported success by using DisplayPort instead.

One person said that a DisplayPort to HDMI cable adapter didn't work.

Several people reported that is the machine is started without any HDMI connected, and you then SSH into the machine to run some command (startx was mentioned), you can then plug in the HDMI afterwards to get an image. Tedious method of course, but points towards a HDMI initialization issue.

At least one person mentioned logs saying that EDID failed to be read via HDMI För the connected displays.

Some people with 30x0-series GPUs have working HDMI ports. So perhaps it is an issue with specific displays with specific data in their EDID that trips things up. Which would explain why the bug wasn't discovered by you during testing.

I have tried different kernel versions. All of these are confirmed NOT working with 515.76 driver: 5.19.8, 5.19.11, 5.19.12.

The only thing that worked for me was downgrading to NVIDIA driver 515.65.01, and I am on kernel 5.19.12 with no issues.

@aaronp24
Copy link
Member

aaronp24 commented Oct 5, 2022

I tracked down a problem with modeset sequencing for RTX 30 series GPUs when the boot display is on HDMI. If you're affected, could you please give this commit a try?

@solbjorn
Copy link

solbjorn commented Oct 8, 2022

@aaronp24, fixes the problem for me on RTX 3090 and Linux 6.0.

@broconut123
Copy link

broconut123 commented Oct 9, 2022

@aaronp24 So I tried that commit, but after putting "options nvidia NVreg_OpenRmEnableUnsupportedGpus=1" in /etc/modprobe.d/nvidia.conf it boots to the same black screen.

I followed the install guide:
sh ./NVIDIA-Linux-[...].run --no-kernel-modules
git clone https://github.com/aaronp24/open-gpu-kernel-modules.git
make modules -j$(nproc)
sudo make modules_install -j$(nproc)

This is on 5.19.4-arch1-1, 3080ti with KDE and SDDM. Maybe I did something wrong, not sure.

@jonathonf
Copy link

git clone https://github.com/aaronp24/open-gpu-kernel-modules.git

You built from the main branch, rather than using the mentioned commit.

@broconut123
Copy link

git clone https://github.com/aaronp24/open-gpu-kernel-modules.git

You built from the main branch, rather than using the mentioned commit.

I figured out how to do it and it works.
Thanks a lot!

@evdcush
Copy link

evdcush commented Nov 20, 2022

I can only access grub because of this issue.

How can I bypass this Nvidia bug through grub so I can at least downgrade my Nvidia drivers?

@lexivanx
Copy link

This solved my problem. Using either a second monitor or tty:
cd /etc/X11
sudo rm xorg.conf
sudo cp xorg.conf.nvidia-xconfig-original xorg.conf
sudo reboot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests