Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

egl needs an early out to prevent waking the dGPU unnecessarily #89

Open
flukejones opened this issue Sep 29, 2023 · 37 comments
Open

egl needs an early out to prevent waking the dGPU unnecessarily #89

flukejones opened this issue Sep 29, 2023 · 37 comments

Comments

@flukejones
Copy link

On the last two/three years of hybrid laptops, notably Nvidia RTX20xx++ onwards these machines tend to have a better/deeper suspend function which puts the dgpu in to a very low power state when unused.

Combined with glvnd, this introduces a lag or 1-2 seconds while the dgpu wakes in response to queries. Even if it remains unused and the iGPU is used instead. For example opening Nautilus file manager is delayed 1-2s while the dGPU wakes. For a lot of apps that use glvnd this ends up being a bad UX.

A lot of folks are working around this with __EGL_VENDOR_LIBRARY_FILENAMES=/usr/share/glvnd/egl_vendor.d/50_mesa.json.

I reported this here some time ago

@CosmicFusion
Copy link

yeah, it hurts battery life too

having the gpu wakeup and blast it's fans every time an app is open

@erik-kz
Copy link
Collaborator

erik-kz commented Oct 18, 2023

This should be fixed by ba6c38a

@flukejones
Copy link
Author

This should be fixed by ba6c38a

Seems a bit hit and miss, but this is likely to be due to how some apps (like Firefox, Vscode, Geary, Evolution) maybe handle GPU stuff. These apps will still wake the GPU, but other apps like Nautilus no-longer do this.

@FiestaLake
Copy link

FiestaLake commented Oct 20, 2023

  • Nautilus 45 still opens the GPU with the latest egl-wayland

@Gert-dev
Copy link

Seems a bit hit and miss, but this is likely to be due to how some apps (like Firefox, Vscode, Geary, Evolution) maybe handle GPU stuff. These apps will still wake the GPU, but other apps like Nautilus no-longer do this.

Nautilus 45 still opens the GPU with the latest egl-wayland

I see the same behaviour as the first comment with applications such as VSCode (even when using the Wayland backend), but not the last: GTK4 apps that were previously problematic such as Nautilus now no longer start the GPU or have the noticeable delay spinning up - also confirmed by monitoring the dGPU state using watch cat /sys/class/drm/card*/device/power_state.

Might be worth mentioning for completeness that if the app in question is running in Flatpak, it's not yet fixed likely because the newest release of this library hasn't landed in the base runtimes yet.

@FiestaLake
Copy link

  • Nautilus 45 still opens the GPU with the latest egl-wayland

https://youtu.be/gKYoFEvtUJ4

@kbrenneman
Copy link
Collaborator

Yeah, anything with Flatpak would need an update to its runtime environment to pick up an updated egl-wayland library.

It might be possible to work around that by using flatpak override --filesystem to map the host's copy of libnvidia-egl-wayland.so.1 through to the container, though at that point it's probably easier to just use the __EGL_VENDOR_LIBRARY_FILENAMES workaround instead.

For other applications, if the app itself (or some other library) tries to call eglQueryDevicesEXT on its own, then it would run into the same problem. Firefox might do that, but I couldn't say for sure -- I think the last time I looked at Firefox's GL code was before Wayland even existed. It would surprise me if something like Geary or Evolution did that, though.

@kbrenneman
Copy link
Collaborator

Now that I think about it, if an application calls eglGetDisplay(NULL), or eglGetPlatformDisplay with EGL_PLATFORM_DEVICE_EXT or EGL_PLATFORM_SURFACELESS_MESA then that would also cause the NVIDIA GPU to wake up.

All of those would produce a headless EGLDisplay, without a windowing system associated with it. And without a windowing system, the driver has no way to know which device is driving the desktop.

@Gert-dev
Copy link

Gert-dev commented Oct 20, 2023

https://youtu.be/gKYoFEvtUJ4

That's indeed weird - for me it doesn't bring the dGPU out of the D3Cold state. Since I'm assuming Nautilus isn't the experimental Flatpak version, could it be that you have some kind of specific configuration in place that makes the NVIDIA GPU your primary (card0) one? I notice that for me NVIDIA dGPU is card1 and the Intel iGPU card0. Not sure if this has impact anywhere.

For other applications, if the app itself (or some other library) tries to call eglQueryDevicesEXT on its own, then it would run into the same problem. ...

That indeed makes sense, I assume in these cases we'd need to create the relevant issue reports for those projects separately since this is out of egl-wayland's hands?

Firefox and Electron make some sense because IIRC they also handle some iGPU/dGPU 'placement' for things such as WebGL, so it wouldn't surprise me if the underlying code is also querying the available GPUs for that.

I'm also wondering, though, if these specific remaining issues are then also a problem for hybrid GPU setups with an AMD or even Intel dGPU? I have none to test currently, but it might be interesting to mention in upstream reports and make it more testable for developers.

@kbrenneman
Copy link
Collaborator

That indeed makes sense, I assume in these cases we'd need to create the relevant issue reports for those projects separately since this is out of egl-wayland's hands?

Most likely, yes. If an app actually does just need to do offscreen rendering, though, then there isn't really a good way to do that without running into this. Either it calls something like eglGetDisplay(NULL) and lets implementation pick a device (which would result the NVIDIA driver wake up a GPU), or it would use EGL_EXT_platform_device or EGL_EXT_explicit_device, which would require calling eglQueryDevicesEXT anyway.

I'm also wondering, though, if these specific remaining issues are then also a problem for hybrid GPU setups with an AMD or even Intel dGPU? I have none to test currently, but it might be interesting to mention in upstream reports and make it more testable for developers.

Hard to say. If the driver for the dGPU is Mesa, then it would depend on how Mesa handles device enumeration and selection internally.

@kbrenneman
Copy link
Collaborator

I wonder if the GPU offloading configuration proposal for libglvnd could help with this?

Most of the design for that would be about right, but I'll have to think about if I could tweak that interface to avoid unnecessary internal eglQueryDeviceEXT calls.

@FiestaLake
Copy link

FiestaLake commented Oct 21, 2023

https://youtu.be/gKYoFEvtUJ4

That's indeed weird - for me it doesn't bring the dGPU out of the D3Cold state. Since I'm assuming Nautilus isn't the experimental Flatpak version, could it be that you have some kind of specific configuration in place that makes the NVIDIA GPU your primary (card0) one? I notice that for me NVIDIA dGPU is card1 and the Intel iGPU card0. Not sure if this has impact anywhere.

Yes, it's the native nautilus package from Arch. In my case, most of times NVIDIA dGPU is card0 and the AMD iGPU is card1, though sometimes reversion happens. Haven't done any changes.

@kbrenneman
Copy link
Collaborator

It just occurred to me that the NVIDIA GBM library has the same problem of calling eglQueryDevices right away to try to find a matching device, so anything that tries to use EGL_KHR_platform_gbm would run into this as well. I'd be surprised if any application actually used both EGL_KHR_platform_gbm and EGL_KHR_platform_wayland, though.

But, disabling one or both of the wayland and GBM platform libraries would be a way to determine if the application is doing something directly to access an NVIDIA device, or if that's still coming from one of the platform libraries.

The __EGL_EXTERNAL_PLATFORM_CONFIG_DIRS and __EGL_EXTERNAL_PLATFORM_CONFIG_FILENAMES environment variables can control which platform libraries get loaded, like so:

# Disable all platform libraries
__EGL_EXTERNAL_PLATFORM_CONFIG_DIRS=/some/nonexistant/path /path/to/program
# Only load the GBM platform library
__EGL_EXTERNAL_PLATFORM_CONFIG_FILENAMES=/usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json /path/to/program

@erik-kz
Copy link
Collaborator

erik-kz commented Oct 21, 2023

I'd be surprised if any application actually used both EGL_KHR_platform_gbm and EGL_KHR_platform_wayland, though.

I believe recent versions of WebKit will do this. The web process uses GBM while the GUI process uses Wayland or X11.

@marcinx64
Copy link

marcinx64 commented Oct 24, 2023

Hi,
I've noticed that some apps are broken when applying ICD json file order workaround, either they are not opening:
qflipper

Or partially broken with some UI elements not being displayed:
egl_wa

Temporarily removing WA makes everything work again (except for waking up NVIDIA GPU):
egl_no_wa

Is it something related to those apps/flatpak runtime? Or is it also a bug in EGL?

@kbrenneman
Copy link
Collaborator

Is it something related to those apps/flatpak runtime? Or is it also a bug in EGL?

That depends -- what's the contents of that egl_vendor.d directory?

@marcinx64
Copy link

marcinx64 commented Oct 24, 2023

Right now it looks like this (those are copies from default directory on host):

ls ~/.local/usr/share/glvnd/egl_vendor.d/ 50_mesa.json 60_nvidia.json

Basically there is no difference if I use "__EGL_VENDOR_LIBRARY_FILENAMES" and specify mesa ICD json file first, or use "__EGL_VENDOR_LIBRARY_DIRS" and point to another dir with changed filename for nvidia (10_nvidia.json -> 60_nvidia.json), the issue is the same.

@kbrenneman
Copy link
Collaborator

I'd need to know more about what the application is trying to do to be sure, but my best guess is that it's using an offscreen EGLDisplay, but there's something in Mesa that it can't cope with. Calling something like eglGetDisplay(NULL) will generally hand back an EGLDisplay from whatever vendor library is first.

If you use __EGL_VENDOR_LIBRARY_FILENAMES to limit it to only load Mesa, do you get the same problem?

@marcinx64
Copy link

marcinx64 commented Oct 27, 2023

If you use __EGL_VENDOR_LIBRARY_FILENAMES to limit it to only load Mesa, do you get the same problem?

Tried, unfortunately it is the same behaviour as using __EGL_VENDOR_LIBRARY_DIRS or __EGL_VENDOR_LIBRARY_FILENAMES "reversed".

I'd need to know more about what the application is trying to do to be sure

I can help with this if I would know what You want to check, any specific command output?
My system is:
Fedora Silverblue 39
Kernel 6.5.6
Nvidia driver 535.113.01
egl-wayland 1.1.12

@kbrenneman
Copy link
Collaborator

Tried, unfortunately it is the same behaviour as using __EGL_VENDOR_LIBRARY_DIRS or __EGL_VENDOR_LIBRARY_FILENAMES "reversed".

That's enough to confirm my guess: With Mesa as the first (or only) vendor library, the application ends up using Mesa, and something in Mesa is either failing, missing, or behaving in a way that the application can't cope with. It's probably either a simple app bug or some feature that the app needs which Mesa doesn't have.

Either way, though, that means the problem is outside egl-wayland or the nvidia driver.

@jrelvas-ipc
Copy link

Using the search functionality in gnome shell wakes the gpu up. I kid you not.

lmao.webm

The sudden spikes in power consumption I kept experiencing might be explained by this...

@kbrenneman
Copy link
Collaborator

Using the search functionality in gnome shell wakes the gpu up. I kid you not.

That with the current version of egl-wayland?

It wouldn't surprise me if the search function spawned a new wayland client process, and if that's all it is, then commit ba6c38a should fix it.

@jrelvas-ipc
Copy link

jrelvas-ipc commented Oct 31, 2023

egl-wayland package is version 1.1.12-3.fc39. Is this the latest version?

@kbrenneman
Copy link
Collaborator

No, 1.1.13 is the one that has the fix for this:
https://github.com/NVIDIA/egl-wayland/releases/tag/1.1.13

@Gert-dev
Copy link

Gert-dev commented Oct 31, 2023

I can attest to 1.1.13 not fixing GNOME shell (45) search waking up the dGPU for me, but, since GNOME uses search providers (GNOME characters, nautilus, ...), it seems likely that one or more of those providers are contributing to the problem by hitting one of the aforementioned paths (by accident or by underlying code being called indirectly).

@jrelvas-ipc
Copy link

Using the search functionality of gnome shell no longer wakes up the GPU for me on egl-wayland-1.1.13-1.fc39

Fix appears to work as advertised. @kbrenneman

@jrelvas-ipc
Copy link

jrelvas-ipc commented Dec 12, 2023

I've reported the wake up issue on Flatpak programs to upstream: https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/issues/1683

@retrixe
Copy link

retrixe commented Dec 21, 2023

I'm also wondering, though, if these specific remaining issues are then also a problem for hybrid GPU setups with an AMD or even Intel dGPU? I have none to test currently, but it might be interesting to mention in upstream reports and make it more testable for developers.

Hard to say. If the driver for the dGPU is Mesa, then it would depend on how Mesa handles device enumeration and selection internally.

For me, nouveau behaves the same as the NVIDIA proprietary driver for me here (experiencing wakeups with Chromium/-based apps, neofetch, GNOME Settings -> About panel), so it's worth noting it's an issue on that side of the fence as well

@jrelvas-ipc
Copy link

jrelvas-ipc commented Jan 4, 2024

https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/issues/1683#note_1713305231

Freedesktop upstream says that they don't ship egl-wayland separately; the binary provided by nvidia driver package is used, which is currently still at 1.1.12.

This is why flatpak programs continue to be affected by this bug.

@jrelvas-ipc
Copy link

jrelvas-ipc commented Jan 12, 2024

@erik-kz Is egl-wayland 1.1.13 going to be included with the next nvidia driver major release? If not, is there any timeline to do so? Asking to see if it's worth the trouble for freedesktop's runtime to package it separately.

@erik-kz
Copy link
Collaborator

erik-kz commented Jan 12, 2024

Is egl-wayland 1.1.13 going to be included with the next nvidia driver major release?

Yes it will

@jrelvas-ipc
Copy link

@erik-kz Did some testing with flathub/org.freedesktop.Platform.GL.nvidia#229 and confirmed that the oudated egl-wayland release was the issue - the updated lib in the 550.40.07 beta driver fixes the wake up issue in Flatpak programs!

Gravacao.de.ecra.a.partir.de.2024-01-24.23-13-08.webm

@Hobbyist11
Copy link

Still encountering this on certain electron software like Foliate (epub reader) opening Foliate itself doesn't turn the dGPU on but opening an Ebook does.
egl-wayland 1.1.13
Nvidia-dkms 550.67-1
kernel 6.8.2

@retrixe
Copy link

retrixe commented Apr 3, 2024

Foliate isn't an electron app, it's a GTK app which uses a WebView for rendering e-books in particular

My assumption is opening an e-book initialises WebKit2GTK, which probes GPUs to use and ends up initialising the NVIDIA GPU

@jrelvas-ipc
Copy link

Foliate isn't an electron app, it's a GTK app which uses a WebView for rendering e-books in particular

My assumption is opening an e-book initialises WebKit2GTK, which probes GPUs to use and ends up initialising the NVIDIA GPU

As a side-note, if the program is using Vulkan, even if it's just to get a list of available gpus, that'd wake up the nvidia dgpu, due to a similar issue with Nvidia's Vulkan implementation. I reported it here: https://forums.developer.nvidia.com/t/550-67-nvidia-vulkan-icd-wakes-up-dgpu-on-initialization-and-exit/288095

@jrelvas-ipc
Copy link

CC @erik-kz, since that particular bug is similar to this egl one, but it's with vulkan instead.

@Hobbyist11
Copy link

Foliate isn't an electron app, it's a GTK app which uses a WebView for rendering e-books in particular

My assumption is opening an e-book initialises WebKit2GTK, which probes GPUs to use and ends up initialising the NVIDIA GPU

Oh sorry my bad!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants