Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mesa: 21.0.3 -> 21.1.1 #119558

Merged
merged 1 commit into from May 24, 2021
Merged

mesa: 21.0.3 -> 21.1.1 #119558

merged 1 commit into from May 24, 2021

Conversation

primeos
Copy link
Member

@primeos primeos commented Apr 15, 2021

Motivation for this change
Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

@primeos
Copy link
Member Author

primeos commented Apr 23, 2021

Tried to launch Sway with this (to see if there are similar issues as for #120325) and it failed:

00:00:00.000 [INFO] [sway/main.c:346] Sway version 1.6
00:00:00.004 [INFO] [sway/main.c:154] Linux quorra 5.10.30 #1-NixOS SMP Wed Apr 14 06:42:14 UTC 2021 x86_64 GNU/Linux
00:00:00.005 [INFO] [sway/main.c:170] Contents of /etc/os-release:
00:00:00.005 [INFO] [sway/main.c:154] NAME=NixOS
00:00:00.005 [INFO] [sway/main.c:154] ID=nixos
00:00:00.005 [INFO] [sway/main.c:154] VERSION="21.05.git.d235056d6d6 (Okapi)"
00:00:00.005 [INFO] [sway/main.c:154] VERSION_CODENAME=okapi
00:00:00.005 [INFO] [sway/main.c:154] VERSION_ID="21.05.git.d235056d6d6"
00:00:00.005 [INFO] [sway/main.c:154] PRETTY_NAME="NixOS 21.05 (Okapi)"
00:00:00.005 [INFO] [sway/main.c:154] LOGO="nix-snowflake"
00:00:00.005 [INFO] [sway/main.c:154] HOME_URL="https://nixos.org/"
00:00:00.005 [INFO] [sway/main.c:154] DOCUMENTATION_URL="https://nixos.org/learn.html"
00:00:00.005 [INFO] [sway/main.c:154] SUPPORT_URL="https://nixos.org/community.html"
00:00:00.005 [INFO] [sway/main.c:154] BUG_REPORT_URL="https://github.com/NixOS/nixpkgs/issues"
00:00:00.005 [INFO] [sway/main.c:142] LD_LIBRARY_PATH=
00:00:00.005 [INFO] [sway/main.c:142] LD_PRELOAD=
00:00:00.005 [INFO] [sway/main.c:142] PATH=/run/wrappers/bin:/home/michael/.nix-profile/bin:/etc/profiles/per-user/michael/bin:/nix/var/nix/profiles/default/bin:/run/current-system/sw/bin
00:00:00.005 [INFO] [sway/main.c:142] SWAYSOCK=
00:00:00.005 [DEBUG] [sway/server.c:49] Preparing Wayland server initialization
00:00:00.005 [INFO] [wlr] [backend/session/logind.c:572] Selecting session from XDG_SESSION_ID: 19
00:00:00.014 [INFO] [wlr] [backend/session/logind.c:706] Successfully loaded logind session
00:00:00.016 [INFO] [wlr] [backend/backend.c:167] Found 1 GPUs
00:00:00.016 [INFO] [wlr] [backend/drm/backend.c:155] Initializing DRM backend for /dev/dri/card0 (radeon)
00:00:00.016 [DEBUG] [wlr] [backend/drm/drm.c:65] Atomic modesetting unsupported, using legacy DRM interface
00:00:00.016 [DEBUG] [wlr] [backend/drm/drm.c:82] ADDFB2 modifiers unsupported
00:00:00.016 [INFO] [wlr] [backend/drm/drm.c:244] Found 6 DRM CRTCs
00:00:00.016 [INFO] [wlr] [backend/drm/drm.c:171] Found 6 DRM planes
libGL: Can't open configuration file /etc/drirc: No such file or directory.
libGL: Can't open configuration file /home/michael/.drirc: No such file or directory.
libGL: Can't open configuration file /etc/drirc: No such file or directory.
libGL: Can't open configuration file /home/michael/.drirc: No such file or directory.
libGL: Can't open configuration file /etc/drirc: No such file or directory.
libGL: Can't open configuration file /home/michael/.drirc: No such file or directory.
libGL: Can't open configuration file /etc/drirc: No such file or directory.
libGL: Can't open configuration file /home/michael/.drirc: No such file or directory.
libGL: Can't open configuration file /etc/drirc: No such file or directory.
libGL: Can't open configuration file /home/michael/.drirc: No such file or directory.
libGL: Can't open configuration file /etc/drirc: No such file or directory.
libGL: Can't open configuration file /home/michael/.drirc: No such file or directory.
libGL: Can't open configuration file /etc/drirc: No such file or directory.
libGL: Can't open configuration file /home/michael/.drirc: No such file or directory.
libGL: Can't open configuration file /etc/drirc: No such file or directory.
libGL: Can't open configuration file /home/michael/.drirc: No such file or directory.
libEGL warning: MESA-LOADER: failed to open @@j: /run/opengl-driver/lib/dri/@@j_dri.so: cannot open shared object file: No such file or directory (search paths /run/opengl-driver/lib/dri)

00:00:00.077 [ERROR] [wlr] [EGL] command: eglInitialize, error: EGL_NOT_INITIALIZED (0x3001), message: "DRI2: failed to load driver"
libEGL warning: MESA-LOADER: failed to open @@j: /run/opengl-driver/lib/dri/@@j_dri.so: cannot open shared object file: No such file or directory (search paths /run/opengl-driver/lib/dri)

00:00:00.078 [ERROR] [wlr] [EGL] command: eglInitialize, error: EGL_NOT_INITIALIZED (0x3001), message: "DRI2: failed to load driver"
00:00:00.078 [ERROR] [wlr] [EGL] command: eglInitialize, error: EGL_NOT_INITIALIZED (0x3001), message: "eglInitialize"
00:00:00.078 [ERROR] [wlr] [render/egl.c:217] Failed to initialize EGL
00:00:00.078 [ERROR] [wlr] [EGL] command: eglMakeCurrent, error: EGL_BAD_DISPLAY (0x3008), message: "Invalid display (nil)"
00:00:00.078 [ERROR] [wlr] [render/wlr_renderer.c:263] Could not initialize EGL
00:00:00.078 [ERROR] [wlr] [backend/drm/renderer.c:34] Failed to create EGL/WLR renderer
00:00:00.078 [ERROR] [wlr] [backend/drm/backend.c:201] Failed to initialize renderer
00:00:00.079 [ERROR] [wlr] [backend/backend.c:174] Failed to create DRM backend
00:00:00.079 [ERROR] [wlr] [backend/backend.c:312] Failed to open any DRM device
00:00:00.082 [ERROR] [sway/server.c:55] Unable to create backend

Something's really broken :o

@vcunat
Copy link
Member

vcunat commented Apr 23, 2021

@@j_dri.so: that looks like some bad substitution during patching.

@primeos
Copy link
Member Author

primeos commented Apr 23, 2021

Yeah, I wish... :) Unfortunately the @@ is likely only some coincidence. I forgot to include a hexdump but when I viewed that file with Vim inside a virtual terminal I got some additional "garbage" (very strange/exotic characters). So I've got the feeling that something's terribly going wrong (memory corruption?) and some "random garbage" will end up being used (also forgot / didn't have enough time to check if it's a constant value or if it changes every time - but something's definitely very broken).

@primeos
Copy link
Member Author

primeos commented Apr 29, 2021

I did briefly re-test it with RC3 and the driver name does indeed change every time, e.g.:

  • C
  • (empty)
  • 9`L
  • /h
  • k2
  • ?
  • o9޺
  • ?we
  • {

So yeah, unfortunately something's seriously broken (on a VT/tty the output is even more scary). Likely a memory corruption bug that ends up as random garbage as the driver name.

@primeos
Copy link
Member Author

primeos commented May 6, 2021

@GrahamcOfBorg test sway

@vcunat
Copy link
Member

vcunat commented May 6, 2021

21.1.1 should be available in two weeks, but so far the issues don't make it sound like this will make it to NixOS 21.05.

@primeos
Copy link
Member Author

primeos commented May 6, 2021

From my perspective it doesn't seem like a good idea for NixOS 21.05.

Interestingly the VM test for Sway (https://github.com/NixOS/nixpkgs/pull/119558/checks?check_run_id=2519144774) did succeed and reports:

GL_VERSION: 3.1 Mesa 21.1.0
GL_RENDERER: llvmpipe (LLVM 11.1.0, 256 bits)
GL_VENDOR: Mesa/X.org

So I gave it another go on my system and glinfo as well as glxgears worked as expected in my running Sway session (using Mesa 21.1.0) but after quitting Sway I couldn't launch it anymore (with the same errors as above). I could really use some help on this as I'm out of ideas (don't even know a good way to debug/analyze this without a lot of effort).

If someone in this thread could test this update as well that would be helpful (doesn't have to be Sway - it'd be useful to see if other Wayland/X11 software is affected by this as well).

I'm using something like hardware.opengl.package = (import /srv/nixpkgs/mesa { }).pkgs.mesa.drivers; to test updates where /srv/nixpkgs/mesa points to a Git worktree with the 4 commits of this PR applied on top of nixos-unstable. So far that has worked fine for testing.

Edit:
I've just noticed that wlroots error now mentions DRI3 instead of DRI2 (likely simply due to the recent update to wlroots 0.13):

libEGL warning: MESA-LOADER: failed to open V: /run/opengl-driver/lib/dri/V_dri.so: cannot open shared object file: No such file or directory (search paths /run/opengl-driver/lib/dri)

00:00:00.070 [wlr] [EGL] command: eglInitialize, error: EGL_NOT_INITIALIZED (0x3001), message: "DRI3: failed to load driver"
libEGL warning: MESA-LOADER: failed to open V: /run/opengl-driver/lib/dri/V_dri.so: cannot open shared object file: No such file or directory (search paths /run/opengl-driver/lib/dri)

00:00:00.070 [wlr] [EGL] command: eglInitialize, error: EGL_NOT_INITIALIZED (0x3001), message: "DRI3: failed to load driver"
00:00:00.070 [wlr] [EGL] command: eglInitialize, error: EGL_NOT_INITIALIZED (0x3001), message: "eglInitialize"
00:00:00.070 [wlr] [render/egl.c:217] Failed to initialize EGL
00:00:00.070 [wlr] [EGL] command: eglMakeCurrent, error: EGL_BAD_DISPLAY (0x3008), message: "Invalid display (nil)"
00:00:00.070 [wlr] [render/wlr_renderer.c:263] Could not initialize EGL
00:00:00.070 [wlr] [backend/headless/backend.c:127] Failed to create renderer
00:00:00.071 [wlr] [backend/backend.c:245] failed to start backend 'headless'
00:00:00.071 [sway/server.c:55] Unable to create backend

I haven't really payed attention to the DRI part before but now I'm actually a bit confused why DRI is even used here. I thought the Direct Rendering Infrastructure (DRI) would only be relevant for X11 and that Wayland would use EGL directly:

But I guess EGL and DRI are just not that clearly separated for function names, messages, etc. (possibly due to function overlap but I'm not a graphics expert by any means so this is beyond me anyway).

@primeos
Copy link
Member Author

primeos commented May 6, 2021

Huh, "interesting", it seems like hardware.opengl.package = (import /srv/nixpkgs/mesa { }).pkgs.mesa.drivers; isn't usable for testing anymore... I finally decided to rebuild Sway based on /srv/nixpkgs/mesa and launching that binary works as expected.


I guess the problems come from egl-wayland (I wasn't aware of that package so far).
Edit: Or not, that seems to be a Nvidia thing for their EGLStreams... ("This is a work-in-progress implementation of a EGL External Platform library to add client-side Wayland support to EGL on top of EGLDevice and EGLStream families of extensions." - so that shouldn't be relevant at all on my AMD based test setup, should be pulled in by XWayland then)

$ nix-build -A sway
these derivations will be built:
  /nix/store/0h8zj58ygv9hfk9zy4dfnrnwri5z92wr-fix-paths.patch.drv
  /nix/store/mfsp9vzsyaj1x17bgx5bnfp494z8im58-egl-wayland-1.1.6.drv
  /nix/store/47s0jgqnrcdqh4k43f8g0zqidkngl5v3-xwayland-21.1.1.drv
  /nix/store/bl0kzyvl4nar9rf4yvpy7gavjnlxgd5a-wlroots-0.13.0.drv
  /nix/store/9hdn8910hbgxgs215l6pg1cklkddphxa-sway-unwrapped-1.6.drv
  /nix/store/dy1sfcy3cj5kx29cwzc9y9n2bwqvl2di-sway.drv
  /nix/store/yi99w995cgr1gaajljjsb4dznkrygbg3-sway-1.6.drv

@primeos primeos mentioned this pull request May 6, 2021
10 tasks
primeos added a commit to primeos/nixpkgs that referenced this pull request May 20, 2021
Note: The update to Mesa 21.0.2 was reverted (25ae1fd) because it
caused major issues with Sway (segfault on startup [0]).
This is still the case and might affect all packages that directly
depend on "mesa" (for libgbm or libglapi) but it only causes issues when
the package depends on a "mesa" version that differs from "mesa.drivers"
used for "/run/opengl-driver/". I've noticed this while testing Mesa
updates with the NixOS option "hardware.opengl.package" (as usual)
instead of rebuilding my whole system (which would work). Unfortunately
this can/will likely also cause issues when mixing different channels,
using Flakes/Overlays, etc.

The cause of this should be similar to [1] ("mesa" updates now cause the
same issues that "glibc" updates already do, maybe triggered by certain
Mesa changes) and some additional discussions is in [2],[3].

Note: Don't backport this to NixOS 21.05, at least not without careful
consideration.

[0]: NixOS#118753 (comment)
[1]: NixOS#95808
[2]: NixOS#120325
[3]: NixOS#119558
@primeos primeos marked this pull request as ready for review May 24, 2021 12:41
@primeos primeos changed the title mesa: 21.0.X -> 21.1.1 mesa: 21.0.1 -> 21.1.1 May 24, 2021
@primeos
Copy link
Member Author

primeos commented May 24, 2021

@vcunat are 1001-2500 rebuilds on Darwin and Linux still fine for master (#118479 (comment), #118753 (comment)) and is it ok to merge all Mesa updates via master or should we merge some of them via staging?

@ofborg test sway

@vcunat
Copy link
Member

vcunat commented May 24, 2021

I think the rebuild amount is OK-ish to push directly to master, though it's a bit more than I originally thought (maybe some of the later changes). And we're in period where I expect to have higher Hydra load continuously... perhaps staging-next as a compromise?

@primeos
Copy link
Member Author

primeos commented May 24, 2021

@vcunat sounds good, that's also what I thought, thanks for the reply!

If I have time for it I'll try to investigate if we can avoid some of the rebuilds (AFAIK the build capacity on Darwin is more limited and it's also strange that there are more rebuilds on Darwin than on Linux - not sure if the Mesa dependencies are even needed on Darwin).
Edit: Accidentally rebased against staging, sorry about the incoming pings... :o
@ofborg test sway
Current rebuild numbers:

$ ./maintainers/scripts/rebuild-amount.sh HEAD~
Estimating rebuild amount by counting changed Hydra jobs.
   1471 x86_64-darwin
    966 x86_64-linux

@primeos primeos changed the base branch from master to staging-next May 24, 2021 16:40
Note: This update likely causes some issues when running an application
that has a direct dependency on Mesa (e.g. Sway and XWayland) and was
compiled against a different Nixpkgs revision. See 7106fca for more
details regarding that issue.
@primeos primeos changed the title mesa: 21.0.1 -> 21.1.1 mesa: 21.0.3 -> 21.1.1 May 24, 2021
@vcunat
Copy link
Member

vcunat commented May 24, 2021

darwin is because libGL is "intentionally" affected in there; see pkgs/development/libraries/mesa/stubs.nix.

@vcunat
Copy link
Member

vcunat commented May 24, 2021

I don't dare trying to improve that.

And x86_64-darwin is the main reason why I suggested not using master now, as it's quite loaded on Hydra now and probably will remain so for at least a week. On staging-next most builds haven't finished yet (especially on "slower" platforms), so most should be saved.

@vcunat
Copy link
Member

vcunat commented May 24, 2021

This still can't be merged as-is (to staging-next) due to containing extra commits from staging.

@primeos
Copy link
Member Author

primeos commented May 24, 2021

This still can't be merged as-is (to staging-next) due to containing extra commits from staging.

Oh boy, I really messed that one up... :o First I forgot to specify staging-next when rebasing (and my remote tracking / ustream branch was staging) and when I noticed this I accidentally re-pushed to primeos:staging-next instead of primeos:mesa-next (which I didn't realize and explains why the extra commits where still in there). Seems like I was lacking some attention...

Anyway, sorry about that, it's fixed now.
@ofborg test sway

@primeos primeos merged commit 0c40fb5 into NixOS:staging-next May 24, 2021
fabaff pushed a commit to fabaff/nixpkgs that referenced this pull request Jun 11, 2021
Note: The update to Mesa 21.0.2 was reverted (25ae1fd) because it
caused major issues with Sway (segfault on startup [0]).
This is still the case and might affect all packages that directly
depend on "mesa" (for libgbm or libglapi) but it only causes issues when
the package depends on a "mesa" version that differs from "mesa.drivers"
used for "/run/opengl-driver/". I've noticed this while testing Mesa
updates with the NixOS option "hardware.opengl.package" (as usual)
instead of rebuilding my whole system (which would work). Unfortunately
this can/will likely also cause issues when mixing different channels,
using Flakes/Overlays, etc.

The cause of this should be similar to [1] ("mesa" updates now cause the
same issues that "glibc" updates already do, maybe triggered by certain
Mesa changes) and some additional discussions is in [2],[3].

Note: Don't backport this to NixOS 21.05, at least not without careful
consideration.

[0]: NixOS#118753 (comment)
[1]: NixOS#95808
[2]: NixOS#120325
[3]: NixOS#119558
vcunat pushed a commit that referenced this pull request Jul 25, 2021
Note: The update to Mesa 21.0.2 was reverted (25ae1fd) because it
caused major issues with Sway (segfault on startup [0]).
This is still the case and might affect all packages that directly
depend on "mesa" (for libgbm or libglapi) but it only causes issues when
the package depends on a "mesa" version that differs from "mesa.drivers"
used for "/run/opengl-driver/". I've noticed this while testing Mesa
updates with the NixOS option "hardware.opengl.package" (as usual)
instead of rebuilding my whole system (which would work). Unfortunately
this can/will likely also cause issues when mixing different channels,
using Flakes/Overlays, etc.

The cause of this should be similar to [1] ("mesa" updates now cause the
same issues that "glibc" updates already do, maybe triggered by certain
Mesa changes) and some additional discussions is in [2],[3].

Note: Don't backport this to NixOS 21.05, at least not without careful
consideration.

[0]: #118753 (comment)
[1]: #95808
[2]: #120325
[3]: #119558
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants