New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segment fault on Xorg 1.20 #201

Closed
AlynxZhou opened this Issue May 28, 2018 · 29 comments

Comments

Projects
None yet
@AlynxZhou
Copy link

AlynxZhou commented May 28, 2018

I am using Arch Linux, since I upgrade my xorg-server to 1.20, primusrun always get a segment fault while optirun not, here is my journal:

May 21 15:46:47 pendragon kernel: bbswitch: enabling discrete graphics
May 21 15:46:47 pendragon kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 240
May 21 15:46:47 pendragon kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  396.24  Thu Apr 26 00:10:09 PDT 2018 (using threaded interrupts)
May 21 15:46:47 pendragon kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  396.24  Wed Apr 25 23:54:18 PDT 2018
May 21 15:46:47 pendragon kernel: nvidia-modeset: Allocated GPU:0 (GPU-2ffa395f-25a1-eaab-d6f5-1a2531e2cda8) @ PCI:0000:01:00.0
May 21 15:46:47 pendragon kernel: nvidia-modeset: Freed GPU:0 (GPU-2ffa395f-25a1-eaab-d6f5-1a2531e2cda8) @ PCI:0000:01:00.0
May 21 15:46:47 pendragon bumblebeed[438]: [   46.065467] [WARN][XORG] (WW) Open ACPI failed (/var/run/acpid.socket) (No such file or directory)
May 21 15:46:47 pendragon bumblebeed[438]: [   46.065482] [WARN][XORG] (WW) Warning, couldn't open module mouse
May 21 15:46:47 pendragon bumblebeed[438]: [   46.065491] [WARN][XORG] (WW) NVIDIA(0): Unable to get display device for DPI computation.
May 21 15:46:47 pendragon bumblebeed[438]: [   46.065496] [WARN][XORG] (WW) NVIDIA(0): Option "NoLogo" is not used
May 21 15:46:47 pendragon bumblebeed[438]: [   46.065504] [WARN][XORG] (WW) Warning, couldn't open module mouse
May 21 15:46:47 pendragon bumblebeed[438]: [   46.065509] [ERROR][XORG] (EE) PreInit returned 2 for "<default pointer>"
May 21 15:46:47 pendragon bumblebeed[438]: [   46.065513] [ERROR][XORG] (EE) PreInit returned 2 for "<default keyboard>"
May 21 15:46:47 pendragon kernel: nvidia-modeset: Allocated GPU:0 (GPU-2ffa395f-25a1-eaab-d6f5-1a2531e2cda8) @ PCI:0000:01:00.0
May 21 15:46:47 pendragon kernel: nvidia-modeset: Freed GPU:0 (GPU-2ffa395f-25a1-eaab-d6f5-1a2531e2cda8) @ PCI:0000:01:00.0
May 21 15:46:47 pendragon kernel: glxgears[1432]: segfault at 74 ip 00007f5404112151 sp 00007f5400623b20 error 4 in i965_dri.so[7f5403f3a000+874000]
May 21 15:46:47 pendragon systemd[1]: Started Process Core Dump (PID 1434/UID 0).
May 21 15:46:48 pendragon kernel: nvidia-modeset: Unloading
May 21 15:46:48 pendragon systemd-coredump[1435]: Process 1416 (glxgears) of user 1000 dumped core.
                                                  
                                                  Stack trace of thread 1432:
                                                  #0  0x00007f5404112151 n/a (i965_dri.so)
                                                  #1  0x00007f5404314f8d n/a (i965_dri.so)
                                                  #2  0x00007f5408c09425 n/a (libGL.so.1)
                                                  #3  0x00007f5407f3c075 start_thread (libpthread.so.0)
                                                  #4  0x00007f540824b53f __clone (libc.so.6)
                                                  
                                                  Stack trace of thread 1416:
                                                  #0  0x00007f5407f44856 do_futex_wait.constprop.1 (libpthread.so.0)
                                                  #1  0x00007f5407f44958 __new_sem_wait_slow.constprop.0 (libpthread.so.0)
                                                  #2  0x00007f5408c0a4ac glXSwapBuffers (libGL.so.1)
                                                  #3  0x000056472e673a27 n/a (glxgears)
                                                  #4  0x00007f540817606b __libc_start_main (libc.so.6)
                                                  #5  0x000056472e67408a n/a (glxgears)
                                                  
                                                  Stack trace of thread 1433:
                                                  #0  0x00007f5407f44856 do_futex_wait.constprop.1 (libpthread.so.0)
                                                  #1  0x00007f5407f44958 __new_sem_wait_slow.constprop.0 (libpthread.so.0)
                                                  #2  0x00007f5408c0aa7c n/a (libGL.so.1)
                                                  #3  0x00007f5407f3c075 start_thread (libpthread.so.0)
                                                  #4  0x00007f540824b53f __clone (libc.so.6)
May 21 15:46:48 pendragon kernel: nvidia-nvlink: Unregistered the Nvlink Core, major device number 240
May 21 15:46:48 pendragon kernel: bbswitch: disabling discrete graphics
May 21 15:46:48 pendragon kernel: pci 0000:01:00.0: Refused to change power state, currently in D0

Thanks if anyone can help.

@NerosTie

This comment has been minimized.

Copy link

NerosTie commented Jun 8, 2018

Same issue with xorg-server 1.20.0-6 on Arch.

@CrafterSvK

This comment has been minimized.

Copy link

CrafterSvK commented Jun 9, 2018

Same issue on Arch

@turboasm123

This comment has been minimized.

Copy link

turboasm123 commented Jun 9, 2018

I have fixed this issue by downgrading Mesa to 18.0.4, that works just fine.I suppose we should wait for the newer release of MESA to fix this mess...

@NerosTie

This comment has been minimized.

Copy link

NerosTie commented Jun 9, 2018

@turboasm123 But are they aware of this bug?

@turboasm123

This comment has been minimized.

Copy link

turboasm123 commented Jun 9, 2018

@NerosTie yes, they are.There is already a bug reported on bugs.archlinux.org

https://bugs.archlinux.org/task/58933

@CrafterSvK

This comment has been minimized.

Copy link

CrafterSvK commented Jun 17, 2018

What if mesa is not problem. Primus is 2 years old. Shouldn't we update it to latest specification of mesa? Rather than wait for them?

Remove incomplete GLX_SGIX_swap_barrier stubs from the Xlib libGL
Remove incomplete GLX_SGIX_swap_group stubs from the Xlib libGL

They removed this stubs from libGL that may cause problems.
Probably not (seems unlikely)

@juliotux

This comment has been minimized.

Copy link

juliotux commented Jun 18, 2018

Xorg 1.20 have now
https://www.phoronix.com/scan.php?page=news_item&px=X.Org-Server-1.20-Features

Server-side GLVND "GLXVND" for allowing different OpenGL drivers to back different X screens. This should help in some multi-GPU setups and other combinations.

Wouldn't be the time to update the primusrun and use this feature?

@turboasm123

This comment has been minimized.

Copy link

turboasm123 commented Jun 22, 2018

@juliotux I'm sure it is used by nvidia-xrun, but it's not better than primus, since you have to start an X-session and switch every time you want to run anything.

@juliotux

This comment has been minimized.

Copy link

juliotux commented Jun 22, 2018

@turboasm123 nvidia-xrun do not use GLXVND. It just start a new X session with nvidia xorg config. As annouced, GLXVND runs in the same X session, but in different screens. The challenge would be bring the rendering of this second screen to the primary one. Another option is wrap around GLVND, as suggested by another users, but I couldn't find any good documentation about it.

@knyzorg

This comment has been minimized.

Copy link

knyzorg commented Jun 23, 2018

Confirming "solution" from linked arch bug tracker. Downgrading mesa fixes the issue.

So... Probably not a primus issue. Mesa is definitely a culprit though.

@turboasm123

This comment has been minimized.

Copy link

turboasm123 commented Jun 23, 2018

I have also reported the bug on the Mesa bugtracker.Not much response, but still, there is some activity.
Here is the link: https://bugs.freedesktop.org/show_bug.cgi?id=106910

@knyzorg

This comment has been minimized.

Copy link

knyzorg commented Jul 2, 2018

It's frustrating how slowly this is all moving. It feels like one of those issues which will be left broken for years.

On the bright side, Optimus seems to be working fine.

@CrafterSvK

This comment has been minimized.

Copy link

CrafterSvK commented Jul 8, 2018

Yes on a bright side. There are certain applications which won't work because they need LD_PRELOAD or something like that. Optimus won't pass this arguments and application will crash. Mount & Blade: Warband is one of them. That's why I like primus a lot more.

@knyzorg

This comment has been minimized.

Copy link

knyzorg commented Jul 11, 2018

I have received a response via email which (for some reason?) has been removed.

Reposting:

@chriscjs: PRIMUS_UPLOAD=1 primusrun glxspheres64
solves this issue for me. I did not have to downgrade xorg or mesa.

I am confirming that it seems to work yet I have trouble finding any documentation as to what "PRIMUS_UPLOAD" does.

@AlynxZhou

This comment has been minimized.

Copy link
Author

AlynxZhou commented Jul 11, 2018

@knyzorg seems work for me...

@chriscjs

This comment has been minimized.

Copy link

chriscjs commented Jul 11, 2018

@knyzorg
I had removed the post after someone else posted on https://bugs.archlinux.org/task/58933
that it caused performance degradation compared to optirun, although it does not on my system. Thanks for reposting it. The following text is from the primusrun script in /usr/bin which might explain it a bit:

Upload/display method
0: autodetect, 1: textures, 2: PBO/glDrawPixels (needs Mesa-10.1+)
export PRIMUS_UPLOAD=${PRIMUS_UPLOAD:-0}

A google search on glDrawPixels says it has been removed from opengl 3.2 and above and maybe that's one reason for this issue.

@AlynxZhou

This comment has been minimized.

Copy link
Author

AlynxZhou commented Jul 11, 2018

@chriscjs I have tested them on my pc, =1 gives the best performance, =2 works but not so good as =1, while optirun is the worst. =0 got a segment fault.

Here are results:

[alynx@pendragon:~] % vblank_mode=0 PRIMUS_UPLOAD=0 primusrun glxgears
ATTENTION: default value of option vblank_mode overridden by environment.
ATTENTION: default value of option vblank_mode overridden by environment.
zsh: segmentation fault (core dumped)  vblank_mode=0 PRIMUS_UPLOAD=0 primusrun glxgears
[alynx@pendragon:~]! % vblank_mode=0 PRIMUS_UPLOAD=2 primusrun glxgears
ATTENTION: default value of option vblank_mode overridden by environment.
ATTENTION: default value of option vblank_mode overridden by environment.
22144 frames in 5.0 seconds = 4428.668 FPS
18780 frames in 5.0 seconds = 3755.991 FPS
XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
      after 32 requests (32 known processed) with 0 events remaining.
X Error of failed request:  BadWindow (invalid Window parameter)
  Major opcode of failed request:  147 ()
  Minor opcode of failed request:  1
  Resource id in failed request:  0x3a00002
  Serial number of failed request:  48813
  Current serial number in output stream:  48814
^[[A^C
[alynx@pendragon:~]! % vblank_mode=0 PRIMUS_UPLOAD=1 primusrun glxgears
ATTENTION: default value of option vblank_mode overridden by environment.
ATTENTION: default value of option vblank_mode overridden by environment.
23965 frames in 5.0 seconds = 4792.905 FPS
26534 frames in 5.0 seconds = 5306.707 FPS
26583 frames in 5.0 seconds = 5316.538 FPS
XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
      after 32 requests (32 known processed) with 0 events remaining.
X Error of failed request:  BadWindow (invalid Window parameter)
  Major opcode of failed request:  147 ()
  Minor opcode of failed request:  1
  Resource id in failed request:  0x3a00002
  Serial number of failed request:  83881
  Current serial number in output stream:  83882
primus: warning: dropping a frame to avoid deadlock
^C
[alynx@pendragon:~]! % vblank_mode=0 optirun glxgears
ATTENTION: default value of option vblank_mode overridden by environment.
12713 frames in 5.0 seconds = 2542.409 FPS
13112 frames in 5.0 seconds = 2622.357 FPS
13046 frames in 5.0 seconds = 2609.111 FPS
[VGL] ERROR: in readback--
[VGL]    254: Window has been deleted by window manager
[alynx@pendragon:~]! % 
@jebrosen

This comment has been minimized.

Copy link

jebrosen commented Jul 11, 2018

I can confirm that both PRIMUS_UPLOAD=1 and PRIMUS_UPLOAD=2 work and are significantly faster than optirun on my machine as well.

The fact that PRIMUS_UPLOAD=0 means "autodetect" but both choices are working, and the following code:

primus.dispmethod = test_drawpixels_fast(ddpy, context, *dconfigs) ? 2 : 1;

suggests that it's the autodetection itself (test_drawpixels_fast) that's at fault. Note that the PRIMUS_UPLOAD=2 method is itself implemented using glDrawPixels, so there is probably something special about test_drawpixels_fast that causes it to fail.

@kmahyyg

This comment has been minimized.

Copy link

kmahyyg commented Jul 11, 2018

I just found that this project is so outdated, the last commit is about 3 years ago.......
I'm the one who issued the bug report on Arch Linux Bug Tracker.
Use any of the method above will work, but runs so slowly.

So I strongly suggest you downgrade your mesa or check the Arch Linux Bug Tracker.

@PhdTrollSlayer

This comment has been minimized.

Copy link

PhdTrollSlayer commented Aug 6, 2018

Downgrading to mesa-18.0.4-1 did work for me. But yeah, this repo needs some updates.

@damian01w

This comment has been minimized.

Copy link

damian01w commented Aug 8, 2018

This is very frustrating bug, because primus its a very popular opensource project and now seems to be not working at all with recent Xorg/Mesa updates. Dear @amonakov , please, Are you
still involved in the project? Can you contribute to solve this bug?

@kmahyyg

This comment has been minimized.

Copy link

kmahyyg commented Aug 9, 2018

ribalda added a commit to ribalda/primus that referenced this issue Aug 11, 2018

liglfork.cpp: Fix for amonakov#201
Without this patch current mesa crases during upload_mode detection.

Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
@ribalda

This comment has been minimized.

Copy link

ribalda commented Aug 11, 2018

I have just sent a pull request that fixes the bug for me.

@CrafterSvK

This comment has been minimized.

Copy link

CrafterSvK commented Aug 11, 2018

Should we make a mirror of this repo and make it official if @amonakov is no longer involved?

@knyzorg

This comment has been minimized.

Copy link

knyzorg commented Aug 11, 2018

Arch users can specify which git repo to clone from when using the AUR. However, we can go forward with a long term solution if anybody wants to become the new maintainer of primus and contact the repository maintainers to transfer ownership.

@amonakov

This comment has been minimized.

Copy link
Owner

amonakov commented Aug 12, 2018

Reports indicate that this is a result of regression in Xorg/Mesa that has to do with "DRI3 modifiers", so please try to engage more actively with Mesa/i965 developers about the issue.

Did anyone inform them of the bisection result shown on Arch forum?
https://bbs.archlinux.org/viewtopic.php?pid=1789470#p1789470

In the meantime, running with PRIMUS_UPLOAD=2 should serve as a workaround as it skips the autodetection path.

@CrafterSvK

This comment has been minimized.

Copy link

CrafterSvK commented Aug 19, 2018

All right. primusrun working without any workaround on mesa 18.1.6 on ArchLinux.

@damian01w

This comment has been minimized.

Copy link

damian01w commented Aug 20, 2018

Seems to be working fine now on ArchLinux and Debian testing up-to-date. Thanks!

@knyzorg

This comment has been minimized.

Copy link

knyzorg commented Aug 20, 2018

Confirming as fixed.

@amonakov amonakov closed this Aug 23, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment