Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segment fault on Xorg 1.20 #201

Closed
AlynxZhou opened this issue May 28, 2018 · 29 comments
Closed

Segment fault on Xorg 1.20 #201

AlynxZhou opened this issue May 28, 2018 · 29 comments

Comments

@AlynxZhou
Copy link

@AlynxZhou AlynxZhou commented May 28, 2018

I am using Arch Linux, since I upgrade my xorg-server to 1.20, primusrun always get a segment fault while optirun not, here is my journal:

May 21 15:46:47 pendragon kernel: bbswitch: enabling discrete graphics
May 21 15:46:47 pendragon kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 240
May 21 15:46:47 pendragon kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  396.24  Thu Apr 26 00:10:09 PDT 2018 (using threaded interrupts)
May 21 15:46:47 pendragon kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  396.24  Wed Apr 25 23:54:18 PDT 2018
May 21 15:46:47 pendragon kernel: nvidia-modeset: Allocated GPU:0 (GPU-2ffa395f-25a1-eaab-d6f5-1a2531e2cda8) @ PCI:0000:01:00.0
May 21 15:46:47 pendragon kernel: nvidia-modeset: Freed GPU:0 (GPU-2ffa395f-25a1-eaab-d6f5-1a2531e2cda8) @ PCI:0000:01:00.0
May 21 15:46:47 pendragon bumblebeed[438]: [   46.065467] [WARN][XORG] (WW) Open ACPI failed (/var/run/acpid.socket) (No such file or directory)
May 21 15:46:47 pendragon bumblebeed[438]: [   46.065482] [WARN][XORG] (WW) Warning, couldn't open module mouse
May 21 15:46:47 pendragon bumblebeed[438]: [   46.065491] [WARN][XORG] (WW) NVIDIA(0): Unable to get display device for DPI computation.
May 21 15:46:47 pendragon bumblebeed[438]: [   46.065496] [WARN][XORG] (WW) NVIDIA(0): Option "NoLogo" is not used
May 21 15:46:47 pendragon bumblebeed[438]: [   46.065504] [WARN][XORG] (WW) Warning, couldn't open module mouse
May 21 15:46:47 pendragon bumblebeed[438]: [   46.065509] [ERROR][XORG] (EE) PreInit returned 2 for "<default pointer>"
May 21 15:46:47 pendragon bumblebeed[438]: [   46.065513] [ERROR][XORG] (EE) PreInit returned 2 for "<default keyboard>"
May 21 15:46:47 pendragon kernel: nvidia-modeset: Allocated GPU:0 (GPU-2ffa395f-25a1-eaab-d6f5-1a2531e2cda8) @ PCI:0000:01:00.0
May 21 15:46:47 pendragon kernel: nvidia-modeset: Freed GPU:0 (GPU-2ffa395f-25a1-eaab-d6f5-1a2531e2cda8) @ PCI:0000:01:00.0
May 21 15:46:47 pendragon kernel: glxgears[1432]: segfault at 74 ip 00007f5404112151 sp 00007f5400623b20 error 4 in i965_dri.so[7f5403f3a000+874000]
May 21 15:46:47 pendragon systemd[1]: Started Process Core Dump (PID 1434/UID 0).
May 21 15:46:48 pendragon kernel: nvidia-modeset: Unloading
May 21 15:46:48 pendragon systemd-coredump[1435]: Process 1416 (glxgears) of user 1000 dumped core.
                                                  
                                                  Stack trace of thread 1432:
                                                  #0  0x00007f5404112151 n/a (i965_dri.so)
                                                  #1  0x00007f5404314f8d n/a (i965_dri.so)
                                                  #2  0x00007f5408c09425 n/a (libGL.so.1)
                                                  #3  0x00007f5407f3c075 start_thread (libpthread.so.0)
                                                  #4  0x00007f540824b53f __clone (libc.so.6)
                                                  
                                                  Stack trace of thread 1416:
                                                  #0  0x00007f5407f44856 do_futex_wait.constprop.1 (libpthread.so.0)
                                                  #1  0x00007f5407f44958 __new_sem_wait_slow.constprop.0 (libpthread.so.0)
                                                  #2  0x00007f5408c0a4ac glXSwapBuffers (libGL.so.1)
                                                  #3  0x000056472e673a27 n/a (glxgears)
                                                  #4  0x00007f540817606b __libc_start_main (libc.so.6)
                                                  #5  0x000056472e67408a n/a (glxgears)
                                                  
                                                  Stack trace of thread 1433:
                                                  #0  0x00007f5407f44856 do_futex_wait.constprop.1 (libpthread.so.0)
                                                  #1  0x00007f5407f44958 __new_sem_wait_slow.constprop.0 (libpthread.so.0)
                                                  #2  0x00007f5408c0aa7c n/a (libGL.so.1)
                                                  #3  0x00007f5407f3c075 start_thread (libpthread.so.0)
                                                  #4  0x00007f540824b53f __clone (libc.so.6)
May 21 15:46:48 pendragon kernel: nvidia-nvlink: Unregistered the Nvlink Core, major device number 240
May 21 15:46:48 pendragon kernel: bbswitch: disabling discrete graphics
May 21 15:46:48 pendragon kernel: pci 0000:01:00.0: Refused to change power state, currently in D0

Thanks if anyone can help.

@NerosTie
Copy link

@NerosTie NerosTie commented Jun 8, 2018

Same issue with xorg-server 1.20.0-6 on Arch.

@CrafterSvK
Copy link

@CrafterSvK CrafterSvK commented Jun 9, 2018

Same issue on Arch

@thenationalsanya
Copy link

@thenationalsanya thenationalsanya commented Jun 9, 2018

I have fixed this issue by downgrading Mesa to 18.0.4, that works just fine.I suppose we should wait for the newer release of MESA to fix this mess...

@NerosTie
Copy link

@NerosTie NerosTie commented Jun 9, 2018

@turboasm123 But are they aware of this bug?

@thenationalsanya
Copy link

@thenationalsanya thenationalsanya commented Jun 9, 2018

@NerosTie yes, they are.There is already a bug reported on bugs.archlinux.org

https://bugs.archlinux.org/task/58933

@CrafterSvK
Copy link

@CrafterSvK CrafterSvK commented Jun 17, 2018

What if mesa is not problem. Primus is 2 years old. Shouldn't we update it to latest specification of mesa? Rather than wait for them?

Remove incomplete GLX_SGIX_swap_barrier stubs from the Xlib libGL
Remove incomplete GLX_SGIX_swap_group stubs from the Xlib libGL

They removed this stubs from libGL that may cause problems.
Probably not (seems unlikely)

@juliotux
Copy link

@juliotux juliotux commented Jun 18, 2018

Xorg 1.20 have now
https://www.phoronix.com/scan.php?page=news_item&px=X.Org-Server-1.20-Features

Server-side GLVND "GLXVND" for allowing different OpenGL drivers to back different X screens. This should help in some multi-GPU setups and other combinations.

Wouldn't be the time to update the primusrun and use this feature?

@thenationalsanya
Copy link

@thenationalsanya thenationalsanya commented Jun 22, 2018

@juliotux I'm sure it is used by nvidia-xrun, but it's not better than primus, since you have to start an X-session and switch every time you want to run anything.

@juliotux
Copy link

@juliotux juliotux commented Jun 22, 2018

@turboasm123 nvidia-xrun do not use GLXVND. It just start a new X session with nvidia xorg config. As annouced, GLXVND runs in the same X session, but in different screens. The challenge would be bring the rendering of this second screen to the primary one. Another option is wrap around GLVND, as suggested by another users, but I couldn't find any good documentation about it.

@knyzorg
Copy link

@knyzorg knyzorg commented Jun 23, 2018

Confirming "solution" from linked arch bug tracker. Downgrading mesa fixes the issue.

So... Probably not a primus issue. Mesa is definitely a culprit though.

@thenationalsanya
Copy link

@thenationalsanya thenationalsanya commented Jun 23, 2018

I have also reported the bug on the Mesa bugtracker.Not much response, but still, there is some activity.
Here is the link: https://bugs.freedesktop.org/show_bug.cgi?id=106910

@knyzorg
Copy link

@knyzorg knyzorg commented Jul 2, 2018

It's frustrating how slowly this is all moving. It feels like one of those issues which will be left broken for years.

On the bright side, Optimus seems to be working fine.

@CrafterSvK
Copy link

@CrafterSvK CrafterSvK commented Jul 8, 2018

Yes on a bright side. There are certain applications which won't work because they need LD_PRELOAD or something like that. Optimus won't pass this arguments and application will crash. Mount & Blade: Warband is one of them. That's why I like primus a lot more.

@knyzorg
Copy link

@knyzorg knyzorg commented Jul 11, 2018

I have received a response via email which (for some reason?) has been removed.

Reposting:

@chriscjs: PRIMUS_UPLOAD=1 primusrun glxspheres64
solves this issue for me. I did not have to downgrade xorg or mesa.

I am confirming that it seems to work yet I have trouble finding any documentation as to what "PRIMUS_UPLOAD" does.

@AlynxZhou
Copy link
Author

@AlynxZhou AlynxZhou commented Jul 11, 2018

@knyzorg seems work for me...

@chriscjs
Copy link

@chriscjs chriscjs commented Jul 11, 2018

@knyzorg
I had removed the post after someone else posted on https://bugs.archlinux.org/task/58933
that it caused performance degradation compared to optirun, although it does not on my system. Thanks for reposting it. The following text is from the primusrun script in /usr/bin which might explain it a bit:

Upload/display method
0: autodetect, 1: textures, 2: PBO/glDrawPixels (needs Mesa-10.1+)
export PRIMUS_UPLOAD=${PRIMUS_UPLOAD:-0}

A google search on glDrawPixels says it has been removed from opengl 3.2 and above and maybe that's one reason for this issue.

@AlynxZhou
Copy link
Author

@AlynxZhou AlynxZhou commented Jul 11, 2018

@chriscjs I have tested them on my pc, =1 gives the best performance, =2 works but not so good as =1, while optirun is the worst. =0 got a segment fault.

Here are results:

[alynx@pendragon:~] % vblank_mode=0 PRIMUS_UPLOAD=0 primusrun glxgears
ATTENTION: default value of option vblank_mode overridden by environment.
ATTENTION: default value of option vblank_mode overridden by environment.
zsh: segmentation fault (core dumped)  vblank_mode=0 PRIMUS_UPLOAD=0 primusrun glxgears
[alynx@pendragon:~]! % vblank_mode=0 PRIMUS_UPLOAD=2 primusrun glxgears
ATTENTION: default value of option vblank_mode overridden by environment.
ATTENTION: default value of option vblank_mode overridden by environment.
22144 frames in 5.0 seconds = 4428.668 FPS
18780 frames in 5.0 seconds = 3755.991 FPS
XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
      after 32 requests (32 known processed) with 0 events remaining.
X Error of failed request:  BadWindow (invalid Window parameter)
  Major opcode of failed request:  147 ()
  Minor opcode of failed request:  1
  Resource id in failed request:  0x3a00002
  Serial number of failed request:  48813
  Current serial number in output stream:  48814
^[[A^C
[alynx@pendragon:~]! % vblank_mode=0 PRIMUS_UPLOAD=1 primusrun glxgears
ATTENTION: default value of option vblank_mode overridden by environment.
ATTENTION: default value of option vblank_mode overridden by environment.
23965 frames in 5.0 seconds = 4792.905 FPS
26534 frames in 5.0 seconds = 5306.707 FPS
26583 frames in 5.0 seconds = 5316.538 FPS
XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
      after 32 requests (32 known processed) with 0 events remaining.
X Error of failed request:  BadWindow (invalid Window parameter)
  Major opcode of failed request:  147 ()
  Minor opcode of failed request:  1
  Resource id in failed request:  0x3a00002
  Serial number of failed request:  83881
  Current serial number in output stream:  83882
primus: warning: dropping a frame to avoid deadlock
^C
[alynx@pendragon:~]! % vblank_mode=0 optirun glxgears
ATTENTION: default value of option vblank_mode overridden by environment.
12713 frames in 5.0 seconds = 2542.409 FPS
13112 frames in 5.0 seconds = 2622.357 FPS
13046 frames in 5.0 seconds = 2609.111 FPS
[VGL] ERROR: in readback--
[VGL]    254: Window has been deleted by window manager
[alynx@pendragon:~]! % 
@jebrosen
Copy link

@jebrosen jebrosen commented Jul 11, 2018

I can confirm that both PRIMUS_UPLOAD=1 and PRIMUS_UPLOAD=2 work and are significantly faster than optirun on my machine as well.

The fact that PRIMUS_UPLOAD=0 means "autodetect" but both choices are working, and the following code:

primus.dispmethod = test_drawpixels_fast(ddpy, context, *dconfigs) ? 2 : 1;

suggests that it's the autodetection itself (test_drawpixels_fast) that's at fault. Note that the PRIMUS_UPLOAD=2 method is itself implemented using glDrawPixels, so there is probably something special about test_drawpixels_fast that causes it to fail.

@kmahyyg
Copy link

@kmahyyg kmahyyg commented Jul 11, 2018

I just found that this project is so outdated, the last commit is about 3 years ago.......
I'm the one who issued the bug report on Arch Linux Bug Tracker.
Use any of the method above will work, but runs so slowly.

So I strongly suggest you downgrade your mesa or check the Arch Linux Bug Tracker.

@hellbound22
Copy link

@hellbound22 hellbound22 commented Aug 6, 2018

Downgrading to mesa-18.0.4-1 did work for me. But yeah, this repo needs some updates.

@damian01w
Copy link

@damian01w damian01w commented Aug 8, 2018

This is very frustrating bug, because primus its a very popular opensource project and now seems to be not working at all with recent Xorg/Mesa updates. Dear @amonakov , please, Are you
still involved in the project? Can you contribute to solve this bug?

@kmahyyg
Copy link

@kmahyyg kmahyyg commented Aug 9, 2018

ribalda added a commit to ribalda/primus that referenced this issue Aug 11, 2018
Without this patch current mesa crases during upload_mode detection.

Signed-off-by: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com>
@ribalda
Copy link

@ribalda ribalda commented Aug 11, 2018

I have just sent a pull request that fixes the bug for me.

@CrafterSvK
Copy link

@CrafterSvK CrafterSvK commented Aug 11, 2018

Should we make a mirror of this repo and make it official if @amonakov is no longer involved?

@knyzorg
Copy link

@knyzorg knyzorg commented Aug 11, 2018

Arch users can specify which git repo to clone from when using the AUR. However, we can go forward with a long term solution if anybody wants to become the new maintainer of primus and contact the repository maintainers to transfer ownership.

@amonakov
Copy link
Owner

@amonakov amonakov commented Aug 12, 2018

Reports indicate that this is a result of regression in Xorg/Mesa that has to do with "DRI3 modifiers", so please try to engage more actively with Mesa/i965 developers about the issue.

Did anyone inform them of the bisection result shown on Arch forum?
https://bbs.archlinux.org/viewtopic.php?pid=1789470#p1789470

In the meantime, running with PRIMUS_UPLOAD=2 should serve as a workaround as it skips the autodetection path.

@CrafterSvK
Copy link

@CrafterSvK CrafterSvK commented Aug 19, 2018

All right. primusrun working without any workaround on mesa 18.1.6 on ArchLinux.

@damian01w
Copy link

@damian01w damian01w commented Aug 20, 2018

Seems to be working fine now on ArchLinux and Debian testing up-to-date. Thanks!

@knyzorg
Copy link

@knyzorg knyzorg commented Aug 20, 2018

Confirming as fixed.

@amonakov amonakov closed this Aug 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet