Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing/corrupted image with Mesa-10.1 #138

Closed
amonakov opened this issue Apr 3, 2014 · 44 comments
Closed

Missing/corrupted image with Mesa-10.1 #138

amonakov opened this issue Apr 3, 2014 · 44 comments

Comments

@amonakov
Copy link
Owner

amonakov commented Apr 3, 2014

It appears that some people get blank or garbage window unless PRIMUS_UPLOAD=1 is in effect. The issue seems limited to Intel Haswell and mesa-10.1 (dedicated fast path for PBO glDrawPixels appeared only in 10.1).

I've been unable to reproduce the issue so far, but it looks like a bug in Mesa with window buffers being mishandled somewhere. Please try to run a standalone test mimicking primus' behavior: https://gist.github.com/amonakov/8192522b71e3350857e4

cd /tmp
wget -O drawpixtest.c https://gist.githubusercontent.com/amonakov/8192522b71e3350857e4/raw/9e2e56254cf5ef0b45ac998a72937a389bf9feba/drawpixtest.c
bash -x drawpixtest.c

You should see a 1024x512 window with a texture moving to the left. It's hardcoded to render 1000 frames; time vblank_mode=0 /tmp/r should finish in about a second.

Please report if the test works for you, your Intel GPU model, Mesa version, and provide /var/log/Xorg.0.log via a pastebin or a Github gist. Please let me know if you're up to git-bisecting Mesa.

@floe
Copy link

floe commented Apr 3, 2014

First run of test shows garbage, subsequent runs show a black window.
Time with vblank_mode=0 is about 1.8s, without vblank_mode it's about 16.6s.
GPU:
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09) (prog-if 00 [VGA controller])
Subsystem: Dell Device 0534
Mesa:
10.1.0-4ubuntu1 (from Ubuntu 14.04 final beta)
Xorg.0.log:
http://pastebin.com/RUfJFfit
Bisecting:
generally yes, time permitting :-)

@amonakov
Copy link
Owner Author

amonakov commented Apr 3, 2014

Thanks. So the issue is not limited to Haswell.

I'll try to grab binary libraries from Ubuntu and see if I can reproduce the problem with those. In the meantime, more test reports are welcome (if the test works fine for you on Mesa-10.1, leave a report as well).

@nlooije
Copy link

nlooije commented Apr 3, 2014

I have the same result as @floe
00:02.0 VGA compatible controller: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller (rev 06) (prog-if 00 [VGA controller])
Subsystem: CLEVO/KAPOK Computer Device 0230
Mesa: 10.1.0-4ubuntu1
Xorg.0.log: http://pastebin.com/37XKEmJv
Bisecting: I have never done this but i am willing to try if necessary

@ArchangeGabriel
Copy link

I’ve got a strange error when entering the last of the three lines: http://pastebin.com/xt4G3E2N

@amonakov
Copy link
Owner Author

amonakov commented Apr 3, 2014

You have cat in your sbin (why?), that was not expected. Adjust the file locally, for example change /*bin to /bin.

@ArchangeGabriel
Copy link

Hum, that’s because in Arch, /bin and /sbin are symlink to /usr/bin. Going to fix that and report (with UXA and SNA).

@ArchangeGabriel
Copy link

Using UXA: everything work as expected, time return around 16.6s without vblank and 1s with. Output looks like this:
drawpixtest

X.org log: https://gist.github.com/ArchangeGabriel/9956653

SNA incoming.

@ArchangeGabriel
Copy link

SNA, same results.

X.org log: https://gist.github.com/ArchangeGabriel/9956890

Kernel command line might have implications? In my log you can see that I have:

i915.i915_enable_fbc=0
i915.i915_enable_rc6=1
i915.lvds_downclock=1
i915.semaphores=1
drm.vblankoffdelay=1

Also, I might try on my old laptop (but need to go home for this), which is currently running Ubuntu 14.04-devel+xorg-edgers, but on which I can setup anything you want and try to bisect mesa if you guide me a little. He is running a Arrandale chip, and was affected by segfault without PRIMUS_UPLOAD=1, but didn’t used it since, so don’t know what it the status of Primus on this machine.

@karolherbst
Copy link
Contributor

for reference: this is my i915 kernel stuff (it works for me)
i915.i915_enable_rc6=7
i915.lvds_downclock=1

intel HD 4600 (i7-4700MQ)
xorg-server: 1.15.0
xf86-video-intel: 2.99.911 (SNA)
mesa: master, 10.1
windows manager: kwin-4.11.8 with OpenGL 3.1 and "Full scene repaints"
http://bpaste.net/show/197260/

@akien-mga
Copy link

Tested here, I don't see anything in the 1024x512 window.
It takes about 2s with vblank_mode=0, 17s without.

My Intel GPU is: Intel 810 and later: Intel Corporation|3rd Gen Core processor Graphics Controller
Here is the output of Xorg.0.log: http://pastebin.com/X2maGLyH

@amonakov
Copy link
Owner Author

amonakov commented Apr 4, 2014

Could not reproduce it with Ubuntu 14.04 Mesa/dri/drm libraries.

Please mention what window manager and compositor (if any) you're running (also, whether disabling it has any effect). I've tested on xfwm4 with and without compton/glx.

Try changing the hardcoded number of frames to a small value, say 1000 to 3 and collecting a log with INTEL_DEBUG=all.

@karolherbst
Copy link
Contributor

it may be weird but I noticed, that everybody with the splash kernel argument gets garbage/black window and everybody without it does not. I don't think this is actually the cause, but it is a strange coincidence :p

@jmmL
Copy link

jmmL commented Apr 4, 2014

I'm sure I've missed something trivial, but this test doesn't run for me

drawpixtest.c: line 51: warning: here-document at line 1 delimited by end-of-file (wanted `EOF')
+ /bin/cat
+ gcc -O /tmp/r.c -lGL -lX11 -o /tmp/r
/tmp/r.c:3:20: fatal error: GL/glx.h: No such file or directory
 #include <GL/glx.h>
                    ^
compilation terminated.

@amonakov
Copy link
Owner Author

amonakov commented Apr 5, 2014

@jmmL: You need to install mesa-common-dev

@jmmL
Copy link

jmmL commented Apr 5, 2014

@amonakov Thanks. I now get another error when trying to run the last line

drawpixtest.c: line 51: warning: here-document at line 1 delimited by end-of-file (wanted `EOF')
+ /bin/cat
+ gcc -O /tmp/r.c -lGL -lX11 -o /tmp/r
/usr/bin/ld: cannot find -lGL
collect2: error: ld returned 1 exit status

Do I need to symlink /usr/lib/i386-linux-gnu/mesa/libGL.so.1 somewhere else?

@amonakov
Copy link
Owner Author

amonakov commented Apr 5, 2014

No, just additionally install libgl1-mesa-dev

@jmmL
Copy link

jmmL commented Apr 5, 2014

Okay, it compiles properly now.

The test does not work for me, producing just a blank black window.
Xorg.0.log: http://pastebin.com/4T79vean

$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)
01:00.0 VGA compatible controller: NVIDIA Corporation GK107M [GeForce GT 640M] (rev ff)
$ apt-cache policy mesa-common-dev
mesa-common-dev:
  Installed: 10.1.0-4ubuntu1
$ apt-cache policy nvidia-331
nvidia-331:
  Installed: 331.38-0ubuntu7

I don't think I'd have the knowledge or time to do any git-bisecting, but happy to help in other, simpler ways if possible.

@nlooije
Copy link

nlooije commented Apr 5, 2014

I dont know if it is at all usefull but forcing UXA on my system by specifying in /etc/X11/xorg.conf:

Section "Device"
Identifier "intel"
Driver "intel"
Option "AccelMethod" "uxa"
EndSection

results in glxgears and some (but not all) games showing output. Also the drawpixtest case now shows output.

@psi29a
Copy link

psi29a commented Apr 7, 2014

I tried it but got corrupted output. Here is my xorg log:
http://pastebin.com/raw.php?i=S8P3eJNK

Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09) (prog-if 00 [VGA controller])

MESA: 10.1.0-4ubuntu

@deveee
Copy link

deveee commented Apr 11, 2014

The same bug with:
Ubuntu 14.04 + gnome 3.10
Intel ivybridge hd4000
nvidia gt 635m

Yesterday I did upgrade, I didn't check any workarounds yet.

@psi29a
Copy link

psi29a commented Apr 11, 2014

I've tried both workarounds (not at the same time) and they both worked.

I'm currently telling the Intel driver to use "uxa" and now we are back to normal, but it would be nice to not have to do this. :)

@rohandhruva
Copy link

It would be great if someone could do a mesa git bisect and figure out what caused the problem: https://bugs.freedesktop.org/show_bug.cgi?id=75295

I am trying to do it myself but I get the feeling I am not proficient enough!

@rohandhruva
Copy link

@psi29a what is the second workaround? Only one I know is to force accelmethod to UXA.

@deveee
Copy link

deveee commented Apr 11, 2014

Second with PRIMUS_UPLOAD=1 environment variable.

@floe
Copy link

floe commented Apr 12, 2014

Just for the record, both workarounds work for me, too. Using UXA, the test is actually a bit faster at 1.3s with vblank_mode=0.

However, when I run glxspheres as an ad-hoc benchmark, optirun/VirtualGL is still significantly faster than primusrun/Primus (175 MPixels/s vs 63 MPixels/s). I assume the speed bonus from Primus will only happen on SNA?

@amonakov
Copy link
Owner Author

If you're affected by the problem, please grab an updated test:

cd /tmp
wget https://gist.githubusercontent.com/amonakov/8192522b71e3350857e4/raw/5e9e7dbd6817e9c236aafb703622368f136e9567/drawpixtest.c
bash -x drawpixtest.c

You can verify it still fails with /tmp/r 1000. If so, please do

export INTEL_DEBUG=all
/tmp/r 2>/tmp/intel-debug.txt
/tmp/r 3 2>>/tmp/intel-debug.txt

and gist/pastebin the resulting intel-debug.txt file.

Please also check if booting without splash makes a difference.

@amonakov
Copy link
Owner Author

@floe, no, SNA vs UXA should not matter for PBO glDrawPixels. Regarding low performance, did you export vblank_mode=0 for primus testing? 63 Mpix/s sounds close to what you'd get with vsync enabled.

@brknkfr
Copy link

brknkfr commented Apr 12, 2014

I'm affected by the problem. Here's the intel-debug.txt. Booting without splash doesn't make a difference. http://dpaste.com/hold/1777383/

@jmmL
Copy link

jmmL commented Apr 12, 2014

@amonakov Here's my intel-debug.txt (with up-to-date 14.04). I will try without splash later on. http://pastebin.com/DeZj4PSE

@amonakov
Copy link
Owner Author

Ah, sorry, a last-minute edit was a bit wrong. Please do:

export INTEL_DEBUG=all
/tmp/r &>/tmp/intel-debug.txt
/tmp/r 3 &>>/tmp/intel-debug.txt

(note the ampersand & rather than 2).

(haven't you noticed that some output was spewed on the console?)

@brknkfr
Copy link

brknkfr commented Apr 12, 2014

Ok, here is the complete log: http://dpaste.com/hold/1777421/

@jmmL
Copy link

jmmL commented Apr 12, 2014

Here's my complete log: http://pastebin.com/y502YcZP

edited to include the same test without the splash boot param. http://pastebin.com/TzBhF8UY

@amonakov
Copy link
Owner Author

My log looks the same; perhaps the issue is entirely on SNA side.

I've filed a bug against the Intel driver: https://bugs.freedesktop.org/show_bug.cgi?id=77368

Please follow that bug report (add yourself to CC list there).

@karolherbst
Copy link
Contributor

@amonakov Today I noticed that with 'primusrun glxgears' the window is black for maybe half a second, but not with 'optirun glxgears'. Do you think this might be related or is it something totally different?

@amonakov
Copy link
Owner Author

Unless you export non-zero PRIMUS_UPLOAD, primus spends 0.2 seconds at startup to determine if glDrawPixels is fast. It should be the same for primusrun and optirun.

I wonder if glxgears is blocked when it's black. Try:

primusrun gdb glxgears
b glXSwapBuffers
commands 1
bt
end
r

and see if it spews while the window is black. Note that swapbuffers from primus' upload method check will also be there, so try with non-zero PRIMUS_UPLOAD as well.

@karolherbst
Copy link
Contributor

I see. Also there are no messages on the console while in the breakpoint, the window is just black.

@amonakov
Copy link
Owner Author

Oh, my instructions were incomplete; gdb should resume the program after dumping backtrace on breakpoint:

primusrun gdb glxgears
b glXSwapBuffers
commands 1
bt
c
end
r

(note added c after bt).

@jmmL
Copy link

jmmL commented Apr 13, 2014

@amonakov I tried your instructions above. The glxgears window was black in both cases.
The output was:
without PRIMUS_UPLOAD=1 http://pastebin.com/XLSM29U8
with PRIMUS_UPLOAD=1 http://pastebin.com/eXY5WQpr

@karolherbst
Copy link
Contributor

@amonakov with your corrected commands there are only backtraces printed and the picture stops until I hit enter (---Type to continue, or q to quit---)

@amonakov
Copy link
Owner Author

OK, the problem was investigated. Black screen is due to glDrawPixels not working with MSAA, and MSAA is enabled where it shouldn't be due to a Xorg bug. Next Xorg releases should correct the issue.

@amonakov
Copy link
Owner Author

@karolherbst, sorry, I forgot about that; try

primusrun gdb glxgears
set height 0
b glXSwapBuffers
commands 1
c
end
r

@jmmL
Copy link

jmmL commented Apr 14, 2014

xserver-xorg-core (and associated packages) have just been pushed to trusty-proposed, which contains the "glx: Clear new FBConfig attributes to 0 by default" fix.

This issue is now resolved for me.

@floe
Copy link

floe commented Apr 14, 2014

Can confirm that this is now fixed on the current Ubuntu 14.04 beta. Thanks everyone!

@th0br0
Copy link

th0br0 commented Sep 30, 2014

Sadly, this is not limited to Intel Haswell.
I'm running into this bug on a Intel(R) Core(TM) i7-2720QM CPU @ 2.20GHz with a NVIDIA Quadro 1000M (ThinkPad W520). Also running mesa 10.1 (mesa-dri-drivers-10.1.5-1.20140607.fc20.x86_64)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests