Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

performance in bumblebee 3.0.1 #241

Closed
rockorequin opened this Issue · 33 comments

6 participants

@rockorequin

I did some benchmarking with glxspheres in Ubuntu 12.10 using "optirun -c [transport] glxspheres":

proxy: 120 fps
yuv: 160 fps
jpg: 175 fps

Unfortunately, wine doesn't work with yuv or jpg (I get a timeout error 'Could not connect to VGL client.').

However, I found that passing --vgl-options '-fps 60' to optirun make both proxy and yuv also perform at 175 fps.

So by adding --vgl-options '-fps 60' to optirun and using the default proxy transport I also get better performance under wine.

Can anyone else replicate these results?

@wonhyo
@ArchangeGabriel

For pure glxspheres results I get this (without -fps 60, with, cpu usage, image quality relatively to the others, it's ever "correct") :

  • proxy: 119, 135, medium, perfect
  • yuv: 148, 165, quite high, low
  • jpeg: 155, 159, very high, medium (should be high with compression set to 100 instead of 95=default, will try again)
  • rgb: 155, 175, high, perfect
  • xv: 84, 87, medium, low

By the way, this is the sort of thing I want to set in #139.

@rockorequin

Interesting that your rgb results are so much better than mine - I only get 82 (without -fps) and 87 (with -fps 60) fps with. And yes, it would be good to be able to set defaults as per #139.

Do you know why using transport mechanisms other than proxy works with glxpsheres but not with wine?

@ArchangeGabriel

For your last question, yes. proxy is direct forwarding through X11 protocol, while the others are using some other things for forwarding, and it is in this place that compression happens. And it may be what causes problems to wine and other apps. I can retrieve the details if you want, but everything about this is in VirtualGL doc.

That's why we defaulted to proxy, because if it doesn't have the best perfs, it has the best quality and "stability".

@Lekensteyn
Owner

Today I had the chance to test https://lists.launchpad.net/bumblebee/msg00155.html
I noticed a minimal performance improvement, not as gigantic as reported in the message. Maybe it is related that VGL_READBACK=pbo does not double my performance as it does by someone else.

Driver: nvidia blob 304.43 (extracted, created some symlinks)
Xorg: 1.12.4
Discrete: Nvidia GT 425M
Integrated: Intel HD Graphics (1st gen) on i5-460M
Using KDE 4.9 with desktop effects enabled.
VirtualGL: 2.3.1 (unpatched)

Transport plugin was just created by running make when extracting the plugin. Adjust the LD_LIBRARY_PATH variable below if you run it somewhere else.

X was started with:

sudo LD_LIBRARY_PATH=$PWD X :8 -config /etc/bumblebee/xorg.conf.nvidia -sharevts -nolisten tcp -noreset -verbose 3 -isolateDevice PCI:01:00:00 -modulepath $PWD,/usr/lib/xorg/modules

Virtual GL was ran like (proxy substituted, VGL_READBACK/VGL_TRANSPORT uncommented as needed):

export LD_LIBRARY_PATH=$PWD
# export VGL_TRANSPORT=optirun
# export VGL_READBACK=pbo
vglrun -c proxy -d :8 +v glxspheres

VGL_TRANSPORT=optirun VGL_READBACK=
73.622034 frames/sec - 76.319546 Mpixels/sec
74.568683 frames/sec - 77.300880 Mpixels/sec
74.923787 frames/sec - 77.668994 Mpixels/sec
74.986752 frames/sec - 77.734267 Mpixels/sec
75.054572 frames/sec - 77.804572 Mpixels/sec
74.966318 frames/sec - 77.713084 Mpixels/sec

VGL_TRANSPORT=optirun VGL_READBACK=pbo
74.864436 frames/sec - 77.607469 Mpixels/sec
74.058901 frames/sec - 76.772419 Mpixels/sec
74.133249 frames/sec - 76.849491 Mpixels/sec
74.187917 frames/sec - 76.906162 Mpixels/sec
74.566359 frames/sec - 77.298470 Mpixels/sec
74.020245 frames/sec - 76.732346 Mpixels/sec

(yuv/rgb/proxy/xv does not seem to matter)

Compare this to the stock VGL transport:
VGL_READBACK=
-c xv (with pbo it drops by about 1 fps)
56.221720 frames/sec - 58.281684 Mpixels/sec
59.427867 frames/sec - 61.605304 Mpixels/sec
60.811085 frames/sec - 63.039203 Mpixels/sec
60.157946 frames/sec - 62.362134 Mpixels/sec
60.746437 frames/sec - 62.972187 Mpixels/sec
-c rgb (with pbo minimal positive difference)
65.037506 frames/sec - 67.420480 Mpixels/sec
66.110678 frames/sec - 68.532974 Mpixels/sec
66.104293 frames/sec - 68.526355 Mpixels/sec
65.768096 frames/sec - 68.177839 Mpixels/sec
66.428777 frames/sec - 68.862728 Mpixels/sec
-c jpeg (with pbo performance is stabler and about 5 fps improvement)
68.350256 frames/sec - 70.854610 Mpixels/sec
67.546118 frames/sec - 70.021008 Mpixels/sec
71.201972 frames/sec - 73.810812 Mpixels/sec
68.135823 frames/sec - 70.632320 Mpixels/sec
-c proxy (with pbo about 2fps more)
61.066646 frames/sec - 63.304128 Mpixels/sec
65.169793 frames/sec - 67.557614 Mpixels/sec
63.926600 frames/sec - 66.268870 Mpixels/sec
64.120791 frames/sec - 66.470177 Mpixels/sec
-c yuv (pbo gives 3-4 more fps)
70.231243 frames/sec - 72.804516 Mpixels/sec
70.079844 frames/sec - 72.647569 Mpixels/sec
70.456350 frames/sec - 73.037871 Mpixels/sec
70.039259 frames/sec - 72.605497 Mpixels/sec

DISPLAY=:8 glxspheres was capped at 60fps, I could not quickly find an option to disable vsync.

For my setup there were two clear choices

  • VGL_TRANSPORT=optirun with compression VGL_READBACK=pbo. Compression does not seem to matter.
  • VGL_READBACK=pbo VGL_COMPRESS=yuv

Does the said transport plugin improve your performance?

@rockorequin

I tried the command 'env vblank_mode=0 glxspheres' suggested in the linked email to disable vsync on the Intel GPU, and I get about 145 fps, which is faster than nvidia through proxy on vgl. However, I noticed that if I have something else running using up CPU (eg a virtualbox VM running updates), nvidia through proxy on vgl goes up to around 150 fps. Odd.

I built the optirun plugin fine, but when I try running X with the given command line, it either bombs out saying it can't load glx or stops at the line where it says "loading glx". Any ideas?

@rockorequin

But if I just run "VGL_TRANSPORT=optirun optirun glxspheres" from the folder containing the libvgltrans_optirun.so file I get around 165-175 fps, which is a definite improvement over the default 120-130 fps. So for me, the optirun plugin is offering a good improvement (up to 40%) in speed.

I also tried running with -c yuv and -c jpg, and they run about the same speed as proxy when using VLG_TRANSPORT=optirun. Also limiting the fps via --vgl-options makes no difference.

@rockorequin

I wanted to test the plugin with wine. libvgltrans_optirun.so didn't work with wine, so I made a 32 bit version:


ldd libvgltrans_optirun32.so
linux-gate.so.1 => (0xf778b000)
libpthread.so.0 => /lib/i386-linux-gnu/libpthread.so.0 (0xf7747000)
libdl.so.2 => /lib/i386-linux-gnu/libdl.so.2 (0xf7742000)
libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf759c000)
/lib/ld-linux.so.2 (0xf778c000)

but this doesn't work either - it says libvgltrans_optirun32.so not found. I tried setting LD_LIBRARY_PATH correctly, and even putting the 32 bit library in /usr/lib32 and /usr/lib/i386-linux-gnu. Does anyone know why?

@rockorequin

Partially answering my own question: if I use VGL_TRANSPORT=optirun32, it doesn't work with wine, but if I add --vgl-options='-trans optirun32', it at least loads the plugin.

But then I get a new error: ERROR: in transport plugin-- RRTransInit: dlopen of libX11.so failed.

@rockorequin

And the full answer is that I have to use the 64 bit version of the plugin even though I am running wine, and put it in /usr/lib/x86_64-linux-gnu, and then it works with either VGL_TRANSPORT=optirun or via the -trans optirun option.

@rockorequin

Actually, scratch that. It isn't working now. I did have something running but must have not enabled the plugin correctly.

@Lekensteyn
Owner

If X complains about the location of libglx.so, add it to the -modulepath option (I had $PWD in place of it).
The option VGL_TRANSPORT=optirun causes VGL to search for "libvgltrans_optirun.so" in the library path. Is your Wine 64-bit or 32-bit? A 32-bit transport plugin needs a 32-bit libX11.so.

@rockorequin

The wine problem (it's 32 bit wine, hence why I compiled a 32 bit plugin) was that there was a libX11.so.6 file in /usr/lib/i386-linux-gnu, but no libX11.so. And the plugin only looks for libX11.so. (And the same issue existed for libXext.so.) I symlinked them and it now works.

Now when I try the X server command it tells me it can't run in framebuffer mode. But isn't the main thing whether I get better frame rates using the optirun plugin, which I showed was so by running "VGL_TRANSPORT=optirun optirun glxspheres"?

@rockorequin

Just some feedback on wine: CoDMW3 ran fine with the plugin, but Crysis2 fails when it first starts a level (ie when you resume a game and it finishes loading). There's not much info as to why, though:

[VGL] glDrawBuffer (mode=0x00000405 pbw->_dirty=0 pbw->_rdirty=0 pbw->getglxdrawable()=0x0080000d ) 0.000000 ms
[VGL] glXSwapBuffers (dpy=0x7d93f038(:0) drawable=0x04e00002 pbw->getglxdrawable()=0x0080000d ) 0.000000 ms
[VGL] glDrawBuffer (mode=0x00000000 pbw->_dirty=0 pbw->_rdirty=0 pbw->getglxdrawable()=0x0080000d ) 0.000000 ms
trace:d3d:context_apply_draw_buffers glDrawBuffer() call ok context.c / 1896
trace:d3d:context_check_fbo_status FBO complete
trace:d3d:context_apply_clear_state glEnable GL_SCISSOR_TEST call ok context.c / 2237
trace:d3d:device_clear_render_targets glClearStencil call ok device.c / 726
trace:d3d:device_clear_render_targets glClearDepth call ok device.c / 739
trace:d3d:device_clear_render_targets glScissor call ok device.c / 775
trace:d3d:device_clear_render_targets glClear call ok device.c / 777
trace:d3d:context_release Releasing context 0x1bfb08, level 1.
trace:d3d:wined3d_device_set_render_state device 0x1a9698, state WINED3D_RS_COLORWRITEENABLE (0xa8), value 0.
trace:d3d:wined3d_device_set_render_state device 0x1a9698, state WINED3D_RS_COLORWRITEENABLE1 (0xbe), value 0.
trace:d3d:wined3d_device_set_render_state device 0x1a9698, state WINED3D_RS_COLORWRITEENABLE2 (0xbf), value 0.
trace:d3d:wined3d_device_set_render_state device 0x1a9698, state WINED3D_RS_COLORWRITEENABLE3 (0xc0), value 0.
trace:d3d:wined3d_device_set_render_state device 0x1a9698, state WINED3D_RS_ZENABLE (0x7), value 0x1.
trace:d3d:wined3d_device_set_render_state device 0x1a9698, state WINED3D_RS_SLOPESCALEDEPTHBIAS (0xaf), value 0x3ecccccd.
trace:d3d:wined3d_device_set_render_state device 0x1a9698, state WINED3D_RS_ZWRITEENABLE (0xe), value 0x1.
trace:d3d:wined3d_device_set_pixel_shader device 0x1a9698, shader 0xd4d4e7a0.
trace:d3d:wined3d_device_set_render_state device 0x1a9698, state WINED3D_RS_CULLMODE (0x16), value 0x2.
[VGL] dlopen (filename=/usr/local/bin/../lib/wine/faultrep.dll.so flag=2 retval=0xdf3be6e0)
fixme:faultrep:ReportFault 0xc55cba0 0x0 stub
[VGL] XQueryExtension (dpy=0x7d453680(:0) name=XFIXES *major_opcode=138 *first_event=87 *first_error=140 ) 0.139952 ms

And turning off sync to vblank in nvidia-settings may have fixed that (Crysis2 now is running).

@Lekensteyn
Owner

Yes the main thing is changing optirun. The X command is normally done by bumblebeed, but I mentioned in case someone wants to test it quickly without bumblebeed (for completeness).

@rockorequin

Ah, ok, thanks. That X command actually will help me out with issue #237 (crashing nvidia driver) if I can get it working.

@rockorequin

There's an opengl benchmark that might be of interest at http://www.ozone3d.net/benchmarks/fur/ (you need wine and a 32 bit version of the optirun plugin). I don't see too much difference between the optirun plugin for VGL transport and the default transport, but optirun in both cases is around 4.5 times faster than the intel card.

@karolherbst

I have a little question: where can I get libvgltrans_optirun.so from? Or how can I build it myself?

@Lekensteyn
Owner

@karolherbst See the attachment in the linked mailing list message (https://lists.launchpad.net/bumblebee/msg00155.html)

@karolherbst

mhh thanks, but improvements not so high than expected (VGL_READBACK=pbo).

VGL_TRANSPORT= optirun -c yuv
glxspheres
179.403653 frames/sec - 200.214477 Mpixels/sec
178.046045 frames/sec - 198.699386 Mpixels/sec
sauerbraten - lowest: around 105 fps
sauerbraten - highest: around 58-62 fps
alanwake(32bit) - menu: 102-110
alanwake(32bit) - ingame: 5+
trine 2(32bit) - menu: 39fps
trine 2(32bit) - ingame 1. : 30-39 fps

VGL_TRANSPORT=optirun optirun -c yuv
glxspheres
Polygons in scene: 62464
Visual ID of window: 0x20
Context is Direct
OpenGL Renderer: GeForce GT 630M/PCIe/SSE2
181.338519 frames/sec - 202.373787 Mpixels/sec
181.046682 frames/sec - 202.048097 Mpixels/sec
sauerbraten - lowest: around 120 fps
sauerbraten - highets: around 56-60 fps
alanwake(32bit) - menu: 38-52
alanwake(32bit) - ingame: 4-5
trine 2(32bit) - menu: 35fps
trine 2(32bit) - ingame 1. : 25-32 fps

PS: some compressions tests:
VGL_TRANSPORT= optirun -c proxy
alanwake(32bit) - menu: 38-52
alanwake(32bit) - ingame: 5-6

so no performance speed ups with VGL_TRANSPORT=optirun.
If you compare -c proxy with VGL_TRANSPORT=optirun you will get big improvements, but -c yuv is also faster than -c proxy

same benchmark from maillinglist:
1.: env vblank_mode=0 glxspheres
2.: env optirun -c proxy glxspheres
3.: env VGL_TRANSPORT=optirun optirun -c proxy glxspheres
addition tests:
4.: env optirun -c yuv glxspheres

  1. 191 fps
  2. 148 fps
  3. 181 fps
  4. 177fps

Also while running some games with optirun, the performance is much better than with the integrated one. It seems only that bumblebee looses performance on high fps values (glxgears-integrated: 6000fps, optirun: 1300fps)

@amonakov
Collaborator

Now there's also an alternative faster transport mechanism called primus. Arch users can use AUR packages, linked
from this thread. To use, invoke primusrun instead of optirun.

@rockorequin

I tried primus on Ubuntu 12.10 and got good results with glxspheres, although it complained that glXUseXFont isn't implemented, so it perhaps wasn't an entirely fair comparison. Note that I needed to export vblank_mode=0 to get results above 60 fps.

First, note that optirun performance under the latest Ubuntu 12.10 (using unity) is significantly worse than before - previously I was getting 120-130 fps with "optirun glxspheres", and now it struggles to get 100 fps (which is similar to using the intel driver). Using "optirun --vgl-options '-fps 60' glxspheres" I currently get results of around 125 fps, compared to 175 fps in earlier versions of Ubuntu 12.10.

Using primus in this (slower) environment, glxspheres managed an impressive 200-225 fps, ie more than double the rate of optirun with default settings.

It took me a while to figure out how to modify the vars in primusrun to get it working (eg I found that $LIB wasn't being interpreted correctly by the dynamic linker), so for the record here are the settings I used (note that I compiled primus' libGL.so.1 into lib/x86_64-linux-gnu):

export PRIMUS_libGLa=${PRIMUS_libGLa:-'/usr/lib/nvidia-current/libGL.so.1'}
export PRIMUS_libGLd=${PRIMUS_libGLd:-'/usr/lib/x86_64-linux-gnu/mesa/libGL.so.1'}
export PRIMUS_libGL=/home/src/primus/lib/x86_64-linux-gnu
export PRIMUS_LOAD_GLOBAL=${PRIMUS_LOAD_GLOBAL:-'/usr/lib/x86_64-linux-gnu/libglapi.so.0'}
export LD_LIBRARY_PATH=${PRIMUS_libGL}${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}

So it looks like primus is using the X server set up by bumblebee to do this, is that correct? And what is needed to make it a 'full' solution, eg to implement glXUseFont and advertise GLX extensions back to the client?

Update: Once I figured out that I needed g++-multilibs installed to build the 32 bit version of primus, I was also able to try CoDMW3 under wine, and primus plays it nicely. Its profiling reported 26-40 fps (typically around 28 fps) on the 'hit and run' level at 1600x900 resolution, but I don't have any fps output from optirun to compare it to.

@rockorequin

I tried using primus with CoDMW1, and CoD's self-reported frame rate was only around 15 fps in the 'War Pig' level, compared to optirun which gets around 30 fps.

@amonakov
Collaborator
@amonakov
Collaborator
@rockorequin

I got the latest primus from git, rebuilt it, and tried prepended the PRIMUS_SYNC commands when running CoDMW but it didn't make any difference. (I'm running the app 'full-screen' with 'unredirect full screen windows' turned on in compiz, perhaps that is why).

Primus profiling isn't saying anything at all. It does for glxspheres, but not for the wine app.

Also, I notice occasional video freezing (typically for 1-5 seconds) while the game runs in the background.

@amonakov
Collaborator
@rockorequin

Just to correct my earlier mistake: I exported the wrong vars for the 32 bit wine environment with CoDMW, so it wasn't using primus at all. That explains the lack of profiling output as well as the slow framerates.

Using primus correctly, I get around 40 fps compared to 30 fps with optimus.

@rockorequin

Well, I've used primus for a couple of weeks now and it is great. I would certainly recommend it to be at least an option in bumblebee (perhaps the default option?) until nvidia gets prime working, which doesn't look close at this stage given the difficulty they are having with dma-buf licence issues in the kernel.

@Lekensteyn
Owner

We're currently discussing this. This week we'll probably have a discussion on #bumblebee-dev if time permits.

@ArchangeGabriel

We will be releasing Bumblebee 3.1 with primus support today, since primus make a lot of progress lately.

@rockorequin

Super! I've got bumblebee 3.1.2 installed from the ubuntu PPA. Does it use primus automatically or do I need to configure it?

@amonakov
Collaborator

See the release notes. You can use optirun -b primus, or edit the config file (Bridge variable), or use primusrun like before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.