Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace VGL with a Compositing Window Manager. #37

Closed
Thulinma opened this issue Jan 16, 2012 · 22 comments

Comments

Projects
None yet
6 participants
@Thulinma
Copy link
Member

commented Jan 16, 2012

so, this is my idea for a possible v4.0 or something:

I'm sure you guys have heard and/or know about compositing window managers.
Any (one at a time, max) application can register themselves with an X server as such a display manager.
Usually this stuff is used for fancy 3D effects and all that crap we don't really care about, think compiz, for example.

The cool part is that in this "mode", all windows get their own framebuffer, off-screen. We can obviously still do this on the nvidia card even though there is no real desktop active.

I want to connect to :8 as such a window manager, and instead of drawing all windows to :8, draw them to :0.

To compare:

  • VirtualGL works by capturing the window first, encoding (to for example JPEG), transferring over a socket, decoding, then finally drawing to the screen. Input goes similarly, but in the reverse direction. That means the screen image data is copied at least SIX times - twice in socket buffers, twice for encoding, twice for decoding. More if they didn't bother doing this efficiently, and since it is meant to be used over a network their biggest bottleneck is probably network speed, not drawing speed.
  • windump works by capturing the window, transferring over a socket, then drawing to the screen. No encoding/decoding here, so it's obviously quite a bit faster.
  • My proposed method would not do any capturing (after all - the framebuffer is readily available) and directly copy the window framebuffer to the other screen. One copy operation. In theory that's at least twice as fast as anything else out there, in practice it will most likely be somewhere around 10-100 times as fast. Proper framerates might even be a possibility here! ;-)

Note: This all sounds great, but it is an amount of work not to be underestimated. We should plan and spread out the tasks properly if we want this to work well.

Thoughts?

@Lekensteyn

This comment has been minimized.

Copy link
Member

commented Jan 16, 2012

Perhaps we can ask a developer from compiz/KWin or other compositing WMs to help us with this? Such projects should have the documentation available. At the moment I don't know how framebuffers and such work, some docs would be useful here. If we go in this direction, we probably need more developers after the tasks list is ready.

@ArchangeGabriel

This comment has been minimized.

Copy link
Member

commented Jan 16, 2012

Indeed, that the future big change we need to provide in Bumblebee. We ever refered that as alternative backend.

Maybe an intermediate state would be to clean VGL code to only keep what we need really, as we're not having the standard use of that project.

@starks

This comment has been minimized.

Copy link
Member

commented Jan 18, 2012

Would VDPAU or VAAPI be possible with such a setup?

@Thulinma

This comment has been minimized.

Copy link
Member Author

commented Jan 18, 2012

Yeah, that's the big main advantage - in theory everything would run EXACTLY like it would on "native" nvidia hardware, with the only exception being that there will most likely be a (tiny) delay and/or (some) dropped frames between the two X servers. We will have to test to know for sure, of course.

@Samsagax

This comment has been minimized.

Copy link
Member

commented Jan 19, 2012

Sounds great on paper, I wonder how we can implement this? Do we have any knowledge on WM stuff?

@smspillaz

This comment has been minimized.

Copy link

commented Jan 19, 2012

Compositing window managers don't know about framebuffers on different gpus. They only know about what the X Server tells them are the available pixmaps on the screen, and there is only one screen per gpu.

It sounds like what you want is prime : http://airlied.livejournal.com/71734.html

Prime creates a "slave" screen which is effectively not managed by the gpu, but provides an entry point for gpu2 to do its rendering and provides the backing pixmap to the server, which the compositing window manager can render on screen using gpu 1 using tfp due to changes in the dri protocol.

@Thulinma

This comment has been minimized.

Copy link
Member Author

commented Jan 20, 2012

@smspillaz Yes, that's pretty much what we're doing here, too. Bumblebee also works by creating a "slave" screen, except we would be doing the transfer from Nvidia -> Intel in userspace instead of in the drivers themselves - also, gpu2 does have control over the screen in our case.
The idea here is to connect to both X servers at the same time, and copy data from one to the other by using a compositing window manager to read out the framebuffers.

@starks

This comment has been minimized.

Copy link
Member

commented Jan 23, 2012

Just linking my previous research for reference: Bumblebee-Project/Bumblebee-old#38

As far as I'm concerned, Xpra was a good idea but dead in the water. It simply can't render fast enough.

@starks

This comment has been minimized.

Copy link
Member

commented Feb 17, 2012

Now that we have capable C coders, perhaps a closer examination of windump is in order.
https://github.com/harp1n/hybrid-windump

From a cursory glance based on my C knowledge, there's really not much that needs to be done aside from implementing a sane windowing scheme that isn't based wholly on window hex ID and using a compositor like xcompmgr by default (necessary for VDPAU).

Just strip out the key assumptions of windump:

  • Parallel Intel/Nvidia X servers instead of a nested Nvidia one. Windump could be used in place vglrun.
  • Giving the Nvidia screen its own cursor
  • Dumping the root window unless explicitly told otherwise
  • Dumping a child window, but not its children
  • Dumped window cannot exceed framebuffer resolution

As things stand, Windump works with unmodified Bumblebee xorg.conf files. Only addition I'd recommend is a virtual screen that maches the largest resolution supported by the Intel EDID.

@starks

This comment has been minimized.

Copy link
Member

commented Feb 20, 2012

Btw, here are some benchmarks.

Kernel 3.3:

[eric@kingfisher ~]$ optirun glxspheres
Polygons in scene: 62464
Visual ID of window: 0x21
Context is Direct
OpenGL Renderer: Gallium 0.4 on NVC3
21.640438 frames/sec - 24.150729 Mpixels/sec
20.795395 frames/sec - 23.207661 Mpixels/sec
21.295474 frames/sec - 23.765749 Mpixels/sec
21.557028 frames/sec - 24.057644 Mpixels/sec

[eric@kingfisher ~]$ DISPLAY=:8 glxspheres
Polygons in scene: 62464
Visual ID of window: 0x281
Context is Direct
OpenGL Renderer: Gallium 0.4 on NVC3
136.879122 frames/sec - 152.757100 Mpixels/sec
134.316217 frames/sec - 149.896898 Mpixels/sec
139.144567 frames/sec - 155.285337 Mpixels/sec
145.671430 frames/sec - 162.569315 Mpixels/sec

Kernel 3.2:

[eric@kingfisher ~]$ optirun glxspheres
Polygons in scene: 62464
Visual ID of window: 0x21
[VGL] WARNING: The OpenGL rendering context obtained on X display
[VGL] :8 is indirect, which may cause performance to suffer.
[VGL] If :8 is a local X display, then the framebuffer device
[VGL] permissions may be set incorrectly.
Context is Indirect
OpenGL Renderer: Gallium 0.4 on NVC3
17.082702 frames/sec - 19.064295 Mpixels/sec
16.375704 frames/sec - 18.275285 Mpixels/sec
16.448742 frames/sec - 18.356796 Mpixels/sec
16.182621 frames/sec - 18.059805 Mpixels/sec

[eric@kingfisher ~]$ DISPLAY=:8 glxspheres
Polygons in scene: 62464
Visual ID of window: 0x281
Context is Direct
OpenGL Renderer: Gallium 0.4 on NVC3
146.582285 frames/sec - 163.585830 Mpixels/sec
142.126530 frames/sec - 158.613207 Mpixels/sec
154.573314 frames/sec - 172.503819 Mpixels/sec
145.024831 frames/sec - 161.847711 Mpixels/sec

[eric@kingfisher ~]$ optirun glxspheres
Polygons in scene: 62464
Visual ID of window: 0x21
Context is Direct
OpenGL Renderer: GeForce GT 555M/PCIe/SSE2
39.047472 frames/sec - 43.576979 Mpixels/sec
38.053408 frames/sec - 42.483171 Mpixels/sec
36.833305 frames/sec - 41.139118 Mpixels/sec
38.421349 frames/sec - 42.912805 Mpixels/sec

[eric@kingfisher ~]$ DISPLAY=:8 glxspheres
Polygons in scene: 62464
Visual ID of window: 0x27
Context is Indirect
OpenGL Renderer: GeForce GT 555M/PCIe/SSE2
1112.614035 frames/sec - 1241.677263 Mpixels/sec
1102.819365 frames/sec - 1230.746412 Mpixels/sec
1104.817853 frames/sec - 1232.976724 Mpixels/sec
1105.103203 frames/sec - 1233.295175 Mpixels/sec

@Lekensteyn

This comment has been minimized.

Copy link
Member

commented Feb 20, 2012

I've seen those differences before, with nouveau I got a difference 4x and with nvidia 10x iirc

@ArchangeGabriel

This comment has been minimized.

Copy link
Member

commented Jul 18, 2012

May Weston, the Wayland Compositing Window Manager, be from any help here ?

@Thulinma

This comment has been minimized.

Copy link
Member Author

commented Jul 20, 2012

That looks like a great starting point, @ArchangeGabriel - but now that proper support in the kernel is so close, do we even need to finish this work anymore?

@ArchangeGabriel

This comment has been minimized.

Copy link
Member

commented Jul 21, 2012

I'm not sure it is so close, I don't expect everything to be user-ready before around 6-8 months. And also, that's only for nouveau, not nvidia, which may take longer. And with no PM, but that's an other point we could fix differently.

If I have the knowledge, I would try to do it myself, however if I don't lack time anymore currently, I'm still having no any knowledge in C or else. So there is not a lot I can do here, but I think that indeed, we should have a discussion all together on the future of Bumblebee, including this Compositor feature, and also the future with Prime landed. I may be available quite any time next week on IRC for that.

@Thulinma

This comment has been minimized.

Copy link
Member Author

commented Jul 21, 2012

Sure. I'm on IRC almost all the time, so nearly any time is good for me, too. I'm in the central US timezone for the next month or so (unlike the usual CET).

@ArchangeGabriel

This comment has been minimized.

Copy link
Member

commented Jul 21, 2012

Ok, will try to catch you on Monday or Tuesday, @Lekensteyn seems to be quite busy and without internet access, and no news from @Samsagax, but we can still discuss together already to have some ideas and points done.

@Samsagax

This comment has been minimized.

Copy link
Member

commented Jul 21, 2012

I can be there for the discussion. But please fix a day, I can be in IRC on Monday.

@ArchangeGabriel

This comment has been minimized.

Copy link
Member

commented Jul 21, 2012

Ok for Monday then. I will probably be connected all the day long, so that you can fix the hour you prefer (but I still live in France, so don't choose 19 PDT...).

@starks

This comment has been minimized.

Copy link
Member

commented Nov 27, 2012

So I did a little experimenting with upstream Xpra and Bumblebee/Primus today.

Somewhat happy with the results.

Frame rate is now perfect, but there are 2 major problems remain.

CPU overhead is still quite high and tearing is still present.

@Thulinma

This comment has been minimized.

Copy link
Member Author

commented Nov 27, 2012

You sure? With primus everything works fine for me - even in fullscreen there's no visible tearing or any graphical glitches of any kind... Are you sure hardware acceleration is enabled on "both sides"?

@ArchangeGabriel

This comment has been minimized.

Copy link
Member

commented Mar 31, 2013

I'm closing this issue, considering primus is makes the job.

@Thulinma

This comment has been minimized.

Copy link
Member Author

commented Mar 31, 2013

Yeah, agreed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.