New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Wayland instead of X11 to increase performance #3366

Open
artemist opened this Issue Dec 5, 2017 · 9 comments

Comments

Projects
None yet
6 participants
@artemist

artemist commented Dec 5, 2017

Although this is not a security issue due to the guid security model, there are several advantages to using Wayland instead of X11:

Advantages

Higher performance

If allocations are on page boundaries, then we can use xc_map_foreign_rage (or the equivalent in the HAL) to map framebuffer pages directly from the client in the VM to the compositor in the guivm

Lower memory usage

Since framebuffers are mapped instead of copied, the proxy wayland compositor should use less memory than xorg (On a VM which currently has 800M of RAM and two windows, Xorg is using 1/6th of the physical memory)

Easier GPU acceleration support

AFAIR, a lot of OpenGL operations are preformed within the X server through the X OpenGL extensions. Simply forwarding these commands to the guivm would be dangerous, so we would need to process within the Xorg server then send the displaylist sometime before the end of processing and rendering. With Wayland graphics processing happens within the context of the application, and only a framebuffer is shared to the compositor. This means that we can simply attach GVT-g or comparable hardware graphics virtualuization to VMs without complex modifications to guid.

Multiple dpi support

Wayland allows one to attach multiple displays with different densities, which is important for people with HiDPI laptops who want to use external displays. We can simply forward events for screen update to the client, although we have to deal with anonymity for anon-whonix, where position of multiple displays could be very revealing.

Method

Wayland has two communication methods; Commands over a Unix socket, and shared memory buffers through a file descriptor with mmap. Commands, including shared memory setup and keyboard input, should be proxied through a client in the guivm and a stub compositor in the appvm. However, wl_shm::create_pool and wl_shm events should be intercepted so that the stub compositor and guivm wayland client both create file descriptors in their VMs, and the guivm maps a foreign range (or asks dom0 to do so, I'm not sure quite how that would work) to link together the contents of those two memory ranges.

Doing this

I am starting work on forwarding Wayland between VMs. I would be interested in working on this for Google Summer of Code if the Qubes project decides to join.

@jpouellet

This comment has been minimized.

Show comment
Hide comment
@jpouellet

jpouellet Dec 7, 2017

Contributor

Not to rain on the wayland parade, but I'm not convinced the potential benefit over the current system is as large as you portray.

If allocations are on page boundaries, then we can use xc_map_foreign_rage (or the equivalent in the HAL) to map framebuffer pages directly from the client in the VM to the compositor in the guivm

The current gui protocol/implementation already does have guests blit directly to a shared-memory framebuffer not requiring any copying between VMs. What exactly would Wayland improve about this?

This means that we can simply attach GVT-g or comparable hardware graphics virtualuization to VMs without complex modifications to guid.

I believe this is highly unlikely to happen. The security risk is just too high IMO.

All rendering in the guests happen in software, and IMO that's very unlikely to change unless GPUs get proper memory protection so e.g. shaders can be mutually isolated in different address spaces, enforced in hardware.

  • The GVT-g approach of "just try to arbitrate everything in software" strongly reminds one of Xen paravirtualization, which we've moved away from in R4 because it's proven too hard to get right and became a liability.
  • Other approaches which somehow result in at least some kind of indirect hw acceleration like Virgil 3d (translate/emulate shader IL) is a graphics-analog of QEMU (in full instruction emulation mode no less!), which Qubes has explicitly architected around not trusting.

IMO it's way too complex to be even worth considering from a security standpoint.

Even just yesterday's OS X security advisory had 3 new CVEs for their intel graphics driver interface, allowing sandbox escapes & privilege escalation. I haven't seen any technical write-ups yet, but I'm willing to bet there are still plenty more holes in that interface.

I would be interested in working on this for Google Summer of Code if the Qubes project decides to join.

And I am interested in being a GSoC mentor for Qubes again. I'm definitely in no position to make any promises about this project, but I look forward to seeing a proposal and your patches in general :)

Contributor

jpouellet commented Dec 7, 2017

Not to rain on the wayland parade, but I'm not convinced the potential benefit over the current system is as large as you portray.

If allocations are on page boundaries, then we can use xc_map_foreign_rage (or the equivalent in the HAL) to map framebuffer pages directly from the client in the VM to the compositor in the guivm

The current gui protocol/implementation already does have guests blit directly to a shared-memory framebuffer not requiring any copying between VMs. What exactly would Wayland improve about this?

This means that we can simply attach GVT-g or comparable hardware graphics virtualuization to VMs without complex modifications to guid.

I believe this is highly unlikely to happen. The security risk is just too high IMO.

All rendering in the guests happen in software, and IMO that's very unlikely to change unless GPUs get proper memory protection so e.g. shaders can be mutually isolated in different address spaces, enforced in hardware.

  • The GVT-g approach of "just try to arbitrate everything in software" strongly reminds one of Xen paravirtualization, which we've moved away from in R4 because it's proven too hard to get right and became a liability.
  • Other approaches which somehow result in at least some kind of indirect hw acceleration like Virgil 3d (translate/emulate shader IL) is a graphics-analog of QEMU (in full instruction emulation mode no less!), which Qubes has explicitly architected around not trusting.

IMO it's way too complex to be even worth considering from a security standpoint.

Even just yesterday's OS X security advisory had 3 new CVEs for their intel graphics driver interface, allowing sandbox escapes & privilege escalation. I haven't seen any technical write-ups yet, but I'm willing to bet there are still plenty more holes in that interface.

I would be interested in working on this for Google Summer of Code if the Qubes project decides to join.

And I am interested in being a GSoC mentor for Qubes again. I'm definitely in no position to make any promises about this project, but I look forward to seeing a proposal and your patches in general :)

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Jan 12, 2018

Member

As @jpouellet said, benefits may not be that large. But this could be still useful thing to do. Xorg and X11 protocol in general is quite complex and from time to time we hit some strange interactions between different toolkits and our GUI. Wayland could make things easier here. So, 👍 from me, including GSoC 2018 (we will apply this year too).

Member

marmarek commented Jan 12, 2018

As @jpouellet said, benefits may not be that large. But this could be still useful thing to do. Xorg and X11 protocol in general is quite complex and from time to time we hit some strange interactions between different toolkits and our GUI. Wayland could make things easier here. So, 👍 from me, including GSoC 2018 (we will apply this year too).

@artemist

This comment has been minimized.

Show comment
Hide comment
@artemist

artemist Jan 12, 2018

Thanks! Even with the problems @jpouellet mentioned, I think that there still could be be some advantages.

A few thoughts I wanted to write down so I don't forget:

The main reason I wanted to start this in the first place was multiple DPI support, and that could be useful, although we have to deal with privacy concerns.

I think we could still reduce RAM usage by sharing the same memory for the framebuffer in the client in the AppVM, the stub compositor in the AppVM, the stub client in the GuiVM, and the real compositor in the GuiVM. It may also be possible to do this in X11 with proper proxying of MIT-SHM, but I can't find any code doing it, and doing so may increase complexity significantly. (I may also just be misunderstanding X Display Lists though). Shared memory does open us up to easy cache attacks, but I can't think of any one can do based off of a framebuffer, especially since one does not generally draw directly onto it because of double buffering, IIRC. Nevertheless, I will have to look into how much the GuiVM is trusted, and if cache attacks originating from it would be a concern.

We can remove GVT-g from the picture: I thought it used newer isolation features since my laptop didn't support it, but I guess not. Further research does show it is basically PV. However, It still may make graphics acceleration with GPU passthrough easier, as there is no need to mess with X11 graphics extensions, only OpenGL/CL libraries. It looks like NVIDIA and AMD also have some interesting (SR-IOV for AMD) isolation features for fancier GPUs, although those seem really really expensive and only easily available on certain servers.

Thanks! Even with the problems @jpouellet mentioned, I think that there still could be be some advantages.

A few thoughts I wanted to write down so I don't forget:

The main reason I wanted to start this in the first place was multiple DPI support, and that could be useful, although we have to deal with privacy concerns.

I think we could still reduce RAM usage by sharing the same memory for the framebuffer in the client in the AppVM, the stub compositor in the AppVM, the stub client in the GuiVM, and the real compositor in the GuiVM. It may also be possible to do this in X11 with proper proxying of MIT-SHM, but I can't find any code doing it, and doing so may increase complexity significantly. (I may also just be misunderstanding X Display Lists though). Shared memory does open us up to easy cache attacks, but I can't think of any one can do based off of a framebuffer, especially since one does not generally draw directly onto it because of double buffering, IIRC. Nevertheless, I will have to look into how much the GuiVM is trusted, and if cache attacks originating from it would be a concern.

We can remove GVT-g from the picture: I thought it used newer isolation features since my laptop didn't support it, but I guess not. Further research does show it is basically PV. However, It still may make graphics acceleration with GPU passthrough easier, as there is no need to mess with X11 graphics extensions, only OpenGL/CL libraries. It looks like NVIDIA and AMD also have some interesting (SR-IOV for AMD) isolation features for fancier GPUs, although those seem really really expensive and only easily available on certain servers.

@jpouellet

This comment has been minimized.

Show comment
Hide comment
@jpouellet

jpouellet Jan 12, 2018

Contributor

It may also be possible to do this in X11 with proper proxying of MIT-SHM

It is my understanding that that is already how things are done. I refer you to https://www.qubes-os.org/doc/gui/#window-content-updates-implementation

but I can't find any code doing it

Some pointers:

Nevertheless, I will have to look into how much the GuiVM is trusted

IIUC it is ultimately trusted by necessity

Contributor

jpouellet commented Jan 12, 2018

It may also be possible to do this in X11 with proper proxying of MIT-SHM

It is my understanding that that is already how things are done. I refer you to https://www.qubes-os.org/doc/gui/#window-content-updates-implementation

but I can't find any code doing it

Some pointers:

Nevertheless, I will have to look into how much the GuiVM is trusted

IIUC it is ultimately trusted by necessity

@jpouellet

This comment has been minimized.

Show comment
Hide comment
@jpouellet

jpouellet Jan 12, 2018

Contributor

Nevertheless, I will have to look into how much the GuiVM is trusted

IIUC it is ultimately trusted by necessity

That is to say, the GuiVM is obviously necessarily in the TCB of any VM which it controls input to / sees output from. Currently we only have one GuiVM (dom0) which must already be ultimately trusted and already has full access to everything anyway. However, down the road it is desirable to move the window manager out of dom0 and remove its ability to control dom0 (and in certain use cases perhaps also remove its ability to control some other VMs managed by an external admin).

Contributor

jpouellet commented Jan 12, 2018

Nevertheless, I will have to look into how much the GuiVM is trusted

IIUC it is ultimately trusted by necessity

That is to say, the GuiVM is obviously necessarily in the TCB of any VM which it controls input to / sees output from. Currently we only have one GuiVM (dom0) which must already be ultimately trusted and already has full access to everything anyway. However, down the road it is desirable to move the window manager out of dom0 and remove its ability to control dom0 (and in certain use cases perhaps also remove its ability to control some other VMs managed by an external admin).

@blacklight447

This comment has been minimized.

Show comment
Hide comment
@blacklight447

blacklight447 Jan 12, 2018

Wouldn't using wayland increase the security of xscreensaver too?

Wouldn't using wayland increase the security of xscreensaver too?

@artemist

This comment has been minimized.

Show comment
Hide comment
@artemist

artemist Jan 12, 2018

@blacklight447 Yes, screen lockers are harder to crash in Wayland.

However, that reminds me of another problem: Screen lockers, like the rest of the compositor, are all part of the same window manager process. This means that we may have to make significant changes to each desktop environment. At minimum, it would just be to have coloured decorations. I think KDE, GNOME, and Sway (i3 clone) support server-side decorations, so it shouldn't be too bad.

@blacklight447 Yes, screen lockers are harder to crash in Wayland.

However, that reminds me of another problem: Screen lockers, like the rest of the compositor, are all part of the same window manager process. This means that we may have to make significant changes to each desktop environment. At minimum, it would just be to have coloured decorations. I think KDE, GNOME, and Sway (i3 clone) support server-side decorations, so it shouldn't be too bad.

@marmarek

This comment has been minimized.

Show comment
Hide comment
@marmarek

marmarek Jan 12, 2018

Member

I think KDE, GNOME, and Sway (i3 clone) support server-side decorations, so it shouldn't be too bad.

I hope it is true. But at least for GNOME, there is big push to client-side decorations, so I'm not so sure about it.

That is to say, the GuiVM is obviously necessarily in the TCB of any VM which it controls input to / sees output from.

Clarification: theoretically GuiVM may not have full control over input. It may be reduced to only controlling input focus. But in the first version it probably will have full control.

Member

marmarek commented Jan 12, 2018

I think KDE, GNOME, and Sway (i3 clone) support server-side decorations, so it shouldn't be too bad.

I hope it is true. But at least for GNOME, there is big push to client-side decorations, so I'm not so sure about it.

That is to say, the GuiVM is obviously necessarily in the TCB of any VM which it controls input to / sees output from.

Clarification: theoretically GuiVM may not have full control over input. It may be reduced to only controlling input focus. But in the first version it probably will have full control.

@DemiMarie

This comment has been minimized.

Show comment
Hide comment
@DemiMarie

DemiMarie Apr 8, 2018

As far as graphics acceleration, modern GPUs do have an MMU that can enforce page protection. The problem is arbitrating access to it between VMs. I can think of a few solutions:

  1. Do not expose the MMU to VMs — attempts to modify the MMU from a VM are trapped and ignored.

  2. Trap-and-emulate (shadow page tables). Too complex? Seems to me to be similar to virtualizing a CPU without SLAT.

  3. Paravirtualization. We only need to handle rendering commands (nothing else makes sense for a VM to do). My understanding is that that is just buffer management — everything else is handled in hardware.

    This seems simple — not more complicated than Xen’s own management of CPU memory, or a kernel’s management of mmap’d buffers. Linux has had many vulnerabilities, but none in the mmap code, if I understand correctly.

  4. On twin-GPU systems, where one GPU is not connected to any display, we can give that GPU to a VM entirely, relying on the IOMMU to prevent access to GPU-internal registers and firmware. This presumes that those are not in the GPU’s address space.

    While obviously suboptimal, this approach works fantastically in one (very important, IMO) use case: gaming.

Of these, 3 and 4 seem the most promising to me. The API for 3 sounds (deceptively?) small:

// A handle to a GPU buffer
typedef int gpu_buffer_t;

// Get a buffer, or -1 on error
int gpu_mmap(uint64_t size);

// The mapping mode
enum gpu_mode_t {
    RO, RW, WO,
};
// Map the buffer, returning its GPU address in *addr
int gpu_map(gpu_mode_t mode, int handle, uint64_t *addr);

// Unmap the buffer
int gpu_unmap(int handle);

// Destroy the buffer
int gpu_free(int handle);

Of course, these are just ideas, and I could be completely and utterly wrong.

DemiMarie commented Apr 8, 2018

As far as graphics acceleration, modern GPUs do have an MMU that can enforce page protection. The problem is arbitrating access to it between VMs. I can think of a few solutions:

  1. Do not expose the MMU to VMs — attempts to modify the MMU from a VM are trapped and ignored.

  2. Trap-and-emulate (shadow page tables). Too complex? Seems to me to be similar to virtualizing a CPU without SLAT.

  3. Paravirtualization. We only need to handle rendering commands (nothing else makes sense for a VM to do). My understanding is that that is just buffer management — everything else is handled in hardware.

    This seems simple — not more complicated than Xen’s own management of CPU memory, or a kernel’s management of mmap’d buffers. Linux has had many vulnerabilities, but none in the mmap code, if I understand correctly.

  4. On twin-GPU systems, where one GPU is not connected to any display, we can give that GPU to a VM entirely, relying on the IOMMU to prevent access to GPU-internal registers and firmware. This presumes that those are not in the GPU’s address space.

    While obviously suboptimal, this approach works fantastically in one (very important, IMO) use case: gaming.

Of these, 3 and 4 seem the most promising to me. The API for 3 sounds (deceptively?) small:

// A handle to a GPU buffer
typedef int gpu_buffer_t;

// Get a buffer, or -1 on error
int gpu_mmap(uint64_t size);

// The mapping mode
enum gpu_mode_t {
    RO, RW, WO,
};
// Map the buffer, returning its GPU address in *addr
int gpu_map(gpu_mode_t mode, int handle, uint64_t *addr);

// Unmap the buffer
int gpu_unmap(int handle);

// Destroy the buffer
int gpu_free(int handle);

Of course, these are just ideas, and I could be completely and utterly wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment