Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buildFHSEnv: binaries with capabilities cannot cannot make use of it #217119

Open
Scrumplex opened this issue Feb 19, 2023 · 15 comments
Open

buildFHSEnv: binaries with capabilities cannot cannot make use of it #217119

Scrumplex opened this issue Feb 19, 2023 · 15 comments

Comments

@Scrumplex
Copy link
Member

Describe the bug

SteamVR has the capability to use asynchronous reprojection to increase the perceived frame rate in VR applications.
To achieve this, it needs to request a VkDevice with a high priority queue.
AMDGPU requires applications to have the CAP_SYS_NICE capability [1], which is usually requested when starting SteamVR for the first time.
Making sure that ~/.steam/steam/steamapps/common/SteamVR/bin/linux64/vrcompositor-launcher has the necessary capability:

$ getcap ~/.steam/steam/steamapps/common/SteamVR/bin/linux64/vrcompositor-launcher
/home/scrumplex/.steam/steam/steamapps/common/SteamVR/bin/linux64/vrcompositor-launcher cap_sys_nice=eip

But SteamVR still fails to acquire a high-priority queue and disables asynchronous reprojection.

Steps To Reproduce

Steps to reproduce the behavior:

  1. Install SteamVR
  2. Make sure vrcompositor-launcher has CAP_SYS_NICE: setcap getcap ~/.steam/steam/steamapps/common/SteamVR/bin/linux64/vrcompositor-launcher ~/.steam/steam/steamapps/common/SteamVR/bin/linux64/vrcompositor-launcher
  3. Start SteamVR

Expected behavior

SteamVR is able to acquire a high priority queue and continues to use async reprojection.

Logs

vrcompositor.log:

Fri Feb 03 2023 11:18:22.618439 - Attempting to enable async support...
Fri Feb 03 2023 11:18:22.618446 - Enabling async support!
Fri Feb 03 2023 11:18:22.618708 - Insufficient permission to create high priority queue.
Fri Feb 03 2023 11:18:22.618720 - Failed to create VkDevice with high priority queue.
Fri Feb 03 2023 11:18:22.618729 - Disabling async support and retrying.

Additional context

My hardware supports this feature, as I have been using SteamVR with async reprojection on Arch Linux before.

Notify maintainers

@mkg20001 @jagajaga

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"

 - system: `"x86_64-linux"`
 - host os: `Linux 6.1.12-zen1, NixOS, 23.05 (Stoat), 23.05.20230218.5f4e07d`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.13.2`
 - channels(scrumplex): `""`
 - channels(root): `"nixos"`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`
@Scrumplex
Copy link
Member Author

Perhaps related? #42117

Scrumplex added a commit to Scrumplex/flake that referenced this issue Mar 19, 2023
This patch allows ANY application to request high priority queues,
including SteamVR running in a bwrap container.

See NixOS/nixpkgs#217119

Signed-off-by: Sefa Eyeoglu <contact@scrumplex.net>
@Scrumplex
Copy link
Member Author

Scrumplex commented Mar 19, 2023

I have patched the relevant kernel code to allow ANY application to acquire high priority queues and added it to my NixOS configuration here: https://codeberg.org/Scrumplex/flake/commit/d6dc803d5cbb4a4dfd388489873bf446f0f56e34

Feel free to use this workaround, until we find a way to allow CAP_SYS_NICE in Steam's bwrap container.

Edit: I have switched my NixOS config to use boot.kernelPatches instead. See https://codeberg.org/Scrumplex/flake/commit/3ec4940bb61812d3f9b4341646e8042f83ae1350

Scrumplex added a commit to Scrumplex/community-patches that referenced this issue Mar 21, 2023
Tk-Glitch pushed a commit to Frogging-Family/community-patches that referenced this issue Mar 21, 2023
@Scrumplex Scrumplex mentioned this issue Jul 23, 2023
12 tasks
@soupglasses
Copy link
Member

soupglasses commented Nov 28, 2023

Is this still a problem in NixOS 23.11?

@Scrumplex
Copy link
Member Author

Is this still a problem in NixOS 23.11?

Yes. This is a general issue with any binary that wants to use its capabilities but is sandboxed with bwrap

@Atemu Atemu changed the title SteamVR fails to acquire high priority queue for asynchronous reprojection buildFHSEnv: binaries with capabilities cannot cannot make use of it Nov 28, 2023
@mkg20001
Copy link
Member

https://jvns.ca/blog/2022/06/28/some-notes-on-bubblewrap/

$ bwrap --ro-bind / / --unshare-all --uid 0 --cap-add cap_net_bind_service nc -l 80 (no output, success!!!)

it seems possible to add capabilities with bwrap, we could add a caps parameter to buildFHSUserEnv

@Scrumplex
Copy link
Member Author

Scrumplex commented Nov 29, 2023

it seems possible to add capabilities with bwrap, we could add a caps parameter to buildFHSUserEnv

While that is true, these are ambient capabilities that'll apply to all processes inside the FHS environment. In the case of Steam, this would mean that all games would have CAP_SYS_NICE, not just SteamVR.

I also remember that Steam does not like ambient capabilities at all. But that might have been bwrap itself. Not sure

Edit: I am also pretty sure that --cap-add itself will require root privileges. Having to run wrappers for apps like Steam with root permissions, is obviously not ideal

@corngood
Copy link
Contributor

corngood commented Feb 10, 2024

I may have made some progress on this. I built steam-run with

  steamPackages = recurseIntoAttrs (callPackage ../games/steam { buildFHSEnv = buildFHSEnvChroot; });

And then tried to start .local/share/Steam/steamapps/common/SteamVR/bin/vrstartup.sh with it.

I removed the STEAM_RUNTIME error return from vrstartup.sh, so it would run, but vrcompositor-launcher would fail to load libcap.so.2 whenever it had CAP_SYS_NICE set (presumably due to LD_LIBRARY_PATH being ignored).

So I did a patchelf --set-rpath /lib64 ~/.steam/steam/steamapps/common/SteamVR/bin/linux64/vrcompositor-launcher and now it seems to work without complaining about caps.

However, I still can't get the compositor to do async. I'm getting "Async support disabled by user setting"

Edit: I had to set "enableLinuxVulkanAsync" : true in vrsettings, and now I get:

Sat Feb 10 2024 02:13:51.352798 [Info] - Enabling async support!
Sat Feb 10 2024 02:13:51.353222 [Error] - Insufficient permission to create high priority queue.
Sat Feb 10 2024 02:13:51.353240 [Error] - Failed to create VkDevice with high priority queue.
Sat Feb 10 2024 02:13:51.353258 [Error] - Disabling async support and retrying.

So, possibly no better than with bwrap.

pscap shows

1     151086 [me]       vrcompositor *      sys_nice @ +

@Atemu
Copy link
Member

Atemu commented Mar 17, 2024

Good news first: I've made some progress on this.

To get access to certain (or all) caps, you can simply put the --cap-add $CAP bwrap arg into steam's extraBwrapArgs.

For easily testing whether caps work, we can use capsh:

cp `which capsh` .
sudo setcap ./capsh
steam-run ./capsh --print

If the cap is there, it'll print:

Current: cap_wake_alarm=i cap_sys_nice+ep

When I uncomment the SLR check in vrstartup.sh, I can get SteamVR to start in steam-run without the error message:

vrcompositor-launcher.sh[662777]: exec /Volumes/Games/SteamVR/bin/linux64/vrcompositor-launcher
Using vrcompositor capability proxy
Launching /Volumes/Games/SteamVR/bin/linux64/vrcompositor

Hooray!

Bad news: This breaks steam.

On startup, you get the generic "your system does not support userns" popup and this error message in the log:

bwrap: Unexpected capabilities but not setuid, old file caps config?

Soooo how exactly are we supposed to add caps to the env if pressure-vessel errors out when it gets any access to caps?

Pinging @smcv because you might know how this is intended to work.

@Atemu
Copy link
Member

Atemu commented Mar 18, 2024

I celebrated to soon. In the vrcompositor.txt log it says:

Mon Mar 18 2024 10:43:17.834390 [Info] - Enabling async support!
Mon Mar 18 2024 10:43:17.834729 [Error] - Insufficient permission to create high priority queue.
Mon Mar 18 2024 10:43:17.834742 [Error] - Failed to create VkDevice with high priority queue.
Mon Mar 18 2024 10:43:17.834752 [Error] - Disabling async support and retrying.

So it appears while it has and recognises the cap inside bwrap, it doesn't actually have it from the kernel's perspective. Ugh.

@smcv
Copy link

smcv commented Mar 18, 2024

So it appears while it has and recognises the cap inside bwrap, it doesn't actually have it from the kernel's perspective.

Yes. Capabilities are namespaced according to a user namespace: see user_namespaces(7). High-priority queues in AMDGPU require CAP_SYS_NICE in the initial user namespace (the one where your init system ran).

bubblewrap can never give you capabilities in the initial user namespace, because each process can only ever have capabilities in the innermost user namespace that is applicable to it. The practical result is that nothing in NixOS' FHS environment will ever be able to have elevated capabilities in the initial user namespace. This is a kernel-imposed limitation, so there is nothing that user-space can do to solve it.

SteamVR developers have attempted to avoid this kernel limitation by making AMDGPU use a more user-namespace-friendly check for whether to allow high-priority queues, but unfortunately there were concerns about this opening up new denial-of-service attacks, because of how the direct rendering manager interacts with memory management.

Soooo how exactly are we supposed to add caps to the env if pressure-vessel errors out when it gets any access to caps?

You can't. This error message is essentially bubblewrap saying: it looks as though I've been installed incorrectly, and I can't tell whether continuing would be a root security vulnerability, so I'm going to stop here.

(Because bubblewrap has historically been installed setuid root, or occasionally setcap CAP_SYS_ADMIN which is root-equivalent, it has to be extra-paranoid about whether it is about to cause a security vulnerability.)

@Atemu
Copy link
Member

Atemu commented Mar 18, 2024

bubblewrap can never give you capabilities in the initial user namespace, because each process can only ever have capabilities in the innermost user namespace that is applicable to it. The practical result is that nothing in NixOS' FHS environment will ever be able to have elevated capabilities in the initial user namespace. This is a kernel-imposed limitation, so there is nothing that user-space can do to solve it.

We do have access to the outside world though, so isn't there anything we could do there?

We already use elevated privileges in the "root" namespace to give the vrcompositor binary caps; is it not possible to pass this privilege through to the userns somehow?
We could trivially run a daemon that has cap_sys_nice in the root namespace too for instance.

SteamVR developers have attempted to avoid this kernel limitation by making AMDGPU use a more user-namespace-friendly check for whether to allow high-priority queues, but unfortunately there were concerns about this opening up new denial-of-service attacks, because of how the direct rendering manager interacts with memory management.

Thanks for the link!

I can understand the worry of DOS but SteamVR being able to DOS my system is not part of my threat model, so I don't see why the user shouldn't be able to declare that to be the case via a limit or some other privileged mechanism.

It's sad to need a kernel patch for an issue like this :/

This error message is essentially bubblewrap saying: it looks as though I've been installed incorrectly, and I can't tell whether continuing would be a root security vulnerability, so I'm going to stop here.

(Because bubblewrap has historically been installed setuid root, or occasionally setcap CAP_SYS_ADMIN which is root-equivalent, it has to be extra-paranoid about whether it is about to cause a security vulnerability.)

Would it not be possible to have a build variant with a --in-know-what-im-doing-this-is-not-a-vuln build-time flag that disables this check? We don't ever install bwrap with suid or cap_sys_admin and I don't think SteamRT/pressure-vessel does either.
(In fact: We couldn't if we wanted to; a user would have to explicitly configure it in their system to add a wrapper and at that point, it's their own responsibility.)

@smcv
Copy link

smcv commented Mar 18, 2024

We do have access to the outside world though, so isn't there anything we could do there?

At the moment, you'll see that SteamVR uses an IPC call via steam-runtime-launch-client --alongside-steam to "escape" from the Steam Linux Runtime 3.0 (sniper) container and run vrcompositor alongside Steam. On a more normal Linux distribution, this means it ends up running in the initial user namespace, where setcap can be effective.

It would in principle be possible to patch that so that it somehow(?) detects that it's in a nested user namespace (or detects that it's on NixOS, or something), and if yes, uses steam-runtime-launch-client --host instead. As currently implemented, --host requires the flatpak-session-helper from Flatpak, although in principle it could be made to talk to a non-Flatpak-specific remote command execution service with a similar API if someone wants to provide one.

If someone successfully prototypes this by hacking the SteamVR scripts, we could ask the SteamVR developers about making that official. I do not have access to VR hardware or a NixOS system, so someone else will have to lead that.

is it not possible to pass this privilege through to the userns somehow?

No. The design of how capabilities(7) interact with user_namespaces(7) is that each process can only have caps in at most one userns: whichever one is the most deeply-nested. You cannot simultaneously be in a deeply-nested userns, and have caps in a "larger" userns. This is a kernel design decision which we do not get to change from user-space.

This is why /proc/$$/status only needs a single field each for CapEff and so on. If you think about it, if it was possible to have more complicated capabilities, then /proc/$$/status would need one capabilities set for each level of nested userns that exists.

I have already had this discussion at exhaustive length with the SteamVR developers, and if there was a simple solution, we would be using it already.

I still think that the long-term answer to this has to be some version of "don't use capabilities(7)", because capabilities(7) are just not a good match for anything that needs to be able to run unprivileged.

Would it not be possible to have a build variant with a --in-know-what-im-doing-this-is-not-a-vuln build-time flag that disables this check? We don't ever install bwrap with suid or cap_sys_admin and I don't think SteamRT/pressure-vessel does either.

Even if you patched out that check, processes inside the bwrap sandbox will never have CAP_SYS_NICE in the initial user namespace, because they exist in a nested user namespace (that's how bwrap can do its job); so ambient capabilities from a higher-level user namespace do not apply.

Also, because of bubblewrap's history as being optionally-setuid and therefore being trusted by sysadmins as being safe-to-be-setuid, I would not be comfortable with providing that, even as an opt-in. I have too many responsibilities already, without opening myself up to being held responsible for new root privilege escalation CVEs. If you think I'm wrong about that, you will have to ask my bubblewrap co-maintainers to overrule me and take responsibility for any CVEs that result from it. Unfortunately my bubblewrap co-maintainers seem to have mostly disappeared (they also have too many responsibilities!) so if you go that route, you are likely to be waiting a while.

@Scrumplex
Copy link
Member Author

From what I can tell, it's not actually namespaces that prevent capabilities from working. I am currently working on a bare-bones bubblewrap replacement for use in Nixpkgs FHSEnv wrappers and while reading bubblewrap's source code I have noticed that it mounts its sandboxed root with the MS_NOSUID option, and it sets no_new_privs.

I first thought it might be preferable to just strip out all the security-related code from bubblewrap, but after looking at the complexity, I have opted to write my own wrapper. You can find the current draft here: https://codeberg.org/Scrumplex/ancientwrap

It can already setup a simple sandbox and mount things inside. I am currently working on implementing the options used by buildFHSEnvBubblewrap as well as providing a simple way to test it using Steam. I haven't tested SteamVR yet.

@smcv
Copy link

smcv commented Mar 18, 2024

it mounts its sandboxed root with the MS_NOSUID option

This defangs setuid binaries, but even if it didn't, they wouldn't work in a user namespace, because the kernel will only allow unprivileged users to create a user namespace with one uid (your own), and all other users including root get mapped to the overflow uid (which appears inside the container as nobody, but should be read as "not me" in this case). So setuid-root would become effectively setuid-nobody.

and it sets no_new_privs

I'm surprised if it works without this. Last time I looked, this was a kernel requirement, without which the kernel would not allow unprivileged users to create a user namespace. (But perhaps newer kernels relax that restriction?)

@smcv
Copy link

smcv commented Mar 18, 2024

From what I can tell, it's not actually namespaces that prevent capabilities from working

I would recommend reading capabilities(7) and user_namespaces(7) before spending a lot of time on implementing something that could turn out to be a dead end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants