New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

containers: automatically use user namespaces on supporting systems #28425

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
None yet
10 participants
@evujumenuk
Copy link
Contributor

evujumenuk commented Aug 20, 2017

On kernels that support user namespaces, -U enables their use. This is the default if systemd-nspawn@.service is used (which it isn't (yet) in containers.nix).

Basically, a 16-bit subset (that is not 0-65535) of the 32-bit UID and GID spaces is chosen and used as the hosts' UID and GID segments the container's UID and GID ranges map to. The container's directory will be chowned recursively to ensure that this works.

Motivation for this change
Things done
  • Tested using sandboxing
    (nix.useSandbox on NixOS,
    or option build-use-sandbox in nix.conf
    on non-NixOS)
  • Built on platform(s)
    • NixOS
    • macOS
    • Linux
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nox --run "nox-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Fits CONTRIBUTING.md.

containers: automatically use user namespaces on supporting systems
On kernels that support user namespaces, `-U` enables their use. This is the default if `systemd-nspawn@.service` is used (which it isn't (yet) in `containers.nix`).

Basically, a 16-bit subset (that is not 0-65535) of the 32-bit UID and GID spaces is chosen and used as the hosts' UID and GID segments the container's UID and GID ranges map to. The container's directory will be `chown`ed recursively to ensure that this works.
@mention-bot

This comment has been minimized.

Copy link

mention-bot commented Aug 20, 2017

@evujumenuk, thanks for your PR! By analyzing the history of the files in this pull request, we identified @wavewave, @edolstra and @kampfschlaefer to be potential reviewers.

@fpletz fpletz added this to the 17.09 milestone Aug 22, 2017

@fpletz

This comment has been minimized.

Copy link
Member

fpletz commented Aug 22, 2017

Sounds good! Have you tested this?

@evujumenuk

This comment has been minimized.

Copy link
Contributor

evujumenuk commented Aug 22, 2017

Nope, not at all :) Right now, I don't have (access to) a system that is set up correctly to test this.

I also don't know whether separate UIDs and/or separate GIDs for processes and/or files break anyone's workflow. Comments welcome! Maybe this isn't such a great idea after all.

Also, we'd get this change for free if containers.nix were rewritten to make use of the systemd-nspawn template.

@fpletz

This comment has been minimized.

Copy link
Member

fpletz commented Aug 22, 2017

Thanks for the info. I'll give this a spin soon!

@@ -123,7 +123,7 @@ let
EXIT_ON_REBOOT=1 \
exec ${config.systemd.package}/bin/systemd-nspawn \
--keep-unit \
-M "$INSTANCE" -D "$root" $extraFlags \
-M "$INSTANCE" -D "$root" -U $extraFlags \

This comment has been minimized.

@Mic92

Mic92 Aug 26, 2017

Contributor

I also attempted to do this, but I have at the moment several problems with systemd related services running in nspawn in unprivileged mode. A different problem is, that you cannot setup setuid wrappers in a user namespaces.

This comment has been minimized.

@xeji

xeji Feb 25, 2018

Contributor

User namespaces are unlikely to work for all applications. For instance, bind-mounting a host data directory into a user-namespaced container may lead to ownership and permission issues. I prefer to use user namespaces on a case-by-case basis. #35541 proposes an extraFlags option to pass additional flags like -U to systemd-nspawn.

This comment has been minimized.

@Mic92

Mic92 Feb 26, 2018

Contributor

systemd-nspawn does not even start for me at the moment. An upstream patch is required.

This comment has been minimized.

@Mic92

This comment has been minimized.

@Mic92

Mic92 Feb 26, 2018

Contributor

Bind mouting would require shiftfs to be upstreamed in the linux kernel.

This comment has been minimized.

@xeji

xeji Feb 26, 2018

Contributor

Guess we'll have to wait then...

@fpletz fpletz modified the milestones: 18.03, 17.09 Aug 30, 2017

@matthewbauer matthewbauer modified the milestones: 18.03, 18.09 Oct 1, 2018

@samueldr samueldr modified the milestones: 18.09, 19.03 Oct 6, 2018

@nixos-discourse

This comment has been minimized.

Copy link

nixos-discourse commented Jan 6, 2019

This pull request has been mentioned on Nix community. There might be relevant details there:

https://discourse.nixos.org/t/nixos-container-limitations/1835/4

@asbachb

This comment has been minimized.

Copy link

asbachb commented Jan 7, 2019

systemd-nspawn does not even start for me at the moment. An upstream patch is required.

@Mic92 do you have a test case for that? I added the option extraFlags = [ "-U" ]; which seems to work.

@Mic92

This comment has been minimized.

Copy link
Contributor

Mic92 commented Jan 8, 2019

@asbachb Could have been fixed in the meantime. A different problem for enabling that by default are our setuid/setcap wrappers. In usernamespaces those are not allowed, which can break tools like sudo or services like postfix.

@asbachb

This comment has been minimized.

Copy link

asbachb commented Jan 8, 2019

@Mic92 I see. Basically I wanted to know if there are known problems with user namespaces enabled.

I wonder if the feature is stable enough or if it's useful to assign it a dedicated configuration value like containers.<name>.enableUserNamespaces.

@Mic92

This comment has been minimized.

Copy link
Contributor

Mic92 commented Jan 8, 2019

@asbachb I'd say an option would be better to allow users to manage that them-self.
Also given that lxcfs is still a things makes me wonder if systemd still needs it for some cgroup actions or if everything can be covered by cgroup namespaces these days.

@danbst

This comment has been minimized.

Copy link
Contributor

danbst commented Jan 14, 2019

This is an old PR, and #35541 added a generic workaround (extraFlags option).

@evujumenuk if you still want to push this change, please add it as containers.<name>.security.userNamespaces = false by default; with a nice description, so newcomers do understand whether they want to enable this feature or not. We can't change defaults without extensive testing (only in critical situations. Is it critical?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment