Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/systemd: make services with DynamicUser depend on nscd #105354

Closed
wants to merge 1 commit into from

Conversation

@symphorien
Copy link
Member

@symphorien symphorien commented Nov 29, 2020

otherwise their username does not resolve to their uid
Fixes #105353

Things done

Tested with the sslh nixos test

  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.
otherwise their username does not resolve to their uid
Fixes NixOS#105353
@aanderse aanderse requested review from flokli, Mic92, andir and arianvp Nov 29, 2020
@flokli
Copy link
Contributor

@flokli flokli commented Nov 29, 2020

sslh is a special beast here - we have some iptables commands that run as a ExecStartPre script, so we require name lookups immediately at startup.

I think this qualifies it for adding a After=nss-user-lookup.target as per https://www.freedesktop.org/software/systemd/man/systemd.special.html#nss-user-lookup.target:

All services for which the availability of the full user/group database is essential should be ordered after this target, but not pull it in.

systemd.services.nscd.wantedBy contains a nss-user-lookup.target, so this should be sufficient.

I don't think adding a dependency to nscd for all services started with DynamicUser=true is correct - while it's true nss_systemd is a non-glibc nss module, and we provide non-glibc nss modules via nscd, there's nothing saying a service started will try to lookup its username at startup time.

On the other hand, services not making use of DynamicUser=yes can still depend on a fully functional user/group database at startup - imagine configuring nginx.conf to run as a uid which is provided via LDAP.

Let's add After=nss-user-lookup.target to sslh.service.

@Mic92 Mic92 changed the title nixos/systemd: make serivces with DynamicUser depend on nscd nixos/systemd: make services with DynamicUser depend on nscd Nov 29, 2020
@Mic92
Copy link
Member

@Mic92 Mic92 commented Nov 29, 2020

Ah. I think I saw this behavior in other places too and I was always wondering why.

@Mic92
Copy link
Member

@Mic92 Mic92 commented Nov 29, 2020

sslh is a special beast here - we have some iptables commands that run as a ExecStartPre script, so we require name lookups immediately at startup.

I think this qualifies it for adding a After=nss-user-lookup.target as per freedesktop.org/software/systemd/man/systemd.special.html#nss-user-lookup.target:

All services for which the availability of the full user/group database is essential should be ordered after this target, but not pull it in.

systemd.services.nscd.wantedBy contains a nss-user-lookup.target, so this should be sufficient.

I don't think adding a dependency to nscd for all services started with DynamicUser=true is correct - while it's true nss_systemd is a non-glibc nss module, and we provide non-glibc nss modules via nscd, there's nothing saying a service started will try to lookup its username at startup time.

On the other hand, services not making use of DynamicUser=yes can still depend on a fully functional user/group database at startup - imagine configuring nginx.conf to run as a uid which is provided via LDAP.

Let's add After=nss-user-lookup.target to sslh.service.

This might be true but it is also a bit of a foot gun. This is not what you would expect looking at the code.

@flokli
Copy link
Contributor

@flokli flokli commented Nov 29, 2020

Hm? Not sure I understand what you mean is the footgun here :-)

The fact we need to pipe all non-glibc nss calls via nscd should have been somewhat better documented as part of #86940, #87016 and #86010. I agree it's annoying, but nscd is the somewhat stable interface. Feel free to carry that discussion over here: #55276

If this service requires being able to resolve dynamic users on startup, it needs to add After=nss-user-lookup.target (maybe add a comment why).
Of course, it's ugly, but probably needed, as we don't know (and thus can't pass in statically) the user id in that iptables invocation (so need the lookup).

I'm against pulling in nscd for all other service that have DynamicUser enabled. It's really two different concepts here.

Services might perfectly fine work without being able to resolve their own username, and DynamicUser itself also happily is able to allocate temporary user ids without nss_systemd being available in the system at all - it's just resolving these user ids won't work.

@Mic92
Copy link
Member

@Mic92 Mic92 commented Nov 30, 2020

Hm? Not sure I understand what you mean is the footgun here :-)

I mean it is not easy to see if a service needs to resolve its own username at startup or not. It might be in binary itself and not just in startup code. And since its a race condition during upgrade when restarting both nscd and the service it also won't be a problem when testing a single service.

@arianvp
Copy link
Member

@arianvp arianvp commented Nov 30, 2020

Note that also from the docs of nss-user-lookup.target

system users and groups are required to be resolvable during earliest boot already, and hence do not need any special ordering against this target.

I don't think this is the case at the moment so this is really an us bug. I suggest starting nscd during early boot (WantedBy=sysinit.target , Default dependencies=no) . DynamicUsers are system users and per systemds contract they should be resolvable as soon as sysinit.target is reached which is currently not the case.

This basically has the same effect as what this PR is doing but a bit more elegantly. I'm thus in favor of this PR- with some tweaks.

nss-lookup.target really is meant for things like ldap and friends. And specifically only for non-system users.

When you have a named DynamicUser in a Systemd service it should immediately be usable .that's the contract Systemd provides and that we break.

@Mic92
Copy link
Member

@Mic92 Mic92 commented Nov 30, 2020

Note that also from the docs of nss-user-lookup.target

system users and groups are required to be resolvable during earliest boot already, and hence do not need any special ordering against this target.

I don't think this is the case at the moment so this is really an us bug. I suggest starting nscd during early boot (WantedBy=sysinit.target , Default dependencies=no) . DynamicUsers are system users and per systemds contract they should be resolvable as soon as sysinit.target is reached which is currently not the case.

I think this problem also appears when nscd is restarted. So we would need not to restart it.

This basically has the same effect as what this PR is doing but a bit more elegantly. I'm thus in favor of this PR- with some tweaks.

nss-lookup.target really is meant for things like ldap and friends. And specifically only for non-system users.

When you have a named DynamicUser in a Systemd service it should immediately be usable .that's the contract Systemd provides and that we break.

@arianvp
Copy link
Member

@arianvp arianvp commented Nov 30, 2020

I think this problem also appears when nscd is restarted. So we would need not to restart it.

We could mark it "never restart" just like dbus. Though that's not very nice.

Another option would be to make nscd socket-activatable. Systemd would then buffer the requests during restarts. But I don't think that code exists upstream yet and quite frankly the nscd upstream code is quite messy and I'm very afraid to touch it.

@flokli
Copy link
Contributor

@flokli flokli commented Nov 30, 2020

Let's move nscd earlier into boot then. We still need to restart it whenever our enabled nss modules are updated, because the LD_LIBRARY_PATH it's run with is restarted, at last as long as we keep providing them like that.

Not picking up changes there is probably worse than an eventual blip while restarting nscd.

@arianvp
Copy link
Member

@arianvp arianvp commented Nov 30, 2020

@Mic92 if nscd is ordered before sysinit.target and thus before all other late-boot services and all services in the activation script are restarted in the same transaction; then it would always be restarted before any services that might blip because of it during startup no? Running services might still get a blip but i don't know how to avoid that.

@Mic92
Copy link
Member

@Mic92 Mic92 commented Nov 30, 2020

@Mic92 if nscd is ordered before sysinit.target and thus before all other late-boot services and all services in the activation script are restarted in the same transaction; then it would always be restarted before any services that might blip because of it during startup no? Running services might still get a blip but i don't know how to avoid that.

I see. That might work.

@symphorien
Copy link
Member Author

@symphorien symphorien commented Dec 1, 2020

I did some experiments with this approach

diff --git a/nixos/modules/services/system/nscd.nix b/nixos/modules/services/system/nscd.nix
index d720f254b81..5703fd04421 100644
--- a/nixos/modules/services/system/nscd.nix
+++ b/nixos/modules/services/system/nscd.nix
@@ -79,6 +79,16 @@ in
                 "${nscd}/sbin/nscd --invalidate hosts"
               ];
           };
+
+        # Services with DynamicUser=true require nscd to be fully started to
+        # resolve their own username. We add a dependency of sysinit.target to nscd to ensure
+        # these units are started after nscd is fully started.
+        unitConfig.DefaultDependencies = false;
+        wantedBy = [ "sysinit.target" "nss-lookup.target" "nss-user-lookup.target" ];
+        before = [ "sysinit.target" "shutdown.target" ];
+        conflicts = [ "shutdown.target" ];
+        wants = [ "local-fs.target" ];
+        after = [ "local-fs.target" ];
       };
 
   };

It looks like it does not fully solve the problem because once sysinit.target is started, restarting sslh and nscd together will happen in any order. sslh only depends on sysinit.target but it is already up. So I'm afraid we need a direct dependency from services to nscd without mediation through a target unit.

@arianvp
Copy link
Member

@arianvp arianvp commented Dec 1, 2020

I think the problem is that systemctl restart doesnt actually add units to the same transaction and thus does not take indirect ordering into account so yeh you seem to be right. This is a bit unfortunate. This part about systemd always confuses me.

@arianvp
Copy link
Member

@arianvp arianvp commented Dec 1, 2020

The way I expected systemctl restart to behave seems to be exactly described in : systemd/systemd#2607

For now; perhaps we should make sure in our activation script that nscd is always restarted before any other services; instead of adding this direct dependency everywhere? It's also not a very attractive solution. I wish systemd would just be able to do the right thing for us here.

I don't know which of the two options is best. I'm ok with adding the direct dependency to all services but I'd prefer if our activation script would understand the entire dependency graph but that's probably too much to ask for now. inplace system activation is just a bit buggy and racey in its current shape =(

@symphorien
Copy link
Member Author

@symphorien symphorien commented Dec 2, 2020

For now; perhaps we should make sure in our activation script that nscd is always restarted before any other services; instead of adding this direct dependency everywhere? It's also not a very attractive solution. I wish systemd would just be able to do the right thing for us here.

I very much prefer making systemd aware of the specificities of our setup by adding explicit dependencies on nscd to working around systemd by hacking the activation script.

@Mic92
Copy link
Member

@Mic92 Mic92 commented Dec 7, 2020

What is actually the security advantage of DynamicUser over users created on demand with no fixed uid? We could actually avoid a lot of hassle if we would not use this feature at all.

@arianvp
Copy link
Member

@arianvp arianvp commented Dec 7, 2020

Not managing UIDs does not work. UID ownership is sticky. Say you deploy postgres and then in the next deploy remove postgres and then deploy nginx. Suddenly your nginx instance has read/write access to all of postgres's data as it recycled the UID.

The alternative is managing a list of global UIDs and make sure you never reuse any of them but that's also not nice (but is how we coped in NixOS before this feature)

DynamicUser is a neat way of both not having to think of uids but still protect again UID reuse attacks. When it's enabled the unit can only write to files in StateDirectory= and systemd makes sure to manage this directory in such a way that UID reuse attacks can not occur.

From the manpage:

           If DynamicUser= is used in conjunction with StateDirectory=, the
           logic for CacheDirectory= and LogsDirectory= is slightly altered:
           the directories are created below /var/lib/private,
           /var/cache/private and /var/log/private, respectively, which are
           host directories made inaccessible to unprivileged users, which
           ensures that access to these directories cannot be gained through
           dynamic user ID recycling. Symbolic links are created to hide this
           difference in behaviour. Both from perspective of the host and from
           inside the unit, the relevant directories hence always appear
           directly below /var/lib, /var/cache and /var/log.

@arianvp
Copy link
Member

@arianvp arianvp commented Dec 7, 2020

I think this PR is good to merge. But I still think it's a good idea to move nscd to before sysinit.target as system users should be resolveable after sysinit.target is reached and doing so solves race conditions during bootup. (e.g. #106127)

We could add that patch in a separate PR or add it to this PR as well. I'm ok with either.

@Mic92
Copy link
Member

@Mic92 Mic92 commented Dec 7, 2020

Not managing UIDs does not work. UID ownership is sticky. Say you deploy postgres and then in the next deploy remove postgres and then deploy nginx. Suddenly your nginx instance has read/write access to all of postgres's data as it recycled the UID.

The alternative is managing a list of global UIDs and make sure you never reuse any of them but that's also not nice (but is how we coped in NixOS before this feature)

DynamicUser is a neat way of both not having to think of uids but still protect again UID reuse attacks. When it's enabled the unit can only write to files in StateDirectory= and systemd makes sure to manage this directory in such a way that UID reuse attacks can not occur.

From the manpage:

           If DynamicUser= is used in conjunction with StateDirectory=, the
           logic for CacheDirectory= and LogsDirectory= is slightly altered:
           the directories are created below /var/lib/private,
           /var/cache/private and /var/log/private, respectively, which are
           host directories made inaccessible to unprivileged users, which
           ensures that access to these directories cannot be gained through
           dynamic user ID recycling. Symbolic links are created to hide this
           difference in behaviour. Both from perspective of the host and from
           inside the unit, the relevant directories hence always appear
           directly below /var/lib, /var/cache and /var/log.

NixOS actually has /var/lib/nixos/uid-map and similar for groups to not reuse uids for different users.

@Mic92
Copy link
Member

@Mic92 Mic92 commented Dec 7, 2020

How about starting nscd before the rest? Nscd will be ready after systemctl start because it forks and only exits the parent once the socket has been opened.

diff --git a/nixos/modules/system/activation/switch-to-configuration.pl b/nixos/modules/system/activation/switch-to-configuration.pl
index b82d69b3bb8..a41fa70b62c 100644
--- a/nixos/modules/system/activation/switch-to-configuration.pl
+++ b/nixos/modules/system/activation/switch-to-configuration.pl
@@ -349,6 +349,7 @@ sub filterUnits {
 my @unitsToStopFiltered = filterUnits(\%unitsToStop);
 my @unitsToStartFiltered = filterUnits(\%unitsToStart);
 
+my $startNscd = delete $unitsToStart{nscd.service};
 
 # Show dry-run actions.
 if ($action eq "dry-activate") {
@@ -359,6 +360,7 @@ sub filterUnits {
     print STDERR "would restart systemd\n" if $restartSystemd;
     print STDERR "would restart the following units: ", join(", ", sort(keys %unitsToRestart)), "\n"
         if scalar(keys %unitsToRestart) > 0;
+    print STDERR "would start nscd\n" if $startNscd;
     print STDERR "would start the following units: ", join(", ", @unitsToStartFiltered), "\n"
         if scalar @unitsToStartFiltered;
     print STDERR "would reload the following units: ", join(", ", sort(keys %unitsToReload)), "\n"
@@ -434,6 +436,13 @@ sub filterUnits {
     unlink($restartListFile);
 }
 
+
+# Services depending on DynamicUser needs nscd to resolve users/groups
+if $startNscd {
+    print STDERR "starting the nscd\n";
+    system("@systemd@/bin/systemctl", "start", "nscd.service") == 0 or $res = 4;
+}
+
 # Start all active targets, as well as changed units we stopped above.
 # The latter is necessary because some may not be dependencies of the
 # targets (i.e., they were manually started).  FIXME: detect units

@arianvp
Copy link
Member

@arianvp arianvp commented Dec 7, 2020

This is what I proposed originally and personally I think combined with making nscd an early-boot service (order before sysinit.target) it would solve all the issues.

For now; perhaps we should make sure in our activation script that nscd is always restarted before any other services; instead of adding this direct dependency everywhere? It's also not a very attractive solution. I wish systemd would just be able to do the right thing for us here.

But @flokli and @symphorien seem to disagree it's the right approach.

I still think it's the better approach though. The direct dependency on nscd.service feels wrong to me because nscd will always be started up early enough if we have it ordered before sysinit.target . that systemctl stop/start does not take into account those indirect dependencies is a shame; but it's a bug in systemd and/or our activation script; not in the systemd units.

I'm ok with making this change to switch-to-configuration.pl combined with the above-mentioned patch to add nscd to sysinit.target

Mic92 added a commit to Mic92/nixpkgs that referenced this issue Dec 8, 2020
Services that have dynamic users require nscd to resolve users
via pam_systemd. Those services might not even create
their own dynamic users itself i.e. iptables.
To make sure nscd is always started when this is happening we move
nscd to sysinit.target and make sure that it is always started before
starting/reloading/restarting any other service.

may fix NixOS#106127
fixes NixOS#105354
@Mic92
Copy link
Member

@Mic92 Mic92 commented Dec 8, 2020

@symphorien I think your solution fixes services that directly depend on their own dynamic user. However it seems that we also need the same users for other services i.e. the firewall service: #106127
I created #106336 as an alternative. I have not checked yet if it actually would fix the firewall uid issue.

@arianvp arianvp mentioned this pull request Dec 8, 2020
10 tasks
@arianvp arianvp added this to In Progress in systemd Dec 8, 2020
@symphorien
Copy link
Member Author

@symphorien symphorien commented Dec 8, 2020

I created #106336 as an alternative

I don't think it is an alternative but complementary. We have 2 problems:

  • sometimes nscd is not started early enough: #106336
  • when restarting nscd and a dynamic user services at the same time, it fails: either this PR or hacking nixos-rebuild (which is only a partial solution, as nixos-rebuild is not the only thing to restart services)

@symphorien
Copy link
Member Author

@symphorien symphorien commented Dec 8, 2020

oh I had not seen this PR does also hack nixos-rebuild. Fine then. Again I would prefer making systemd aware of dependencies rather than working around it, because then bare systemctl restart would work, but do as you wish.

@flokli
Copy link
Contributor

@flokli flokli commented Dec 9, 2020

Let's close this in favor of #106336.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
systemd
  
In Progress
Linked issues

Successfully merging this pull request may close these issues.

4 participants