Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NixOS doesn't detect UID/GID collisions #112647

Closed
matthiasbeyer opened this issue Feb 10, 2021 · 24 comments
Closed

NixOS doesn't detect UID/GID collisions #112647

matthiasbeyer opened this issue Feb 10, 2021 · 24 comments
Labels
0.kind: bug 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 6.topic: nixos

Comments

@matthiasbeyer
Copy link
Contributor

Describe the bug

NixOS rolls a dice on system service user IDs.
This caused a very weird bug on my nextcloud deployment: #112640

The issue was, in short, that the nextcloud user got an UID (1001) which I assigned to a new user later on.
This resulted, after a reboot, that the phpfpm-nextcloud.service was started as user (bob) and nothing worked.
The fix felt like digging in mud, coming from the clean and shiny world of NixOS.

But, maybe not so shiny: I blame nixos here.

To Reproduce
Steps to reproduce the behavior:

  1. run nextcloud, or redis,... or any other service that gets a random UID
  2. add a new user with said UID, because you don't know and don't expect the UID to already be assigned - because, to be honest: Who in the world would think that a system service gets UID 1001 or 1002?
  3. Profit!

Expected behavior

User IDs MUST be hardcoded by the administrator / author of configuration.nix


I know that there are predefined UIDs, but for some services there are not (and redis even got removed!). Apparently, this would have prevented the issue.
Also, a hard fail when I first tried to add the new user would have prevented that from happening.


The title of this issue might be a bit misleading. Feel free to suggest something better.

@symphorien
Copy link
Member

The issue looks like the user for nextcloud is not labeled as system user (which would cause it to be assigned an id below 1000).
I wonder if we should make explicitely specifying wether a user is system or normal mandatory (by not providing a default value to the option).

@primeos
Copy link
Member

primeos commented Feb 10, 2021

I wonder if we should make explicitely specifying wether a user is system or normal mandatory (by not providing a default value to the option).

That would be great for Nixpkgs IMO (I assume isSystemUser should always be true for Nixpkgs modules but since the default is false this is currently pretty error-prone). And since isNormalUser sets isSystemUser to false this might not even require that many changes outside of Nixpkgs, at least for normal user configurations (out-of-tree NixOS modules would likely still be impacted).

NixOS rolls a dice on system service user IDs.

IIRC I once heard that they're assigned incrementally and tracked via /var/lib/nixos/{uid,gid}-map but I've never looked into that myself.
Edit: See allocId in nixos/modules/config/update-users-groups.pl.

The other thing that surprised me is that UID clashes are apparently not detected: #112640 (comment)
Due to users.mutableUsers defaulting to true this is unfortunately not something that can detect any collision but adding new users outside of the NixOS configuration is likely pretty uncommon (I assume users.mutableUsers=true is usually only required for setting and changing passwords securely outside of the NixOS configuration / world-readable Nix store).
Edit: Checking for UID collisions at evaluation time should be possible by parsing users-groups.json:

spec = pkgs.writeText "users-groups.json" (builtins.toJSON {
(and ID collisions from useradd/groupadd could emit a warning/error via during the activation - nixos/modules/config/update-users-groups.pl already seems to emit some warnings if necessary).

@matthiasbeyer
Copy link
Contributor Author

matthiasbeyer commented Feb 10, 2021 via email

@primeos primeos changed the title NixOS UIDs are assigned "randomly" NixOS doesn't detect UID/GID collisions Feb 10, 2021
@primeos
Copy link
Member

primeos commented Feb 10, 2021

The title of this issue might be a bit misleading. Feel free to suggest something better.

I assume "NixOS doesn't detect UID/GID collisions now" is the main issue (please correct me if I'm wrong).

@kira-bruneau
Copy link
Contributor

I just ran into this problem too. I tried to create a new "builder" user for distributed builds, but I didn't realize that a user with the same UID (1001) was already automatically assigned for localtimed.

@xaverdh
Copy link
Contributor

xaverdh commented Feb 11, 2021

So what would the expected behavior be, should update-user-groups emit a warning?

@primeos
Copy link
Member

primeos commented Feb 11, 2021

So what would the expected behavior be, should update-user-groups emit a warning?

Yes, that would be helpful.
And more importantly UID/GID collisions in users-groups.json should IMO be a hard error during the evaluation (see #112647 (comment) - though we'd obviously have to detect the collision based on the sources for generating users-groups.json to throw an error during the evaluation; hopefully this won't cause much overhead).

@matthiasbeyer
Copy link
Contributor Author

matthiasbeyer commented Feb 11, 2021 via email

@xaverdh
Copy link
Contributor

xaverdh commented Feb 11, 2021

No, it should break my build entirely.
To not break backwards-compatibility, tho, I would argue that a warning for now,
and as soon as the next nixos release is made, this should be turned into a hard
error.

I think the issue is that this can only be detected at run time, because the previously used ids are state in /etc/passwd and /var/lib/nixos/* respectively. So the build can not fail depending on this information.
At best it could fail when switching to the configuration (or on boot).

@symphorien
Copy link
Member

Failing on generation activation does not look like a good idea because then affecting a new user imperatively can make previous generation unbootable.

@matthiasbeyer
Copy link
Contributor Author

matthiasbeyer commented Feb 11, 2021 via email

@primeos
Copy link
Member

primeos commented Feb 11, 2021

I feel like we're running a bit in circles here... :o
I thought my edits in #112647 (comment) would explain what could be done during the evaluation, build, or activation phase (but tbh only linking to the relevant source-code places was likely too lazy of me).

I think the issue is that this can only be detected at run time

No, AFAIK this is only true for some cases that are hopefully pretty uncommon (useradd/groupadd, s. #112647 (comment)).

Failing on generation activation does not look like a good idea because then affecting a new user imperatively can make previous generation unbootable.

That's a good point. And additionally I assume that errors in activation scripts should never occur because that could put the system in an "unknown/undefined" state (if only some of the activation steps where executed).

I guess the best way would be to disallow services without UID configuration.
This way, the admin must provide a UID for each service they configure.

Forcing everyone to manually assign all UIDs and GIDs sounds like a bad idea to me (though we could make this optional if that isn't already possible by e.g. setting/overriding the default UID/GID values). And it would still be error prone (currently collisions should only occur if the admin assigns a UID/GID that's already in use, so IMO there might be no or only very minor advantages depending on some implementation details).

@matthiasbeyer
Copy link
Contributor Author

matthiasbeyer commented Feb 11, 2021 via email

@primeos
Copy link
Member

primeos commented Feb 11, 2021

AFAIK, some services have predefined UIDs/GIDs (see
nixos/modules/misc/ids.nix).

Yes and (IIRC) we started to run out of UIDs/GIDs in that range. There should be good explanations on GitHub and/or the Git history and if you open nixos/modules/misc/ids.nix you should see a comment at the top that hints at this (but doesn't explain why we're avoid hardcoding UIDs/GIDs).

If most services would have a UID/GID configured, and all that have none fail
without the user assigning one, that would result in a bit of effort on the
users side,

I think this would result in quite some effort (especially since the UIDs/GIDs must remain unchanged or one has to manually update file owners/groups) and affect (almost) all users. In any case I think that's a bad idea.

but of course with a lot better security considering discussed
issue, am I right?

I don't see any security advantage. You should've only gotten that UID collision because you didn't check if it was already in use (or maybe it's also possible if you add multiple users at the same time - but ideally the code should only dynamically assign new UIDs after all new manually set UIDs are "tracked", so this is hopefully unlikely / would be a bug).
(Edit: And the other issue was that the UID of the Nextcloud user was in the wrong range (at least I assume NixOS modules should only create system users).)

@xaverdh
Copy link
Contributor

xaverdh commented Feb 11, 2021

So for services, we should aim to use DynamicUser anyway.
What we can check is that the same declarative configuration does not hand out some uid multiple times.

But collisions with either current or prior state are not that easy to handle.. and its not just about mutable users, you might just as well have some leftover state from an earlier configuration and the same issue can happen (because explicitly specifying a uid in your configuration will always be, and have to be, honored by setup-users-groups).
edit: in pratice the situation is not that bad though, since then you likely just have an "ordinary" user confronted with some leftover state from a service rather than reverse.

@xaverdh
Copy link
Contributor

xaverdh commented Feb 11, 2021

as a side note: Its quite possible (the shadow tooling supports this, and apparently people do, or at least did use it as well) to have two users share the same uid, so if we just reject such configurations as invalid, we should probably have a switch for allowing it anyway.

@xaverdh
Copy link
Contributor

xaverdh commented Feb 11, 2021

in pratice the situation is not that bad though, since then you likely just have an "ordinary" user confronted with some leftover state from a service rather than reverse.

I take that back. There is simply no way to detect that collision without accessing the prior state.
The same configuration (nextcloud with uid allocated by setup-users-groups and an ordinary user with declarative and explicit uid 1001) will run perfectly fine when built on a fresh machine, because setup-users-groups will now hand out a different uid to nextcloud (because it knows that 1001 is in use).

@aanderse
Copy link
Member

Sorry this is a long thread and I haven't read it all but just in case it provides some context: https://github.com/NixOS/rfcs/blob/master/rfcs/0052-dynamic-ids.md

@xaverdh
Copy link
Contributor

xaverdh commented Feb 12, 2021

I gave this some thought, since user / group creation for services is somewhat error prone manual work currently, maybe nixpkgs would benefit from some helper functions, to be used in nixos modules?
These would ensure that e.g. the user is marked as a system user, that if an explicit uid is used, a gid is given as well (otherwise it defaults to nogroup which is almost never a good idea) and potentially more..

@symphorien
Copy link
Member

Your suggestions (which I support) are best served by the module system

  • the user is marked as a system user, <-- make the option isSystemUser mandatory (no default)
  • if an explicit uid is used, a gid is given as well <-- this could be a nixos module assertion

With a helper function, there is always the possibility of forgetting to use it :/

@matthiasbeyer
Copy link
Contributor Author

matthiasbeyer commented Feb 14, 2021 via email

@matthiasbeyer
Copy link
Contributor Author

WTF? I didn't post this!

@symphorien
Copy link
Member

I wonder if we should make explicitely specifying wether a user is system or normal mandatory (by not providing a default value to the option).

That would be great for Nixpkgs IMO (I assume isSystemUser should always be true for Nixpkgs modules but since the default is false this is currently pretty error-prone). And since isNormalUser sets isSystemUser to false this might not even require that many changes outside of Nixpkgs, at least for normal user configurations (out-of-tree NixOS modules would likely still be impacted).

I implemented this in #115332 but by requesting users to be explicit instead of changing the default.

symphorien added a commit to symphorien/nixpkgs that referenced this issue Apr 14, 2021
As the only consequence of isSystemUser is that if the uid is null then
it's allocated below 500, if a user has uid = something below 500 then
we don't require isSystemUser to be set.

Motivation: NixOS#112647
@stale
Copy link

stale bot commented Sep 6, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Sep 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md 6.topic: nixos
Projects
None yet
Development

No branches or pull requests

7 participants