Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC-0059]: Systemd Service Secrets #59

Open
wants to merge 7 commits into
base: master
from
Open

Conversation

@d-goldin
Copy link

d-goldin commented Nov 16, 2019

Hi,

This is a first draft for an RFC about a possible approach to secrets management for systemd services.

Curious to hear your thoughts.

@d-goldin d-goldin changed the title rfc-0057: systemd service secrets [RFC-0058]: Systemd Service Secrets Nov 16, 2019
@d-goldin d-goldin force-pushed the d-goldin:master branch from 538f402 to 00c0dd3 Nov 16, 2019
Copy link

turion left a comment

I find this a great idea in total, and one which would bring NixOS a step closer to a pain-free devops' choice.

Side note: This approach does not prevent one service from reading the secrets of another, e.g. through privilege escalation or remote code execution. But I think that issue is beyond the scope of this RFC, and should better be handled with something like containers.

rfcs/0058-secrets-for-services.md Outdated Show resolved Hide resolved
rfcs/0058-secrets-for-services.md Outdated Show resolved Hide resolved
rfcs/0058-secrets-for-services.md Outdated Show resolved Hide resolved
## Rotating secrets

Right now, secrets rotation is not done automatically. When new secrets are
pushed, it is the responsibility of the user to restart the services affected.

This comment has been minimized.

Copy link
@turion

turion Nov 17, 2019

Is it possible to at least create a warning or suggestion at the end of e.g. nixos-rebuild, with the correct systemctl restart foo.service command?

This comment has been minimized.

Copy link
@d-goldin

d-goldin Nov 17, 2019

Author

For this we'd have to look at some metadata of the secrets, like a digest or at least mtime. Right now this is not done. This would be probably not needed if the consensus ends up at automatic restarts (unless we add a flag to disable automatic restarts for some cases).

# Drawbacks
[drawbacks]: #drawbacks

I can't really think of a serious drawback right now, but hopefully the

This comment has been minimized.

Copy link
@turion

turion Nov 17, 2019

How does the secret store deal with system/main config upgrades? When I uninstall an old service and install a new one that asks for a secret of the same name, does the new service get access to the old secret?
When I uninstall a lot of services, are old secrets garbage collected? Do downgrades always work seamlessly?

Can you outline the possible pitfalls regarding that topic in an extra section?

This comment has been minimized.

Copy link
@d-goldin

d-goldin Nov 17, 2019

Author

In the current shape, no such management would happen. The secrets would just remain in the secrets store and if a config author decides to re-use them, then so be it. I can include that. I was initially thinking of the drawbacks of "sth that would be worse when adding this", but I guess it depends on the reference point. Definitely good points to add, even if they would remain unhandled.

This comment has been minimized.

Copy link
@turion

turion Nov 18, 2019

Well, it's maybe a strawman, but if you put a secret in /nix/store, it's going to vanish upon garbage collection when I uninstall the package. If I don't care whether it's readable by root or by all users (e.g. inside a container), this is a drawback now because no such garbage collection would take place.

As in "something that would be worse when adding this", you could also say that people possibly expect that NixOS doesn't do any state management by default and that downgrading will bring their machine back into the exact state, but that's not the case anymore.

This comment has been minimized.

Copy link
@turion

turion Nov 18, 2019

you could also say that people possibly expect that NixOS doesn't do any state management by default

The difference being that before this RFC, I'd manually handle the secret state and remember to handle it. After all, I'm deleting the line saying service.foo.secretsFile = "/foo/bar";, and when I delete the line, I'll delete the file. This is now not the case anymore. Instead, I need to remember how this semiautomatic secret handling works.

This comment has been minimized.

Copy link
@globin

globin Nov 18, 2019

Member

if you put a secret in /nix/store

It is exactly the purpose of this RFC not to have secrets in the nix store. This is possible without any problems and if they are referenced in the nixos config they would not be garbage collected as currently is the case. And this can still be achieved with this proposal as before.

If I don't care whether it's readable by root or by all users (e.g. inside a container)

Actually currently the nix store of (nixos) containers is shared with the host and all other containers, so that might be an issue nonetheless.

With regard to secret state handling, a lot of services have the possibility do use a passwordFile, secretFile, etc. but there is no abstraction for handing out permissions to secrets correctly, which is something like the proposal in this RFC is needed. Currently you have to not only drop the files on the machine yourself in the correct destination (same with this RFC, as long as you are using a fetcher that requires that), but also set the correct permissions for the respective service, which might be a chicken-and-egg problem of adding users for service by enabling it and the service requiring the secret.

[unresolved]: #unresolved-questions

* Is it sufficient to put responsibility on restarting services after key changes
onto the user or would an automated mechanism be better?

This comment has been minimized.

Copy link
@turion

turion Nov 17, 2019

I think typically an automated mechanism would be better. Most services just restart gracefully, so an additional unexpected restart is better than a confusing error message and downtime.

This comment has been minimized.

Copy link
@d-goldin

d-goldin Nov 17, 2019

Author

This very much depends on the service. If I'm dealing with a very heavy service, like for instance an ElasticSearch node, where a restart can cause re-replications or similar it can be very useful to tightly coordinate restarts in a larger setup. So I'm sympathetic to the average case where automatic restart is nice and stress-less but I think if we do it, we should make it configurable.

* When using a scope with multiple services, ideally only the secrets
referenced in the services definition should be made available to each
service. Right now all the secrets of the scope are blindly copied.
* Transition of most critical services to use proposed approach

This comment has been minimized.

Copy link
@turion

turion Nov 17, 2019

I think it would be better to transition some non-critical services first, to gain some experience without breaking anything. If this works for 10+ regularly used non-critical e.g. webapps, then transition everything to this approach.

This comment has been minimized.

Copy link
@d-goldin

d-goldin Nov 17, 2019

Author

I will fix the wording on that one. Parts of this section are maybe still a bit too "note to self"-like. What I intended was not for "inclusion into upstream" but more to hypothetically verify this is covering all the most important aspects needed for the most critical services. Your version is definitely better for an actual slow inclusion into upstream step.

d-goldin and others added 4 commits Nov 17, 2019
Typo fix
Co-Authored-By: Manuel Bärenz <programming@manuelbaerenz.de>
Co-Authored-By: Manuel Bärenz <programming@manuelbaerenz.de>
Typo fix
Co-Authored-By: Manuel Bärenz <programming@manuelbaerenz.de>
Co-Authored-By: Manuel Bärenz <programming@manuelbaerenz.de>
@Ericson2314

This comment has been minimized.

Copy link
Member

Ericson2314 commented Nov 18, 2019

The detailed design is quite long. I think it's better to clean up the Nix secrets story before moving downstream, and I suspect trying to do everything at once will leave us stuck.

@globin

This comment has been minimized.

Copy link
Member

globin commented Nov 18, 2019

@Ericson2314 I think this proposal is orthogonal to the secrets-in-nix-store issue and that this is quite a nice way to move forward without getting in the way of that, as this RFC abstracts over where the secrets are fetched from. In the case of nix store secrets being implement one could easily add a fetcher to use that, but also keep the abstraction in place for remote secret management systems if for some reason you are required to use that.

But obviously I'd be very happy if #5 or similar would be continued!

@globin

This comment has been minimized.

Copy link
Member

globin commented Nov 18, 2019

Side note: This approach does not prevent one service from reading the secrets of another, e.g. through privilege escalation or remote code execution. But I think that issue is beyond the scope of this RFC, and should better be handled with something like containers

@turion I'm not sure how you mean this? This as far as I can see is possible with this RFC as you can also use non-FS fetcher functions. And obviously always depends on how far you want to go with isolation.

@turion

This comment has been minimized.

Copy link

turion commented Nov 18, 2019

Side note:

It was really just a comment that all secrets are still root-readable, and that this RFC is orthogonal to more isolation/virtualization.

@aanderse

This comment has been minimized.

Copy link

aanderse commented Nov 18, 2019

From reading this over it seems once a module uses this proposal there is no opt out for the user, correct?

If say the drawback is that this adds more complexity to the system. When I was new to nix I always enjoyed being able to read a modules source and usually understand what is happening... translate things into what a regular distro does. This is an entirely new abstraction that has no equivalent in a regular distro and after reading through it twice I still feel very uncomfortable thinking about how I can explain/justify this mechanism to my co-workers who manage nixos servers with me, but have not dived into nixos enough to understand all the nuances yet. I foresee having to answer many questions to puzzled looks from people wondering why we can't continue to keep doing what we're doing, and wondering why, as non "nix people", they no longer understand module source code.

I also wonder if a product similar to vault should be put in the alternatives section. I'm under the impression that while this may not be an entirely solved problem, there are existing solutions out there that can fit.

@globin

This comment has been minimized.

Copy link
Member

globin commented Nov 18, 2019

I also wonder if a product similar to vault should be put in the alternatives section. I'm under the impression that while this may not be an entirely solved problem, there are existing solutions out there that can fit.

I think that this rather allows us to consistently use something like vault or some otherwise created secrets folder etc. and not having to depend on every module implementing support for it. It does create a further abstraction but by that allows all modules to have a consistent way of passing secrets and not a different implementation for each of them.

@aanderse

This comment has been minimized.

Copy link

aanderse commented Nov 18, 2019

@globin I meant a single solution like vault could be chosen by the sysadmin for every secret on a machine and then module code could continue as is, unmodified. This provides greater flexibility, though currently at the cost of upfront work for the sysadmin. That being said I concede that the open issues regarding secrets in the nix store have been open for years without resolution, and no one has provided a nice to use module for something like vault that integrates nicely into nixos yet.

I guess some sort of system/module that keeps this logic out of the module system would have been my desired outcome.

I hope my opinion has provided at least minimal value as a differing point of view.

@globin

This comment has been minimized.

Copy link
Member

globin commented Nov 18, 2019

The thing is that is not possibly currently.

For all modules that only allow passing secrets by string (secret content) you'd have to modify it to pass in the secret differently as it would otherwise be added to the nix store nonetheless.

This change should happen anyway even without this proposal and a lot of modules now at least allow passing by file, where one can choose to use a non-nix-store-file with more specific permissions or just a pkgs.writeText in one does not care. For the ability to use something like vault one would still have to create new services, similar to the proposal here. That's why I think it would be nice to have an abstraction so that we can standardise all services handling secrets to have a consistent interface.

@aanderse

This comment has been minimized.

Copy link

aanderse commented Nov 18, 2019

@globin I entirely agree that all modules need to replace any password options with passwordFile (or the like) regardless of this RFC, and that is something I (along with many others) have slowly been working toward. I also agree that pkgs.writeText in conjunction with passwordFile is the proper justification for entirely eliminating any password like options, like I proposed when I removed the dbPassword option from the zabbixServer module.

@d-goldin

This comment has been minimized.

Copy link
Author

d-goldin commented Nov 18, 2019

@aanderse:

From reading this over it seems once a module uses this proposal there is no opt out for the user, correct?

Do you mean for a module author or a user? If a module is switched to this, then no, without "more involved" fiddling about a user wouldn't be able to change that unless author has created multiple avenues. We could of-course add a "secret store" that is just the nix store as some sort of fallback for scenarios where people don't care and they can at least go and add their secrets to the nix store as before.

If say the drawback is that this adds more complexity to the system. [...]. I foresee having to answer many questions to puzzled looks from people wondering why we can't continue to keep doing what we're doing, and wondering why, as non "nix people", they no longer understand module source code.

This is of course different perceptions, but for me NixOS is already quite an opinionated distribution that does a lot of things quite differently and brings a few abstractions that no other distro has.
Imho pretty much anything new of this sort will have the issue of change/surprise/added complexity.

I wonder, why does this particular code/interface seem more complex than for instance all the module evaluation and merging logic in https://github.com/NixOS/nixpkgs/blob/master/lib/modules.nix? It is of course possible to simplify and document to help people understand it more easily, once we understand what points of confusion exist.

I also wonder if a product similar to vault should be put in the alternatives section. I'm under the impression that while this may not be an entirely solved problem, there are existing solutions out there that can fit.

Well, yes. I thought of this as a given, that people can still go and use vault directly if they want/their software supports it. If all things we are dealing with would support sth like vault, then it would be great. But as @globin said, we'd still want the ability to plug in another service to manage secrets if need be, so some abstraction would likely emerge.

I guess some sort of system/module that keeps this logic out of the module system would have been my desired outcome.

Could you please explain a little bit what exactly you mean? Something like this, but being just some library living separately from NixOS in general, or is this more about how such an approach would be integrated with the module system? For instance, instead of the modules using this interface directly, the user of a module calling some functions on it?

Thanks, I will keep the points in mind when I do the next pass.

typo fix
Co-Authored-By: Lassulus <github@lassul.us>
DynamicUser = true;
};
};
};

This comment has been minimized.

Copy link
@edolstra

edolstra Nov 20, 2019

Member

This seems unnecessarily verbose. I would do something like this:

systemd.services.foo = {
  needsSecrets = [ "secret1" ];
  ...
};

and then our systemd module can generate whatever helper units are necessary.

It also avoids imperative-sounding function names like mkSecretsScope which suggest that they allocate a unique new scope, but that's not the case, e.g. in

  secretsScope1 = mkSecretsScope {
     loadSecrets = [ "secret1" "secret2" ];
     type = "folder";
   };
  secretsScope2 = mkSecretsScope {
     loadSecrets = [ "secret1" "secret2" ];
     type = "folder";
   };

secretsScope1 and secretsScope2 are actually the same scope.

This comment has been minimized.

Copy link
@d-goldin

d-goldin Nov 20, 2019

Author

This seems unnecessarily verbose. I would do something like this: [...]

I agree, it's more compact. Why I initially decided against it and for something that modifies the resulting structure separately was to reduce changes to the existing systemd modules. At the same time I thought it would be nice to be able to get the secrets as arguments, which I thought made it nicer to deal with in code. In the needsSecrets case, if I understand it correctly, the user would be requesting a secret by its name, like in the scope creation, but then would have to possibly deal with paths (as strings) to pass the secret as an argument to the service, or load it into an env-var, because there would not be an automatic mapping mechanism anymore. Unless we pack it up into a shell variable or so.

It also avoids imperative-sounding function names like mkSecretsScope which suggest that they allocate a unique new scope, but that's not the case, e.g. ...

I do not necessarily perceive mk* as an imperative terminology, but that depends on the reader. We have a lot of pure mk* functions that construct some structure. But I'm not at all attached to the naming here, so we can change it to whatever seems more suitable if they're still around further down the road.

Edit: In fact, I'm not attached to most of the terms in the RFC, such as "sidecart" and similar. I merely picked them from what I thought would make it easily enough understood. So if there are suggestions to rename things, I'm up for it.

This comment has been minimized.

Copy link
@d-goldin

d-goldin Dec 1, 2019

Author

@edolstra: Did this sufficiently address your remarks? I'm not super familiar with the inner workings of the process, so for now I just left those as discussion comments here, but I'm willing to incorporate some of the things pointed out into "alternatives" or the core section, if there is some consensus around that.

with secrets
* "Side-car" service: A privileged systemd service running the fetcher
function to retrieve the secret, and initially create the service
namespace

This comment has been minimized.

Copy link
@edolstra

edolstra Nov 20, 2019

Member

It's not clear to me why the helper unit is needed. Can't the keys be fetched using an ExecStartPre=+... command?

More generally, instead of creating our own mechanism for passing keys to services, maybe the kernel keyring mechanism can be used for this? Units would call keyctl request/search/read/... to fetch keys. These keys would either be preloaded into the keyring or produced on demand using the request-key program.

This comment has been minimized.

Copy link
@d-goldin

d-goldin Nov 20, 2019

Author

I briefly tried ExecStartPre, but unfortunately it does not seem to run within the mount namespace, so it doesn't have a access to the PrivateTmp we want. I am not sure if this is intentional or not.

Regarding kernel keyring mechanism - I agree, it might be a good default backend. The directory based thing was just the dumbest proof-of-concept case I came up with (given that similar approaches are used in nixops and krops/stockholm). Part of the intention is to have a somewhat agnostic interface.

This comment has been minimized.

Copy link
@arianvp

arianvp Dec 16, 2019

Member

Kernel keyring mechanism sounds overkill to me. Its usecase is not to communicate files between userland processes, but between userland and kernel drivers. Files are a perfectly sufficient abstraction for passing secrets around in userland.

This comment has been minimized.

Copy link
@flokli

flokli Jan 21, 2020

I have mixed feelings about this.

Using files usually implies having to worry about ACLs and who's allowed to access them. We can cheat by mounting them in a private namespace, but it's still a bit cumbersome.
Sometimes you want to have "use once" properties and provide a new key on every read / issue tokens on access etc. We could cheat again by providing these key files by a fuse filesystem, but then it just gets more complicated.

The kernel keyring might be a good abstraction over all this, it's just not widely adapted currently and lacking real-world usage. Reading keyrings (7) looks promising. In addition to thread/process/session, there's also an upcall feature, bouncing back to userspace to request secrets which could be a request to whatever credentials provider is used.

I'd love to experiment with that a bit, or see some real-world examples. Anybody aware of these?

This comment has been minimized.

Copy link
@poettering

poettering Jan 21, 2020

kernel keyring is very much intended for userspace keys too. See the kerberos stuff that has been ported to use it for that, or systemd's cryptsetup.

I think the kernel keyring has deficiencies (upcalls, yuck! also no namespacing for containers, …), but it probably is the right approach in the long run.

@edolstra edolstra mentioned this pull request Nov 28, 2019
@globin globin changed the title [RFC-0058]: Systemd Service Secrets [RFC-0059]: Systemd Service Secrets Nov 28, 2019
@shlevy shlevy mentioned this pull request Dec 5, 2019
@Mic92 Mic92 mentioned this pull request Dec 12, 2019
@Mic92

This comment has been minimized.

Copy link

Mic92 commented Dec 12, 2019

I would like to nominate @Lassulus, maintainer of krops.

@dhess

This comment has been minimized.

Copy link

dhess commented Dec 12, 2019

I'm grateful for the nomination as shepherd. However, I don't think I meet the requirements as stated by the RFC process, namely: "This team should be people who are very familiar with the main components touched by the RFC."

I think the project would be better served by someone who has, e.g., hacked on NixOps' key distribution code, or worked with the Linux kernel's keyrings facility, both of which seem very relevant to the goal of this RFC.

How about this: I'll agree to shepherd the RFC, despite my lack of expertise, if a better candidate doesn't step up in a reasonable timeframe.

@arianvp

This comment has been minimized.

Copy link
Member

arianvp commented Dec 16, 2019

I'm not sure if this RFC actually solves the problem it's trying to solve. But please correct me if I am wrong.

In the case documented in this PR, the fetcher is responsible for copying the file from /run/secrets into the the service's mount namespace. But the only way I see this working is either have the secret world-readable (in which case the service might as well read it from /run/secrets directly, as the service has access to that anyway), or to statically allocate a uid and guid for the user, and chown the secret to that user; but in that case DynamicUser= will disable itself automatically and you lose the benefit of using it in the first place

If a statically allocated user or group of the configured name already exists, it is used and no dynamic user/group is allocated.
Note that if User= is specified and the static group with the name exists, then it is required that the static user with the name already exists. Similarly, if Group= is specified and the static user with the name exists, then it is required that the static group with the name already exists.

Reason is, Because the sidecar-service has no idea what the uid and guid of the DynamicUser is going to be apriori (because they're dynamically allocated on service startup), it can not chown the secret in the service's private /tmp to the right permissions to make the secret readable by the service. To mitigate this problem you can either make it world-readable (in which the service can just directly read from /run/secrets) or you must allocate the uid upfront (in which the DynamicUser= mechanism does not allocate a uid automatically anymore)

If a service wants to access secretse, I advice using a statically allocated user and/or group, and then use filesystem permissions to scope the secret to the specific service. This is what filesystem permissions are for. /run/secrets/mykey.key can then be owned by the nginx group, making sure that only nginx can access it.

We can still keep DynamicUser=true as it will still default to a whole bunch of useful isolation features even if the uid for the provided User= is statically allocated; but I think both using dynamic uid allocation and getting the secret into the container with the right permissions at the same time is impossible.

@Mic92 Mic92 mentioned this pull request Dec 19, 2019
@domenkozar

This comment has been minimized.

Copy link
Member

domenkozar commented Dec 19, 2019

@dhess I'd like you to reconsider, not everyone is expected to know everything. We need people with security mindset around.

@Lassulus, @globin and @aanderse are accepted.

We need the leader - do you mind being one @aanderse (30min cap is perfect).

And final word from @dhess.

@dhess

This comment has been minimized.

Copy link

dhess commented Dec 19, 2019

@domenkozar OK, I accept!

@flokli

This comment has been minimized.

Copy link

flokli commented Dec 20, 2019

Possibly relevant: systemd/systemd#14264

@aanderse

This comment has been minimized.

Copy link

aanderse commented Dec 22, 2019

@domenkozar sounds good. I'll wait to hear back from @d-goldin, once the outstanding comments have been addressed.

@d-goldin

This comment has been minimized.

Copy link
Author

d-goldin commented Dec 22, 2019

@aanderse: Which comments in particular would you like to have addressed?

@aanderse

This comment has been minimized.

Copy link

aanderse commented Dec 22, 2019

Oh I wasn't sure if you wanted to address the most recent comment by @arianvp or not. I'm happy to schedule a call whenever you are ready. Just give the word.

@d-goldin

This comment has been minimized.

Copy link
Author

d-goldin commented Dec 22, 2019

@aanderse: Alright. I'll ask for some clarification on that one and we can probably get in next week or so. Will the committee help finding a co-author or some other proponents for this issue going forward, or how does this usually work?

@d-goldin

This comment has been minimized.

Copy link
Author

d-goldin commented Dec 24, 2019

@arianvp: I tried to address a few parts of your argument, but please clarify further where necessary. The core suggestion I think I'm seeing here is to not do any of this and leave things as they are with static service users and somehow set up permissions, but I don't really see how one hurts the other and why just static users is better than static users+additional solutions. This "classical" approach is by the way listed in alternatives.

I'm not sure if this RFC actually solves the problem it's trying to solve. But please correct me if I am wrong.

There are roughly two things this RFC tries to propose:

  • An API for how secrets could be described and referenced in Nix for use with systemd services
  • Rough mechanics of how those secrets can be made available to services

Does it solve neither?

In the case documented in this PR, the fetcher is responsible for copying the file from /run/secrets into the the service's mount namespace. But the only way I see this working is either have the secret world-readable (in which case the service might as well read it from /run/secrets directly, as the service has access to that anyway)

Right now the "store" is described like this:

Secrets store: a secure file-system based location, in this document /etc/secrets, only accessible to root.

It is mostly assumed that the service itself, running as a service user or especially DynamicUser does not have access to this store directly.

In case of a known service user, this is possible to solve by the config author ensuring correct permissions (likely manually?), but as you correctly noticed further below, not in the case of DynamicUser as things stand right now (unless systemd solves this for us, like in the systemd issue linked by @flokli).

The fetcher runs as a privileged user though and has no issues accessing the store, which is the whole purpose of the side-car service (be it a folder as a secrets store, or vault).

[...]
Reason is, Because the sidecar-service has no idea what the uid and guid of the DynamicUser is going to be apriori (because they're dynamically allocated on service startup), it can not chown the secret in the service's private /tmp to the right permissions to make the secret readable by the service. To mitigate this problem you can either make it world-readable (in which the service can just directly read from /run/secrets) or you must allocate the uid upfront (in which the DynamicUser= mechanism does not allocate a uid automatically anymore)

Maybe this aspect in the RFC doc itself is not sufficiently spelled out yet, but DynamicUser is specifically called out as something that this aims to solve, like:

With the introduction of Systemd's DynamicUser, the more traditional approaches of manually managing permissions of some out-of-store files could become cumbersome or slow down the adoption of DynamicUser and other sandboxing features throughout the nixpkgs modules.

The simple POC implementation of the fetcher does set the secret to world-readable within the services private mount namespace (https://github.com/d-goldin/nix-svc-secrets/blob/master/secretslib.nix#L19) to deal with the DynamicUser part. And one key advantage here is that less additional out-of-config state such as permissions needs to be kept consistent with the config by the user because the fetchers can make them accessible based on nix config and nothing else.

What I don't seem to understand in this argument is why you equate world-readable within the private mount namespace with the ability to read from the secrets store directly (/run/secrets, as you call it here).

If a service wants to access secretse, I advice using a statically allocated user and/or group, and then use filesystem permissions to scope the secret to the specific service. This is what filesystem permissions are for. /run/secrets/mykey.key can then be owned by the nginx group, making sure that only nginx can access it.

While there is not much on the case of static users + secrets accessible with that user in the RFC right now, I think it should still map well enough interface wise but would require adjustments in how the solution resolves the location of the secret for such a service, which could be done rather transparently.

Which specific adjustments do you think are necessary to address your concerns, or do you generally think that there is no problem to be solved here?

I do see the need to be a bit more explicit in the document about how the dynamic UID/GUID is handled as it currently relies on the reader to actually look into the POC, but before I make further changes, I'd like to see this clarified first.

@dhess

This comment has been minimized.

Copy link

dhess commented Dec 24, 2019

My role here as shepherd will be to play the naïve user, asking the dumb questions.

First off, I'm unfamiliar with DynamicUser. I infer from reading this RFC that DynamicUser is important to the future of NixOS module design. Can someone briefly explain to me what benefits DynamicUser provides for NixOS module design, and why DynamicUser doesn't work well with the current best-practices approach for out-of-store secrets, as exemplified by NixOps?

Also, I infer from reading this RFC that it exists primarily because of the needs of DynamicUser services; but in my opinion, there are some comparatively straightforward features that a robust systemd secrets service should also provide, which are either not addressed by this RFC, or are mentioned only in passing. For example, many services need not be given read access to any secrets-containing files at all, if only NixOS's systemd facilities had a standard, automated way to set environment variables containing those secrets in the target process's environment -- consider, for example, a process that requires AWS credentials. This mechanism would work irrespective of DynamicUser, so long as the application can read secrets from environment variables.

That's just one example, but there are several others I can think of just off the top of my head. There are some real low-hanging fruit with respect to the handling of secrets in NixOS. It would be a shame not to address at least some of those features in this RFC, in addition to whatever is required by DynamicUser. Perhaps those other features are out of scope for this particular RFC. If so, that's fair, but in that case, I think that the RFC should be re-titled to indicate that it is specifically addressing the needs of DynamicUser modules.

@d-goldin

This comment has been minimized.

Copy link
Author

d-goldin commented Dec 25, 2019

@dhess:

First off, I'm unfamiliar with DynamicUser. I infer from reading this RFC that DynamicUser is important to the future of NixOS module design.

I did not necessarily want to evoke this impression, but it's a useful feature that currently is a bit hard to use for things that also require access to secrets, which often are likely managed using regular permissions, which then stops working. It should also not be read as "this is the only way things should be done from now on" and rather as an attempt to provide some additional, optional tooling.

Can someone briefly explain to me what benefits DynamicUser provides for NixOS module design, and why DynamicUser doesn't work well with the current best-practices approach for out-of-store secrets, as exemplified by NixOps?

  • Removes need to explicitly manage users and groups for a service. This is nice not only because it saves some typing in the module definition, but because it also avoids problems such as UID/GID re-use.
  • Additional sandboxing, such as mostly read-only file-system hierarchy
  • Private, cleaned up /tmp
  • Cleanup of IPC objects

As was pointed out by @arianvp, it is possible to achieve similar isolation using static users by adding a few settings (see link below). In this case we are back to the usual things that need to happen outside of nix: ship the secrets, ensure correct permissions, reference secrets loosely as path-strings which is imho a gap that would be nice to narrow. I have also some hope that it should be possible to provide an abstraction that could allow providing secrets from various stores, such as vault, to services that do not have direct support.

Why DynamicUser is mentioned in the RFC so much is because it seems to me, that if we solve this for DynamicUser, the solution should also work for static user scenarios with the added benefit of avoiding additional out-of-config permission management and hopefully adding some representation to secrets in the config that is more than just paths-as-strings.

Here is a more exhaustive write-up by Poettering himself on the motivations behind DynamicUser; http://0pointer.net/blog/dynamic-users-with-systemd.html

Also, I infer from reading this RFC that it exists primarily because of the needs of DynamicUser services;

As mentioned above, I don't think this should be the case. It also exists because there is perpetual confusion about how to manage secrets, how to use them in modules, how to avoid them being stored in the nix-store which leaves NIxOS with a usability gap and some footguns in that regard.

but in my opinion, there are some comparatively straightforward features that a robust systemd secrets service should also provide, which are either not addressed by this RFC, or are mentioned only in passing. For example, many services need not be given read access to any secrets-containing files at all, if only NixOS's systemd facilities had a standard, automated way to set environment variables containing those secrets in the target process's environment -- consider, for example, a process that requires AWS credentials. This mechanism would work irrespective of DynamicUser, so long as the application can read secrets from environment variables.

Systemd does have EnvironmentFile which does exactly this. It also runs with systemds privileges, so it can load files the service user does not directly have permissions to access (but again, something needs to make the secret accessible to the service user, be it systemd in this case or a side-car service in the other case). This is nice, as long as a prepared environment-file can be provided. While nice enough, as a mechanism it has a few shortcomings though, one of which is environment variable size limits, cumbersome escaping being another.

Further, If I'm not mistaken, it's right now not possible to provide multiple EnvironmentFile directives in a NixOS systemd service config, which is problematic if there is need to combine entries from user-supplied data and module settings which makes this a bit cumbersome too, but it's an issue that could be fixed independently [Edit: I just double-checked, multiple env files work just fine] (we have no way to pass a secret as a string in a "safe" way, so if we pass a path, we're back to the need of being able to load up that path as the service user).

The POC linked in the RFC has an ugly example of loading a secret file into the environment, so its not entirely unmentioned. Nicer mechanics for this would be nice though, such as with some wrapper or similar.

Why I opted to make the ugly example and not just use EnvironmentFile is because supporting X different kinda-working-for-some-cases approaches was not something I was aiming for. As soon as a file can be made accessible to the service process most ways of passings secrets such as environment variables, CLI arguments, config-file templates should be dealable with and require fewer moving parts in processing of the service definition and require a less complex API.

Ultimately, a module author should still be able choose which mechanism is the easiest to use.

That's just one example, but there are several others I can think of just off the top of my head. There are some real low-hanging fruit with respect to the handling of secrets in NixOS.

Which additional cases are you thinking of that are still unexplained/unaddressed? A list would be generally useful.

While I do think we should address as many low hanging fruits as possible, i don't think we should support every possible mechanism under the sun for maintainability and complexity reasons.

Hope that clarifies a bit.

@7c6f434c

This comment has been minimized.

Copy link
Member

7c6f434c commented Dec 25, 2019

It should also not be read as "this is the only way things should be done from now on" and rather as an attempt to provide some additional, optional tooling.

Note, though, that by the (accepted) RFC #52 DynamicUser seems to be advised as the default solution whenever feasible.

@globin

This comment has been minimized.

Copy link
Member

globin commented Jan 13, 2020

Note that a rather annoying bug in systemd concerning DynamicUser is fixed in the next release (systemd/systemd#14532).

@Mic92

This comment has been minimized.

Copy link

Mic92 commented Jan 14, 2020

I know that encryption is not a goal yet of the current RFC but I would like to propose a concept that would unify the secret management between all deployment solutions and just make it part of NixOS itself.

  1. Encrypt the secret with a public key encryption like gpg (with pass as a frontend)
  2. Add it to the nix store during the normal build process
  3. During activation phase/or in a service the key gets decrypted by the secret service using the key stored on the machine.

This approach has the following advantages:

  • We know when we have to restart services because the encrypted secret changed
  • We can encrypt secrets for multiple both the users and the servers that should read them -> this also makes it easy to manage them with a VCS (i.e. git)
  • We no longer need to implement an extra step of deploying secrets in each deploy tool, which makes it likely faster and less error prone
  • Rollbacks now will correctly and pick up old version of secrets from nix store

We could both encrypted secrets/plain secrets side by side however plain secrets would not be added to the nix store just as we did before. For not encrypted secrets we could use nix builtins to calculate the checksum to see if we need to restart services.

Errata:

  • @flokli just noted the use case of using vault for encryption, this could by implemented by taking a key id instead of an encrypted secret and decrypting/downloading that at runtime.
@flokli

This comment has been minimized.

Copy link

flokli commented Jan 14, 2020

@Mic92 this would however make secrets part of your system configuration, so rotating these secrets regularily becomes complicated or impossible, and going back to older generations will also rollback to (expired) secrets. See @zimbatm's comment NixOS/nixpkgs#24288 (comment) which goes more into detail about why they should not be configuration.

@Mic92

This comment has been minimized.

Copy link

Mic92 commented Jan 14, 2020

@flokli rolling back to old secret is exactly my use case. I am not an enterprise user using vault and for my small deployment I want to store my secrets in the git. However I could also imagine to extend the concept above to support taking a key id and instead of decrypting at runtime from the nix store, would download it from vault.

@flokli flokli mentioned this pull request Jan 21, 2020
3 of 10 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.