Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elixir and Erlang RELEASE_COOKIE: let's reach consensus on what to do to fix the current mess #166229

Open
picnoir opened this issue Mar 29, 2022 · 16 comments

Comments

@picnoir
Copy link
Member

picnoir commented Mar 29, 2022

Some Context

Since Elixir 1.13, the absence of a release cookie at startup leads to a failure in the mix-release-generated start script. We've been hit pretty badly by this issue for the Pleroma package

First, @kloenk approched this by injecting a random cookie to the service start script in #149368 via a systemd-provided env variable.

While it did fix the Pleroma service startup, it did not fix the situation for the interactive binaries such as pleroma_ctl. Trying to fix that issue, I opened #164398, which wraps all the $out/bin binaries with the previously mentioned release cookie env variable containing a dummy release-cookie.

Sadly, this PR broke some existing setups which led @yu-re-ka to open #164965 .

At this point, @yu-re-ka, @kloenk and me started to discuss how to fix this once and for all on Matrix. During this discussion, we realized that most of the single-node beam packages are likely to suffer from the same issue. We agreed the proper fix shouldn't live in the pleroma derivation but rather in the mix-release routine in charge of generating the startup scripts.

Status Quo

How could we solve this unfortunate situation?

I personally can see 3 options:

  1. Leave everything as it is and assume the leaf package to correctly set RELEASE_COOKIE. In that case, we probably should write some documentation about it and streamline the wrapper introduced at nixos/pleroma: create cookie if not existing #149368.
  2. Patch the mix-release script generator to use a static cookie. We'd first generate a static release cookie in the Nix store and then patch the script generator to set RELEASE_COOKIE to this static value. In practice, it would mean we'd stop deleting the statically generated release cookie in the mix-release postFixup script. This would solve this release cookie issue for the single node deployments. Of course, we'd provide a way to override this and set a custom cookie in case of a multi-node deployment.
  3. Patch the mix-release script generator to use a cookie situated at runtime's $(PWD)/cookie instead of something located in the $RELEASE_ROOT, ie. in the release Nix store path.

I'd be personally in favor of moving forward 1 with the solution 2. I'd assume that if a user is advanced-enough to setup a multi-node Erlang cluster, they are advanced enough to override the static dummy cookie in their Nix config to something sensible.

I don't like the idea of leaving things as they are (ie. solution 1): we'll know for sure that any binary produced by beamPackages.mixRelease will fail at startup unless it gets patched.

Dear @NixOS/beam maintainers, what do you think? Do you see any option besides the ones listed above? Which one do you personally favor? Why?

Cc: @yu-re-ka @kloenk @happysalada @NixOS/beam

Footnotes

  1. Meaning: I'm up to implement that if we all agree it's the way to go.

@picnoir picnoir changed the title Elixir and Erlang RELEASE_COOKIE: let's reach consensus what to do to fix the current mess Elixir and Erlang RELEASE_COOKIE: let's reach consensus on what to do to fix the current mess Mar 29, 2022
@DianaOlympos
Copy link
Contributor

I do not like the idea of patching that script. If we want to patch it, it would be better to find a way forward to upstream the change to the script to make it easier to work with.

I am trying to understand what is the problem here. So the mix release generate (or get injected hopefully for reproduceability) a cookie when it generate the release. We package that and a user download it.

Is the problem that we expect a different package to use the same cookie ?

@picnoir
Copy link
Member Author

picnoir commented Mar 29, 2022

I am trying to understand what is the problem here. So the mix release generate (or get injected hopefully for reproduceability) a cookie when it generate the release.

That's how upstream mix-release is meant to work.

However, in our case:

  1. mix-release generate a cookie in $out/releases/COOKIE.
  2. mix-release generate some shell startup script for each elixir bin pointing to $out/releases/COOKIE.
  3. we delete the previously generated $out/releases/COOKIE cookie in the postFixup hook
    if [ -e $out/releases/COOKIE ]; then # absent in special cases, i.e. elixir-ls

Meaning, when a user tries to launch a binary, the mix-release startup script will fail when trying to load the non-existing $out/releases/COOKIE cookie.


Here's a practical example of a CLI utility (pleroma_ctl) crashing at runtime because of the lack of this release cookie.

/tmp/tmp.JRFJp10kpj » cat pleroma-without-cookie.nix
let
  # Nixpkgs pin **before** the commit in which we wrap pleroma pleroma binaries with a dummy COOKIE.
  pkgs = import (builtins.fetchTarball {
        url = "https://github.com/NixOS/nixpkgs/archive/1098fc92217ac27746ec8004a87ca742b1408795.tar.gz";
        sha256  = "sha256:0p6bva4jw7nfr57ds21mi6mj8axx1parpj8xcg1hkk36hhcs8lhj";
    }) {};
in pkgs.pleroma

/tmp/tmp.JRFJp10kpj » pleroma=$(nix-build pleroma-without-cookie.nix)

/tmp/tmp.JRFJp10kpj » "${pleroma}"/bin/pleroma_ctl create
cat: /nix/store/5bl9vq8acxf7h6s0rdxwm08g6p32sqs8-pleroma-2.4.2/releases/COOKIE: No such file or directory

[Edit]: I realize my OP wasn't clear at all and leads to some confusion.

By:

Patch the mix-release script generator to use a static cookie. We'd first generate a static release cookie in the Nix store and then patch the script generator to set RELEASE_COOKIE to this static value. In practice, it would mean we'd stop deleting the statically generated release cookie in the mix-release postFixup script.

I basically mean:

  • Either we stop deleting the cookie generated by mix-release in the postFixup phase.
  • Either we continue deleting the cookie but patch as well the startup script to prevent it from pointing to a non-existing path.

@happysalada
Copy link
Contributor

happysalada commented Mar 29, 2022

hey, thanks for starting this!

let me try to summarize the problem.
When we build a mix release, a cookie is created. In nix we remove that cookie in the build script. The fear being that by default all release run in epmd mode and so one person with the cookie for a particular elixir software in nix could connect to all the other nix instances of that software if the person forgot to replace that cookie.
The problem at present is that with elixir 1.13, a release will just refuse to start if not provided a cookie. Since we remove the default one, if the user packaging software for nix doesn't provide an option to replace the cookie,, then the software won't start at all.

I have a couple of questions.
If we make sure that in the module we make for software, the cookie has to be defined, would that not solve our problem ? Are you saying that each user packaging a software needs to make sure to add that cookie option to the module they create and that it makes it harder ? I'm not sure yet what is the problem with enforcing that the cookie is set in the nixos module.
To be sure, here is what I am advocating for
https://github.com/NixOS/nixpkgs/blob/nixos-unstable/nixos/modules/services/web-apps/plausible.nix#L237
I noticed that you are adding a wrapper for the cookie directly in the package, I think it should be done in the module if possible.
I understand that the problem with that approach is for other binaries than the service. for example, for your pleroma_ctl you would have to do something like
RELEASE_COOKIE=$(cat cookie_path) "${pleroma}"/bin/pleroma_ctl create if you want to run it from the cli. Are you saying that would be too inconvenient ? (just want to clarify).

@picnoir
Copy link
Member Author

picnoir commented Mar 29, 2022

Individually answering the questions.

If we make sure that in the module we make for software, the cookie has to be defined, would that not solve our problem ?

👍 It does for the binaries meant to be used as long running services.

I'm not sure yet what is the problem with enforcing that the cookie is set in the nixos module.

I don't see any either for binaries meant to be used as long running services.

To be sure, here is what I am advocating for
https://github.com/NixOS/nixpkgs/blob/nixos-unstable/nixos/modules/services/web-apps/plausible.nix#L237
I noticed that you are adding a wrapper for the cookie directly in the package, I think it should be done in the module if possible.

👍

if you want to run it from the cli. Are you saying that would be too inconvenient ? (just want to clarify).

Yes. This whole story started out by a confused user pinging me on IRC facing this exact error (not being able to run pleroma_ctl from the CLI), they were rightfully confused by the following error message:

cat: /nix/store/5bl9vq8acxf7h6s0rdxwm08g6p32sqs8-pleroma-2.4.2/releases/COOKIE: No such file or directory

My knowledge of Erlang and Elixir is very limited, I took this opportunity to dig a bit more into EVM.

I now realize this whole story boils down to the threat model you had in mind when writing this. Since I'm a novice here, I'm going to describe what I see as the current threat model. Could you confirm this is what you had in mind, just to make sure we're at the same page here?

From what I can tell, elixir uses the RELEASE_DISTRIBUTION value to determine whether epmd will be disabled, listening on localhost or listening for external connections. It defaults to sname, which, quoting the doc here, "allows access only within the current system". As far as I can tell, there's currently no way to override this variable in the current Nixpkgs BEAM infra, meaning there's no way for a Erlang node living on another host to connect with the node. In conclusion, regardless where we store the cookie (in the store, in a protected folder), we won't be carelessly exposing the Erlang node on internet.

Now, let's consider the local users. If we were to store the release cookie in the Nix store, it would mean that any user having access to the machine running the Erlang node could get an interactive access to it provided they can find the world-readable cookie in the Nix store. This is the unacceptable part leading us not to store the running cookie to the store.

^ Is this threat model correct or am I (once again) missing something?


In the end, I think this situation boils down to a single question: how common is it to find Elixir binaries meant to be both short-lived and used interactively 1.

  • If it's fairly common, we should find a clever way to setup a dummy cookie that could potentially live in the store provided the short-lived nature of the node using it.
  • If it's uncommon, then it's probably more sensible to leave things are they are and manually patch pleroma_ctl in the end.

Footnotes

  1. ie. is pleroma_ctl a really weird and uncommon use case of elixir?

@DianaOlympos
Copy link
Contributor

I would say that in general, yes, having elixir short lived command line tools is "relatively" rare except for server side stuff managing an elixir application, and in this case it is expecting the operators know enough to handle the work needed to make it work. the Erlang system are not really meant for short lived stuff.

@happysalada
Copy link
Contributor

We had a brief discussion offline with NinjaTrappeur.

The actionable on this item would be to make a PR to document the threat model and our choice.
The default choice of having a cookie is surprising with nix since it could enable any user to access the node (an attacker could get access to the node without being root).

The additional detail that was new to me is that the commands like pleroma_ctl don't actually need the "real" cookie to connect, but having a dummy cookie works (I could have sworn that wasn't possible). Probably the best for these commands is to add a wrapper with a dummy cookie.

The last issue we haven't talked about is getting an iex shell on the running node. IIRC you need the right cookie to connect. I think though that people who need to get iex on a production node working correctly won't be newbies, so we can let them figure it out.

Thank you again for starting the discussion!

@lambdadog
Copy link
Contributor

lambdadog commented Apr 28, 2022

It may be worthwhile to add both a dummyCookie and a cookieFilePath argument to the mixRelease builder.

defaults:

  • dummyCookie: false
  • cookieFilePath: null

asserts:

  • If dummyCookie is true, then cookieFilePath must be null.
  • if dummyCookie is false, then cookieFilePath must be a string.
    • cookieFilePath MUST be a string or null, rather than a nix path (as nix paths are copied into the nix store)

This means that one of the values must be set, and since to my understanding no (without fixups) mixRelease build will function without a cookie and they will always expect the cookie to be in the nix store, which obviously isn't acceptable, this means that every case is covered cleanly, it just requires input from the derivation author.

The dummy cookie would just be a cookie generated in /tmp, generated if it doesn't exist by wrappers on all binaries.

@picnoir
Copy link
Member Author

picnoir commented May 4, 2022

It may be worthwhile to add both a dummyCookie and a cookieFilePath argument to the mixRelease builder.

Sounds like a sensible different approach. It'd probably mean having to patch mix as described here #166229 (comment) on the Elixir side. (or wrapping all the mix release builds).

@lambdadog would you be up to implement that?

@lambdadog
Copy link
Contributor

I see no compelling reason why we shouldn't simply wrap all mix release builds. A patch is more maintenance and wrapping a program can be considered to be cheap enough we don't need to care on all NixOS targets.

And absolutely, I'll start work on it.

@picnoir
Copy link
Member Author

picnoir commented May 5, 2022

I see no compelling reason why we shouldn't simply wrap all mix release builds.

We potentially could add a CLI flag to mix release and try to upstream that.


Adding a wrapper is also fine by me.

@yu-re-ka
Copy link
Contributor

yu-re-ka commented May 6, 2022

Okay I haven't commented on this so far.

The fact that mix release generates a cookie file during the release step is really not great. It also means that many other distros and official installation guide have users end up with a well-known default cookie.
This problem is not NixOS-specific. This is something that should be fixed long-term in upstream.

I think it's fine if we put a well-known default cookie in the Nix store, because that is how the mix release process works for now.

Instead, I think the proper solution here is to completely disable the distribution features of the beam VM by default in all the NixOS modules. This can be done by setting RELEASE_DISTRIBUTION=none (at runtime) and prevents the beam process from binding to an additional port for distribution features, and also starting epmd.
It will still require a cookie to run, but it could use the default cookie from the nix store since it is not really used for anything.

If someone wants to set up an installation with distribution features and do that securely, it's their task to properly secure access to those ports and set a different cookie.

@afontaine
Copy link
Contributor

Yeah I'm leaning towards agreeing with @yu-re-ka here as well.

That being said... I would like to keep the remote_iex functionality intact, at least for localhost connections.

It's probably more correct to disable distribution stuff by default, and let users explicitly enable them though.

Does the BEAM VM require a cookie? if one isn't provided, does that mean anyone can connect? If not, then I would say remove the cookie.

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Nov 12, 2022
@zoedsoupe
Copy link

Any updates or new thoughts on this?

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Mar 30, 2023
@benbot
Copy link

benbot commented Dec 2, 2023

This seems to be preventing me from running my phoenix app :(

What are we supposed to do?

alejandro-angulo added a commit to alejandro-angulo/dotfiles that referenced this issue Dec 18, 2023
- Need to provide RELEASE_COOKIE environment variable when running the
  app (NixOS/nixpkgs#166229)

- Deploy script has an output directory hardcoded that doesn't play nice
  with nix. I made change and generated a patch file with `git diff` in
  my local copy of the repo. I also had to make sure to change the
  filepaths in the patchfile to remove the `assets/` prefix. The
  contents of the this directory must be moved to `priv/static/assets`.

- Have to manually install the phoenix node dependencies (these aren't
  fetched from npm, but from the repo itself).
@SecretVal
Copy link

I am also having problems with this. Any updates?

@lambdadog
Copy link
Contributor

I can say that I'm no longer really available to work on this issue (and apologies for falling through on it previously), but as far as I'm aware the implementation I discussed should still be sound.

I am curious if anything has come from mentions of upstreaming some changes. I get the feeling that Elixir and Erlang applications weren't really designed to be distributed via a system package manager and that's the core of the issue, but I'm not sure if the teams upstream would be interested in including and maintaining changes that enable this kind of package manager distribution a bit better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants