Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Services depending on can cause hanging boots on NixOS containers #67265

Ma27 opened this issue Aug 22, 2019 · 5 comments

Services depending on can cause hanging boots on NixOS containers #67265

Ma27 opened this issue Aug 22, 2019 · 5 comments


Copy link

@Ma27 Ma27 commented Aug 22, 2019

Describe the bug

When starting an imperative NixOS container which is deployed using the container backend from NixOps with several secrets uploaded using the deployment.keys module and a dovecot2 unit from services.dovecot installed, the boot times out and causes the container to fail as it's waiting for an infinite amount of time for (which is a systemd-target that indicates whether all keys from NixOps were successfully uploaded).

This happens because several modules from nixpkgs (including dovecot) wait for by default, but are wanted by which causes the system to wait until the unit is started up (which is supposed to happen is reached).

The problem with NixOS containers is that they don't have a proper uplink until the container@name.service is completely started when using scripted networking as the ve-<name> interface on the host side is configured after the container is started up:

postStartScript = (cfg:
ipcall = cfg: ipcmd: variable: attribute:
if cfg.${attribute} == null then
if [ -n "${variable}" ]; then
${ipcmd} add ${variable} dev $ifaceHost
''${ipcmd} add ${cfg.${attribute}} dev $ifaceHost'';
renderExtraVeth = name: cfg:
if cfg.hostBridge != null then
# Add ${name} to bridge ${cfg.hostBridge}
ip link set dev ${name} master ${cfg.hostBridge} up
echo "Bring ${name} up"
ip link set dev ${name} up
# Set IPs and routes for ${name}
${optionalString (cfg.hostAddress != null) ''
ip addr add ${cfg.hostAddress} dev ${name}
${optionalString (cfg.hostAddress6 != null) ''
ip -6 addr add ${cfg.hostAddress6} dev ${name}
${optionalString (cfg.localAddress != null) ''
ip route add ${cfg.localAddress} dev ${name}
${optionalString (cfg.localAddress6 != null) ''
ip -6 route add ${cfg.localAddress6} dev ${name}
if [ -n "$HOST_ADDRESS" ] || [ -n "$LOCAL_ADDRESS" ] ||
[ -n "$HOST_ADDRESS6" ] || [ -n "$LOCAL_ADDRESS6" ]; then
if [ -z "$HOST_BRIDGE" ]; then
ip link set dev $ifaceHost up
${ipcall cfg "ip addr" "$HOST_ADDRESS" "hostAddress"}
${ipcall cfg "ip -6 addr" "$HOST_ADDRESS6" "hostAddress6"}
${ipcall cfg "ip route" "$LOCAL_ADDRESS" "localAddress"}
${ipcall cfg "ip -6 route" "$LOCAL_ADDRESS6" "localAddress6"}
${concatStringsSep "\n" (mapAttrsToList renderExtraVeth cfg.extraVeths)}
# Get the leader PID so that we can signal it in
# preStop. We can't use machinectl there because D-Bus
# might be shutting down. FIXME: in systemd 219 we can
# just signal systemd-nspawn to do a clean shutdown.
machinectl show "$INSTANCE" | sed 's/Leader=\(.*\)/\1/;t;d' > "/run/containers/$"

With the container being unreachable until start-up is done, it's impossible to send keys on an unattended reboot to containers to ensure that is properly reached (which makes the system wait for dovecot2.service as it currently depends on The timeout of dovecot2 keeps the container from properly starting up.

In my case the issue wouldn't exist if dovecot2.service didn't depend on as I only deploy secrets for services.borgbackup currently, so it's completely unnecessary for dovecot2.service to wait for that target.

My current workaround looks like this:

{ lib, ... }: { = {
    wants = lib.mkForce [ ];
    after = lib.mkForce [ "" ];

To Reproduce
Steps to reproduce the behavior:

  1. Deploy a container with deployment.targetEnv = "container";
  2. Deploy several secrets with deployment.keys and a dovecot instance using services.dovecot
  3. Try to reboot the container

Expected behavior

I originally expected that no service would wait for the keys on its own without explicitly configuring it to do so as the nixops manual recommends to use the <key-name>-key.service units and recommends to explicitly add those to the units in question.

However one might argue as well that the actual issue is the broken uplink for NixOS containers at boot, so I'd like to gather some opinions before filing a patch :)

Maintainer information:

# a list of nixpkgs attributes affected by the problem
# a list of nixos modules affected by the problem
  - systemd
  - services.dovecot
  - services.httpd
  - services.nsd
  - services.strongswan
  - services.strongswan-swanctl

CCing @edolstra @hrdinka (for dovecot2) and @lheckemann (as we talked about this earlier that day)

Copy link

@hrdinka hrdinka commented Aug 23, 2019


Thanks for the detailed write-up. I have added the dependency to dovecot back in time. My problem then was that dovecot/the hole system would not start because it was missing the key files. Actually every service could possible depend on a key deployed by nixops. Therefore a solution that fixes this problem for all services would be favorable.

While it would be great to have a proper replacement, finding one isn't easy for the reasons described above. Since nixops does cover this in its documentation now (it didn't back then), I am in favor of removing the dependency from all services. We should however, add this to the NixOS realease notes and wait for NixOS 19.09 before porting this to stable.

Copy link

@lheckemann lheckemann commented Aug 23, 2019

Absolutely agree that this shouldn't go into 19.03, but yeah I'm also in favour of making the change on master before the feature freeze (7th September) :)

Copy link
Member Author

@Ma27 Ma27 commented Aug 23, 2019

Thanks for the feedback! I'll open a PR tomorrow which removes the dependencies to from modules in <nixpkgs/nixos>.

However I'd keep this issue open after that until we've discussed whether should be declared in a module in NixOps.

Ma27 added a commit to Ma27/nixpkgs that referenced this issue Aug 27, 2019
The `` is used to indicate whether all NixOps keys were
successfully uploaded on an unattended reboot. However this can cause
startup issues e.g. with NixOS containers (see NixOS#67265) and can block
boots even though this might not be needed (e.g. with a dovecot2
instance running that doesn't need any of the NixOps keys).

As described in the NixOps manual[1], dependencies to keys should be
defined like this now:

``` nix
{ = {
    after = [ "secret-key.service" ];
    wants = [ "secret-key.service" ];

However I'd leave the issue open until it's discussed whether or not to
keep `` in `nixpkgs`.

Copy link
Member Author

@Ma27 Ma27 commented Apr 16, 2020

The actual issue has been fixed for 19.09 already, so this should be closable now.

@Ma27 Ma27 closed this Apr 16, 2020
Copy link

@nixos-discourse nixos-discourse commented Aug 4, 2020

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants