Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos-container tool is broken if $NIXOS_CONFIG environment variable is set #22948

Closed
peti opened this issue Feb 18, 2017 · 13 comments
Closed

nixos-container tool is broken if $NIXOS_CONFIG environment variable is set #22948

peti opened this issue Feb 18, 2017 · 13 comments
Assignees
Labels
0.kind: bug Something is broken 1.severity: blocker This is preventing another PR or issue from being completed
Milestone

Comments

@peti
Copy link
Member

peti commented Feb 18, 2017

I tried to create and run a container on nixos 17.03pre100896 (4450327). Unfortunately, this won't work:

# nixos-container create foo 
host IP is 10.233.1.1, container IP is 10.233.1.2

# nixos-container start foo 
Job for container@foo.service failed because a timeout was exceeded.
See "systemctl status container@foo.service" and "journalctl -xe" for details.
/run/current-system/sw/bin/nixos-container: failed to start container

The relevant bit of the journal seems to be:

[...]
Feb 18 19:09:24 container foo[5465]:          Starting Networking Setup...
Feb 18 19:09:24 container foo[5465]: [  OK  ] Started Networking Setup.
Feb 18 19:10:54 systemd[1]: container@foo.service: Start operation timed out. Terminating.
Feb 18 19:10:54 container foo[5465]: [185B blob data]
Feb 18 19:10:54 container foo[5465]: [ TIME ] Timed out waiting for device dev-disk-by\x2dlabel-boot.device.
Feb 18 19:10:54 container foo[5465]: [DEPEND] Dependency failed for /boot.
Feb 18 19:10:54 container foo[5465]: [DEPEND] Dependency failed for Local File Systems.
Feb 18 19:10:54 container foo[5465]: [ TIME ] Timed out waiting for device sys-subsystem-net-devices-wlp2s0.device.
[...]

Apparently, the container cannot find it's boot device.

@peti peti added the 0.kind: bug Something is broken label Feb 18, 2017
@globin globin added the 1.severity: blocker This is preventing another PR or issue from being completed label Feb 18, 2017
@globin globin added this to the 17.03 milestone Feb 18, 2017
@globin
Copy link
Member

globin commented Feb 18, 2017

Could you try again with the current master, I've merged a systemd patch fixing docker and lxc.

@peti
Copy link
Member Author

peti commented Feb 19, 2017

I tried again using today's master branch at b322271, but it makes no difference:

Feb 19 10:28:38 latitude container foo[3214]: Spawning container foo on /var/lib/containers/foo.
Feb 19 10:28:38 latitude container foo[3214]: Press ^] three times within 1s to kill container.
Feb 19 10:28:38 latitude container foo[3214]: /etc/localtime does not point into /usr/share/zoneinfo/, not updating container timezone.
Feb 19 10:28:38 latitude container foo[3214]: <<< NixOS Stage 2 >>>
Feb 19 10:28:38 latitude container foo[3214]: tee: /proc/self/fd/10: No such device or address
Feb 19 10:28:39 latitude container foo[3214]: starting systemd...
Feb 19 10:28:39 latitude container foo[3214]: systemd 232 running in system mode. (+PAM +AUDIT -SELINUX +IMA +APPARMOR -SMACK -SYSVINIT +UTMP -LIBCRYPTSETUP +GCRYPT -GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID -ELFUTILS +KMOD -IDN)
Feb 19 10:28:39 latitude container foo[3214]: Detected virtualization systemd-nspawn.
Feb 19 10:28:39 latitude container foo[3214]: Detected architecture x86-64.
Feb 19 10:28:39 latitude container foo[3214]: [1B blob data]
Feb 19 10:28:39 latitude container foo[3214]: Welcome to NixOS 17.03pre101267.a9584c9 (Gorilla)!
Feb 19 10:28:39 latitude container foo[3214]: [1B blob data]
Feb 19 10:28:39 latitude container foo[3214]: Set hostname to <latitude>.
Feb 19 10:28:39 latitude container foo[3214]: Initializing machine ID from container UUID.
Feb 19 10:28:39 latitude container foo[3214]: Failed to install release agent, ignoring: No such file or directory
Feb 19 10:28:39 latitude container foo[3214]: console-getty.service: Cannot add dependency job, ignoring: Unit console-getty.service is masked.
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Created slice User and Session Slice.
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Listening on udev Kernel Socket.
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Reached target Swap.
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Listening on /dev/initctl Compatibility Named Pipe.
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Started Dispatch Password Requests to Console Directory Watch.
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Started Forward Password Requests to Wall Directory Watch.
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Listening on Journal Socket (/dev/log).
Feb 19 10:28:39 latitude container foo[3214]: [UNSUPP] Starting of proc-sys-fs-binfmt_misc.automount not supported.
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Listening on Journal Socket.
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Reached target Paths.
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Reached target Remote File Systems.
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Listening on udev Control Socket.
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Created slice System Slice.
Feb 19 10:28:39 latitude container foo[3214]:          Starting udev Coldplug all Devices...
Feb 19 10:28:39 latitude container foo[3214]:          Starting Journal Service...
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Reached target Slices.
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Created slice system-getty.slice.
Feb 19 10:28:39 latitude container foo[3214]:          Mounting Huge Pages File System...
Feb 19 10:28:39 latitude container foo[3214]:          Starting Firewall...
Feb 19 10:28:39 latitude container foo[3214]:          Mounting POSIX Message Queue File System...
Feb 19 10:28:39 latitude container foo[3214]:          Starting udev Kernel Device Manager...
Feb 19 10:28:39 latitude container foo[3214]:          Starting Apply Kernel Variables...
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Reached target Local File Systems (Pre).
Feb 19 10:28:39 latitude container foo[3214]:          Starting Update UTMP about System Boot/Shutdown...
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Started Update UTMP about System Boot/Shutdown.
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Started Apply Kernel Variables.
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Mounted Huge Pages File System.
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Mounted POSIX Message Queue File System.
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Started Journal Service.
Feb 19 10:28:39 latitude container foo[3214]:          Starting Flush Journal to Persistent Storage...
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Started udev Kernel Device Manager.
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Started Flush Journal to Persistent Storage.
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Started udev Coldplug all Devices.
Feb 19 10:28:39 latitude container foo[3214]:          Starting udev Wait for Complete Device Initialization...
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Started udev Wait for Complete Device Initialization.
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Started Firewall.
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Reached target Network (Pre).
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Reached target All Network Interfaces (deprecated).
Feb 19 10:28:39 latitude container foo[3214]:          Starting Networking Setup...
Feb 19 10:28:39 latitude container foo[3214]: [  OK  ] Started Networking Setup.
Feb 19 10:28:56 latitude container foo[3214]: [2.0K blob data]
Feb 19 10:29:08 latitude container foo[3214]: [2.0K blob data]
Feb 19 10:29:20 latitude container foo[3214]: [2.0K blob data]
Feb 19 10:29:32 latitude container foo[3214]: [2.0K blob data]
Feb 19 10:29:44 latitude container foo[3214]: [2.0K blob data]
Feb 19 10:29:56 latitude container foo[3214]: [2.0K blob data]
Feb 19 10:30:08 latitude container foo[3214]: [2.0K blob data]
Feb 19 10:30:09 latitude container foo[3214]: [185B blob data]
Feb 19 10:30:09 latitude container foo[3214]: [ TIME ] Timed out waiting for device sys-subsystem-net-devices-wlp2s0.device.
Feb 19 10:30:09 latitude container foo[3214]: [DEPEND] Dependency failed for WPA Supplicant.
Feb 19 10:30:09 latitude container foo[3214]: [ TIME ] Timed out waiting for device dev-disk-by\x2dlabel-boot.device.
Feb 19 10:30:09 latitude container foo[3214]: [DEPEND] Dependency failed for /boot.
Feb 19 10:30:09 latitude container foo[3214]: [DEPEND] Dependency failed for Local File Systems.
Feb 19 10:30:09 latitude container foo[3214]:          Starting Create Volatile Files and Directories...
Feb 19 10:30:09 latitude container foo[3214]: [  OK  ] Reached target Host and Network Name Lookups.
Feb 19 10:30:09 latitude container foo[3214]: [  OK  ] Reached target Login Prompts.
Feb 19 10:30:09 latitude container foo[3214]: [  OK  ] Reached target Timers.
Feb 19 10:30:09 latitude container foo[3214]: [  OK  ] Reached target Sockets.
Feb 19 10:30:09 latitude container foo[3214]: [  OK  ] Started Emergency Shell.
Feb 19 10:30:09 latitude container foo[3214]: [  OK  ] Reached target Emergency Mode.
Feb 19 10:30:09 latitude container foo[3214]:          Starting Rebuild Journal Catalog...
Feb 19 10:30:09 latitude container foo[3214]: [  OK  ] Reached target User and Group Name Lookups.
Feb 19 10:30:09 latitude container foo[3214]: [  OK  ] Reached target Network.
Feb 19 10:30:09 latitude container foo[3214]: [  OK  ] Started Rebuild Journal Catalog.
Feb 19 10:30:09 latitude container foo[3214]: [  OK  ] Started Create Volatile Files and Directories.
Feb 19 10:30:09 latitude container foo[3214]: [  OK  ] Reached target System Time Synchronized.
Feb 19 10:30:09 latitude container foo[3214]: You are in emergency mode. After logging in, type "journalctl -xb" to view
Feb 19 10:30:09 latitude container foo[3214]: system logs, "systemctl reboot" to reboot, "systemctl default" or ^D to
Feb 19 10:30:09 latitude container foo[3214]: try again to boot into default mode.
Feb 19 10:30:09 latitude container foo[3214]: [1B blob data]
Feb 19 10:30:09 latitude container foo[3214]: Cannot open access to console, the root account is locked.
Feb 19 10:30:09 latitude container foo[3214]: See sulogin(8) man page for more details.
Feb 19 10:30:09 latitude container foo[3214]: [1B blob data]
Feb 19 10:30:09 latitude container foo[3214]: Press Enter to continue.

@zimbatm
Copy link
Member

zimbatm commented Feb 19, 2017

@globin can you point me to the patch?

For docker the workaround is to add this in the configuration.nix:

{
  boot.kernelParams = [
    "systemd.legacy_systemd_cgroup_controller=1"
  ];
}

See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=843509

@zimbatm
Copy link
Member

zimbatm commented Feb 19, 2017

See also moby/moby#16238

@globin
Copy link
Member

globin commented Feb 20, 2017

See 31ff2ac for the bump in nixpkgs and the cherry-picked commit: systemd/systemd@843d5ba

But I don't think this actually is relevant to this issue, but I still haven't been able to reproduce it..

@matthiasbeyer
Copy link
Contributor

Cannot reproduce (17.03pre101636.183eeb3 (Gorilla)):

 0 │ sudo nixos-container create foo
[sudo] Passwort für m: 
host IP is 10.233.1.1, container IP is 10.233.1.2
these derivations will be built:
  /nix/store/2id8ndm2454297vi4g15nqgbzqvn3gd2-system-path.drv
  /nix/store/48j19yxivrkvgrgl0a28d14mmn5f3kg2-dbus-1.drv
  /nix/store/m98654zdy53739dcvn9ic13yszb1l072-unit-polkit.service.drv
  /nix/store/w5rbjwbv473bvwli7igjaa9l4pycah35-unit-dbus.service.drv
  /nix/store/b5cir4s84dkp3mjcsgv3qxkf0kkbbbxh-system-units.drv
  /nix/store/plrc7pnrwyf4aygh3xhjvfh8mv270hjb-user-units.drv
  /nix/store/vlqzkl24xg4sdc2b2cq12x61szynnv3k-etc-hostname.drv
  /nix/store/7sgcd4bsj78mhpid23rwcqzh41l32kv1-etc.drv
  /nix/store/b5s6943hlbi3gq87jvachf92jmz3555b-users-groups.json.drv
  /nix/store/q7b8iralxi12jc91pq1mb5dq4qndjv35-nixos-system-foo-17.03pre101636.183eeb3.drv
building path(s) ‘/nix/store/8pwypxrpvq5pmgwynfzmrbw80h1mgypj-users-groups.json’
building path(s) ‘/nix/store/89nvwqbdy2ql0qagwrd6353c3zh24gbr-etc-hostname’
building path(s) ‘/nix/store/vkb974w01q2fc8wvd79lrn5as6lkmrng-system-path’
created 4113 symlinks in user environment
install-info: warning: no info dir entry in `/nix/store/vkb974w01q2fc8wvd79lrn5as6lkmrng-system-path/share/info/time.info'
building path(s) ‘/nix/store/vpsg13xa75b6ysm6nvb95snnizmjmnpy-dbus-1’
building path(s) ‘/nix/store/4nnca1j56bi7rwcn5vkcd6n99sa5ikzg-unit-polkit.service’
building path(s) ‘/nix/store/9lsd7pwys3q8xw6fgyi9zni7009kg04w-unit-dbus.service’
building path(s) ‘/nix/store/zp0js20av3c6cq6jb47gcnbns5zmbidj-system-units’
building path(s) ‘/nix/store/2b9gjh3jgrnsp0x7x4ngzy89655pw0hw-user-units’
building path(s) ‘/nix/store/x13llsl0vca2mq879rbpljjdr7394gz5-etc’
building path(s) ‘/nix/store/nfn3szjfjrv1zfmlrd16qli7ajncnc79-nixos-system-foo-17.03pre101636.183eeb3’
 0 │ sudo nixos-container start foo
 0 │ sudo nixos-container status foo
up

@fpletz
Copy link
Member

fpletz commented Feb 25, 2017

I'm also not able to reproduce this.

But why would the container even wait for a boot device or the wireless interface? I also noticed:

Feb 19 10:28:39 latitude container foo[3214]: Set hostname to <latitude>.

@peti Is latitude the hostname of your host system and thus the host config used in the container somehow?

@peti
Copy link
Member Author

peti commented Feb 27, 2017

@fpletz, yes, latitude is the name of host system. I don't think that the host's config should be used inside of the container. I created an empty container by running nixos-container create foo, which gets me the error message I cited above. Then I tried creating a container with a running ssh daemon, nixos-container create foo --config 'services.openssh.enable = true; users.extraUsers.root.openssh.authorizedKeys.keys = ["ssh-rsa AAAA..."];', but this behaves exactly the same way. I can re-produce this issue reliably on two different machines, both of which run a very recent NixOS master, i.e. revision a9584c9 from today.

@Mic92
Copy link
Member

Mic92 commented Feb 27, 2017

@peti it most likely fail because it fails to mount something (Dependency failed for Local File Systems). Can you login into emergency mode and check out which .mount unit fails? Just enter systemctl and then skip over the list for failing .mount units.

Update: After reading your logs, I saw that that it tries to mount this disk dev-disk-by\x2dlabel-boot.device. This should not be in the list of filesystems of a container. Can you spot it in your configuration?
Otherwise as a workaround adding nofail should prevent systemd from refusing to start:

  fileSystems = [
    { mountPoint = "/boot";
      device = "/dev/disk/by-label/boot";
      options = ["nofail"];
    }
];

@peti
Copy link
Member Author

peti commented Feb 27, 2017

I added the nofail option and this does make a difference, indeed. I see now that system attempts to start all kinds of crazy services which definitely belong to the host system -- not into the container. Here's just a snippet of the output:

Feb 27 21:25:33 latitude kernel: bbswitch: discrete card already disabled
Feb 27 21:25:33 latitude container foo[9074]: [  OK  ] Started S.M.A.R.T. Daemon.
Feb 27 21:25:33 latitude container foo[9074]:          Starting Extra networking commands....
Feb 27 21:25:33 latitude container foo[9074]: [  OK  ] Started Disable NVIDIA Card.
Feb 27 21:25:33 latitude container foo[9074]: [FAILED] Failed to start Configure I/O Scheduler.
Feb 27 21:25:33 latitude container foo[9074]: See 'systemctl status set-io-scheduler.service' for details.
Feb 27 21:25:33 latitude container foo[9074]: [  OK  ] Started Extra networking commands..
Feb 27 21:25:33 latitude container foo[9074]: [  OK  ] Started Cleanup old Diffie-Hellman parameters.
Feb 27 21:25:33 latitude container foo[9074]: [  OK  ] Started Name Service Cache Daemon.
Feb 27 21:25:33 latitude container foo[9074]: [  OK  ] Reached target User and Group Name Lookups.
Feb 27 21:25:33 latitude container foo[9074]:          Starting Login Service...
Feb 27 21:25:33 latitude container foo[9074]: [  OK  ] Reached target Host and Network Name Lookups.
Feb 27 21:25:33 latitude container foo[9074]: [  OK  ] Started D-Bus System Message Bus.
Feb 27 21:25:33 latitude container foo[9074]: [  OK  ] Started Login Service.
Feb 27 21:25:33 latitude container foo[9074]:          Starting X11 Server...
Feb 27 21:25:33 latitude container foo[9074]: [  OK  ] Started X11 Server.
Feb 27 21:25:34 latitude container foo[9074]: [  OK  ] Started SSH Host Key Generation.
Feb 27 21:25:34 latitude container foo[9074]: [  OK  ] Started SSH Daemon.
Feb 27 21:25:34 latitude container foo[9074]: [  OK  ] Stopped X11 Server.
Feb 27 21:25:34 latitude container foo[9074]:          Starting X11 Server...
Feb 27 21:25:34 latitude container foo[9074]: [  OK  ] Started X11 Server.
Feb 27 21:25:36 latitude container foo[9074]: [  OK  ] Stopped X11 Server.
Feb 27 21:25:36 latitude container foo[9074]:          Starting X11 Server...
Feb 27 21:25:36 latitude container foo[9074]: [  OK  ] Started X11 Server.
Feb 27 21:25:37 latitude container foo[9074]: [  OK  ] Stopped X11 Server.
Feb 27 21:25:37 latitude container foo[9074]: [FAILED] Failed to start X11 Server.
Feb 27 21:25:37 latitude container foo[9074]: See 'systemctl status display-manager.service' for details.
[  *** ]
[    **] (2 of 3
[***   ] (1 of 3) A star
[***   ] (3 of 3) A start job is
[    **] (2 of 3) A start job is running
[  *** ] (1 of 3) A start job is running for…-
Feb 27 21:27:03 latitude systemd[1]: container@foo.service: Start operation timed out. Terminating.
[ TIME ] Timed out waiting for device dev-disk-by\x2dlabel-root.device.

Notice how the container even attempts to set up a display-manager.service for X11!

So clearly this container is created using the host's configuration.nix file, not the one from /var/lib/containers/foo/etc/nixos/configuration.nix, which looks genuine.

@globin
Copy link
Member

globin commented Mar 22, 2017

@peti have you had any more insights what could be causing this? I've been using the containers a lot and haven't had any issues which are remotely similar to this.

@peti
Copy link
Member Author

peti commented Mar 22, 2017

The problem is caused by the fact that I have the environment variable $NIXOS_CONFIG defined in my environment:

# declare -p NIXOS_CONFIG 
declare -x NIXOS_CONFIG="/etc/nixos/configuration-latitude.nix"

With that variable defined, container operations fail. If I remove the variable, containers work fine.

@peti peti changed the title nixos-container broken in current nixos-unstable nixos-container tool is broken if $NIXOS_CONFIG environment variable is set Mar 22, 2017
@globin
Copy link
Member

globin commented Mar 22, 2017

Ah thanks, working on it 🚧

@globin globin self-assigned this Mar 22, 2017
globin added a commit to mayflower/nixpkgs that referenced this issue Mar 22, 2017
globin added a commit that referenced this issue Mar 22, 2017
@globin globin closed this as completed in 9b9416c Mar 22, 2017
adrianpk added a commit to adrianpk/nixpkgs that referenced this issue May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken 1.severity: blocker This is preventing another PR or issue from being completed
Projects
None yet
Development

No branches or pull requests

6 participants