Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Imperative containers broken on 18.03/nixos-unstable #40355

Closed
cillianderoiste opened this issue May 11, 2018 · 17 comments · Fixed by #47917
Closed

Imperative containers broken on 18.03/nixos-unstable #40355

cillianderoiste opened this issue May 11, 2018 · 17 comments · Fixed by #47917

Comments

@cillianderoiste
Copy link
Member

cillianderoiste commented May 11, 2018

Issue description

I upgraded an imperative container from 17.03 to 18.03 with nixos-unstable on the host: 18.09pre139319.1d9330d63a5 (Jellyfish), and now I can no longer run any nix commands:

$ nixos-rebuild -v switch 
error: opening lock file '/nix/var/nix/db/big-lock': Read-only file system
building Nix...
error: opening lock file '/nix/var/nix/db/big-lock': Read-only file system
error: opening lock file '/nix/var/nix/db/big-lock': Read-only file system
error: opening lock file '/nix/var/nix/db/big-lock': Read-only file system
warning: don't know how to get latest Nix
error: opening lock file '/nix/var/nix/db/big-lock': Read-only file system
building the system configuration...
error: opening lock file '/nix/var/nix/db/big-lock': Read-only file system
$ nix-env -iA nixos.hello 
error: opening lock file '/nix/var/nix/db/big-lock': Read-only file system

I also tried creating a fresh container and it breaks in the same way.

Technical details

nix-info on the container fails with:
error: opening lock file '/nix/var/nix/db/big-lock': Read-only file system
system: 0, multi-user?: no, error: opening lock file '/nix/var/nix/db/big-lock': Read-only file system
version: 0, error: opening lock file '/nix/var/nix/db/big-lock': Read-only file system

nix-info on the host system:
system: "x86_64-linux", multi-user?: yes, version: nix-env (Nix) 2.0.2, channels(goibhniu): "nixos-18.03pre120540.b8f7027360", channels(root): "nixos-18.09pre139319.1d9330d63a5", nixpkgs: /nix/var/nix/profiles/per-user/root/channels/nixos/nixpkgs

@arianvp
Copy link
Member

arianvp commented May 14, 2018

I can confirm this issue on a clean 18.03 install.

To reproduce:

nixos-container create test
nixos-container start test
nixos-container root-login test
# nixos-rebuild switch

@cillianderoiste cillianderoiste changed the title Imperative container breaks nix commands after upgrade from 17.03 to 18.03 (big-lock) Imperative containers broken on 18.03/nixos-unstable May 14, 2018
@arianvp
Copy link
Member

arianvp commented May 18, 2018

This issue sees related NixOS/nix#2134

@Profpatsch
Copy link
Member

error: opening lock file '/nix/var/nix/db/big-lock': Read-only file system

That’s the error messages you get when you try using nix 1.11 together with 2.0.
I suspect inside the imperative containers there might be a different version running than outside?

@arianvp
Copy link
Member

arianvp commented May 27, 2018

This doesn't seem to be the issue. the container is running the same nix version, it seems:

[arian@nixos:~]$ sudo nixos-container root-login lol3
[sudo] password for arian: 

[root@lol3:~]# nix-build --version
nix-build (Nix) 2.0.2

[root@lol3:~]# nix --version
nix (Nix) 2.0.2

[root@lol3:~]# nix-env --version
error: opening lock file '/nix/var/nix/db/big-lock': Read-only file system

[root@lol3:~]# readlink $(which nix-env)
/nix/store/j4di8j9awar03dfz2c91hd0yrdw427v1-nix-2.0.2/bin/nix-env

Interestingly enough, the command nix-env --version crashes already

@arianvp
Copy link
Member

arianvp commented May 27, 2018

Aha, I think I found the culprit

/nix/var/nix/db is mounted read-only into the container. Whilst in 17.09 I think it would have been read-write. (How else would an imperative container modify the nix store otherwise?)
However, I do not have a 17.09 install at hand on which I could try this.

[root@lol3:/nix/var/nix/db]# cat /proc/mounts 
/dev/sda5 / btrfs rw,relatime,ssd,space_cache,subvolid=5,subvol=/var/lib/containers/lol3 0 0
tmpfs /tmp tmpfs rw,nosuid,nodev 0 0
tmpfs /sys tmpfs ro,nosuid,nodev,noexec,relatime,mode=755 0 0
tmpfs /dev tmpfs rw,nosuid,size=822424k,mode=755 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev,size=8224204k 0 0
devtmpfs /dev/net/tun devtmpfs rw,nosuid,size=822424k,nr_inodes=2046098,mode=755 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=3,mode=620,ptmxmode=666 0 0
devpts /dev/console devpts rw,nosuid,noexec,relatime,gid=3,mode=620,ptmxmode=666 0 0
tmpfs /run tmpfs rw,nosuid,nodev,size=4112104k,mode=755 0 0
tmpfs /run/systemd/nspawn/incoming tmpfs ro,size=4112104k,mode=755 0 0
/dev/sda5 /nix/store btrfs ro,relatime,ssd,space_cache,subvolid=5,subvol=/nix/store 0 0
/dev/sda5 /nix/var/nix/daemon-socket btrfs ro,relatime,ssd,space_cache,subvolid=5,subvol=/nix/var/nix/daemon-socket 0 0
/dev/sda5 /nix/var/nix/db btrfs ro,relatime,ssd,space_cache,subvolid=5,subvol=/nix/var/nix/db 0 0
/dev/sda5 /nix/var/nix/gcroots btrfs rw,relatime,ssd,space_cache,subvolid=5,subvol=/nix/var/nix/gcroots/per-container/lol3 0 0
/dev/sda5 /nix/var/nix/profiles btrfs rw,relatime,ssd,space_cache,subvolid=5,subvol=/nix/var/nix/profiles/per-container/lol3 0 0

@arianvp
Copy link
Member

arianvp commented May 27, 2018

In nix 1.12, this lock was never acquired. So it was not a problem that
the /nix/var/nix/db was mounted ro
https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/virtualisation/containers.nix#L129

However, acquiring the lock throws an EROFS (Read only file system),
However, in Nix 2.0, the following patch was introduced:
NixOS/nix@9bdd949

Previously, the function would return on EROFS, but now the exception is actually raised... So this seems to be an issue with nix 2.0 not working with a readonly db folder.

@arianvp
Copy link
Member

arianvp commented May 27, 2018

I'm not very familiar with the nix source code, and why not throwing on EROFS was not an issue in the past. But I have a feeling this is a nix bug and should be filed in that repository as well.

@cillianderoiste
Copy link
Member Author

You can work around this for now by remounting /nix/var/nix/db as read/write within the container:
mount -o remount,rw /nix/var/nix/db

@arianvp
Copy link
Member

arianvp commented Jul 13, 2018 via email

@LnL7
Copy link
Member

LnL7 commented Jul 13, 2018

Making the host nix database writable in the container is a very bad idea, eg.

<goibhniu>	LnL: in case you missed it ... I ran `nix-collect-garbage -d` in a container, but it garbage collected everything from the host system :/
<LnL>	euh...
<goibhniu>	I get this error when I try to run nix-channel, or nix-env commands that download stuff. Both in the container and on the host.
<LnL>	like gcroots on the host?
<goibhniu>	yep, I only have access to the programs that are still running, and what's in the store for the container
<LnL>	ok #1 please create an issue for that, this really shouldn't be possible
<goibhniu>	at least I can install stuff from the store that the container uses
<ben>	oh i guess the nix daemon isnt seeing your local env vars
<goibhniu>	I can't look up the issue now, but it's probably because I changed /nix/var/nix/db (or something like that) to rw, to work around an issue.
<ben>	is that how it works
<goibhniu>	oh!
<LnL>	hrm, that might have enabled this yes
<LnL>	containers should only talk to the daemon, not access the db directly AFAIK
goibhniu	thinks there's some env var that can get nix to not use the daemon
<LnL>	well if you make the db writable it'll try to use that directly unless you explicitly set NIX_REMOTE=daemon
<LnL>	container -> nix-cli -> host -> nix-daemon -> db would know about your host's roots, container -> nix-cli -> db won't

@flokli
Copy link
Contributor

flokli commented Jul 17, 2018

So how should the container connect to the host's nix-daemon? Docs read like it's only possbile to connect over ssh, or could you expose it as a file-based socket too?

@LnL7
Copy link
Member

LnL7 commented Jul 17, 2018

Yes. It's a unix domain socket, running nix commands as an unprivileged user also use this to communicate with the daemon. By default it's located at /nix/var/nix/daemon-socket/socket, but it's possible to customize with the --store flag or NIX_REMOTE environment variable if you want another path inside the container.

$ nix-store -r /nix/store/kmwd1hq55akdb9sc7l3finr175dajlby-hello-2.10 --store unix://foo/socket
these paths will be fetched (0.04 MiB download, 0.19 MiB unpacked):
  /nix/store/kmwd1hq55akdb9sc7l3finr175dajlby-hello-2.10
copying path '/nix/store/kmwd1hq55akdb9sc7l3finr175dajlby-hello-2.10' from 'https://cache.nixos.org'...
warning: you did not specify '--add-root'; the result might be removed by the garbage collector
/nix/store/kmwd1hq55akdb9sc7l3finr175dajlby-hello-2.10
$ /nix/store/kmwd1hq55akdb9sc7l3finr175dajlby-hello-2.10/bin/hello
Hello, world!

@arianvp
Copy link
Member

arianvp commented Aug 2, 2018

Is there anyone who knows how this stuff works, and thinks can help me get it fixed before 18.09?

Otherwise I'd suggest removing containers from the documentation for the 18.09 release as they currently just do not work at all anymore. And bringing it back in the 19.03 release

cc @vcunat @samueldr

@reinhardt
Copy link
Contributor

I'm afraid I don't know about the internals, but containers do work for me as long as I stick with e.g. nixos-container update mycontainer instead of logging in and trying nixos-rebuild switch.

arianvp added a commit to arianvp/nixpkgs that referenced this issue Sep 28, 2018
Nix commands inside the container have been broken since 18.03,
and no fix is yet in sight.  Lets remove from the documentation
that this is a usecase that we support, as it doesn't seem
likely that this will be fixed before 18.09 either.

See NixOS#40355
@arianvp
Copy link
Member

arianvp commented Sep 28, 2018

@reinhardt I've updated the docs to reflect this

Mic92 pushed a commit that referenced this issue Sep 29, 2018
Nix commands inside the container have been broken since 18.03,
and no fix is yet in sight.  Lets remove from the documentation
that this is a usecase that we support, as it doesn't seem
likely that this will be fixed before 18.09 either.

See #40355

(cherry picked from commit f309440)
@arianvp
Copy link
Member

arianvp commented Oct 5, 2018

Okay, after some digging, I have found out what is going wrong, and how we can fix it.
The fact that this worked before seems to have been pure coincidence.

Because we are user root within the container namespace, the nix commands assume single-user mode, and try to modify the store directly. However, the root inside the user namespace is different than the root that owns /nix/store. Hence we can't modify the store at al, and the nix command crashes. Instead, we should force nixos-container to talk to the host nix daemon instead:

The solution is to force the root user to use the host daemon, which we can do as follows:

$ sudo nixos-container root-login 
# export NIX_REMOTE=daemon
# nixos-rebuild switch

When we are not the root user in the container, nix commands already work as expected...

[arian@t430s:~]$ sudo nixos-container create test --config 'users.users.arian = { isNormalUser = true; createHome = true; };'

[arian@t430s:~]$ sudo nixos-container start test

[arian@t430s:~]$ sudo nixos-container root-login test
#  nix-channel --add  https://nixos.org/channels/nixos-18.03 nixpkgs
# nix-channel --update
# su arian
$ <all nix commands now work>

We should add the NIX_REMOTE=daemon environment variable to the root-login command, and then everything should work as expected...

@reinhardt
Copy link
Contributor

Nice catch!

@arianvp arianvp mentioned this issue Oct 5, 2018
9 tasks
arianvp added a commit to arianvp/nixpkgs that referenced this issue Oct 5, 2018
When logging into a container by using
  nixos-container root-login
all nix-related commands in the container would fail, as they
tried to modify the nix db and nix store, which are mounted
read-only in the container.  We want nixos-container to not
try to modify the nix store at all, but instead delegate
any build commands to the nix daemon of the host operating system.

This already works for non-root users inside a nixos-container,
as it doesn't 'own' the nix-store, and thus defaults
to talking to the daemon socket at /nix/var/nix/daemon-socket/,
which is bind-mounted to the host daemon-socket, causing all nix
commands to be delegated to the host.

However, when we are the root user inside the container, we have the
same uid as the nix store owner, eventhough it's not actually
the same root user (due to user namespaces). Nix gets confused,
and is convinced it's running in single-user mode, and tries
to modify the nix store directly instead.

By setting `NIX_REMOTE=daemon` in `/etc/profile`, we force nix
to operate in multi-user mode, so that it will talk to the host
daemon instead, which will modify the nix store for the container.

This fixes NixOS#40355
arianvp added a commit to arianvp/nixpkgs that referenced this issue Oct 8, 2018
samueldr pushed a commit that referenced this issue Oct 8, 2018
When logging into a container by using
  nixos-container root-login
all nix-related commands in the container would fail, as they
tried to modify the nix db and nix store, which are mounted
read-only in the container.  We want nixos-container to not
try to modify the nix store at all, but instead delegate
any build commands to the nix daemon of the host operating system.

This already works for non-root users inside a nixos-container,
as it doesn't 'own' the nix-store, and thus defaults
to talking to the daemon socket at /nix/var/nix/daemon-socket/,
which is bind-mounted to the host daemon-socket, causing all nix
commands to be delegated to the host.

However, when we are the root user inside the container, we have the
same uid as the nix store owner, eventhough it's not actually
the same root user (due to user namespaces). Nix gets confused,
and is convinced it's running in single-user mode, and tries
to modify the nix store directly instead.

By setting `NIX_REMOTE=daemon` in `/etc/profile`, we force nix
to operate in multi-user mode, so that it will talk to the host
daemon instead, which will modify the nix store for the container.

This fixes #40355

(cherry picked from commit 3624bb5)
samueldr pushed a commit that referenced this issue Oct 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants