/var/lib/nixos/uid-map corrupted when using nixos-rebuild build-vm many times #97305

davidak · 2020-09-06T17:33:01Z

Describe the bug
I was using nixos-rebuild build-vm to test a PR i was working on. At some point i was not able to login and many services don't started.

...
running activation script...
malformed JSON string, neither array, object, number, string or atom, at character offset 0 (before "\x{0}\x{0}\x{0}\x{0}...") at /nix/store/z9a0mg0qg4xhlih0wix950xgq285fbzh-update-users-groups.pl line 11.
Activation script snippet 'users' failed (2)
setting up /etc...
removing obsolete symlink ‘/etc/resolv.conf’...
removing obsolete symlink ‘/etc/systemd/resolved.conf’...
chown: invalid user: 'root:root'
Activation script snippet 'var' failed (1)
chown: invalid user: 'root.messagebus'
chown: invalid user: 'root.root'
chown: invalid user: 'root.root'
chown: invalid user: 'root.root'
chown: invalid user: 'root.root'
chown: invalid user: 'root.root'
chown: invalid user: 'root.root'
chown: invalid user: 'root.root'
chown: invalid user: 'root.root'
chown: invalid user: 'root.root'
chown: invalid user: 'root.root'
chown: invalid user: 'root.root'
chown: invalid user: 'root.root'
chown: invalid user: 'root.root'
chown: invalid user: 'root.nogroup'
Activation script snippet 'wrappers' failed (1)
warning: the group 'nixbld' specified in 'build-users-group' does not exist
starting systemd...
...

nix run nixpkgs.libguestfs-with-appliance
mktemp -d
sudo guestmount -a ./nixos.qcow2 -m /dev/sda --ro /tmp/tmp.1F7pugMFFJ
[root@gaming:/tmp/tmp.1F7pugMFFJ]# hexdump -n 2 var/lib/nixos/uid-map
0000000 0000
0000002

/var/lib/nixos/uid-map and /etc/shadow contain only zeros

Related to #69365, #26788, #61755, #82755

To Reproduce
Steps to reproduce the behavior:

minimal config:

{ config, pkgs, ... }:

{
  users.extraUsers.root.password = "root";
  documentation.enable = false;
}

nixos-rebuild build-vm -I nixpkgs=~/code/nixpkgs/ -I nixos-config='/home/davidak/root'
start vm: /nix/store/js7vf96xzsvj6h23p3jcbixlx0qyvmhq-nixos-vm/bin/run-nixos-vm
stop vm when booted
build vm again mith different config... and repeat

Workaround:

remove disk image file:
rm ./nixos.qcow2

now it boots:

[davidak@gaming:~/code/nixpkgs]$ /nix/store/js7vf96xzsvj6h23p3jcbixlx0qyvmhq-nixos-vm/bin/run-nixos-vm
Formatting '/home/davidak/code/nixpkgs/nixos.qcow2', fmt=qcow2 cluster_size=65536 compression_type=zlib size=536870912 lazy_refcounts=off refcount_bits=16

after removing it, i can't reproduce it anymore.

Expected behavior
NixOS boots into working system

The text was updated successfully, but these errors were encountered:

Mic92 · 2020-09-23T08:53:54Z

should be fixed by #98544

stale · 2021-03-26T11:28:08Z

I marked this as stale due to inactivity. → More info

ElvishJerricco · 2021-06-27T21:13:24Z

I still get something similar on unstable if I kill the machine while it's booting. Start a blank NixOS VM and kill the VM when systemd takes over in stage 2, and that VM will not have valid users upon reboot.

Mic92 · 2021-06-28T06:34:52Z

I still get something similar on unstable if I kill the machine while it's booting. Start a blank NixOS VM and kill the VM when systemd takes over in stage 2, and that VM will not have valid users upon reboot.

What does the file look like in this case?

stale · 2022-01-09T17:07:05Z

I marked this as stale due to inactivity. → More info

Melkor333 · 2022-04-29T05:27:06Z

Something like this happened to me on my workstation. It may also have been more like #61755 or something completely different, can't tell. But what happened is that I rebuilt my local system multiple times with nixos-rebuild switch and after that I couldn't reboot my system anymore. I was playing with services.xss-lock and home-managers screen-locker modules so that definitively shouldn't have any effect.

Initially I had an error saying something like mv: /bin/sh is already the same as bin/.sh.tmp. But in reality /bin/sh was a broken symlink pointing to nothing ( -> '') while /bin/.sh.tmp was a proper symlink into the nix store. When I removed /bin/sh it told me that the same was the case for /usr/bin/env and /usr/bin/.env.tmp. After removing (usr/bin/env too, I got above error. Removing uid-map, gid-map and lateron also auto-subuid-map seems to solve the problem as the machine regenerated the files properly (I actually renamed the whole /etc/ -> /etc.old and /var -> /var.old and ran nixos-install from a live ISO). Weirdly the home-manager service for my user failed with following error for a lot of files:

Apr 28 22:05:10 afonil hm-activate-samuelh[1443]: cmp: /home/samuelh/.config/dunst/dunstrc: Is a directory
Apr 28 22:05:10 afonil hm-activate-samuelh[1430]: Existing file '/home/samuelh/.config/dunst/dunstrc' is in the way of '/nix/store/6dszxk7vkdwayk2msir9rgycglwfhyq2-home-manager-files/.config/dunst/dunstrc'
Apr 28 22:05:10 afonil hm-activate-samuelh[1445]: cmp: /home/samuelh/.config/environment.d/10-home-manager.conf: Is a directory
Apr 28 22:05:10 afonil hm-activate-samuelh[1430]: Existing file '/home/samuelh/.config/environment.d/10-home-manager.conf' is in the way of '/nix/store/6dszxk7vkdwayk2msir9rgycglwfhyq2-home-manager-files/.config/environment.d/10-home-manager.conf'
Apr 28 22:05:10 afonil hm-activate-samuelh[1447]: cmp: /home/samuelh/.config/git/config: Is a directory
Apr 28 22:05:10 afonil hm-activate-samuelh[1430]: Existing file '/home/samuelh/.config/git/config' is in the way of '/nix/store/6dszxk7vkdwayk2msir9rgycglwfhyq2-home-manager-files/.config/git/config'

I solved this with the following magic. It takes the files from the journal and moves them to the folder bad. After that I could restart the service just fine:

mkdir bad
mv -t bad/ $(journalctl -u home-manager-samuelh.service | grep 'Existing file' | awk '{ print $8 }' | tr -d "'")

The files in the folder bad are equally corrupted as the /bin/sh and /usr/bin/en files were:

[samuelh@afonil:~]$ ls -lah bad/
total 4.0K
drwxr-xr-x 1 samuelh users 1.1K Apr 28 22:25 .
drwx------ 1 samuelh users  872 Apr 29 06:51 ..
lrwxrwxrwx 1 samuelh users    0 Apr 28 12:51 10-home-manager.conf -> ''
lrwxrwxrwx 1 samuelh users    0 Apr 28 12:51 {446900e4-71c2-419f-a6a7-df9c091e268b}.xpi -> ''
lrwxrwxrwx 1 samuelh users    0 Apr 28 12:51 87677a2c52b84ad3a151a4a72f5bd3c4@jetpack.xpi -> ''
lrwxrwxrwx 1 samuelh users    0 Apr 28 12:51 blueman-applet.service -> ''
lrwxrwxrwx 1 samuelh users    0 Apr 28 12:51 config -> ''
lrwxrwxrwx 1 samuelh users    0 Apr 28 12:51 config.nix -> ''
lrwxrwxrwx 1 samuelh users    0 Apr 28 12:51 {d7742d87-e61d-4b78-b8a1-b469842139fa}.xpi -> ''
lrwxrwxrwx 1 samuelh users    0 Apr 28 12:51 dunstrc -> ''
lrwxrwxrwx 1 samuelh users    0 Apr 28 12:51 dunst.service -> ''
lrwxrwxrwx 1 samuelh users    0 Apr 28 12:51 flameshot.service -> ''

I assume this is some error during activation, maybe even a problem with btrfs?

It might make sense to have some kind of check in the activation which makes sure that such broken symlinks and broken/unreadable /var/lib/nixos/uid-mapfiles are properly removed/recreated (Or are the *-map files usually never recreated?). Of course fixing the reason this behaviour happens is even better but I can imagine that there is always some edge case where e.g. a cold reset of a system which is activating can cause such broken symlinks which don't get overwritten properly.

brianmcgee · 2022-11-15T12:18:46Z

I had a similar experience as @Melkor333 today, in the end I had to blow away auto-subuid-map to get things working again.

MakiseKurisu · 2023-04-11T15:41:05Z

I also encountered this issue. I run NixOS in a VM running multiple podman containers, one of them is qBittorrent, and it filled the entire rootfs. After enlarging the partition I still cannot log into the system, which lead me to investigate and found this issue.

I'm using btrfs single inside and dup outside of the VM disk, so that might be why it was corrupted.

Checking G/UID map shows the same symptom:

root@pve11:~# hexdump /mnt/@/var/lib/nixos/uid-map
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0000320 0000 0000 0000 0000 0000 0000          
000032b
root@pve11:~# hexdump /mnt/@/var/lib/nixos/gid-map
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0000200 0000 0000 0000 0000 0000 0000 0000     
000020d

which is why update-users-groups.pl failed as shown in OP's error message. This script is responsible of creating /etc/passwd, and as it failed, the system cannot load the user database.

I restored those 2 files from my backup and was able to log into my system again.

Baitinq · 2023-10-30T21:50:31Z

This is still a problem. I got this today and had to delete the uid-map

davidak added 0.kind: bug 1.severity: blocker 6.topic: nixos labels Sep 6, 2020

davidak mentioned this issue Sep 6, 2020

nixos/config: add defaultPackages option #97171

Merged

10 tasks

davidak self-assigned this Sep 6, 2020

davidak removed 1.severity: blocker 6.topic: nixos labels Sep 6, 2020

davidak changed the title ~~NixOS broken on master~~ nixos-rebuild build-vm fails to boot when starting it a second time Sep 6, 2020

davidak changed the title ~~nixos-rebuild build-vm fails to boot when starting it a second time~~ /var/lib/nixos/uid-map corrupted when using nixos-rebuild build-vm many times Sep 6, 2020

davidak removed their assignment Sep 6, 2020

cole-h mentioned this issue Sep 7, 2020

"/dev/fd/62: No such file or directory" in init script on nixos-unstable(-small) #97383

Closed

veprbl added the 6.topic: nixos label Sep 8, 2020

domust mentioned this issue Jan 13, 2021

[package request]: NordVPN #101864

Open

stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Mar 26, 2021

stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jun 27, 2021

stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jan 9, 2022

stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Apr 29, 2022

stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Nov 13, 2022

stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Nov 15, 2022

bjornfor mentioned this issue May 1, 2023

Corrupted /var/lib/nixos/gid-map preventing configuration switch #229194

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

/var/lib/nixos/uid-map corrupted when using nixos-rebuild build-vm many times #97305

/var/lib/nixos/uid-map corrupted when using nixos-rebuild build-vm many times #97305

davidak commented Sep 6, 2020 •

edited

Loading

Mic92 commented Sep 23, 2020

stale bot commented Mar 26, 2021

ElvishJerricco commented Jun 27, 2021

Mic92 commented Jun 28, 2021

stale bot commented Jan 9, 2022

Melkor333 commented Apr 29, 2022 •

edited

Loading

brianmcgee commented Nov 15, 2022

MakiseKurisu commented Apr 11, 2023

Baitinq commented Oct 30, 2023

/var/lib/nixos/uid-map corrupted when using nixos-rebuild build-vm many times #97305

/var/lib/nixos/uid-map corrupted when using nixos-rebuild build-vm many times #97305

Comments

davidak commented Sep 6, 2020 • edited Loading

Mic92 commented Sep 23, 2020

stale bot commented Mar 26, 2021

ElvishJerricco commented Jun 27, 2021

Mic92 commented Jun 28, 2021

stale bot commented Jan 9, 2022

Melkor333 commented Apr 29, 2022 • edited Loading

brianmcgee commented Nov 15, 2022

MakiseKurisu commented Apr 11, 2023

Baitinq commented Oct 30, 2023

davidak commented Sep 6, 2020 •

edited

Loading

Melkor333 commented Apr 29, 2022 •

edited

Loading