Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos.ova build fails on Hydra #25901

Closed
vcunat opened this issue May 18, 2017 · 22 comments
Closed

nixos.ova build fails on Hydra #25901

vcunat opened this issue May 18, 2017 · 22 comments
Assignees

Comments

@vcunat
Copy link
Member

vcunat commented May 18, 2017

Issue description

The ova job has been failing during past week, blocking the nixos-unstable channel. I don't know what's the exact problem.

Steps to reproduce

I'm unable to reproduce the failure locally.

@vcunat vcunat changed the title nixos.ova builid fails on Hydra nixos.ova build fails on Hydra May 18, 2017
@clefru
Copy link
Contributor

clefru commented May 19, 2017

Differences between succeeding and failing job:

+copying closure closure to /build/root...
-copying closure closure to /tmp/nix-build-nixos-ova-17.09pre107278.42bf19cc04-x86_64-linux.drv-0/root...
+error: changing ownership of path ‘/build/root/nix/store’: Invalid argument

@matthewbauer
Copy link
Member

Here's the line from nix giving the error:

https://github.com/NixOS/nix/blob/62d476c7ee5dbb79fb435895e0cda3fac8f53ba3/src/libstore/local-store.cc#L89

It's definitely related to NixOS/nix@eba840c.

@dezgeg
Copy link
Contributor

dezgeg commented May 20, 2017

Probably the same reason as explained in 6cfb3b6.

No idea on how to fix, though.

@clefru
Copy link
Contributor

clefru commented May 21, 2017

This issue seems to have no owner (just stating facts, no offense intended). Is there a process to identify the offending commit and roll it back?

@vcunat
Copy link
Member Author

vcunat commented May 21, 2017

Apparently it wasn't triggered by a nixpkgs commit but by a nix change. That's why most people won't reproduce it. I don't think a process really exists (for this).

/cc @edolstra for the option to roll that change back on the build farm for now, as there's no idea how to fix it properly and the channel is on a ~11 days old commit already.

@pbogdan
Copy link
Member

pbogdan commented May 23, 2017

Seems it's now timing out building ibus package?

@vcunat
Copy link
Member Author

vcunat commented May 23, 2017

ibus was updated on master (since the last ova failure) and it seems to build fine on Hydra now, though the ova job doesn't really show that (yet).

@vcunat
Copy link
Member Author

vcunat commented May 24, 2017

Now the job built successfully, though I can't see why. AFAIK it's possible some build slaves still use an older version of nix, or something...

@vcunat
Copy link
Member Author

vcunat commented May 24, 2017

Right, probably only the packet machine can succeed.

I managed to make Hydra build the tested job successfully now, after two weeks, but this issue remains a channel blocker IMO.

@vcunat
Copy link
Member Author

vcunat commented Jun 12, 2017

I guess the packet machine was updated so now the job will never succeed...

@grahamc
Copy link
Member

grahamc commented Jun 16, 2017

The problem seems to be we're running insstall commands for the ova inside the sandbox, which prevents setuid/gid, however, we do use setsid/setgid in some places:

ex in nixos-prepare-root:

mkdir -m 1775 -p $mountPoint/nix/store

@grahamc
Copy link
Member

grahamc commented Jun 16, 2017

Here are some additional places:

grahamc@Morbo> rg  -g '!*.xml' nix/store | grep -E "[0-7]{4}"
nixos/modules/virtualisation/qemu-vm.nix:          mkdir -p 0755 $targetRoot/nix/.rw-store/store $targetRoot/nix/.rw-store/work $targetRoot/nix/store
nixos/modules/system/boot/stage-2-init.sh:chmod -f 1775 /nix/store
nixos/modules/installer/tools/nixos-prepare-root.sh:mkdir -m 1775 -p $mountPoint/nix/store
pkgs/tools/package-management/nix/nix/nix.spec.in:chmod 1775 /nix/store

@vcunat
Copy link
Member Author

vcunat commented Jun 16, 2017

Right, /nix/store seems to use 1775 root nixbld on standard NixOS, but that's only the sticky bit, not set(u/g)id. If I read the seccomp code right, it only attempts to disallow those two bits.

@pbogdan
Copy link
Member

pbogdan commented Jun 16, 2017

Please bear in mind I don't know anything about the low-level details involved. I was able to reproduce the failure in a VM and trying to debug and connect some breadcrumbs I ended up with the below, although again my reasoning here may well be flawed. Hopefully it is of some help and doesn't add to the confusion.


Applying the following:

$ git status --short
 M pkgs/tools/system/fakeroot/default.nix
?? pkgs/tools/system/fakeroot/einval.patch
$ git diff
diff --git a/pkgs/tools/system/fakeroot/default.nix b/pkgs/tools/system/fakeroot/default.nix
index 5286b6b2cb..a3b858db2d 100644
--- a/pkgs/tools/system/fakeroot/default.nix
+++ b/pkgs/tools/system/fakeroot/default.nix
@@ -10,7 +10,7 @@ stdenv.mkDerivation rec {
   };

   # patchset from brew
-  patches = stdenv.lib.optionals stdenv.isDarwin [
+  patches = [ ./einval.patch ] ++ (stdenv.lib.optionals stdenv.isDarwin [
     (fetchpatch {
       name = "0001-Implement-openat-2-wrapper-which-handles-optional-ar.patch";
       url = "https://bugs.debian.org/cgi-bin/bugreport.cgi?msg=5;filename=0001-Implement-openat-2-wrapper-which-handles-optional-ar.patch;att=1;bug=766649";
@@ -26,7 +26,7 @@ stdenv.mkDerivation rec {
       url = "https://bugs.debian.org/cgi-bin/bugreport.cgi?att=2;bug=766649;filename=fakeroot-always-pass-mode.patch;msg=20";
       sha256 = "0i3zaca1v449dm9m1cq6wq4dy6hc2y04l05m9gg8d4y4swld637p";
     })
-    ];
+    ]);

   buildInputs = [ getopt ]
     ++ stdenv.lib.optional (!stdenv.isDarwin) libcap
$ cat pkgs/tools/system/fakeroot/einval.patch
diff --git a/libfakeroot.c b/libfakeroot.c
index 68a95fb..70da8bc 100644
--- a/libfakeroot.c
+++ b/libfakeroot.c
@@ -792,7 +792,7 @@ int chown(const char *path, uid_t owner, gid_t group){
     r=next_lchown(path,owner,group);
   else
     r=0;
-  if(r&&(errno==EPERM))
+  if(r&&(errno==EPERM||errno==EINVAL))
     r=0;

   return r;
@@ -819,7 +819,7 @@ int lchown(const char *path, uid_t owner, gid_t group){
     r=next_lchown(path,owner,group);
   else
     r=0;
-  if(r&&(errno==EPERM))
+  if(r&&(errno==EPERM||errno==EINVAL))
     r=0;

   return r;
@@ -843,7 +843,7 @@ int fchown(int fd, uid_t owner, gid_t group){
   else
     r=0;

-  if(r&&(errno==EPERM))
+  if(r&&(errno==EPERM||errno==EINVAL))
     r=0;

   return r;
@@ -870,7 +870,7 @@ int fchownat(int dir_fd, const char *path, uid_t owner, gid_t group, int flags)
   else
     r=0;

-  if(r&&(errno==EPERM))
+  if(r&&(errno==EPERM||errno==EINVAL))
     r=0;

   return r;

on top of:

  • System: 17.09pre108553.0011f9065a (Hummingbird)
  • Nix version: nix-env (Nix) 1.12pre5350_7689181e
  • Nixpkgs version: 17.09pre108553.0011f9065a
  • Sandboxing enabled: build-use-sandbox = true

makes the test pass for me.

Patch and ideas stolen from #10496

@grahamc
Copy link
Member

grahamc commented Jun 16, 2017

@pbogdan I'm in no position to evaluate if your patch is the best solution (I have no idea what I'm doing here,) however I can definitely appreciate your great digging and patch. Thank you!

@Mic92
Copy link
Member

Mic92 commented Jun 17, 2017

@grahamc you can safely skip the sticky bit in, when creating the image. It will be fixed by stage-2 automatically: https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/system/boot/stage-2-init.sh#L62

@vcunat
Copy link
Member Author

vcunat commented Jun 17, 2017

Oh, I didn't realize this isn't reproducible with the latest stable nix. It still builds in /tmp and not /build, according to log from ova, and it just succeeds as it is.
EDIT: I wonder why that is, as that change seems to generate more secure binaries when that daemon is used to build them.

@vcunat vcunat self-assigned this Jun 17, 2017
@vcunat vcunat closed this as completed in 0d4431c Jun 17, 2017
@vcunat
Copy link
Member Author

vcunat commented Jun 17, 2017

This solution seems OK. I think I now understand. @pbogdan: thank you a lot for finding a solution; I designated you as the author of the modified commit :-)

@grahamc
Copy link
Member

grahamc commented Jun 17, 2017

@vcunat
Copy link
Member Author

vcunat commented Jun 17, 2017

Yes, I do believe we'll finally get a channel bump within several hours 🎉 There are just some heavy packages left, e.g. webkitgtk...

@grahamc
Copy link
Member

grahamc commented Jun 17, 2017

nixos-unstable bumped!

@edolstra
Copy link
Member

Maybe we should just get rid of that fakeroot stuff? I mean, the goal was to git rid of the QEMU VM, but that's still being used, so fakeroot just seems an unnecessary complication...

@ius ius mentioned this issue Feb 6, 2022
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants