Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected remote builder behavior; builders-use-substitutes = true not working? #8101

Open
stuser81 opened this issue Mar 23, 2023 · 9 comments
Labels

Comments

@stuser81
Copy link

I have two machines on my LAN:

  • Laptop (not powerful). Debian 11 with Nix 2.13.3 multi-user install. 64-bit Intel CPU.
  • Desktop (powerful multi-core machine). NixOS 22.11. 64-bit Intel CPU. LAN IP address: 192.168.1.80

My goals:

  • Never should the laptop do Nix builds, builds should always be forwarded to the desktop.
  • The desktop should still grab things from cache.nixos.org when possible.

This is what happens on the laptop without remote building setup (i.e. without any ~/.config/nix/nix.conf and without any modifications to /etc/nix/nix.conf). I have truncated the output, but in fact everything is already available on cache.nixos.org so it gets downloaded from there.

user@laptop:~$ nix-collect-garbage
finding garbage collector roots...
deleting garbage...
deleting unused links...
note: currently hard linking saves -0.00 MiB
0 store paths deleted, 0.00 MiB freed
user@laptop:~$ nix-shell -p ghc
these 46 paths will be fetched (233.20 MiB download, 2585.36 MiB unpacked):
  /nix/store/09ybh8g6bhhq7h3lrq98rmvjcdw94iz8-gmp-with-cxx-stage4-6.2.1
  /nix/store/1zsc48wwlplpkzms83m7zr94xnfalq2q-glibc-2.35-224-dev
  /nix/store/31an24ard60kip3iaxkmd1rnflb55zfp-binutils-2.40-lib
  /nix/store/45ncc133v5ncn8ivb1lkfv0wzfab9lx2-gawk-5.2.1
  /nix/store/4vq879kpg8b3ni6awk3dphzsipkf5vdm-ghc-9.2.7-doc
  ...
copying path '/nix/store/fa0byasfkms4k570jm47b7sb9lkrj73v-linux-headers-6.2' from 'https://cache.nixos.org'...
copying path '/nix/store/y5kskikzjzv84169aa81kg6b24qq1q5p-libunistring-1.1' from 'https://cache.nixos.org'...
copying path '/nix/store/izcs0br6mfbx7rqs5ngmg47fwpwycbl1-libidn2-2.3.2' from 'https://cache.nixos.org'...
copying path '/nix/store/8bmp6r3a0xfha3wj36phlc47clh9w81l-glibc-2.35-224' from 'https://cache.nixos.org'...
copying path '/nix/store/pcslyy22s9piz2n3pckqia0k5i4ysi12-attr-2.5.1' from 'https://cache.nixos.org'...
...

Let's now set up remote building:

  • As requested by the documentation (for multi-user Nix setups), I make sure that the root user on the laptop has access to ssh root@192.168.1.80 without password (using the normal SSH key stuff).
  • This is my laptop's modified /etc/nix/nix.conf (only the first line was there by default):
build-users-group = nixbld

extra-experimental-features = nix-command flakes
builders-use-substitutes = true
trusted-substituters = ssh://192.168.1.80
substituters = ssh://192.168.1.80
max-jobs = 0
builders = ssh://192.168.1.80 - - 10

builders-use-substitutes = true asks the desktop to grab from cache.nixos.org if possible, max-jobs = 0 makes sure no building takes place on the laptop, and builders = ssh://192.168.1.80 - - 10 makes sure 10 cores are used on the desktop.

Note: On my laptop I ran sudo systemctl stop nix-daemon.service and sudo systemctl start nix-daemon.service but I'm not sure if that was needed. However, I noticed I indeed had to modify /etc/nix/nix.conf on my laptop. Modifying ~/.config/nix/nix.conf on my laptop wasn't causing the remote builder to get registered at all. (This is possibly a separate bug altogether.)

The problem:

Check out what happens now. As you can see, it's building things on the desktop machine! This is unexpected because as we already saw, cache.nixos.org should already have everything needed. (I have not modified the default substituters on the desktop NixOS machine.) It seems that builders-use-substitutes = true is not working properly.

user@laptop:~$ nix-collect-garbage 
finding garbage collector roots...
deleting garbage...
deleting '/nix/store/2ymzqhzn0bayy8sgvppw38dqffk5yxx3-shell.drv'
...
user@laptop$ nix-shell -p ghc
these 655 derivations will be built:
  /nix/store/5syi8n4h7dn97ldhrjw1vbpmr2prypcp-glibc-2.35.tar.xz.drv
  /nix/store/1zpsgbsjy7spp9b2y1gmycp0yvq6mp59-linux-6.2.tar.xz.drv
  /nix/store/20d5pi1a5i9jj041i0gvr9zcs7bjbw46-binutils-2.40.tar.bz2.drv
  /nix/store/pg6zn3qqhy869kh4gwhpfyf36q0d121z-zlib-1.2.13.tar.gz.drv
  /nix/store/4lcnmk8h9jkhiqa815716rvagnli57j7-zlib-1.2.13.drv
...
copying path '/nix/store/v28dv6l0qk3j382kp40bksa1v6h7dx9p-bash-5.2.tar.gz' from 'ssh://192.168.1.80'...
building '/nix/store/a68j9bys24cr3m1bixy4bz92q27bmx7k-bash52-005.drv' on 'ssh://192.168.1.80'...
building '/nix/store/f9hs49y4q8bvg4ffdiycbafd5r1gb13r-bash52-008.drv' on 'ssh://192.168.1.80'...
building '/nix/store/sjlm8agj6m3cpglc5v11d40cj7j6kin2-fix-static.patch.drv' on 'ssh://192.168.1.80'...
warning: ignoring substitute for '/nix/store/dyamyflq6pvvnhzsj5ldzpwf31g4r5vm-bootstrap-stage0-stdenv-linux' from 'ssh://192.168.1.80', as it's not signed by any of the keys in 'trusted-public-keys'
warning: ignoring substitute for '/nix/store/8znp434sp8m46633mfcvql5czccakcsx-bootstrap-stage2-stdenv-linux' from 'ssh://192.168.1.80', as it's not signed by any of the keys in 'trusted-public-keys'
building '/nix/store/jxhwnvkpxrbl5rrf01zf3cf2p7v215pb-findutils-4.9.0.tar.xz.drv' on 'ssh://192.168.1.80'...
building '/nix/store/5syi8n4h7dn97ldhrjw1vbpmr2prypcp-glibc-2.35.tar.xz.drv' on 'ssh://192.168.1.80'...
building '/nix/store/6v6ld5igw5f9rw3gdc7w7s14cfpfq63c-gzip-1.12.tar.xz.drv' on 'ssh://192.168.1.80'...
building '/nix/store/4h6k2a3b62nkgsfjf8s53dqlay7kywwx-libidn2-2.3.2.tar.gz.drv' on 'ssh://192.168.1.80'...
...
waiting for a machine to build '/nix/store/8yhywccj73wdxv579rz5lk45dpjfgn2w-tar-1.34.tar.xz.drv'...
warning: ignoring substitute for '/nix/store/59cfn9z8i5230r5gr5wh46ki67lrvl7s-bootstrap-stage1-stdenv-linux' from 'ssh://192.168.1.80', as it's not signed by any of the keys in 'trusted-public-keys'
warning: ignoring substitute for '/nix/store/93yhirq3wghn0hp4d6yn17l4i4wgf6q8-expand-response-params' from 'ssh://192.168.1.80', as it's not signed by any of the keys in 'trusted-public-keys'
waiting for a machine to build '/nix/store/rwkpw9zf33qgwciyrg6yd4kmrcg5k4wj-gcc-12.2.0.tar.xz.drv'...
waiting for a machine to build '/nix/store/a5alapfd1s4rf9dbwb1h0mhlvanl3qvz-gmp-6.2.1.tar.bz2.drv'...
waiting for a machine to build '/nix/store/x026yqw2ch0lhcyd548qra2i6gxi2whp-libunistring-1.1.tar.gz.drv'...
...
copying 0 paths...
copying 0 paths...
copying 0 paths...
copying 1 paths...
copying path '/nix/store/1dydp86d00qzjbncpi80sdsndf33lc5j-fix-static.patch' from 'ssh://192.168.1.80'...
copying 0 paths...
copying 0 paths...
copying 0 paths...
...
copying path '/nix/store/lwjp1v1f0dry15flc4klvag2asx9sn5f-Python-3.10.10.tar.xz' from 'ssh://192.168.1.80'...
copying path '/nix/store/y3yiminrckvhf35fh9q42vjwi0npznji-acl-2.3.1.tar.gz' from 'ssh://192.168.1.80'...
copying path '/nix/store/mz6mc8s7mrvvzjpl9322agmsq00cyrw5-attr-2.5.1.tar.gz' from 'ssh://192.168.1.80'...
...
copying 1 paths...
copying path '/nix/store/447hvnlzzi9myri1iq3bijxgx6v6b592-patchelf-0.15.0.tar.bz2' from 'ssh://192.168.1.80'...
copying path '/nix/store/slpdqm3wlhwbkzyijjz3xpifa219ac0x-bzip2-1.0.8.tar.gz' from 'ssh://192.168.1.80'...
building '/nix/store/kg5bghxkplbz38wgavcl9gd3c46bz14b-bootstrap-stage0-stdenv-linux.drv' on 'ssh://192.168.1.80'...
building '/nix/store/fk7xz02i1l0rsjrpvv94gcj0qvs6w0pp-bootstrap-stage1-stdenv-linux.drv' on 'ssh://192.168.1.80'...
building '/nix/store/sl3ypk7flwfdb6630whq2slfa11k0cs1-bootstrap-stage2-stdenv-linux.drv' on 'ssh://192.168.1.80'...
copying 1 paths...
copying path '/nix/store/33hyq1pcdw473p8r4fyqp6h9n8r6lxvj-linux-6.2.tar.xz' from 'ssh://192.168.1.80'...
copying 0 paths...
copying 1 paths...
copying path '/nix/store/dyamyflq6pvvnhzsj5ldzpwf31g4r5vm-bootstrap-stage0-stdenv-linux' from 'ssh://192.168.1.80'...
copying 0 paths...
copying 1 paths...
copying path '/nix/store/59cfn9z8i5230r5gr5wh46ki67lrvl7s-bootstrap-stage1-stdenv-linux' from 'ssh://192.168.1.80'...
copying path '/nix/store/p769cp9mdy7yswdhhiwdq75y03x14199-bootstrap-stage0-glibc-bootstrap' from 'ssh://192.168.1.80'...
copying path '/nix/store/f45dpx8vxhfckg2mbbns9dy1l82i74jz-coreutils-9.1.tar.xz' from 'ssh://192.168.1.80'...
copying 0 paths...
copying 1 paths...
copying path '/nix/store/8znp434sp8m46633mfcvql5czccakcsx-bootstrap-stage2-stdenv-linux' from 'ssh://192.168.1.80'...
building '/nix/store/xlijxhdmvypilgznjg1mz46a9mkzfzw1-bootstrap-stage0-glibc-iconv-bootstrap.drv' on 'ssh://192.168.1.80'...
building '/nix/store/4lwbbznxlz8didx8103ljafarii97p5q-python-setup-hook.sh.drv' on 'ssh://192.168.1.80'...
copying 1 paths...
copying path '/nix/store/jn9f2mr2jdm9yn5hi0pws44nbfrah8d3-bash52-008' from 'ssh://192.168.1.80'...
copying 0 paths...
copying 1 paths...
copying path '/nix/store/z9a8wa9j4z2fk5f3hp2wcyl1dywnmz17-bootstrap-stage0-glibc-iconv-bootstrap' from 'ssh://192.168.1.80'...
'/nix/store/z9a8wa9j4z2fk5f3hp2wcyl1dywnmz17-bootstrap-stage0-glibc-iconv-bootstrap/include/iconv.h' -> '/nix/store/p769cp9mdy7yswdhhiwdq75y03x14199-bootstrap-stage0-glibc-bootstrap/include/iconv.h'
copying 0 paths...
copying path '/nix/store/m8mylclf924bhpsv529hy11llq30psyb-bootstrap-stage0-binutils-wrapper-' from 'ssh://192.168.1.80'...
copying path '/nix/store/qa2hk6c245wq0lzwm3g4k6j26125xg5r-diffutils-3.9.tar.xz' from 'ssh://192.168.1.80'...
copying 1 paths...
copying 1 paths...
copying path '/nix/store/iwddjcddxwrrxjq3476fzig56ln37awj-python-setup-hook.sh' from 'ssh://192.168.1.80'...
copying path '/nix/store/z76vsdh69cvwkwhwg69k7d1znwjmx6hf-bash52-005' from 'ssh://192.168.1.80'...
copying path '/nix/store/5ghhrws7rqx4bfk5wpv3gz44vg4arqcm-bootstrap-stage1-gcc-wrapper-' from 'ssh://192.168.1.80'...
...
copying 0 paths...
copying 1 paths...
copying path '/nix/store/93yhirq3wghn0hp4d6yn17l4i4wgf6q8-expand-response-params' from 'ssh://192.168.1.80'...
copying path '/nix/store/jjp9cm8wkglic54jk52kfv27d6233afp-gawk-5.2.1.tar.xz' from 'ssh://192.168.1.80'...
copying path '/nix/store/xgaqv1fn6jdrskv7kwkcfimpsv7h1kbs-gnum4-1.4.19' from 'ssh://192.168.1.80'...
...
copying 1 paths...
copying path '/nix/store/y954pl28vm03qfhvqrgyspwwv28b5lyi-findutils-4.9.0.tar.xz' from 'ssh://192.168.1.80'...
copying 1 paths...
copying path '/nix/store/0avnvyc7pkcr4pjqws7hwpy87m6wlnjc-make-4.4.1.tar.gz' from 'ssh://192.168.1.80'...
copying 1 paths...
building '/nix/store/3yar2pnvz7ll79z3jlzx09qnhrsi7zj5-automake-1.16.5.tar.xz.drv' on 'ssh://192.168.1.80'...
building '/nix/store/20d5pi1a5i9jj041i0gvr9zcs7bjbw46-binutils-2.40.tar.bz2.drv' on 'ssh://192.168.1.80'...
building '/nix/store/xviwx1gm25j77g6fr3crfp4m0a3jggd1-curl-7.88.1.tar.bz2.drv' on 'ssh://192.168.1.80'...
...
waiting for a machine to build '/nix/store/x026yqw2ch0lhcyd548qra2i6gxi2whp-libunistring-1.1.tar.gz.drv'...
waiting for a machine to build '/nix/store/5ggyc87vq92i999462r1qm5l1myi9sqr-libxcrypt-4.4.33.tar.xz.drv'...
waiting for a machine to build '/nix/store/mxiibcd5b5v63fas7pfg2zavgp5bi5fk-lzip-1.23.tar.gz.drv'...
...
copying path '/nix/store/506rq7p13pk2v7a63wmsv11l0ir21ab2-krb5-1.20.1.tar.gz' from 'ssh://192.168.1.80'...
copying 0 paths...
unpacking sources
unpacking source archive /nix/store/9vm1ihdg1ysmrjdbb80g834iizzxb4yk-bison-3.8.2.tar.gz
source root is bison-3.8.2
setting SOURCE_DATE_EPOCH to timestamp 1632561040 of file bison-3.8.2/src/parse-gram.output
patching sources
configuring
configure flags: --disable-dependency-tracking --prefix=/nix/store/d99iw47x8x3kfb5x4laic5a1raswyqdr-bison-3.8.2 --build=x86_64-unknown-linux-gnu --host=x86_64-unknown-linux-gnu
checking for a BSD-compatible install... /nix/store/370ldp1qzc2zfl0kspcp137dvjmdhpsh-bootstrap-tools/bin/install -c
checking whether build environment is sane... yes
checking for a race-free mkdir -p... /nix/store/370ldp1qzc2zfl0kspcp137dvjmdhpsh-bootstrap-tools/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
...
@stuser81 stuser81 added the bug label Mar 23, 2023
@SuperSandro2000
Copy link
Member

Note: On my laptop I ran sudo systemctl stop nix-daemon.service and sudo systemctl start nix-daemon.service but I'm not sure if that was needed.

If you change the nix.conf file you need to restart the daemon.

builders = ssh://192.168.1.80 - - 10 makes sure 10 cores are used on the desktop.

The 10 there is for maxJobs. I would recommend to use https://search.nixos.org/options?channel=unstable&from=0&size=50&sort=relevance&type=packages&query=nix.buildMachine to avoid confusion.

Modifying ~/.config/nix/nix.conf on my laptop wasn't causing the remote builder to get registered at all. (This is possibly a separate bug altogether.)

No, it is not. The builders are read by the daemon which only read /etc/nix/nix.conf.

This is unexpected because as we already saw, cache.nixos.org should already have everything needed. (I have not modified the default substituters on the desktop NixOS machine.) It seems that builders-use-substitutes = true is not working properly.

There is likely something else happening, too. Is there any log indicating that the substitution failed? What are the substituters on the build machine? Is the command working as expected if you run it directly on the build machine?

@stuser81
Copy link
Author

What are the substituters on the build machine? Is the command working as expected if you run it directly on the build machine?

@SuperSandro2000 I have not changed the default substituters on the build machine (it's still cache.nixos.org there). Yes, nix-shell -p ghc works fine on the desktop build machine (just like it did on the laptop before I made the nix.conf changes).

I dig a bit more digging:

  • I made the laptop's (Debian) Nix use also the exact same channel as the NixOS machine (the nixos-22.11 channel) - for simplicity, to keep both completely the same.

  • Then I noticed interesting things:

    • (After Nix garbage collecting both machines) If I have already run nix-shell -p ghc on the desktop build machine, subsequent nix-shell -p ghc on the laptop works fine and copies the GHC from the desktop to the laptop because it's already available on the desktop (it's the same GHC since we're on the same channel on both machines now).
    • (After Nix garbage collecting both machines) If I run nix-shell -p ghc on the laptop (without doing anything on the desktop build machine), the desktop build machine starts building things - instead of grabbing things from cache.nixos.org. This is the problem identified in my original post. Any ideas what could be causing this?

Here is a wild guess (take everything below with a grain of salt because it's just a guess):

  • Documented builders-use-substitutes behavior: "In practical terms, this means that remote hosts will fetch as many build dependencies as possible from their own substitutes (e.g, from cache.nixos.org), instead of waiting for this host to upload them all."
  • To me, this sort of suggests the "intention" of this setting: Only if the derivation is already built on the laptop does the desktop build machine even consider grabbing it from cache.nixos.org, the idea being it's adding a new "competitor" to the laptop.
  • If the above wild guess is true, this explains the behavior but also begs the question: Why not make the desktop build machine always try to grab things from cache.nixos.org?

Does this guess have any merit?

@SuperSandro2000
Copy link
Member

  • Why not make the desktop build machine always try to grab things from cache.nixos.org?

I recently misconfigured my substituters setting and through that build everything on remote builders which in fact downloaded the derivations from cache.nixos.org

  • Any ideas what could be causing this?

Not really. How are you installing Nix? Are you using the installer? Can you try it with a NixOS machine?

@stuser81
Copy link
Author

stuser81 commented Mar 24, 2023

Can you try it with a NixOS machine?

@SuperSandro2000 I just tried. I noticed that on NixOS, nix.settings.substituters = ["ssh://192.168.1.80"] actually results in https://cache.nixos.org/ getting appended to the end automatically (as you can see with nix show-config). This does not happen with my Nix multi-user install on Debian.

So I suspect you were hitting https://cache.nixos.org/ through the laptop machine, not the desktop build machine.

So I believe we've been comparing apples and oranges.

Is there any way to get NixOS to not do this strange automatic appending? Related: NixOS/nixpkgs#158356 It seems people want this automatic appending, which feels strange to me. What about people who also want to offload the cache downloading to the build machine?

@stuser81
Copy link
Author

stuser81 commented Mar 25, 2023

I finally got it to work. These are the main changes I made since last time:

  • Instead of giving the client root user SSH access to the the build server root user, I gave it access to the ordinary non-root user instead. I must have misunderstood the documentation when they talked about "root". I thought everything had to be root across the board, but in fact only the client root user is relevant.
  • I added nix.settings.trusted-users = ["user"]; to the build server's configuration.nix. This appears to be important for reasons covered here: distributed builds require a trusted remote user #2789
  • I added nix.extraOptions = "builders-use-substitutes = true" also to the build server, but I have no idea if (kinda doubt) this did anything useful. Mentioning this for completeness.
  • I added require-sigs = false and changed builders in /etc/nix/nix.conf:
build-users-group = nixbld

extra-experimental-features = nix-command flakes
builders-use-substitutes = true
trusted-substituters = ssh://192.168.1.80
require-sigs = false
substituters = ssh://192.168.1.80
max-jobs = 0
builders = ssh://192.168.1.80  x86_64-linux  -  10  2  benchmark,big-parallel,kvm,nixos-test  -  -

Things started working after that. (I can't guarantee I did nothing else important but I doubt it.) I don't think the root vs. regular user thing was the real issue. The real issue was maybe that I didn't have any trusted-users before (which would have been nix.settings.trusted-users = ["root"]; while I was still accessing root) but I'm too lazy to verify it now. Also require-sigs = false avoided some warnings about untrusted substituters, which was possibly very important. (The builders change was likely only a minor change to fix an error about big-parallel being missing for one of the packages.)

TODO room for improvements: Replace require-sigs = false with a thing that only trusts cache.nixos.org public key (as the NixOS build server, or any Nix installation, does by default. It somehow makes sense that downstream machines, the laptop client machine in my case, also need to trust it if they want packages from it).

Worth noting:

When the client machine shows you tons of lines like

building '/nix/store/zsydnl4207d2vaa9n8kzksccqvv37npq-foo-3.0.0.drv' on 'ssh://192.168.1.80'...

it will not display if the build server is actually building it or if it's downloading from a substituter (e.g. cache.nixos.org). To find out which, I kept bmon (network monitor) and top (CPU monitor) open at the same time on the build server. I was indeed noticing downloads (no top activity) when I was grabbing something cached online and real building (top activity and build machine fan noise) when building something custom.

The inability of NixOS clients to leave cache.nixos.org out of substituters (which I can on Nix on Debian) remains a real issue - but it's a separate issue.

@colemickens
Copy link
Member

colemickens commented Apr 10, 2023

Yep, just started doing remote builds again recently and I'm seeing this again too.

edit: I don't mean to be too whiny, but after years, it's disappointing how little confidence I have in many scenarios around remote building.

@colemickens
Copy link
Member

colemickens commented Apr 10, 2023

And now I can't tell (after adding a regular "trusted-user") if it's working or not because it's still copying sources up. It would be great if there was a way to do a remote build with NIX_STORE=ssh-ng:// that acted like a local build and didn't do any copying.

EDIT: this is probably the result of me copying derivations to the remote, and the fact that some of my config uses IFD ?

@Animeshz
Copy link

Animeshz commented Jul 19, 2023

I'm still not able to get remote builder to pull packages from cache directly, it always goes through my local machine. Is there any exact steps if anybody got it right?

EDIT: Turns out I didn't had substituters set up on the remote build machine, adding a nix-channel and updating fixed it.

@teto
Copy link
Member

teto commented Apr 23, 2024

I see this as well. This makes remote builds longer than building locally :s I've checked my local and remote configs and everything looks sensible to me, I have no idea why I have to push stuff from my machine :s It doesn't help that the nix-daemon is very quit. Is there any flag to make it more verbose (--help is not that helpful either) ?
Also the nix client says:
copying /nix/store/.... to ssh://builderwhich is a bit ambiguous since you dont know if it's copying from cache or locally. bandwhich showed me that my machine was uploading to the builder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants