Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad file descriptor when building content addressed derivation #6516

Open
aciceri opened this issue May 10, 2022 · 17 comments
Open

Bad file descriptor when building content addressed derivation #6516

aciceri opened this issue May 10, 2022 · 17 comments
Labels
bug ca-derivations Derivations with content addressed outputs

Comments

@aciceri
Copy link
Member

aciceri commented May 10, 2022

Describe the bug

When I try to build ca derivations I get sporadic errors about "bad file descriptor"s. Sometimes they are built and sometimes not.

Steps To Reproduce

$ nix build --impure --expr '(import <nixpkgs> { config.contentAddressedByDefault = true; }).htop'
error (ignored): error: closing file descriptor 39: Bad file descriptor
error (ignored): error: closing file descriptor 25: Bad file descriptor
error: substitution of 'sha256:7abd5b35fa87ed09127c7f67000ef79752dbc9d33da6ee334a48f2bc7158eb38!out': read failed: Bad file descriptor

Same command again gives another output

error (ignored): error: cannot unlink '/tmp/nix-build-boost-build-boost-1.69.0.drv-0/boost_1_69_0/boost': Directory not empty
error (ignored): error: cannot unlink '/tmp/nix-build-audit-2.8.5.drv-0/audit-2.8.5': Directory not empty
error: closing file descriptor 23: Bad file descriptor

I've built nix from master (the latest commit) but I had the same problem with current nixos-unstable's nix (2.8) and even with nixUnstable on stable (2.5)

$ nix-env --version
nix-env (Nix) 2.9.0
$ nix show-config
accept-flake-config = false
access-tokens = 
allow-dirty = true
allow-import-from-derivation = true
allow-new-privileges = false
allow-symlinked-store = false
allow-unsafe-native-code-during-evaluation = false
allowed-impure-host-deps = 
allowed-uris = 
allowed-users = @wheel
auto-optimise-store = false
bash-prompt = 
bash-prompt-suffix = 
build-hook = /nix/store/7xc0d4pkxm2mly6glp5svqyp8qvfva7b-nix-2.8.0/libexec/nix/build-remote
build-poll-interval = 5
build-users-group = nixbld
builders = 
builders-use-substitutes = false
commit-lockfile-summary = 
compress-build-log = true
connect-timeout = 0
cores = 0
diff-hook = 
download-attempts = 5
enforce-determinism = true
eval-cache = true
experimental-features = ca-derivations flakes nix-command
extra-platforms = aarch64-linux i686-linux
fallback = false
filter-syscalls = true
flake-registry = https://github.com/NixOS/flake-registry/raw/master/flake-registry.json
fsync-metadata = true
gc-reserved-space = 8388608
hashed-mirrors = 
http-connections = 25
http2 = true
ignored-acls = security.selinux system.nfs4_acl
impersonate-linux-26 = false
keep-build-log = true
keep-derivations = true
keep-env-derivations = false
keep-failed = false
keep-going = false
keep-outputs = true
log-lines = 10
max-build-log-size = 0
max-free = 18446744073709551615
max-jobs = 4
max-silent-time = 0
min-free = 0
min-free-check-interval = 5
nar-buffer-size = 33554432
narinfo-cache-negative-ttl = 3600
narinfo-cache-positive-ttl = 2592000
netrc-file = /etc/nix/netrc
nix-path = nixpkgs=/nix/store/ni03h8nnld654lcqrs594lq1g1md818q-source nixos-config=/nix/store/317ij093b6ih587p35fk85sc6pkp58ms-nixos home-manager=/nix/store/33ybk6hkdwfd8w9xglbb42bzad7k3fpn-source
plugin-files = 
post-build-hook = 
pre-build-hook = 
preallocate-contents = false
print-missing = true
pure-eval = true
repeat = 0
require-sigs = true
restrict-eval = false
run-diff-hook = false
sandbox = true
sandbox-build-dir = /build
sandbox-dev-shm-size = 50%
sandbox-fallback = false
sandbox-paths = /bin/sh=/nix/store/z7hbia18niwzbaxydjyainfbnwc90xgv-busybox-static-x86_64-unknown-linux-musl-1.35.0/bin/busybox /nix/store/d5djp7cj3jsqy4m3lg4nvhfkka9pkhlm-qemu-6.1.1 /nix/store/l5smdvmw3nim4bndhddbxikaz9mfjpwj-bash-5.1-p8 /run/binfmt
secret-key-files = 
show-trace = false
stalled-download-timeout = 300
store = auto
substitute = true
substituters = https://nrdxp.cachix.org https://nixpkgs-wayland.cachix.org https://nix-community.cachix.org https://hydra.iohk.io https://cache.ngi0.nixos.org/ https://arm.cachix.org https://aciceri-fleet.cachix.org https://cache.nixos.org/ https://cache.nixos.org/ https://nrdxp.cachix.org https://nix-community.cachix.org
sync-before-registering = false
system = x86_64-linux
system-features = benchmark big-parallel kvm nixos-test
tarball-ttl = 3600
timeout = 0
trace-function-calls = false
trusted-public-keys = nrdxp.cachix.org-1:Fc5PSqY2Jm1TrWfm88l6cvGWwz3s93c6IOifQWnhNW4= nixpkgs-wayland.cachix.org-1:3lwxaILxMRkVhehr5StQprHdEo4IrE8sRho9R9HOLYA= nix-community.cachix.org-1:mB9FSh9qf2dCimDSUo8Zy7bkq5CX+/rkCWyvRCYg3Fs= hydra.iohk.io:f/Ea+s+dFdN+3Y/G+FDgSq+a5NEWhJGzdjvKNGv0/EQ= cache.ngi0.nixos.org-1:KqH5CBLNSyX184S9BKZJo1LxrxJ9ltnY2uAs5c/f1MA= arm.cachix.org-1:K3XjAeWPgWkFtSS9ge5LJSLw3xgnNqyOaG7MDecmTQ8= aciceri-fleet.cachix.org-1:WiHJIK4UFTdfvWx0lG3mCR4EddyYsRhIuMGSje3/YGI= cache.nixos.org-1:6NCHdD59X431o0gWypbMrAURkbJ16ZPMQFGspcDShjY= nrdxp.cachix.org-1:Fc5PSqY2Jm1TrWfm88l6cvGWwz3s93c6IOifQWnhNW4= nix-community.cachix.org-1:mB9FSh9qf2dCimDSUo8Zy7bkq5CX+/rkCWyvRCYg3Fs=
trusted-substituters = 
trusted-users = root @wheel hydra-queue-runner
use-case-hack = false
use-registries = true
use-sqlite-wal = true
user-agent-suffix = 
warn-dirty = true

Additional context

Sometimes I get "core dumped" in the nix daemon logs but not necessarily.

@aciceri aciceri added the bug label May 10, 2022
@aciceri
Copy link
Member Author

aciceri commented May 12, 2022

@thufschmitt You mentioned this same problem almost two years ago. Could you confirm this is still a bug and it doesn't depend on my particular configuration or derivations, please?

Forgive me for directly pinging you but I need to be sure that this is currently broken.

Moreover, if I repeat nix build several times in the end I get my derivation built. Waiting for a fix, how bad is this? Does it implies a "rotten" output or is as it should be despite errors?

I would like to help but this is my first time trying to read Nix's source code and I fear this bug isn't easy to fix. I can't even imagine what is causing it.

@thufschmitt
Copy link
Member

@aciceri I have these occasionnally (but might not be the same cause as the original one. It mostly disappeared until a couple of months ago). But I couldn’t manage to reproduce it in a deterministic-ish way :(

Moreover, if I repeat nix build several times in the end I get my derivation built. Waiting for a fix, how bad is this? Does it implies a "rotten" output or is as it should be despite errors?

I don’t think so. I think the error is “just” that some fds get (in a totally not deterministic fashion) closed too early or too late, but things seem to work well when it doesn’t happen

@Mindavi
Copy link
Contributor

Mindavi commented Jun 8, 2022

I'm getting this error quite regularly. I have the ngi0 cache enabled, which I think may be causing this issue to occur more often.

I also see this at one place in the journal:

nix-daemon[3456058]: terminate called after throwing an instance of 'nix::SysError'
nix-daemon[3456058]:   what():  error: closing file descriptor 7762532: Bad file descriptor

nix-daemon[3457576]: corrupted double-linked list
nix-daemon[3508170]: corrupted size vs. prev_size while consolidating

Is there a way to enable more debug logging to maybe catch more of what's happening internally? I get this error quite often (especially now I'm building a lot of thing), so I'd like to help with figuring out where the issue lies.

nix --version
nix (Nix) 2.9.0pre20220512_d354fc3
a possibly related coredump
Jun 08 21:04:28 nixos-asus nix-daemon[4142576]: nix-daemon: src/libutil/callback.hh:39: void nix::Callback<T>::rethrow(const std::__exception_ptr::exception_ptr&) [with T = std::shared_ptr<const nix::Realisation>]: Assertion `!prev' failed.
Jun 08 21:04:28 nixos-asus systemd[1]: Started Process Core Dump (PID 4148031/UID 0).
░░ Subject: A start job for unit systemd-coredump@136-4148031-0.service has finished successfully
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░ 
░░ A start job for unit systemd-coredump@136-4148031-0.service has finished successfully.
░░ 
░░ The job identifier is 195372.
Jun 08 21:04:31 nixos-asus systemd-coredump[4148041]: [🡕] Process 4142576 (nix-daemon) of user 0 dumped core.
                                                      
  Module linux-vdso.so.1 with build-id c4a82d529d4b2545e4713e225e47f30f8db0bd8a
  Module libnss_dns.so.2 with build-id b97e1f67518c08996cd53db45db04b03da0c4b4a
  Module libattr.so.1 without build-id.
  Module libresolv.so.2 with build-id a34d5b36b6ca2ae656d70f2d5ce85c4910e572c3
  Module libkeyutils.so.1 without build-id.
  Module libkrb5support.so.0 without build-id.
  Module libcom_err.so.3 without build-id.
  Module libk5crypto.so.3 without build-id.
  Module libkrb5.so.3 without build-id.
  Module libunistring.so.2 without build-id.
  Module libxml2.so.2 without build-id.
  Module libbz2.so.1 without build-id.
  Module liblzma.so.5 without build-id.
  Module libacl.so.1 without build-id.
  Module libbrotlicommon.so.1 without build-id.
  Module libaws-c-common.so.1 without build-id.
  Module libaws-checksums.so.1.0.0 without build-id.
  Module libaws-c-sdkutils.so.1.0.0 without build-id.
  Module libaws-c-cal.so.1.0.0 without build-id.
  Module libaws-c-compression.so.1.0.0 without build-id.
  Module libs2n.so without build-id.
  Module libaws-c-io.so.1.0.0 without build-id.
  Module libaws-c-http.so.1.0.0 without build-id.
  Module libaws-c-auth.so.1.0.0 without build-id.
  Module libaws-c-s3.so.0unstable without build-id.
  Module libaws-c-event-stream.so.1.0.0 without build-id.
  Module libaws-c-mqtt.so.1.0.0 without build-id.
  Module libaws-crt-cpp.so without build-id.
  Module libzstd.so.1 without build-id.
  Module libgssapi_krb5.so.2 without build-id.
  Module libssl.so.1.1 with build-id 9d817747495563f73cfc4c6d372c9bd0f836da49
  Module libssh2.so.1 without build-id.
  Module libidn2.so.0 without build-id.
  Module libnghttp2.so.14 without build-id.
  Module libz.so.1 without build-id.
  Module librt.so.1 with build-id 81dcc5b293a378ea5ee6ff8ba1e67b8f55381439
  Module libcpuid.so.15 without build-id.
  Module libarchive.so.13 without build-id.
  Module libbrotlidec.so.1 without build-id.
  Module libbrotlienc.so.1 without build-id.
  Module libseccomp.so.2 without build-id.
  Module libaws-cpp-sdk-core.so without build-id.
  Module libaws-cpp-sdk-s3.so without build-id.
  Module libaws-cpp-sdk-transfer.so without build-id.
  Module libcurl.so.4 with build-id c4c2c0869f5133fbacd8b8191c79e5cfa9029b1d
  Module libsqlite3.so.0 with build-id b6353da50d2d06c6f59459f63c8e7877c1fbd156
  Module libcrypto.so.1.1 with build-id 6823d8bff0118bd827d506ec1a5358388dbcac43
  Module libboost_context.so.1.77.0 without build-id.
  Module ld-linux-x86-64.so.2 with build-id 2d4e3d041d24aa7a72377e8cc41e9336abd77ffb
  Module libc.so.6 with build-id c512f38583c48b03cc0011d4583d15cea2e94d03
  Module libgcc_s.so.1 without build-id.
  Module libm.so.6 with build-id ab70ae196025b3056f0fc0ed85a43390cda0ea4a
  Module libstdc++.so.6 without build-id.
  Module libnixcmd.so with build-id f538586a445a781fc4c2202d154d20bca9e7f17b
  Module libnixutil.so with build-id aed8c0f26c81d8841983133cefec84414d64e060
  Module libnixstore.so with build-id 3d31b662c0fa9584645a3ec8013cc661a8243864
  Module libnixfetchers.so with build-id 93fc1793a75c2b3fb8e424f09a6ef680f162a28d
  Module libnixmain.so with build-id ab529116f6b89ac4edd22a7937808653374f2e93
  Module libdl.so.2 with build-id a4bfa88c56b19d9a13f5191cfe661e9b9f32140a
  Module libpthread.so.0 with build-id bff749b993405063a72406b65328ce34b1d20b2b
  Module libgc.so.1 with build-id 403c4ba7c8033c2a5e02672e93d50117ceb1b928
  Module libnixexpr.so with build-id e1013311c515c44248014891e43672bf4361acc6
  Module liblowdown.so.1 without build-id.
  Module libeditline.so.1 without build-id.
  Module libsodium.so.23 with build-id a2a3c8381c73e1058d450cc33715a46ba849d69a
  Module nix with build-id 19be8d9a11d8985077ba0a4f176d5052202a5544
  Stack trace of thread 4146833:
  #0  0x00007f87b3289c1f __pthread_kill_implementation (libc.so.6 + 0x87c1f)
  #1  0x00007f87b323f042 raise (libc.so.6 + 0x3d042)
  #2  0x00007f87b322a49c abort (libc.so.6 + 0x2849c)
  #3  0x00007f87b322a3d5 __assert_fail_base.cold.0 (libc.so.6 + 0x283d5)
  #4  0x00007f87b3238062 __assert_fail (libc.so.6 + 0x36062)
  #5  0x00007f87b392fd4a _ZN3nix8CallbackISt10shared_ptrIKNS_11RealisationEEE7rethrowERKNSt15__exception_ptr13exception_ptrE (libnixstore.so + 0x12fd4a)
  #6  0x00007f87b38d7c2c _ZZN3nix16BinaryCacheStore24queryRealisationUncachedERKNS_9DrvOutputENS_8CallbackISt10shared_ptrIKNS_11RealisationEEEEENKUlSt6futureISt8optionalINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEE_clESJ_.cold (libnixstore.so + 0xd7c2c)
  #7  0x00007f87b3926230 _ZNSt17_Function_handlerIFvSt6futureISt8optionalINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEEZN3nix16BinaryCacheStore24queryRealisationUncachedERKNSB_9DrvOutputENSB_8CallbackISt10shared_ptrIKNSB_11RealisationEEEEEUlS9_E_E9_M_invokeERKSt9_Any_dataOS9_ (libnixstore.so + 0x126230)
  #8  0x00007f87b3930660 _ZN3nix8CallbackISt8optionalINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEE7rethrowERKNSt15__exception_ptr13exception_ptrE (libnixstore.so + 0x130660)
  #9  0x00007f87b3a572bd _ZZN3nix20HttpBinaryCacheStore7getFileERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS_8CallbackISt8optionalIS6_EEEENKUlSt6futureINS_18FileTransferResultEEE_clESF_ (libnixstore.so + 0x2572bd)
  #10 0x00007f87b3a573af _ZNSt17_Function_handlerIFvSt6futureIN3nix18FileTransferResultEEEZNS1_20HttpBinaryCacheStore7getFileERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS1_8CallbackISt8optionalISB_EEEEUlS3_E_E9_M_invokeERKSt9_Any_dataOS3_ (libnixstore.so + 0x2573af)
  #11 0x00007f87b3a1e640 _ZN3nix8CallbackINS_18FileTransferResultEE7rethrowERKNSt15__exception_ptr13exception_ptrE (libnixstore.so + 0x21e640)
  #12 0x00007f87b3a22ec6 _ZN3nix16curlFileTransfer12TransferItemD1Ev (libnixstore.so + 0x222ec6)
  #13 0x0000564368bcb8ca _ZNSt16_Sp_counted_baseILN9__gnu_cxx12_Lock_policyE2EE10_M_releaseEv (nix + 0xbc8ca)
  #14 0x00007f87b3a1db47 _ZNSt8_Rb_treeIPvSt4pairIKS0_St10shared_ptrIN3nix16curlFileTransfer12TransferItemEEESt10_Select1stIS8_ESt4lessIS0_ESaIS8_EE8_M_eraseEPSt13_Rb_tree_nodeIS8_E.isra.0 (libnixstore.so + 0x21db47)
  #15 0x00007f87b3a2701a _ZN3nix16curlFileTransfer16workerThreadMainEv (libnixstore.so + 0x22701a)
  #16 0x00007f87b3a2770f _ZN3nix16curlFileTransfer17workerThreadEntryEv (libnixstore.so + 0x22770f)
  #17 0x00007f87b34db2e4 n/a (libstdc++.so.6 + 0xdb2e4)
  #18 0x00007f87b3287ff2 start_thread (libc.so.6 + 0x85ff2)
  #19 0x00007f87b330abfc __clone3 (libc.so.6 + 0x108bfc)
  
  Stack trace of thread 4142576:
  #0  0x00007f87b3284cb5 __futex_abstimed_wait_common (libc.so.6 + 0x82cb5)
  #1  0x00007f87b32899e3 __pthread_clockjoin_ex (libc.so.6 + 0x879e3)
  #2  0x00007f87b34db357 _ZNSt6thread4joinEv (libstdc++.so.6 + 0xdb357)
  #3  0x00007f87b3a1badd _ZNSt23_Sp_counted_ptr_inplaceIN3nix16curlFileTransferESaIS1_ELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv (libnixstore.so + 0x21badd)
  #4  0x00007f87b3a1bd5a _ZN3nix3refINS_16curlFileTransferEED2Ev (libnixstore.so + 0x21bd5a)
  #5  0x00007f87b3241205 __run_exit_handlers (libc.so.6 + 0x3f205)
  #6  0x00007f87b324138a exit (libc.so.6 + 0x3f38a)
  #7  0x0000564368c75cdc _ZNSt17_Function_handlerIFvvEZL10daemonLoopvEUlvE_E9_M_invokeERKSt9_Any_data (nix + 0x166cdc)
  #8  0x00007f87b3fb2e2d _ZNSt17_Function_handlerIFvvEZN3nix12startProcessESt8functionIS0_ERKNS1_14ProcessOptionsEEUlvE_E9_M_invokeERKSt9_Any_data (libnixutil.so + 0xf8e2d)
  #9  0x00007f87b3face09 _ZN3nixL6doForkEbSt8functionIFvvEE (libnixutil.so + 0xf2e09)
  #10 0x00007f87b3fb36e9 _ZN3nix12startProcessESt8functionIFvvEERKNS_14ProcessOptionsE (libnixutil.so + 0xf96e9)
  #11 0x0000564368c76f67 _ZL9runDaemonb (nix + 0x167f67)
  #12 0x0000564368c776fe _ZL15main_nix_daemoniPPc (nix + 0x1686fe)
  #13 0x0000564368cf54ec _ZN3nix11mainWrappedEiPPc (nix + 0x1e64ec)
  #14 0x00007f87b43fde89 _ZN3nix16handleExceptionsERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8functionIFvvEE (libnixmain.so + 0x34e89)
  #15 0x0000564368bbc846 main (nix + 0xad846)
  #16 0x00007f87b322b237 __libc_start_call_main (libc.so.6 + 0x29237)
  #17 0x00007f87b322b2f5 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x292f5)
  #18 0x0000564368bc0e71 _start (nix + 0xb1e71)
  ELF object binary architecture: AMD x86-64

another coredump that happened while trying to build ca-derivations
Jun 08 21:34:11 nixos-asus nix-daemon[638948]: corrupted double-linked list

  Module linux-vdso.so.1 with build-id c4a82d529d4b2545e4713e225e47f30f8db0bd8a
  Module libnss_dns.so.2 with build-id b97e1f67518c08996cd53db45db04b03da0c4b4a
  Module libattr.so.1 without build-id.
  Module libresolv.so.2 with build-id a34d5b36b6ca2ae656d70f2d5ce85c4910e572c3
  Module libkeyutils.so.1 without build-id.
  Module libkrb5support.so.0 without build-id.
  Module libcom_err.so.3 without build-id.
  Module libk5crypto.so.3 without build-id.
  Module libkrb5.so.3 without build-id.
  Module libunistring.so.2 without build-id.
  Module libxml2.so.2 without build-id.
  Module libbz2.so.1 without build-id.
  Module liblzma.so.5 without build-id.
  Module libacl.so.1 without build-id.
  Module libbrotlicommon.so.1 without build-id.
  Module libaws-c-common.so.1 without build-id.
  Module libaws-checksums.so.1.0.0 without build-id.
  Module libaws-c-sdkutils.so.1.0.0 without build-id.
  Module libaws-c-cal.so.1.0.0 without build-id.
  Module libaws-c-compression.so.1.0.0 without build-id.
  Module libs2n.so without build-id.
  Module libaws-c-io.so.1.0.0 without build-id.
  Module libaws-c-http.so.1.0.0 without build-id.
  Module libaws-c-auth.so.1.0.0 without build-id.
  Module libaws-c-s3.so.0unstable without build-id.
  Module libaws-c-event-stream.so.1.0.0 without build-id.
  Module libaws-c-mqtt.so.1.0.0 without build-id.
  Module libaws-crt-cpp.so without build-id.
  Module libzstd.so.1 without build-id.
  Module libgssapi_krb5.so.2 without build-id.
  Module libssl.so.1.1 with build-id 9d817747495563f73cfc4c6d372c9bd0f836da49
  Module libssh2.so.1 without build-id.
  Module libidn2.so.0 without build-id.
  Module libnghttp2.so.14 without build-id.
  Module libz.so.1 without build-id.
  Module librt.so.1 with build-id 81dcc5b293a378ea5ee6ff8ba1e67b8f55381439
  Module libcpuid.so.15 without build-id.
  Module libarchive.so.13 without build-id.
  Module libbrotlidec.so.1 without build-id.
  Module libbrotlienc.so.1 without build-id.
  Module libseccomp.so.2 without build-id.
  Module libaws-cpp-sdk-core.so without build-id.
  Module libaws-cpp-sdk-s3.so without build-id.
  Module libaws-cpp-sdk-transfer.so without build-id.
  Module libcurl.so.4 with build-id c4c2c0869f5133fbacd8b8191c79e5cfa9029b1d
  Module libsqlite3.so.0 with build-id b6353da50d2d06c6f59459f63c8e7877c1fbd156
  Module libcrypto.so.1.1 with build-id 6823d8bff0118bd827d506ec1a5358388dbcac43
  Module libboost_context.so.1.77.0 without build-id.
  Module ld-linux-x86-64.so.2 with build-id 2d4e3d041d24aa7a72377e8cc41e9336abd77ffb
  Module libc.so.6 with build-id c512f38583c48b03cc0011d4583d15cea2e94d03
  Module libgcc_s.so.1 without build-id.
  Module libm.so.6 with build-id ab70ae196025b3056f0fc0ed85a43390cda0ea4a
  Module libstdc++.so.6 without build-id.
  Module libnixcmd.so with build-id f538586a445a781fc4c2202d154d20bca9e7f17b
  Module libnixutil.so with build-id aed8c0f26c81d8841983133cefec84414d64e060
  Module libnixstore.so with build-id 3d31b662c0fa9584645a3ec8013cc661a8243864
  Module libnixfetchers.so with build-id 93fc1793a75c2b3fb8e424f09a6ef680f162a28d
  Module libnixmain.so with build-id ab529116f6b89ac4edd22a7937808653374f2e93
  Module libdl.so.2 with build-id a4bfa88c56b19d9a13f5191cfe661e9b9f32140a
  Module libpthread.so.0 with build-id bff749b993405063a72406b65328ce34b1d20b2b
  Module libgc.so.1 with build-id 403c4ba7c8033c2a5e02672e93d50117ceb1b928
  Module libnixexpr.so with build-id e1013311c515c44248014891e43672bf4361acc6
  Module liblowdown.so.1 without build-id.
  Module libeditline.so.1 without build-id.
  Module libsodium.so.23 with build-id a2a3c8381c73e1058d450cc33715a46ba849d69a
  Module nix with build-id 19be8d9a11d8985077ba0a4f176d5052202a5544
  Stack trace of thread 676248:
  #0  0x00007f87b3289c1f __pthread_kill_implementation (libc.so.6 + 0x87c1f)
  #1  0x00007f87b323f042 raise (libc.so.6 + 0x3d042)
  #2  0x00007f87b322a49c abort (libc.so.6 + 0x2849c)
  #3  0x00007f87b327e3f8 __libc_message (libc.so.6 + 0x7c3f8)
  #4  0x00007f87b329329a malloc_printerr (libc.so.6 + 0x9129a)
  #5  0x00007f87b329375c unlink_chunk.isra.0 (libc.so.6 + 0x9175c)
  #6  0x00007f87b3294d5b _int_free (libc.so.6 + 0x92d5b)
  #7  0x00007f87b3297491 free (libc.so.6 + 0x95491)
  #8  0x00007f87b3a1dab2 _ZNSt8_Rb_treeIPvSt4pairIKS0_St10shared_ptrIN3nix16curlFileTransfer12TransferItemEEESt10_Select1stIS8_ESt4lessIS0_ESaIS8_EE8_M_eraseEPSt13_Rb_tree_nodeIS8_E.isra.0 (libnixstore.so + 0x21dab2)
  #9  0x00007f87b3a2701a _ZN3nix16curlFileTransfer16workerThreadMainEv (libnixstore.so + 0x22701a)
  #10 0x00007f87b3a2770f _ZN3nix16curlFileTransfer17workerThreadEntryEv (libnixstore.so + 0x22770f)
  #11 0x00007f87b34db2e4 n/a (libstdc++.so.6 + 0xdb2e4)
  #12 0x00007f87b3287ff2 start_thread (libc.so.6 + 0x85ff2)
  #13 0x00007f87b330abfc __clone3 (libc.so.6 + 0x108bfc)
  
  Stack trace of thread 638948:
  #0  0x00007f87b3284cb5 __futex_abstimed_wait_common (libc.so.6 + 0x82cb5)
  #1  0x00007f87b32899e3 __pthread_clockjoin_ex (libc.so.6 + 0x879e3)
  #2  0x00007f87b34db357 _ZNSt6thread4joinEv (libstdc++.so.6 + 0xdb357)
  #3  0x00007f87b3a1badd _ZNSt23_Sp_counted_ptr_inplaceIN3nix16curlFileTransferESaIS1_ELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv (libnixstore.so + 0x21badd)
  #4  0x00007f87b3a1bd5a _ZN3nix3refINS_16curlFileTransferEED2Ev (libnixstore.so + 0x21bd5a)
  #5  0x00007f87b3241205 __run_exit_handlers (libc.so.6 + 0x3f205)
  #6  0x00007f87b324138a exit (libc.so.6 + 0x3f38a)
  #7  0x0000564368c75cdc _ZNSt17_Function_handlerIFvvEZL10daemonLoopvEUlvE_E9_M_invokeERKSt9_Any_data (nix + 0x166cdc)
  #8  0x00007f87b3fb2e2d _ZNSt17_Function_handlerIFvvEZN3nix12startProcessESt8functionIS0_ERKNS1_14ProcessOptionsEEUlvE_E9_M_invokeERKSt9_Any_data (libnixutil.so + 0xf8e2d)
  #9  0x00007f87b3face09 _ZN3nixL6doForkEbSt8functionIFvvEE (libnixutil.so + 0xf2e09)
  #10 0x00007f87b3fb36e9 _ZN3nix12startProcessESt8functionIFvvEERKNS_14ProcessOptionsE (libnixutil.so + 0xf96e9)
  #11 0x0000564368c76f67 _ZL9runDaemonb (nix + 0x167f67)
  #12 0x0000564368c776fe _ZL15main_nix_daemoniPPc (nix + 0x1686fe)
  #13 0x0000564368cf54ec _ZN3nix11mainWrappedEiPPc (nix + 0x1e64ec)
  #14 0x00007f87b43fde89 _ZN3nix16handleExceptionsERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8functionIFvvEE (libnixmain.so + 0x34e89)
  #15 0x0000564368bbc846 main (nix + 0xad846)
  #16 0x00007f87b322b237 __libc_start_call_main (libc.so.6 + 0x29237)
  #17 0x00007f87b322b2f5 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x292f5)
  #18 0x0000564368bc0e71 _start (nix + 0xb1e71)
  ELF object binary architecture: AMD x86-64

@aciceri
Copy link
Member Author

aciceri commented Jun 8, 2022

@Mindavi I can't see any connection with the ngi0 cache to be honest, what do you mean? These errors happen during the building of derivations, the more things it can fetch from the cache and the lower the chances of these errors occurring are.

However I would really like that ca-derivations would work too and I'm available to work on this but this is the first time I put my hands on nix source itself. @thufschmitt What do you recommend? Which preliminary readings to really understand how ca-derivations work at a low level? Which files in the source are really involved? Is there a way to better debug what is happening?

@dasJ
Copy link
Member

dasJ commented Jul 4, 2022

It also seems to happen without the ngi0 cache: https://github.com/helsinki-systems/harmonia/runs/7183636987?check_suite_focus=true

@fogti
Copy link
Contributor

fogti commented Jul 4, 2022

This also seems to happen without using content-addressed derivations.

@thufschmitt thufschmitt added the ca-derivations Derivations with content addressed outputs label Aug 3, 2022
@L-as
Copy link
Member

L-as commented Oct 6, 2022

I can reliably reproduce this when using ca-derivations. Is there any way I can debug this? This is quite annoying.

@fogti
Copy link
Contributor

fogti commented Oct 6, 2022

This now happens to me repeatedly when not using CA derivations (when building stuff) and is really annoying, although probably difficult to debug, as there is no information about what type of file descriptor has this "use-after-free" like problem. Maybe such diagnostic should be added.

@L-as
Copy link
Member

L-as commented Oct 6, 2022

Just removing close calls one by one could tell you at least which one it is.

@fogti
Copy link
Contributor

fogti commented Oct 6, 2022

that will just cause nix to quickly run out of file descriptors, I suppose (given it opens thousands of them in mere seconds regularly)

@L-as
Copy link
Member

L-as commented Oct 26, 2022

Note: I've successfully worked around this by doing ulimit -n $((1024 * 1024)), note, you need to increase the hard limit in your NixOS config like so (IIRC):

{
    systemd.extraConfig = "DefaultLimitNOFILE=1048576";
}

Seems like the default of 1024 is causing issues, but I'm not sure it's worth fixing if just increasing it fixes it.

@L-as
Copy link
Member

L-as commented Oct 26, 2022

@thufschmitt

@L-as
Copy link
Member

L-as commented Oct 26, 2022

It doesn't seem to solve the problem entirely unfortunately. Perhaps it needs to be a very high value to avoid it.

@fogti
Copy link
Contributor

fogti commented Oct 27, 2022

Can confirm that even outside of this bug Nix frequently runs into problems with a low ulimit -n

@Mindavi
Copy link
Contributor

Mindavi commented Nov 22, 2022

Still seeing this, maybe it helps that I'm now building with ubsan enabled.

The crash from nix-daemon
Nov 22 20:55:43 nixos-asus nix-daemon[133090]: /nix/store/1gf2flfqnpqbr1b4p4qz2f72y42bs56r-gcc-11.3.0/include/c++/11.3.0/future:498:43: runtime error: member call on address 0x555d809ed190 which does not point to an object of type '_Result'
Nov 22 20:55:43 nixos-asus nix-daemon[133090]: 0x555d809ed190: note: object has invalid vptr
Nov 22 20:55:43 nixos-asus nix-daemon[133090]:  00 00 00 00  7d 19 42 d5 58 55 00 00  00 00 00 00 00 00 00 00  c8 dc 03 44 27 7f 00 00  40 d3 03 44
Nov 22 20:55:43 nixos-asus nix-daemon[133090]:               ^~~~~~~~~~~~~~~~~~~~~~~
Nov 22 20:55:43 nixos-asus nix-daemon[133090]:               invalid vptr
Nov 22 20:55:43 nixos-asus nix-daemon[133090]: /nix/store/1gf2flfqnpqbr1b4p4qz2f72y42bs56r-gcc-11.3.0/include/c++/11.3.0/future:258:4: runtime error: member access within address 0x555d809ed190 which does not point to an object of type '_Result'
Nov 22 20:55:43 nixos-asus nix-daemon[133090]: 0x555d809ed190: note: object has invalid vptr
Nov 22 20:55:43 nixos-asus nix-daemon[133090]:  00 00 00 00  7d 19 42 d5 58 55 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
Nov 22 20:55:43 nixos-asus nix-daemon[133090]:               ^~~~~~~~~~~~~~~~~~~~~~~
Nov 22 20:55:43 nixos-asus nix-daemon[133090]:               invalid vptr
Nov 22 20:55:43 nixos-asus kernel: traps: nix-daemon[136604] general protection fault ip:7f27c62b3aaa sp:7f27997f80a0 error:0 in libstdc++.so.6.0.29[7f27c62a1000+f8000]
Nov 22 20:55:44 nixos-asus systemd-coredump[136861]: [🡕] Process 133090 (nix-daemon) of user 0 dumped core.
                                                     
  Module nix with build-id 83e802aa55fb0ba3f6995330f514aa9f785bb6c9
  Stack trace of thread 136604:
  #0  0x00007f27c62b3aaa __dynamic_cast (libstdc++.so.6 + 0xb3aaa)
  #1  0x00007f27c580fe4b _ZN7__ubsan16checkDynamicTypeEPvS0_m (libubsan.so.1 + 0xfe4b)
  #2  0x00007f27c580e8f3 _ZL26HandleDynamicTypeCacheMissPN7__ubsan24DynamicTypeCacheMissDataEmmNS_13ReportOptionsE (libubsan.so.1 + 0xe8f3)
  #3  0x00007f27c580efe5 __ubsan_handle_dynamic_type_cache_miss (libubsan.so.1 + 0xefe5)
  #4  0x00007f27c87437d5 _ZNSt17_Function_handlerIFSt10unique_ptrINSt13__future_base12_Result_baseENS2_8_DeleterEEvENS1_13_State_baseV27_SetterISt10shared_ptrIKN3nix11RealisationEEOSC_EEE9_M_invokeERKSt9_Any_data (libnixstore.so + 0xd437d5)
  #5  0x00007f27c868f945 _ZNSt13__future_base13_State_baseV29_M_do_setEPSt8functionIFSt10unique_ptrINS_12_Result_baseENS3_8_DeleterEEvEEPb (libnixstore.so + 0xc8f945)
  #6  0x00007f27c86f118b _ZZNSt9once_flag18_Prepare_executionC4IZSt9call_onceIMNSt13__future_base13_State_baseV2EFvPSt8functionIFSt10unique_ptrINS3_12_Result_baseENS7_8_DeleterEEvEEPbEJPS4_SC_SD_EEvRS_OT_DpOT0_EUlvE_EERSI_ENKUlvE_clEv (libnixstore.so + 0xcf118b)
  #7  0x00007f27c86f1273 _ZZNSt9once_flag18_Prepare_executionC4IZSt9call_onceIMNSt13__future_base13_State_baseV2EFvPSt8functionIFSt10unique_ptrINS3_12_Result_baseENS7_8_DeleterEEvEEPbEJPS4_SC_SD_EEvRS_OT_DpOT0_EUlvE_EERSI_ENUlvE_4_FUNEv (libnixstore.so + 0xcf1273)
  #8  0x00007f27c548e047 __pthread_once_slow (libc.so.6 + 0x8e047)
  #9  0x00007f27c86beba8 _ZSt9call_onceIMNSt13__future_base13_State_baseV2EFvPSt8functionIFSt10unique_ptrINS0_12_Result_baseENS4_8_DeleterEEvEEPbEJPS1_S9_SA_EEvRSt9once_flagOT_DpOT0_ (libnixstore.so + 0xcbeba8)
  #10 0x00007f27c86bed4d _ZNSt13__future_base13_State_baseV213_M_set_resultESt8functionIFSt10unique_ptrINS_12_Result_baseENS3_8_DeleterEEvEEb (libnixstore.so + 0xcbed4d)
  #11 0x00007f27c89909b5 _ZNSt17_Function_handlerIFvSt6futureISt10shared_ptrIKN3nix11RealisationEEEEZNS2_25DrvOutputSubstitutionGoal7tryNextEvEUlS6_E_E9_M_invokeERKSt9_Any_dataOS6_ (libnixstore.so + 0xf909b5)
  #12 0x00007f27c87161b5 _ZN3nix8CallbackISt10shared_ptrIKNS_11RealisationEEEclEOS4_ (libnixstore.so + 0xd161b5)
  #13 0x00007f27c9222942 _ZNSt17_Function_handlerIFvSt6futureISt10shared_ptrIKN3nix11RealisationEEEEZNS2_5Store16queryRealisationERKNS2_9DrvOutputENS2_8CallbackIS5_EEEUlS6_E_E9_M_invokeERKSt9_Any_dataOS6_.lto_priv.0 (libnixstore.so + 0x1822942)
  #14 0x00007f27c87161b5 _ZN3nix8CallbackISt10shared_ptrIKNS_11RealisationEEEclEOS4_ (libnixstore.so + 0xd161b5)
  #15 0x00007f27c86cffb3 _ZZN3nix16BinaryCacheStore24queryRealisationUncachedERKNS_9DrvOutputENS_8CallbackISt10shared_ptrIKNS_11RealisationEEEEENKUlSt6futureISt8optionalINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEE_clESJ_.lto_priv.0 (libnixstore.so + 0xccffb3)
  #16 0x00007f27c8749331 _ZNSt17_Function_handlerIFvSt6futureISt8optionalINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEEZN3nix16BinaryCacheStore24queryRealisationUncachedERKNSB_9DrvOutputENSB_8CallbackISt10shared_ptrIKNSB_11RealisationEEEEEUlS9_E_E9_M_invokeERKSt9_Any_dataOS9_.lto_priv.0 (libnixstore.so + 0xd49331)
  #17 0x00007f27c86c18fa _ZN3nix8CallbackISt8optionalINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEclEOS8_ (libnixstore.so + 0xcc18fa)
  #18 0x00007f27c8e4e886 _ZNSt17_Function_handlerIFvSt6futureIN3nix18FileTransferResultEEEZNS1_20HttpBinaryCacheStore7getFileERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS1_8CallbackISt8optionalISB_EEEEUlS3_E_E9_M_invokeERKSt9_Any_dataOS3_ (libnixstore.so + 0x144e886)
  #19 0x00007f27c8ce609c _ZN3nix8CallbackINS_18FileTransferResultEE7rethrowERKNSt15__exception_ptr13exception_ptrE (libnixstore.so + 0x12e609c)
  #20 0x00007f27c8d13f6a _ZN3nix16curlFileTransfer16workerThreadMainEv (libnixstore.so + 0x1313f6a)
  #21 0x00007f27c8d48246 _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN3nix16curlFileTransferC4EvEUlvE_EEEEE6_M_runEv (libnixstore.so + 0x1348246)
  #22 0x00007f27c62df3d4 execute_native_thread_routine (libstdc++.so.6 + 0xdf3d4)
  #23 0x00007f27c5488e86 start_thread (libc.so.6 + 0x88e86)
  #24 0x00007f27c550fc60 __clone3 (libc.so.6 + 0x10fc60)
     
Stack trace of thread 133090:
  #0  0x00007f27c54857d5 __futex_abstimed_wait_common (libc.so.6 + 0x857d5)
  #1  0x00007f27c548a953 __pthread_clockjoin_ex (libc.so.6 + 0x8a953)
  #2  0x00007f27c62df447 _ZNSt6thread4joinEv (libstdc++.so.6 + 0xdf447)
  #3  0x00007f27c8cebe65 _ZN3nix16curlFileTransferD2Ev (libnixstore.so + 0x12ebe65)
  #4  0x00007f27c8d2789c _ZNSt23_Sp_counted_ptr_inplaceIN3nix16curlFileTransferESaIS1_ELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv (libnixstore.so + 0x132789c)
  #5  0x00007f27c8cd7291 _ZN3nix3refINS_16curlFileTransferEED2Ev (libnixstore.so + 0x12d7291)
  #6  0x00007f27c54400c5 __run_exit_handlers (libc.so.6 + 0x400c5)
  #7  0x00007f27c544024e exit (libc.so.6 + 0x4024e)
  #8  0x0000555d7f0e2243 _ZNSt17_Function_handlerIFvvEZL10daemonLoopvEUlvE_E9_M_invokeERKSt9_Any_data (nix + 0xcb0243)
  #9  0x00007f27c73d52e3 _ZNKSt8functionIFvvEEclEv (libnixutil.so + 0x5d52e3)
  #10 0x00007f27c764f0ef _ZNSt17_Function_handlerIFvvEZN3nix12startProcessESt8functionIS0_ERKNS1_14ProcessOptionsEEUlvE_E9_M_invokeERKSt9_Any_data.lto_priv.0 (libnixutil.so + 0x84f0ef)
  #11 0x00007f27c73d52e3 _ZNKSt8functionIFvvEEclEv (libnixutil.so + 0x5d52e3)
  #12 0x00007f27c760ae04 _ZN3nixL6doForkEbSt8functionIFvvEE (libnixutil.so + 0x80ae04)
  #13 0x00007f27c762ef0a _ZN3nix12startProcessESt8functionIFvvEERKNS_14ProcessOptionsE (libnixutil.so + 0x82ef0a)
  #14 0x0000555d7f0e95e6 _ZL9runDaemonb (nix + 0xcb75e6)
  #15 0x0000555d7f0e9e23 _ZL15main_nix_daemoniPPc (nix + 0xcb7e23)
  #16 0x0000555d7ed3feec _ZNSt17_Function_handlerIFviPPcEPFiiS1_EE9_M_invokeERKSt9_Any_dataOiOS1_ (nix + 0x90deec)
  #17 0x0000555d7f353764 _ZN3nix11mainWrappedEiPPc (nix + 0xf21764)
  #18 0x0000555d7f358d38 _ZNSt17_Function_handlerIFvvEZ4mainEUlvE_E9_M_invokeERKSt9_Any_data (nix + 0xf26d38)
  #19 0x00007f27ca98d03c _ZN3nix16handleExceptionsERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESt8functionIFvvEE (libnixmain.so + 0x18d03c)
  #20 0x0000555d7f336750 main (nix + 0xf04750)
  #21 0x00007f27c542924e __libc_start_call_main (libc.so.6 + 0x2924e)
  #22 0x00007f27c5429309 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x29309)
  #23 0x0000555d7ed3f6d5 _start (nix + 0x90d6d5)
  ELF object binary architecture: AMD x86-64

I demangled some of the symbols from this and it seems to be pointing to somewhere here:

queryRealisationUncached(
id,
{ [this, id, callbackPtr](
std::future<std::shared_ptr<const Realisation>> fut) {
try {
auto info = fut.get();
if (diskCache) {
if (info)
diskCache->upsertRealisation(getUri(), *info);
else
diskCache->upsertAbsentRealisation(getUri(), id);
}
(*callbackPtr)(std::shared_ptr<const Realisation>(info));
} catch (...) {
callbackPtr->rethrow();
}
} });
}

@Radvendii
Copy link
Contributor

I have been working on debugging this, but haven't been able to reliably reproduce it. I got the error myself quite a few times when I first enabled CA derivations, but now it's not happening at all. Anyone who is still consistently getting the error: can you make a small example flake that is (somewhat) reliably triggering it? (with nix-store --delete in between invocations, presumably)

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/content-addressed-nix-call-for-testers/12881/217

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug ca-derivations Derivations with content addressed outputs
Projects
None yet
Development

No branches or pull requests

8 participants