Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Bad file descriptor" and "Too many open files" #101459

Closed
FRidh opened this issue Oct 23, 2020 · 20 comments
Closed

"Bad file descriptor" and "Too many open files" #101459

FRidh opened this issue Oct 23, 2020 · 20 comments
Labels
0.kind: bug Something is broken

Comments

@FRidh
Copy link
Member

FRidh commented Oct 23, 2020

Describe the bug
Switching NixOS to cfed29b fails with

# nixos-rebuild switch
building the system configuration...
error (ignored): error: --- SysError ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- nix
opening directory '/tmp/nix-build-users-groups.json.drv-2': Too many open files
error (ignored): error: --- SysError ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- nix
opening directory '/nix/store/vyynl4q8f5ap34x1i3amfrh9wwh5j8m6-users-groups.json.drv.chroot': Bad file descriptor
error: --- SysError ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- nix
opening directory '/nix/store/74asavs7xp33immp3yxa9kbjgsmq67z3-texlive-robustindex-49877': Too many open files

Not including texlive.combined.scheme-full worked around the issue.

Metadata
Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

Maintainer information:

# a list of nixpkgs attributes affected by the problem
attribute:
# a list of nixos modules affected by the problem
module:

Probably related #100281.

@FRidh FRidh added the 0.kind: bug Something is broken label Oct 23, 2020
@andir
Copy link
Member

andir commented Oct 23, 2020

Just to mention the workaround until we get to actually tackle the problem:

$ sudo mount -o remount,nr_inodes=1M /tmp

We are running out of inodes on /tmp, since systemd v246 the default limit there it 400k which isn't really enough for mixing some of our larger builds (chromium, firefox, …).

The line that is causing the problem is systemd/systemd@7d85383#diff-ff9de8d0d78a6434e92fcd42eca4801d682efdc8c0539f901022b4374bd94a13R25.

One of the solutions that I'd like to see is a) reporting upstream & b) creating our own (customizable) mount unit instead of just importing this. The limit of 400k seems rather arbitrary and probably varies in usefulness between environments.

EDIT: the too many open files errors is actually a bit different from the error that I've been observing but might be the same source? @FRidh can you try with a stable version of Nix? Is nix maybe leaking FDs?

@FRidh
Copy link
Member Author

FRidh commented Oct 23, 2020

Using a configurable mount unit is #14777. I recall there was some discussion on that topic, where some said we should use upstream unit files to keep up with changes they make.

@xaverdh
Copy link
Contributor

xaverdh commented Oct 23, 2020

For me just building texlive in isolation (with nix build) worked. But that's more of a workaround really.

@aaronjanse
Copy link
Member

I also have this issue. Possible related; I switched to flakes today. I also had the issue while building texlive. The exact texlive package that fails appears to be random.

@jtojnar
Copy link
Member

jtojnar commented Oct 29, 2020

For me, building texlive on Ubuntu also caused this error, and I was able to fix it by increasing the soft limit on number of open file descriptors using ulimit -n 500000 (1024 by default). But I am using single-user install on my Ubuntu box so the fix will likely not apply to NixOS.

@FRidh
Copy link
Member Author

FRidh commented Nov 3, 2020

See NixOS/nix#4046. regarding leaking file descriptors.

@smaret
Copy link
Member

smaret commented Nov 14, 2020

I get a similar error:

error: --- SysError ------------------------------------------------------------------- nix
opening directory '/nix/store/d8sbz0scgbq1mj281cf2nvlwkqhc6h7q-ketcindy.r54074.tar.xz': Too many open files

on macOS 11.0 when installing TeXLive with flakes. Is this related?

@aanderse
Copy link
Member

I just ran into this as well...

@bqv
Copy link
Contributor

bqv commented Dec 15, 2020

Ditto

@DieracDelta
Copy link
Member

DieracDelta commented Dec 29, 2020

Also on flakes, also ran into this on texlive.

@fricklerhandwerk
Copy link
Contributor

Just encountered this when deploying to some random KVM guest with NixOps. You can increase the limits as proposed here:

  security.pam.loginLimits = [{
    domain = "*";
    type = "soft";
    item = "nofile";
    value = "4096";
  }];

@FRidh FRidh added this to the 21.05 milestone Feb 20, 2021
@siraben
Copy link
Member

siraben commented Apr 17, 2021

I ran into this issue today while reinstalling Nix on macOS, home-manager switch triggered too many open files

@Artturin
Copy link
Member

Fixed in 56d7e74#diff-13a08bcc3b07f25e2a556c8a2bc57422cebb5657df91cb222e3d0b9645e99174

We probably can't do anything about the macos issue

@schickling
Copy link

We probably can't do anything about the macOS issue

Are there (at least) some known workarounds for macOS?

@mstone
Copy link
Contributor

mstone commented Mar 18, 2022

Some notes on a possible workaround on macOS, based on some of the other comments/issues linked above:

(based on https://discussions.apple.com/thread/253001317?answerId=255632520022#255632520022)

  1. Create a file at /Library/LaunchDaemons/limit.maxfiles.plist
<?xml version="1.0" encoding="UTF-8"?>  
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"  
        "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">  
  <dict>
    <key>Label</key>
    <string>limit.maxfiles</string>
    <key>ProgramArguments</key>
    <array>
      <string>launchctl</string>
      <string>limit</string>
      <string>maxfiles</string>
      <string>64000</string>
      <string>524288</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>ServiceIPC</key>
    <false/>
  </dict>
</plist>
  1. Make sure the file is owned by root:wheel.

  2. Activate the config change by running

sudo launchctl load -w /Library/LaunchDaemons/limit.maxfiles.plist
  1. (If you like), test the change by running
launchctl limit maxfiles
  1. Finally, due to a race condition at startup, you’ll need to restart nix-daemon manually after every reboot (or until someone improves this outline to fix the race) by running something like:
sudo launchctl kickstart -k system/org.nixos.nix-daemon

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/bad-file-descriptor-during-nix-build-at-clean-git-nix-33-11/18848/1

@tobiasBora
Copy link
Contributor

I can experience this again, on NixOs when installing texlive with the documentation #171218

@nh2
Copy link
Contributor

nh2 commented Jun 8, 2022

I got this today during a nixops deployment (deplyoying and deployed machines are both NixOS 21.11):

example-worker-1......> copying path '/nix/store/3js38vjcks04xpdmwzz22rxn5k62cgjf-python3.9-zope.testing-4.9' from 'http://nixos-cache.example.com'...
example-worker-1......> creating file '/nix/store/3js38vjcks04xpdmwzz22rxn5k62cgjf-python3.9-zope.testing-4.9/lib/python3.9/site-packages/zope/testing/__pycache__/__init__.cpython-39.pyc': Too many open files
example-worker-1......> copying path '/nix/store/5a4swf17523g4v41w067gg97mm40cinm-dbus-1.12.20-lib' from 'http://nixos-cache.example.com'...
example-worker-1......> warning: unable to download 'http://cache.nixos.org/3js38vjcks04xpdmwzz22rxn5k62cgjf.narinfo': Couldn't resolve host name (6); retrying in 265 ms
example-worker-1......> warning: unable to download 'http://cache.nixos.org/3js38vjcks04xpdmwzz22rxn5k62cgjf.narinfo': Couldn't resolve host name (6); retrying in 534 ms
example-worker-1......> warning: unable to download 'http://cache.nixos.org/3js38vjcks04xpdmwzz22rxn5k62cgjf.narinfo': Couldn't resolve host name (6); retrying in 1111 ms
example-worker-1......> warning: unable to download 'http://cache.nixos.org/3js38vjcks04xpdmwzz22rxn5k62cgjf.narinfo': Couldn't resolve host name (6); retrying in 2181 ms
example-worker-1......> creating file '/nix/store/5a4swf17523g4v41w067gg97mm40cinm-dbus-1.12.20-lib/bin/dbus-launch': Too many open files
example-worker-1......> copying path '/nix/store/kqr525b69pq0kvsa53576mwb8xdjf8pi-libusb-1.0.23' from 'http://nixos-cache.example.com'...
example-worker-1......> copying path '/nix/store/s68ma0q8sr8gmpfwjjj91j41rwsyny5z-polkit-0.116' from 'http://nixos-cache.example.com'...
example-worker-1......> opening lock file '/nix/store/kqr525b69pq0kvsa53576mwb8xdjf8pi-libusb-1.0.23.lock': Too many open files
example-worker-1......> opening lock file '/nix/store/s68ma0q8sr8gmpfwjjj91j41rwsyny5z-polkit-0.116.lock': Too many open files
example-worker-1......> copying path '/nix/store/5a4swf17523g4v41w067gg97mm40cinm-dbus-1.12.20-lib' from 'http://cache.nixos.org'...
example-worker-1......> copying path '/nix/store/3js38vjcks04xpdmwzz22rxn5k62cgjf-python3.9-zope.testing-4.9' from 'http://cache.nixos.org'...
example-worker-1......> opening directory '/nix/store/5a4swf17523g4v41w067gg97mm40cinm-dbus-1.12.20-lib': Too many open files
example-worker-1......> warning: build of ...lots of stuff... failed

After this nixops immediately just continues with

example-worker-1......> copying 339 paths...

despite the failure.

https://discourse.nixos.org/t/bad-file-descriptor-during-nix-build-at-clean-git-nix-33-11/18848/2 says

So this appears to be from corrupt cache or download. Bad file descriptor error in this case most likely due to corrupt download of the cache.

Not sure what is going on.

@FRidh FRidh reopened this Jun 13, 2022
@FRidh
Copy link
Member Author

FRidh commented Jun 13, 2022

Should be fixed with #176558.

@FRidh FRidh closed this as completed Jun 13, 2022
@mstone
Copy link
Contributor

mstone commented Dec 3, 2022

BTW: I ran into this again on macOS (thus showing that the workaround I proposed in a comment above was insufficient) but this time found Artturin/nix@2320a2f / NixOS/nix#6645 which I think did what was really necessary.

Thanks @Artturin!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken
Projects
None yet
Development

No branches or pull requests