Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetching paths from cache.nixos.org can get 'stuck' #160289

Closed
raboof opened this issue Feb 16, 2022 · 26 comments
Closed

Fetching paths from cache.nixos.org can get 'stuck' #160289

raboof opened this issue Feb 16, 2022 · 26 comments

Comments

@raboof
Copy link
Member

raboof commented Feb 16, 2022

Describe the bug

Sometimes, nix gets stuck fetching paths:

$ nix-shell -p sbt
these 2 paths will be fetched (408.08 MiB download, 694.33 MiB unpacked):
  /nix/store/0kg8zm25k6av8hgf177kwnp9q5j6yrf3-openjdk-17.0.1+12
  /nix/store/wj7x8ik896h0k9r6fk0bns21xw5c18yh-sbt-1.6.2
copying path '/nix/store/0kg8zm25k6av8hgf177kwnp9q5j6yrf3-openjdk-17.0.1+12' from 'https://cache.nixos.org'...

Stalling without any network traffic.

Interestingly this seems to be somehow tied to the specific path: nix can download other paths just fine, but always gets stuck on (in this case) /nix/store/0kg8zm25k6av8hgf177kwnp9q5j6yrf3-openjdk-17.0.1+12.

It has something to do with the system state: rebooting the machine makes it possible to fetch the path again, but killing all nix-daemon processes without rebooting doesn't seem to help.

edit: it does seem to be something around nix-daemon though, because when rebooting I see systemd-shutdown waiting for several nix-daemon processes to shut down which takes quite a while.

If anyone can suggest ways to further diagnose the problem I'd be happy to try things out!

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 5.16.5, NixOS, 22.05 (Quokka)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.6.0`
 - channels(root): `"nixos-20.03pre194293.2436c27541b"`
 - nixpkgs: `/home/aengelen/nixpkgs`

(current profile is at 9f697d6, nixpkgs is at 48d63e9)

@kenranunderscore
Copy link
Contributor

I'm seeing the same just now. For me lots of builds worked today, but I get stuck at /nix/store/062izp7vafcszxbip77mwya90rzkcg19-mbrola-3.3 specifically, also after killing the nix-daemon processes.

 - system: `"x86_64-linux"`
 - host os: `Linux 5.10.95, NixOS, 22.05 (Quokka)`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.7.0pre20220127_558c4ee`
 - nixpkgs: `/nix/var/nix/profiles/per-user/root/channels/nixos`

@06kellyjac
Copy link
Member

It seems to be related to remote builds. I have to sudo kill -9 the processes

https://discourse.nixos.org/t/remote-building-experienence-massively-degraded-since-2-3-months/16950

@raboof
Copy link
Member Author

raboof commented Feb 16, 2022

It seems to be related to remote builds.

Hmm, I don't use remote builds though

I have to sudo kill -9 the processes

Which processes? killall -9 nix-daemon did kill/restart them, but didn't solve the problem for me.

@06kellyjac
Copy link
Member

if you run that as a normal user it'll ignore you

@kenranunderscore
Copy link
Contributor

kenranunderscore commented Feb 16, 2022

I ran it with sudo pkill -9 nix-daemon (or some such) and it didn't fix my problem either. Also no remote builds here.
Even a reboot didn't fix it at first.

After the reboot and failed attempt I tried to nix-collect-garbage -d && sudo nix-collect-garbage -d, after which it started working again.

@raboof
Copy link
Member Author

raboof commented Feb 16, 2022

if you run that as a normal user it'll ignore you

(I sudo'ed, and verified in ps aux that the processes were really gone/replaced)

@06kellyjac
Copy link
Member

ohh, did kill them. I misread.

Well after I ran into this problem a couple times and removed remote builders I've not encountered it again since.
I did also notice during those period that ping-ing cache.nixos.org also hung for like 10 seconds plus

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/stuck-at-copying-path-pkg-path-from-https-cache-nixos-org-even-after-turning-off-network/17564/5

@Artturin
Copy link
Member

I can repro

@edolstra @roberth

@roberth
Copy link
Member

roberth commented Feb 17, 2022

@Artturin

I can repro

How?

Here's a fix for a deadlock involving CurlDownloader and the interrupt handlers. May or may not be related. Does it repro with this fix applied?

@Artturin
Copy link
Member

@Artturin

I can repro

How?

Here's a fix for a deadlock involving CurlDownloader and the interrupt handlers. May or may not be related. Does it repro with this fix applied?

* [Fix deadlocked nix-daemon zombies on darwin #3294 nix#6052](https://github.com/NixOS/nix/pull/6052)

Sorry I should have said that I have the same issue.

I'll apply that pr and see if I get this issue anymore

@Artturin
Copy link
Member

i've been running nix with this overlay since 5 days ago and i haven't gotten the same problem

      (self: super: {
        nixUnstable = super.nixUnstable.overrideAttrs (old: {
          src = super.fetchFromGitHub {
            owner = "nixos";
            repo = "nix";
            rev = "a768e85e2fb3b0500829bc42cdc137176481bedf";
            sha256 = "sha256-XtjveEPLGhHPh95+yequcLajtyCskSUTx+XZsP4UqA0=";
          };
          patches = (old.patches or [ ]) ++ [
            (super.fetchpatch {
              url = "https://github.com/NixOS/nix/commit/c3b942e0fc4777f9033f614b6b1f462c0f8c473e.patch";
              sha256 = "sha256-LQ5zkXwv1/3DoZz6CevyipdHTdxX7XlXv3nbmgGsngA=";
            })
          ];
        });
      })

updated #158455 to the newest nix commit to include the pr

@miallo
Copy link
Contributor

miallo commented Feb 23, 2022

@Artturin Just as a warning that you might just be lucky: I also ran into the stuck downloads once and after following the instructions by @kenranunderscore (nix-collect-garbage) I could do the updates and for the last 7 days it also worked for me without the overlay.

Did anyone figure out how to reproduce this bug?

@raboof
Copy link
Member Author

raboof commented Feb 23, 2022

I also haven't seen stuck downloads for the past week or so...

@r-burns r-burns mentioned this issue Feb 26, 2022
13 tasks
@r-burns
Copy link
Contributor

r-burns commented Feb 26, 2022

I'm still getting this on NixOS unstable with Nix 2.6.1. Is there a workaround that doesn't require collecting garbage? I have a couple compute-expensive derivations on my hard drive I'd rather not have to rebuild....

@Artturin
Copy link
Member

Artturin commented Feb 26, 2022

reboot

and optionally add gcroots for derivs

@raboof
Copy link
Member Author

raboof commented May 15, 2022

I haven't seen this for a while now. Let's close for now and re-open when we see evidence it's still a problem.

@raboof raboof closed this as completed May 15, 2022
@sheldonneuberger-sc
Copy link
Contributor

sheldonneuberger-sc commented Aug 28, 2022

I've hit this issue a few times in the past week. It's always the first package that is attempted to be copied from cache.nixos.org, which for me is /nix/store/8pil82i9qfww8l752158kqsa8dc2fzfl-python3.9-pyyaml-6.0. Haven't found any more info on it yet. host os: Darwin 21.5.0, macOS 10.16, nix-env (Nix) 2.10.3

@raboof
Copy link
Member Author

raboof commented Aug 29, 2022

Interesting. The problem hasn't returned for me, and I think in my case it wasn't the first package, so it might not be the same problem. Perhaps it would be better to add a new issue (and link to this one for possible background) rather than re-open this?

@uuuvn
Copy link

uuuvn commented Jul 19, 2023

I was hit by a similar problem (but copying files from host to a vds via remote nixos-rebuild). nix-collect-garbage on both host and vds did nothing but upgrading host's system did solved it

@sheldonneuberger-sc
Copy link
Contributor

In my case, this was due to a bad routing configuration between T-mobile and Fastly CDN (which hosts cache.nixos.org), which was causing network traffic to cache.nixos.org to route from LAX to Argentina (traceroute indicated this). Fastly/tmobile fixed this specific routing table misconfiguration, but since then I've moved to my own substituter on cloudfront so I can't say whether the issue has occurred again.

@kquick
Copy link
Contributor

kquick commented Feb 27, 2024

Encountered a similar issue: nix operations would just hang when trying to download (the perl package in my case). Was able to resolve with:

$ systemctl stop nix-daemon.socket
$ systemctl stop nix-daemon
$ nix store gc
$ systemctl start nix-daemon.socket
$ systemctl start nix-daemon

plus killing all active/hung nix attempts.

Running nix 2.18.1 on nixos 12.11 (Tapir)

piotr-semenov added a commit to piotr-semenov/parigp-lang that referenced this issue May 10, 2024
@jkarni
Copy link
Contributor

jkarni commented Jun 27, 2024

I also have seen this a few times. Killing the daemon did the trick.

@nebez
Copy link

nebez commented Aug 2, 2024

I've been having this issue in my Docker containers. FROM nixos/nix:latest has been failing for ~a few weeks now on a fresh machine with no docker cache, didn't think much of it and relied on the docker cache.

After losing the docker cache, I have been unable to rebuild my images at all. Stuck at copying path '/nix/store/[...]' from 'https://cache.nixos.org'... indefinitely.

Pinning my tags to FROM nixos/nix:2.20.0 fixed it. Will debug more later.

@Shados
Copy link
Member

Shados commented Aug 7, 2024

Saw this on a system today. I think in this particular instance it was triggered through some odd interaction with one remote builder being unavailable; removing it from the builders list, kill -9'ing the stuck nix-daemon processes and restarting the daemon overall seemed to fix the issue. Kinda surprised that systemctl stop nix-daemon.service reported success despite there being zombie nix-daemon processes still running -- shouldn't systemd have been able to pick up on those through its cgroup shenanigans?

@ocharles
Copy link
Contributor

I just ran into this. The fix for me was to kill all stuck nix-store processes, then systemctl restart nix-daemon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests