Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS broke for most programs after recent update on the nixos-unstable channel #107537

Closed
poscat0x04 opened this issue Dec 24, 2020 · 14 comments · Fixed by #107572
Closed

DNS broke for most programs after recent update on the nixos-unstable channel #107537

poscat0x04 opened this issue Dec 24, 2020 · 14 comments · Fixed by #107572
Labels
0.kind: bug Something is broken 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS

Comments

@poscat0x04
Copy link
Contributor

Describe the bug
Most of my programs is unable resolve DNS after I updated my system to this commit (from the last time nixos-ustables was updated which is approximately 14 days ago). This includes curl, chrome, resolvectl and nix which basically makes my system unusable. One really weird exception is the drill tool which is actually able to query both a remote server (such as 8.8.8.8) and the local systemd-resolved server to resolve DNS (which indicates that this is probably not the fault of systemd). There's also a error message in systemd-resolved's log that said "Failed to escape hostname: Invalid argument". I also tried to edit my /etc/resolv.conf file to change the nameserver to 8.8.8.8 and that didn't change anything.

here are some examples of the errors returned by those programs:

$ resolvectl query google.com
resolve call failed: Operation not supported

To Reproduce
I don't really know how to reproduct this issue

Expected behavior
DNS should be working properley on all programs

Additional context
Add any other context about the problem here.

Notify maintainers

Metadata
Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

Maintainer information:

# a list of nixpkgs attributes affected by the problem
attribute:
# a list of nixos modules affected by the problem
module:
@poscat0x04 poscat0x04 added the 0.kind: bug Something is broken label Dec 24, 2020
@Emantor
Copy link
Member

Emantor commented Dec 24, 2020

Programs which are broken are using the libc calls to resolve the hostname, looks like the nss resolve module from systemd-resolved can't resolve the hostname either. This looks to be some kind of failure within systemd-resolved, since systemd-resolve on the CLI fails with Operation not permitted. We should also open an upstream bug unless we find some evidence that this is caused by NixOS.

@veprbl veprbl added the 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS label Dec 24, 2020
@tfmoraes
Copy link
Contributor

This bug affects me too. Same error and same systemd-resolved logs.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/unknown-network-problem/10669/2

@ghost
Copy link

ghost commented Dec 24, 2020

Same bug on my end.

@flokli flokli mentioned this issue Dec 24, 2020
10 tasks
@flokli
Copy link
Contributor

flokli commented Dec 24, 2020

I think I figured out what's the problem here. I opened an upstream issue: systemd/systemd#18078

Can you try if the following patch works around the issue?

From 7e5a4e90f8e3048fa2c054b4451e89e296030053 Mon Sep 17 00:00:00 2001
From: Florian Klink <flokli@flokli.de>
Date: Fri, 25 Dec 2020 00:13:53 +0100
Subject: [PATCH] nixos/systemd: provide libidn2 for systemd-resolved

---
 nixos/modules/system/boot/resolved.nix | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/nixos/modules/system/boot/resolved.nix b/nixos/modules/system/boot/resolved.nix
index 84bc9b78076..7fe8f4dfb7e 100644
--- a/nixos/modules/system/boot/resolved.nix
+++ b/nixos/modules/system/boot/resolved.nix
@@ -1,4 +1,4 @@
-{ config, lib, ... }:
+{ config, pkgs, lib, ... }:
 
 with lib;
 let
@@ -150,6 +150,9 @@ in
       wantedBy = [ "multi-user.target" ];
       aliases = [ "dbus-org.freedesktop.resolve1.service" ];
       restartTriggers = [ config.environment.etc."systemd/resolved.conf".source ];
+      # Upstream bug: https://github.com/systemd/systemd/issues/18078
+      # systemd-resolved without libidn2 is broken
+      environment.LD_LIBRARY_PATH = "${lib.getLib pkgs.libidn2}/lib";
     };
 
     environment.etc = {
-- 
2.29.2

@poscat0x04
Copy link
Contributor Author

@flokli that worked! (though I have to manually edit the service file since automatic GC kicked in and I cannot rebuild my system:facepalm:)

Also, I feel like there should be a basic network connectivity test to prevent bugs like this from being merged into the unstable channel.

@xaverdh
Copy link
Contributor

xaverdh commented Dec 25, 2020

@flokli that worked! (though I have to manually edit the service file since automatic GC kicked in and I cannot rebuild my systemfacepalm)

Since we now have a workaround, maybe we should pin this issue, so ppl with broken DNS can find it easily?

And thanks for the quick action @flokli !

Also, I feel like there should be a basic network connectivity test to prevent bugs like this from being merged into the unstable channel.

I agree, but I thought we actually have them for unstable?

@jpotier
Copy link
Contributor

jpotier commented Dec 25, 2020

I've added the following to my configuration.nix:

{ config, pkgs, lib, ...}:
{
#...

  # 2020-12-25 Bug in systemd-resolved, workaround:
  systemd.services.systemd-resolved.environment = with lib; {
    LD_LIBRARY_PATH = "${getLib pkgs.libidn2}/lib";
  };

}

and I can confirm systemd-resolved works again.

Now, how do you get to rebuild your system if like @poscat0x04 you don't have a working system to rollback to? I did (as root):

systemctl stop systemd-resolved.service
echo "nameserver 1.1.1.1" > /etc/resolv.conf
nixos-rebuild test

to force using that file in order to be able to run nixos-rebuild.

@poscat0x04
Copy link
Contributor Author

aw man I did not know that if you stop systemd-resolved it will resolve normally. I removed the symlink /etc/systemd and created a new directory there, copied everything from /etc/static/systemd to there, manually edited systemd-resolved.service, and ran systemctl daemon-reload and after that restarted systemd-resolved.

@jopiter's method is definitely more superior.

@vcunat
Copy link
Member

vcunat commented Dec 25, 2020

So... for now we push this workaround to nixpkgs master?

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/unknown-network-problem/10669/3

@jpotier
Copy link
Contributor

jpotier commented Dec 25, 2020

So... for now we push this workaround to nixpkgs master?

I'd say yes. Especially if there's no quick upstream fix in perspective. Hopefully this problem only affects users following unstable.

@vcunat
Copy link
Member

vcunat commented Dec 25, 2020

Stable doesn't get systemd updates, so it shouldn't be affected (based on flokli's analysis).

flokli added a commit to flokli/nixpkgs that referenced this issue Dec 25, 2020
systemd started using dlopen() for some of their "optional"
dependencies.

Apparently, `libidn2` isn't so optional, and systemd-resolved doesn't
work without libidn2 present, breaking DNS resolution.

Fixes NixOS#107537

Upstream bug: systemd/systemd#18078
@flokli
Copy link
Contributor

flokli commented Dec 25, 2020

PR at #107572.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.kind: bug Something is broken 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants