Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable networkd by default #202488

Closed
wants to merge 11 commits into from

Conversation

lheckemann
Copy link
Member

@lheckemann lheckemann commented Nov 23, 2022

Description of changes

Enables networkd by default and adjusts docs and tests accordingly.

Things done
  • Make sure all the tests still work
    • fix eval on containers-restart-networking
    • Fix tests known to be broken by this PR
    • Work out which other tests break (the following are still unknown)
      • blocky
      • cage
      • cassandra_3_0
      • cassandra_3_11
      • containers-nested
      • engelsystem
      • enlightenment
      • geth
      • installed-tests.flatpak
      • kea
      • libvirtd
      • musescore
      • ndppd
      • plotinus
      • prosody
      • prosody-mysql
      • samba-wsdd
      • sourcehut
      • systemd-networkd-dhcpserver
      • systemd-networkd-ipv6-prefix-delegation
      • trafficserver
  • Check for any networking.* options that are ignored when using networkd
  • Talk about useDHCP and anyInterface some more -- do the defaults make sense?
  • Check whether things make sense when networkmanager is also enabled
  • Determine whether DNS server modules should warn when resolved is enabled, or even disable it themselves
  • Fits CONTRIBUTING.md.

This results in a more sensible behaviour for common setups, such as
workstations with both WiFi and Ethernet interfaces where only one
needs to be connected for online status to be reached -- which is
exactly the scenario useDHCP is for.

This also allows simplifying the config generated by the installer: we
don't need per-interface declarations at all anymore.
The example is outdated, as NixOS's networking options have supported
static IPv6 configuration for a long time now.

With networkd being enabled by default, networking.localCommands loses
further relevance and should only be used in extremely niche use cases
-- and certainly not encouraged for use cases as common as setting
IPv6 addresses.
Since networkd is now enabled by default, we no longer need to enable
it explicitly.
@@ -74,6 +74,7 @@ in
enable = true;
}
(mkIf cfg.useDHCP {
wait-online.anyInterface = lib.mkDefault true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I frequently see problems with wait-online timing out already because it tracks to many interfaces. Wouldn't this make this worse?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this does the exact opposite. Its just waits for any of the interfaces to come online.

@danielbarter
Copy link
Contributor

danielbarter commented Mar 16, 2023

super stoked for this!

I have been using the networkd backend for a while now and it is rock solid. Might be worth adding some docs about how networking.<interface> and systemd.network.networks.<interface> are related. I found this confusing at first. I would be happy to write something, unless there is any objection.

@lheckemann
Copy link
Member Author

@danielbarter yes, that would be great! If you want to pair at some point to get this moving again let me know, I'm @linus:schreibt.jetzt (you can find me in most of the big Nix-related channels) on Matrix :)

@danielbarter
Copy link
Contributor

👍. Should have some time next weekend.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/california-bay-area-meetup-nix-20th/26553/3

@infinisil infinisil added the significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc. label Apr 19, 2023
@onny
Copy link
Contributor

onny commented Jun 18, 2023

Really looking forward to have this in 23.11. Had much better experience with networking.useNetworkd = true; regarding IPv6 support

@mweinelt
Copy link
Member

The tests for kea and networkd-prefix-delegation rely on networkd only, so I expect them to be fine.

@@ -7,6 +7,9 @@ let
{ ... }:
{ services.cjdns.enable = true;

# Occupies port 53 otherwise
services.resolved.enable = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CJDNS is actually unrelated to DNS (at least in a direct sense), this shouldn't be necesarry

@Majiir
Copy link
Contributor

Majiir commented Oct 7, 2023

I'm also excited for networkd, but to temper the enthusiasm above: I have been trying to switch my systems to networkd for about a week now and haven't yet succeeded. I will keep opening PRs and issues as I discover details, but to briefly list what I've been running into:

  • If I switch or test from a scripted configuration to one with useNetworkd, I lose all interface connectivity on the target host (for some hosts with non-trivial network configs). This can happen even when a clean boot into the new config works. I haven't yet pinned down the root cause, but I found that some permutation of enabling resolved or systemd.network.enable (without useNetworkd) triggered the issue. It might also have something to do with dhcpcd. Why is this an issue? Because it makes switching more difficult to pull off on systems where lots of rebooting creates too much downtime (like a router or server).

  • I found some of my networks were configured nonsensically with useNetworkd. See nixos/network-interfaces-systemd: don't set network-level domains #258677 and nixos/network-interfaces-systemd: support and require defaultGateway.interface #258695 for fixes.

  • I ran into (still undiagnosed) issues with NixOS container connectivity when enabling useNetworkd.

  • The default services enabled with useNetworkd can clash with an existing configuration in unexpected ways. For example, enabling useNetworkd on my router broke DNS on my network because the router runs dnsmasq, which conflicted with the ports bound by the now-enabled-by-default resolved.

Some of these are issues we can fix in nixpkgs, while others are footguns that users should be made aware of prominently in the release notes.

Since there could be myriad problems for existing users, would it make sense to start by enabling useNetworkd for new installations (through a stateVersion check and/or default config) and see what happens with a broader audience? I suspect most current useNetworkd users are competent early adopters.

@lheckemann
Copy link
Member Author

switching doesn't work: this is a known issue and I don't think it's something we should invest much effort into improving (though it might be worth detecting this and cancelling a switch). Systems should be rebooted on a regular basis to ensure they're running recent kernels anyway, and upgrading to a new release is generally expected to require a reboot.

silly configuration: some good catches there, thanks!

containers: maybe @Ma27 knows something relevant?

services on port 53: these problems are exposed by many of the tests, which is why I put a big checklist in the description and some discussion on here about what the best way to deal with that is.

@Majiir
Copy link
Contributor

Majiir commented Oct 7, 2023

switching doesn't work: this is a known issue and I don't think it's something we should invest much effort into improving (though it might be worth detecting this and cancelling a switch).

Yes, detecting and cancelling would help if there aren't plans to support it. It's not only an issue for release (where as you say, a reboot is expected) but also in case users want to make the switch sometime before or after the release. The issue with a required reboot isn't one reboot, but the potentially many reboots and loss-of-connectivity events that come with troubleshooting the switch to networkd.

services on port 53: these problems are exposed by many of the tests, which is why I put a big checklist in the description and some discussion on here about what the best way to deal with that is.

👍 I don't see a dnsmasq test there but I can find or make one. It seems adding --bind-interfaces to dnsmasq options is the generally accepted way to make it play nicely with resolved (and I found this to work).

@Majiir
Copy link
Contributor

Majiir commented Oct 7, 2023

@Majiir
Copy link
Contributor

Majiir commented Oct 7, 2023

Failing test for dnsmasq+resolved: #259644

@Majiir
Copy link
Contributor

Majiir commented Oct 8, 2023

Ohh, here's a fun one! I switched a machine to networkd and its DHCP-assigned IPv4 address changed even though this machine's MAC address has a static address reservation configured on the DHCP server (Kea). This happened because the DHCP Client ID changed when switching to networkd. I don't think we can do anything to fix this, but we should mention it in the upgrade instructions when networkd becomes the default.


[EDIT to avoid notification spam] I think my container issues are caused by enabling systemd-resolved on the host. These are declarative containers with the default networking.useHostResolvConf value (true). Whereas resolv.conf previously pointed to nameservers on the network that the container could reach, it now points at 127.0.0.53, but resolved is not running in the container, it's running on the host.

@Ma27
Copy link
Member

Ma27 commented Oct 9, 2023

re containers: if it's about DNS, you'll most likely need sth. like networking.useHostResolvConf = !hostConfig.networking.useNetworkd;.

"issues with NixOS container connectivity" is rather vague, so that's just a wild guess.

Even the existing container subsystem works surprisingly well with networkd I discovered over the last months.

EDIT: OK, seems as if the issue was found already. The message appeared right after I submitted, thanks GitHub 🙃

@Ma27
Copy link
Member

Ma27 commented Oct 23, 2023

Ah I thought I had sent this comment last week, but apparently I didn't. Anyways, here we go:

Fix tests known to be broken by this PR

Haven't looked through all, but e.g. openssh is only broken because of the underscores in the hostnames. I totally forgot about this, but given that resolved prohibits this, we should drop it entirely from the testing framework.

@lheckemann
Copy link
Member Author

Superseded by #264967

@lheckemann lheckemann closed this Nov 2, 2023
caspervk pushed a commit to caspervk/nixos that referenced this pull request Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.