Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undeprecate networking.useDHCP #75515

Closed
bjornfor opened this issue Dec 11, 2019 · 26 comments
Closed

Undeprecate networking.useDHCP #75515

bjornfor opened this issue Dec 11, 2019 · 26 comments

Comments

@bjornfor
Copy link
Contributor

bjornfor commented Dec 11, 2019

Describe the bug
Using networking.useDHCP is deprecated since e862dd6.

But:

  1. Why is networking.useDHCP discouraged/deprecated?
  2. Why does using networkd mean we can no longer have a global default for whether or not to use DHCP?

It seems like a step back to me, having to list machine specific network interfaces in configuration.nix
instead of being able to say "use DHCP (or not) for any interface you see". Also ref. #73595.

I think/hope networking.useDHCP = true can be mapped to networkd with something like this (untested):

# /etc/systemd/network/10-nixos-dhcp.network
[Match]
Name=*

[Network]
DHCP=yes

CC @globin.

@bjornfor bjornfor changed the title Undeprecated networking.useDHCP Undeprecate networking.useDHCP Dec 11, 2019
@bjornfor
Copy link
Contributor Author

@florianjacob: Why the thumbs down? I'm curious why using networking.useDHCP is suddenly a bad idea. Because of networkd somehow? Please explain.

@florianjacob
Copy link
Contributor

About a year ago I banged my head against global useDHCP and related global options which caused alot of problems with networkd, and are / were enforced through a 99-main.network which matches all interfaces like your code snippet. Can't explain the problems on the hoof anymore though, as I disabled all of that stuff since then and manually configure systemd-networkd directly without networking.interfaces as it works / worked so bad with systemd-networkd. The deprecation itself is exactly because the 99-main.network catchall is removed in 20.03:

The <literal>99-main.network</literal> file was removed. Maching all

If I remember correctly, one main cause is the fact that systemd-networkd does only apply the first network file that matches and ignores all others, which doesn't harmonize with how the networking.interfaces module / global options are designed.

(Thumbs down just because I remember it's a good idea not to have that, thumbs up for documenting and explaining why that decision was made and what the problems were.)

@bjornfor
Copy link
Contributor Author

@florianjacob: Thank you. I read the linked issues and have a better understanding of the problem now.

If some interfaces must be excluded from networkd control, and a whitelist is preferred, how about this:

networking.useDHCP = [ "en*" "wl*" ];

This could be the new default and have low priority so that per interface settings win. This should be machine agnostic AFAICT.

@bjornfor
Copy link
Contributor Author

Well, if the above whitelist works, there is actually no need to change the API: useDHCP must simply change from matching all interfaces to the ones starting with "en*" and "wl*".

@mkg20001
Copy link
Member

IMO there should be a way to override the whitelist, while the boolean true value will make it use the default whitelist a list could override it

Also, some machines have eth0, eth1... renamed interfaces, so just relying on "en*" might not work, it should be "e*"

@bjornfor
Copy link
Contributor Author

I remember now I even have a setup with "wan0" and "lan0" interfaces (via udev rule), so even "e*" is not correct/enough.

Is there a way to match real hardware interfaces?

@mkg20001
Copy link
Member

In network manager at least there seems to be the "hw" flag, but I suppose with the right values from /proc/ this flag could be recreated

The only problem would be that it'll likely have to happen on runtime, since the config could get built on any machine

screenshot

@bjornfor
Copy link
Contributor Author

How is the live CD going to be configured without a global networking.useDHCP? One cannot know the names of the network interfaces beforehand, so there must be some wildcard/global match in NixOS somewhere. If the live CD can be made to work with networkd (I guess that's the plan), surely we can make the installed NixOS too?

@Moredread
Copy link
Contributor

@bjornfor wouldn't a "*" work for the live iso?

@bjornfor
Copy link
Contributor Author

I think the point was to not match certain interfaces, like the loopback interface ('lo'). But I didn't pick up all the details of the above linked issues, so I could be wrong.

@bjornfor bjornfor added this to the 20.03 milestone Dec 29, 2019
@fpletz fpletz self-assigned this Jan 7, 2020
@fpletz
Copy link
Member

fpletz commented Jan 7, 2020

So there are two related problems at play here.

Networkd wants to configure all interfaces it's configured to manage

If there is no carrier or no DHCP response, the interfaces will stay in the configuring state and will delay network-online.target until either all interfaces are configured or a timeout is reached. Then networking-online.target fails and thus all services depending on it, even if one link has a configured and working connection.

This is not something we want to happen to new users or on NixOS upgrades that might switch to networkd by default.

Note that most other distributions I'm familiar with don't do DHCP on all interfaces by default but have their installer generate a sensible networking config by some kind of autodetection and asking the user. We're just using dhcpcd cleverly. I think this behaviour only makes sense on install mediums where one cannot assume anything of the target. But on install mediums, IMHO, we should rather let the the user just use network-manager and nm-tui for ad-hoc configuration.

Specifically, I think the networking configuration should rather be a conscious decision by the user. Either statically via configuration or dynamically via tools like NetworkManager. Users can still configure dhcpcd or networkd explicitly to run DHCP on whitelisted/blacklisted interfaces they deem useful for the job. I was also thinking of allowing wildcards/globs for networking.interfaces.<ifname> since both our DHCP clients would support it to simplify this.

There is a way though to exclude interfaces from the network-online.target status checks: Network units can set RequiredForOnline=false. But setting this for the catchall DHCP networks would also break network-online.target for services that really rely on a working internet connection on start.

After researching for my response here again, I noticed that systemd-networkd-wait-online now has an --any option which would result in the same behaviour we have for dhcpcd in principle. Except that if we couple static configurations with DHCP on all (other) interfaces, the one statically configured interface without a default route would activate network-online.target which is also not the behaviour we want, strictly speaking.

Also note that this way we have a kind of race condition with both dhcpcd and networkd anyway because acquiring an IP via DHCP on one interface does not necessarily mean we also get a default route that might come from another interface.

How to match all "uplink" interfaces?

Matching for en* wl* ww* would potentially be enough, but only if predictable interface names are enabled (see man systemd.net-naming-scheme). If predictable interfaces names are disabled, we cannot assume anything since the interfaces names are defined by the kernel/drivers could have names like usb0 for usb network cards.

Furthermore, udev exposes the DEVTYPE property which can be accessed via networkd units for matching via Type=. This would be ideal because we could match for ethernet and wifi cards individually. After looking at some hardware, this property is unfortunately not set on some hardware even though the interface is from a physical network card. Not sure if this is a kernel, driver or udev problem.

But: Even though we might have a sensible selector that works with predictable interface names enabled, we have not yet solved the first problem.

Conclusion

It was more sensible for us to remove networking.useDHCP because we aren't sure how to implement a correct solution in networkd via either config or code. Moreover, though our current implementation with dhcpcd is working well for most cases, it is also a source of trouble for others, and it has bugs. And it is enabled by default!

@bjornfor Does this explain our rationale in a way that makes sense to you? What do you think?

@globin
Copy link
Member

globin commented Jan 21, 2020

Closing, as there has been no further reaction and I think @fpletz comment is an adequate answer to the issue. Feel free to reopen if there are further questions!

@globin globin closed this as completed Jan 21, 2020
@bjornfor
Copy link
Contributor Author

@globin: My lack of response was mostly due to lack of time, not because I think this issue is not relevant anymore. In fact, I don't have a lot of time now either, so sorry for being brief.

@fpletz: Thank you for the detailed post. Here is my response, as an end user who doesn't know all the details:

It was more sensible for us to remove networking.useDHCP because we aren't sure how to implement a correct solution in networkd via either config or code.

That sounds like a perfectly good reason for why things are like they are with networkd, but IMHO not so much for deprecating networking.useDHCP. It sounds like there are issues with the networkd integration in NixOS (and some upstream projects?), not the idea of networking.useDHCP itself, and that networkd is not ready yet to be the default NixOS networking backend.

The move to networkd feels kind of rushed, ref. this issue and #73595.

@bjornfor bjornfor reopened this Jan 22, 2020
@fpletz fpletz removed this from the 20.03 milestone Jan 23, 2020
@fpletz
Copy link
Member

fpletz commented Jan 23, 2020

@bjornfor Sorry that I didn't make my point clear enough and that I was stressing the move to networkd too much.

networking.useDHCP should be removed because it's currently

  • buggy (see the edge cases I described)
  • does not what its documentation states ("Whether to use DHCP to obtain an IP address and other configuration for all network interfaces that are not manually configured.") because the dhcpcd blacklist will still be applied silently.

If you still disagree about the removal, please come up with a sensible implementation instead. We can then also use that logic with networkd.

@bjornfor
Copy link
Contributor Author

networking.useDHCP should be removed because it's currently

  • buggy (see the edge cases I described)

I only saw bugs / edge cases mentioned for the combination of networkd and networking.useDHCP. For networking.useDHCP alone, what's the problem?

  • does not what its documentation states ("Whether to use DHCP to obtain an IP address and other configuration for all network interfaces that are not manually configured.") because the dhcpcd blacklist will still be applied silently.

Do you mean the networking.dhcpcd.denyInterfaces option + hardcoded list of ignored interfaces from nixos/modules/services/networking/dhcpcd.nix (lo peth* vif* tap* tun* virbr* vnet* vboxnet* sit*)? I guess I always assumed the option was about hardware interfaces, so I don't feel bad when now seeing that list of blacklisted interfaces. We can add in the word "hardware" before "network interfaces" too, to make the docstring more accurate. Does the current implementation cause any problems?

@edolstra
Copy link
Member

I don't see a reason to remove networking.useDHCP. It's an "abstract" option not tied to any particular implementation. Whether it enables dhcpcd or systemd's DHCP client is an implementation detail.

@bjornfor
Copy link
Contributor Author

When #73595 gets fixed, I guess the plan is to run nixos-generate-config when adding/removing network interfaces? (Well, not my plan, but it seems we're heading that way.)

I tried nixos-generate-config on my machine and got this:

  networking.useDHCP = false;
  networking.interfaces.docker0.useDHCP = true;   # wrong
  networking.interfaces.enp2s0.useDHCP = true;
  networking.interfaces.tun0.useDHCP = true;      # wrong
  networking.interfaces.vboxnet0.useDHCP = true;  # wrong
  networking.interfaces.wlp3s0.useDHCP = true;

So the thinking is that networking.useDHCP should be removed because it has a (hidden) blacklist, whereas without a blacklist you get that behaviour like above? I don't think that's an improvement.

@davidak
Copy link
Member

davidak commented Apr 9, 2020

If there is no carrier or no DHCP response, the interfaces will stay in the configuring state and will delay network-online.target until either all interfaces are configured or a timeout is reached. Then networking-online.target fails and thus all services depending on it, even if one link has a configured and working connection.

@fpletz isn't the normal behavior that the system tries to get an IP via DHCP and when it don't get one, assign itself a link local address?

Link local addresses allow machines to automatically have an IP address on a network if they haven't been manually configured or automatically configured by a special server on the network (DHCP). Before an address is chosen from that range, the machine sends out a special message (using ARP which stands for address resolution protocol) to the machines on the network around it (assuming that they also haven't been assigned an address manually or automatically) to find out if 169.254.1.1 is free. If it is, then the machine assigns that address to its network card. If that address is already in use by another machine on the same network, then it tries the next IP 169.254.1.2 and so on, until it finds a free address.

Source: https://serverfault.com/a/118329

So, can we get that behavior with networkd?

I'm always for sane defaults. Do what the user expects. So we can implement a blacklist logic for interfaces that are configured automatically by other programs, like docker0, tun0, vboxnet0.

Someone can ask systemd if the features we need are supported now, or if they will implement them ever? With that information, we can make an informed decision how to proceed here to finish the release.

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/networking-usenetworkd-and-usedhcp/4352/2

bjornfor referenced this issue Apr 29, 2020
…s to bridges by default

This is an backward incompatible change from upstream dhcpcd [0], as
this could have easily locked me out of my box.

As dhcpcd doesn't allow to use only a blacklist (denyinterfaces in
dhcpcd.conf) of devices and use all remaining devices, while explicitly
allowing some interfaces like bridges, I think the best option would be
to not change anything about it and just educate the users here about
that edge case and how to solve it.

[0] https://roy.marples.name/archives/dhcpcd-discuss/0002621.html

(cherry picked from commit eeeb2bf)
@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/networking-usenetworkd-and-usedhcp/4352/3

@Ericson2314
Copy link
Member

@fpletz When you mention network manager, are you saying that the global useDHCP = true isn't needed when network manager is used?

@nh2
Copy link
Contributor

nh2 commented Jan 14, 2021

Another possible issue to consider:

#109389 (comment) (Using Docker on AWS EC2 breaks EC2 metadata route because of DHCP)

@veprbl veprbl removed this from the 20.03 milestone May 31, 2021
@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/persistent-network-interfaces-nixos-and-usb-wifi-dongles-how-can-i-get-all-three-to-play-nicely/13587/3

@KizzyCode
Copy link

KizzyCode commented Feb 2, 2022

There are two problems for me with deprecating (and subsequently removing) networking.useDHCP:

  • networkd does not work within initrd which is required for remote unlocking of LUKS/ZFS (at least I was unable to make it work using the builtin options 😅)
  • I've had bad experiences with DHCP-auto-configuration for newly added network interfaces (e.g. if you have a headless server and replace the network card).

IMO there is no full replacement for networking.useDHCP yet, therefore I also disagree with it's deprecation (even if I understand most of the reasons).
Maybe as a compromise: Set the default value for networking.useDHCP to false and add a warning about oddments and quirks, especially when used with networkd?

Specifically, I think the networking configuration should rather be a conscious decision by the user.

Well I see the point but I don't fully agree – the basic idea behind DHCP is zero-config, so it should be possible to fully opt-in to auto-configuration for all physical interfaces. Then, if I add a new ethernet-card/WiFi-dongle, it should have full auto-configuration (even within initrd if networking is enabled there). And if I remove/unplug the interfaces, they should be "deconfigured" automatically.

AFAIK this does not yet work reliably with networkd (see all the complaints about USB-ethernet or WiFi dongles not connecting out-of-box or unplugged dongles blocking network-dependent services on boot).

@bjornfor
Copy link
Contributor Author

Looks like this issue is about to be solved: #167327

@bjornfor
Copy link
Contributor Author

bjornfor commented May 7, 2022

Thank you, @lheckemann! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests