Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boot problems on Debian #30

Open
vlada-dudr opened this issue Sep 22, 2017 · 22 comments
Open

Boot problems on Debian #30

vlada-dudr opened this issue Sep 22, 2017 · 22 comments

Comments

@vlada-dudr
Copy link

vlada-dudr commented Sep 22, 2017

Hello,

using ifupdown2 on Debian (Stretch and Jessie) I needed to add override to provided systemd service in order to pull in network.target. This is normally done by Debian's default networking service and nothing else do it. So some services (like ssh) crashing on binding, because they are dependent (After=, not Wants=) on network.target which is not started and therefore they are attempted to start before network is available.

Also the start up script doesn't wait for DAD, systemd starts services depending on usable interfaces right away and they fail to bind, because addresses are in tentative state. I solved it dirty at the moment: added ExecStartPost=/bin/sleep 2 to my overrides, but I love to see more elegant option. Maybe adding an interface option so ifup wait for DAD could do the trick.

PS. I basically created the drop-in after the Debian default networking.service file, but I can PR if you want.

@daveolson53
Copy link
Contributor

The problem here is that network.target isn't well defined in what it means. In systemd v213, recognizing this, systemd made /etc/init.d LSB services that depended on $network depend on network-online.target, rather than network.target.
I'm having trouble finding the reference, but it is a known issue that 'network.target' is not made active on debian jessie (systemd v215), and that therefore anything depending on network.target will start earlier than is desired.
In general, what most people want is for the service that configures networking to have completed. For ifupdown2 systems, this is networking.service, so you can either depend on networking.service (from ifupdown2) or on network-online.target.
You can either edit problematic unit files, or you can add extensions in /etc/systemd/system/SERVICENAME.service.d/netoverride.conf with something simliar to

[Unit]
After=networking-online.service

You can check if my theory is correct by running
systemctl status network.target

If it shows "inactive", then it was never reached, and systemd will not block any service that has
After=network.target
because systemd "knows" that network.target can't become active (and systemd will not block on dependencies that can't be reached with "After", only with "Requisite" or "Required").

I don't think this is fixable in ifupdown2 itself.

Also see:
https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/
for a long discussion on network.target, and that it's described as primarily for shutdown, not boot up.

@daveolson53
Copy link
Contributor

Oh, and for what's worth, the Cumulus Linux distribution changed almost all of the network-dependent services to depend on network-online.target

@vlada-dudr
Copy link
Author

As I have written, I managed to solve the issue by making networking.service pull in network.target, which seems to be the one debian services usually depend on. It is also more effective to pull network.target in, because then it is not necessary to fix other units depending on it rather then network-online.target. What you said means I should probably bug report to debain packaging team to make ifdown's and ifdown2's networking services consistent.

What you thing about the tentative addresses issue?

@daveolson53
Copy link
Contributor

daveolson53 commented Sep 24, 2017

What, exactly, do you mean by "pull in network.target"? For everything that I can think of, or find in the systemd docs, or several hours of googling, I'm unable to find anything I can do with or too networking.service that affects network.target at all. I've tried Wants, WantedBy, After (and Before), both network.target, and network-pre.target (and others as well).

Debian jessie, both systemd v215, and the v230 backports.

On your "tentative state" question, I don't know what that means to you. I do recommend that for anything that needs networking to actually be functional, you using network-online.target (as does the systemd discussion that I mentioned).

@julienfortin
Copy link
Contributor

julienfortin commented Jan 22, 2018

Fix from @vlada-dudr

I managed to solve the issue by making networking.service pull in network.target,
which seems to be the one debian services usually depend on. 
It is also more effective to pull network.target in, because then it is not necessary to fix other units depending on it rather then network-online.target.

@daveolson53
Copy link
Contributor

daveolson53 commented Jan 22, 2018 via email

@julienfortin
Copy link
Contributor

julienfortin commented Jan 22, 2018 via email

@vlada-dudr
Copy link
Author

Hi,

I found solution by crating drop-in for systemd service which:

  1. has to want network.target
  2. sleep 3 seconds after starting ifupdown2

Sleeping is necessary as ifupdown2 doesn't wait for DAD, so without it services binding to IPv6 addresses fail and die with "cannot bind to socket".

Problem with Debian is, it is not very consistent with what depends on network.target and what on network-online.target. Some services even don't have such dependencies (like munin-node, which crashed without correct drop-in). Still there is going to be problem with tentative IPv6 as network-online.target can be reached too early. I think there are two possible solutions:
either make ifup synchronously wait until all addresses are usable, or make a way to signal systemd that can proceed with other networking targets.

@roopa-prabhu roopa-prabhu reopened this Jan 24, 2018
@roopa-prabhu
Copy link
Contributor

Thanks for the details. We are looking to see if we can detect the debian version and create the override ourselves during ifupdown2 install.

Also looking at the DAD issue: What you suggest seem to be good options. we can make it conditional on if DAD is enabled on an interface. also exploring other options.

@daveolson53
Copy link
Contributor

Vlada-dudr, do you mean "WantedBy = network.target", or "Wants = network.target"? I'm assuming you mean WantedBy, since Wants has the wrong semantic.

Sleeping after starting a service is OK as a short-term workaround, but it's not a viable solution, it will break randomly. We'll have to figure out something in ifupdown2 or some related service.

Basicly, DAD working isn't really part of networking startup, just as DHCP on ethernet isn't part of it.

That is, both are part of networking-online.target; that is, that the network is (at least minimally) usable, as opposed to configured.

Since networking-online.target has to be "after" network.target, we can't just make the dependency work the other way, we'll have to add another oneshot service or equivalent that is After=networking.service and WantedBy=network.target that will wait for dhcp (if configured) and DAD (if configured). That is, we end up defining network.target to be functionally the same as networking-online.target. That's rather ugly, but if dependencies are wrong, I think it's about all that we can do.

@vlada-dudr
Copy link
Author

vlada-dudr commented Jan 24, 2018

It is Wants=, because by default it is networking.service from ifupdown (the default one), which starts network.target, just check out debian default networking.service file. So debian dependencies go like this:

networking.service is WantedBy=multi-user.target and wants=network.target

this means that enabling networking.service makes the whole thing running. In case of ifupdown2 it skips network.target, thus services having WantedBy=network.target doesn't start. Those services having something like After=network.target and WantedBy=mutli-user.target (which is common setup in Debian) just crash, as network is not ready, because systemd has no need to schedule them correctly.

Using DHCP means that you should not rely on which address you get, therefore is not clever to bind on expected ip address one gets. DAD on the other hand is needed by IPv6 and it is by default performed every time one assign an address. It means that it is part of assigning and should be waited for by configuration tool.

Put it in simple way, having network configuration tool, I expect having usable network after telling it "configure my network". Running it during boot I expect that it has mechanism to keep boot process predictable. Now we have situation in which setting IPv6 address is not predictable as ifupdown2 does exit sucessfully before network is usable and it has no mechanism to announce network readiness .

Best solution from my point of view is (and I guess is simplest to implement):

  1. make ifup wait until DHCP and DAD finish
  2. make Debian default networking.service file more or less same as Debian default is

Doing 1) makes things easy: I can trust ifup to quit when I can use my network, for 2) I should maybe send bug issue to Debian package maintainers as it is their problem more then yours, I believe (if you are not maintaing them also).

@daveolson53
Copy link
Contributor

OK, thanks for letting us know what worked for you.

In my opinion, we can't have Wants=network.target for networking.service, because networking.service is what provides the network.target, and that creates a circular dependency.

I understand why you say that you want everything working when networking.service completes, but that's not the way the systemd team defined things (that's why they have a separate networking-online.target).

It's also not what happens with init.d or upstart-based /etc/init.d/network. When that script completes, eth0 (or whatever) is typically not yet up and running, if it uses dhcp (or wifi). I suspect, but don't know for sure, that the same is true for DAD, for the same reasons; it requires action from outside the system that is booting up (or that is having networking restarted).

So while 1 may be desirable, and at some level "possible", it's a significant change from the past, and from the systemd design.

Trying to change all of that longstanding history and design in ifupdown2 is not something that I think is a good idea.

No, the maintainer of ifupdown2 is not a maintainer for debian systemd or other networking-related packages. You should submit a debian bug if you want to change the behavior, but you'll need to have pretty strong reasons to change historical practice, as part of your bug, if it's going to be acted upon.

@vlada-dudr
Copy link
Author

In my opinion, we can't have Wants=network.target for networking.service, because networking.service is what provides the network.target, and that creates a circular dependency.

Not really, because only thing which starts network.target is networking.service, which I posted above.

I understand why you say that you want everything working when networking.service completes, but that's not the way the systemd team defined things (that's why they have a separate networking-online.target).

Debian seems to have different understanding of network.service and network-online.target, completely ignoring latter one in its config files.

It's also not what happens with init.d or upstart-based /etc/init.d/network. When that script completes, eth0 (or whatever) is typically not yet up and running, if it uses dhcp (or wifi). I suspect, but don't know for sure, that the same is true for DAD, for the same reasons; it requires action from outside the system that is booting up (or that is having networking restarted).

Just try running ifup from original ifupdown, you will see it waits for DAD and I believe for DHCP also (but I am not sure at moment).

So while 1 may be desirable, and at some level "possible", it's a significant change from the past, and from the systemd design.

Systemd way would be: running networking.service which start network.target and "notifier.service", then when network is ready notifier.service start network-online.target. As Debian seems to be ignoring network-online.target synchronous operation is need. Or it can be solved by having notifier starting both targets. Maybe one can use dbus some other mechanisms to start network-online.target, but I am not skilled in systemd to say.

Trying to change all of that longstanding history and design in ifupdown2 is not something that I think is a good idea.

At the moment ifupdown2 is not able to tell init system when network is ready, which makes services depending on this fact crashing.

No, the maintainer of ifupdown2 is not a maintainer for debian systemd or other networking-related packages. You should submit a debian bug if you want to change the behavior, but you'll need to have pretty strong reasons to change historical practice, as part of your bug, if it's going to be acted upon.

Systemd service file is part of ifupdown2 package, so I meant ifupdown2 package maintainers. The point is, that at the moment ifupdown2 is not drop-in replacement for ifupdown as it behaves different on boot.

@vlada-dudr
Copy link
Author

By the way it is reason why NetworkManager supplies NetworkManager-wait-online.service...

@vlada-dudr
Copy link
Author

vlada-dudr commented Jan 25, 2018

Doing quick reasearch both systemd-networkd and NetworkManager services have Wants=network.target. (On Arch)

@daveolson53
Copy link
Contributor

You have a misunderstanding of systemd and services vs targets, I think. targets don't start anything. They are a milestone, a progress point.

network.target is not "started", by anything. It's a progress report that a certain point in the boot sequence has been reached.

NetworkManager Wants=network.target because it wants networking to have started prior to NetworkManager.

On your other points, running 'ifup' is not the same as /etc/init.d/networking (old style ifupdown) at boot.

We agree that you are seeing differences between ifupdown and ifupdown2, and are trying to figure out how best to solve those issues for all the distros where it can be used. I'm not arguing about that, I'm just trying to clarify the issues and environment that we are dealing with.

And yes, of course the networking.service unit file is part of ifupdown2, so it's maintained by the same folks.

NetworkManager and it's wait-online service is one answer, but it can't be the only answer, because not everybody runs NetworkManager.

NetworkManager-wait-online.service is not waiting for networking to be up and active and all ports/NICs to be usable. It's waiting for NetworkManager to be usable (and it gives up after 30 seconds in the standard ubuntu 16.04 configuration). When everything comes up quickly, that can be almost the same thing as you want, but it's not guaranteed. I don't see a way to "guarantee" what you want

For example, do you block boot forever if you have a cabled ethernet connection and it's not connected, or the other end is down? No, you don't want that, for most use cases, I think we can all agree.

@vlada-dudr
Copy link
Author

Sorry, I just didn't find good word to describe target being pulled in and scheduled to reach, next time I will just use "pull in".

NetworkManager pulls network.target in, because it signals that network configuration tool started. For those services which don't bind (or are able to retry later) at specific addresses it is enough to start after this moment. For other there is network-online.target as the point after which they should start. It is very important to say, that allowing reaching network-online.target without having network online is wrong.

NetworkManager-wait-online.service waits until all interfaces return either succes or failure, that is how I understand man page of nm-online.

Of course, I expect timeouts. Even though blocking forever is same thing as broken service for some servers.

@daveolson53
Copy link
Contributor

NetworkManager does not "pull in" network.target. That's not what Wants does. But we are arguing over fine semantic details.

At least in ubuntu 16, the NetworkManager-wait-online.service does not wait for all ports. It waits for dynamic addressing on ipv4, which isn't quite the same thing. Again, a fine semantic difference, and also not relevant for ipupdown2.

The only thing that I can see us doing here, is to be somewhat equivalent to what ifupdown v1 did.

For both ubuntu 16 and debian stretch (ifupdown v1), the start action is 'ifup -q', and they added
WantedBy=network-online.target
to networking.service in the Install section, and
Wants=network.target
in the Unit section. The latter is what you are arguing for, Vlada-dudr, but I still believe that is incorrect from the systemd perspective.

In any case, we've agreed, we'll do our best to make this work correctly, but I don't think we can guarantee the "working interface" part. It's going to be a best effort, which means it is going to fail at least some of the time, probably with an explicit timeout value, so it can be tweaked.

@vlada-dudr
Copy link
Author

From systemd.unit man page:

Wants=
A weaker version of Requires=. Units listed in this option will be started if the configuring unit is. However, if the listed units fail to start or cannot be added to the transaction, this has no impact on the validity of the transaction as a whole. This is the recommended way to hook start-up of one unit to the start-up of another unit.

From https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/ :

Note that network.target is a passive unit: you cannot start it directly and it is not pulled in by any services that want to make use of the network. Instead, it is pulled in by the network management service itself. Services using the network should hence simply place an After=network.target dependency in their unit files, and avoid any Wants=network.target or even Requires=network.target.

@vlada-dudr
Copy link
Author

vlada-dudr commented Jan 26, 2018

me@archlinux~ $ systemctl cat NetworkManager.service
# /usr/lib/systemd/system/NetworkManager.service
[Unit]
Description=Network Manager
Documentation=man:NetworkManager(8)
Wants=network.target
After=network-pre.target dbus.service
Before=network.target

Same like in Ubuntu 16.04 package found.

@vlada-dudr
Copy link
Author

vlada-dudr commented Jan 26, 2018

Good! We got somewhere.

The simple thing is to make unit files behave same. It is very important.

Well, when network breaks, then it is expected that services and configuration will break also. Timeout is what I actually expect.

@kokel
Copy link
Contributor

kokel commented Feb 6, 2018

Hello, I have had problems, too. Some services relying on interfaces to be up on boot didn't come up.

In Debian Stretch ifupdownv1 solved this:

[Unit]
Description=Raise network interfaces
Documentation=man:interfaces(5)
DefaultDependencies=no
Wants=network.target
After=local-fs.target network-pre.target apparmor.service systemd-sysctl.service systemd-modules-load.service
Before=network.target shutdown.target network-online.target
Conflicts=shutdown.target

[Install]
WantedBy=multi-user.target
WantedBy=network-online.target

[Service]
Type=oneshot
EnvironmentFile=-/etc/default/networking
ExecStartPre=-/bin/sh -c '[ "$CONFIGURE_INTERFACES" != "no" ] && [ -n "$(ifquery --read-environment --list --exclude=lo)" ] && udevadm settle'
ExecStart=/sbin/ifup -a --read-environment
ExecStop=/sbin/ifdown -a --read-environment --exclude=lo
RemainAfterExit=true
TimeoutStartSec=5min

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants