Skip to content
This repository has been archived by the owner on Oct 16, 2020. It is now read-only.

Unit with After=network-online.target starts up before an IP is on the interface #1966

Closed
MohammadKarimi23 opened this issue May 14, 2017 · 4 comments

Comments

@MohammadKarimi23
Copy link

Bug

Container Linux Version

NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1298.7.0
VERSION_ID=1298.7.0
BUILD_ID=2017-03-31-0215
PRETTY_NAME="Container Linux by CoreOS 1298.7.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues

Environment

HP gen9 server

Bug Description

I'm using CoreOS matchbox for network boot and provisioning my private cloud.
matchbox uses systemd units for booting the system and I'm following matchbod examples at https://github.com/coreos/matchbox/blob/v0.6.0/examples/ignition/install-reboot.yaml

installer.service contains Requires=network-online.target and After=network-online.target but the service fails to curl the ignition file needed for installer and returns 7/failed_to_connect error, but when I curl the file manually after boot time, the curl is successful

@crawford
Copy link
Contributor

The underlying issue is that network-online.target isn't a silver bullet for detecting if the network is up. It's intended to be used with legacy applications that don't handle network changes properly. This example should be smart enough to retry the network operation and therefore drop the network online dependency.

@MohammadKarimi23
Copy link
Author

I actually solved the problem using until ping command as 'ExecStartPre' like below

[Service]
        ExecStartPre=/bin/sh -c 'until ping -c1 google.com; do sleep 1; done;'
        ExecStartPre=/usr/bin/mkdir -p /etc/kubernetes/ssl
        ExecStart=/usr/bin/bash -c "[ -f /etc/kubernetes/ssl/%i ] ||  curl {{.k8s_cert_endpoint}}/tls/%i -o /etc/kubernetes/ssl/%i"

but I appreciate if you show me a way to do that using systemd units

@crawford
Copy link
Contributor

That seems like a fine solution to me.

@euank
Copy link
Contributor

euank commented Jun 30, 2017

I opened poseidon/matchbox#596 to hopefully fix this. I think discussion can be moved there.

@euank euank closed this as completed Jun 30, 2017
kthommandra added a commit to aristanetworks/telegraf that referenced this issue Aug 30, 2018
When telegraf starts with localhost.localdomain as the hostname,
it sticks with that hostname even after the hostname changes later on.
This happens mostly during server startup. Posting points with
localhost.localdomain messes up lot of monitoring aspects.

On systemd-networkd based systems, the following link
explains how to make a systemd service start after the interface
is fully setup. (refer
https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/)
But using repeated experiments it was confirmed that those steps
do not work reliably. There are reports from other users about
network-online.target not behaving as expected. See
https://community.getchannels.com/t/wait-for-networking-on-reboot-ubuntu-network-online-target/936

So we are using a simple and yet gauranteed workaround suggested in
coreos/bugs#1966

Repeated testing on ce114 confirmed that this method is completely reliable.

This issue is avoided in dhclient based systems since we rely on
dhclient helper to delay start telegraf. However, we could use the
same change on dhclient based systems too if needed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants