New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

installation (ignition) has a race with systemd-networkd , starts to early #2532

Open
DirkTheDaring opened this Issue Dec 9, 2018 · 3 comments

Comments

Projects
None yet
4 participants
@DirkTheDaring
Copy link

DirkTheDaring commented Dec 9, 2018

Issue Report

cannot load ignition configuration from a matchbox service due to a race of ignition and network devices
You can see the race in the log, in the following dump. This is a setup using some slower older machines, within a VMware

Bug

Dec 09 13:52:54 localhost kernel: audit: type=1130 audit(1544363573.877:7): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-networkd comm="systemd" exe="/usr/lib64/systemd/systemd" hostname=? addr=? terminal=? res=success'
Dec 09 13:52:53 localhost audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-networkd comm="systemd" exe="/usr/lib64/systemd/systemd" hostname=? addr=? terminal=? res=success'
Dec 09 13:52:53 localhost systemd-networkd[230]: Enumeration completed
Dec 09 13:52:54 localhost ignition[266]: GET error: Get https://matchbox.fritz.box/ignition?uuid=efbb4d56-4198-4819-ba75-8e059437150f&mac=00-0c-29-37-15-0f&spin=worker1&channel=stable&arch=amd64-usr&flavor=kubespray&domain=fritz.box&stage=boot: dial tcp: lookup matchbox.fritz.box on 192.168.178.250:53: dial udp 192.168.178.250:53: connect: network is unreachable
Dec 09 13:52:54 localhost ignition[266]: GET error: Get https://matchbox.fritz.box/ignition?uuid=efbb4d56-4198-4819-ba75-8e059437150f&mac=00-0c-29-37-15-0f&spin=worker1&channel=stable&arch=amd64-usr&flavor=kubespray&domain=fritz.box&stage=boot: dial tcp: lookup matchbox.fritz.box on 192.168.178.250:53: dial udp 192.168.178.250:53: connect: network is unreachable
Dec 09 13:52:54 localhost ignition[266]: GET error: Get https://matchbox.fritz.box/ignition?uuid=efbb4d56-4198-4819-ba75-8e059437150f&mac=00-0c-29-37-15-0f&spin=worker1&channel=stable&arch=amd64-usr&flavor=kubespray&domain=fritz.box&stage=boot: dial tcp: lookup matchbox.fritz.box on 192.168.178.250:53: dial udp 192.168.178.250:53: connect: network is unreachable
Dec 09 13:52:54 localhost systemd-networkd[230]: lo: Configured
Dec 09 13:52:55 localhost systemd-networkd[230]: eth0: IPv6 successfully enabled
Dec 09 13:52:55 localhost ignition[266]: GET error: Get https://matchbox.fritz.box/ignition?uuid=efbb4d56-4198-4819-ba75-8e059437150f&mac=00-0c-29-37-15-0f&spin=worker1&channel=stable&arch=amd64-usr&flavor=kubespray&domain=fritz.box&stage=boot: dial tcp: lookup matchbox.fritz.box on 192.168.178.250:53: dial udp 192.168.178.250:53: connect: network is unreachable
Dec 09 13:52:57 localhost ignition[266]: GET error: Get https://matchbox.fritz.box/ignition?uuid=efbb4d56-4198-4819-ba75-8e059437150f&mac=00-0c-29-37-15-0f&spin=worker1&channel=stable&arch=amd64-usr&flavor=kubespray&domain=fritz.box&stage=boot: dial tcp: lookup matchbox.fritz.box on 192.168.178.250:53: dial udp 192.168.178.250:53: connect: network is unreachable
Dec 09 13:52:57 localhost systemd-networkd[230]: eth0: Gained carrier
Dec 09 13:52:57 localhost systemd-networkd[230]: eth0: DHCPv4 address 192.168.178.49/24 via 192.168.178.1
Dec 09 13:52:58 localhost systemd-networkd[230]: eth0: Gained IPv6LL
Dec 09 13:53:10 localhost systemd-networkd[230]: eth0: Configured
Dec 09 13:53:54 localhost systemd-networkd[230]: lo: Lost carrier
Dec 09 13:53:54 localhost systemd-networkd[230]: eth0: Lost carrier

Container Linux Version

VERSION=1911.4.0

Environment

What hardware/cloud provider/hypervisor is being used to run Container Linux?
VMWare
2GB
IPV4 network

Expected Behavior

Load coreos.config.url as before and install machine

Actual Behavior

Fails, because the network devices are some seconds later up, so that ignition which is trying to load from url fails

Reproduction Steps

  1. . Get a 2 GB VMware, try to pxe boot coreos with a coreos.config.url pointing to a service
  2. ...

Other Information

@dm0-

This comment has been minimized.

Copy link
Member

dm0- commented Dec 9, 2018

There is a suggested workaround in #2527 for config timeouts.

@DirkTheDaring

This comment has been minimized.

Copy link

DirkTheDaring commented Dec 9, 2018

This "workaround" left me with more confusion . Is there an example or step by step approach, how this workaround can be achieved ?

@lucab

This comment has been minimized.

Copy link
Member

lucab commented Dec 9, 2018

@DirkTheDaring can you please attach subsequent logs entries (i.e. for the whole boot)? In the default configuration, Ignition will keep retrying until it is able to fetch the userdata. Racing with network stabilization is expected, but a boot failure is likely due to some other problems.

lucab added a commit to lucab/ignition that referenced this issue Dec 10, 2018

internal/exec: increase default config fetch timeout
This bumps default config timeout to 2 minutes in order to be more
resilient in environments where network stabilization may take a long
time.

Ref: coreos/bugs#2527
Ref: coreos/bugs#2532
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment