multiple OEM-GCE container get started on GCE when using non-Google DNS #2601

HeikoOnnebrink · 2019-07-08T06:28:48Z

Bug

Instead of a single continuous running oem-gce rkt container we found that every 1-2 minutes a new oem-gce container instance was spinning up until the system runs too many of them and in the end runs out of memory.

Container Linux Version

NAME="Container Linux by CoreOS"
ID=coreos
VERSION=2135.5.0
VERSION_ID=2135.5.0
BUILD_ID=2019-07-01-1959
PRETTY_NAME="Container Linux by CoreOS 2135.5.0 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"
BUG_REPORT_URL="https://issues.coreos.com"

Environment

We run CoreOS on Google Cloud VMs since some years. Cloud is connected via site-2-site VPN to our corporate network.
In our setup up we use our corporate nameserver inside resolver.conf and added 169.254.169.254 metadata.google.internal to hosts file to allow lookup of metadata server name.
But since some time (apologize that I cannot name the exact CoreOS version since when problems started .. ) it does not work anymore and we get the symptoms described above.

Expected Behavior

Only one oem-gce container should be started after boot and stay running

Actual Behavior

every 1-2 minutes a new oem-gce rkt container instance is spinning up and stays running until we get OOM issues

UUID            APP     IMAGE NAME                      STATE           CREATED         STARTED         NETWORKS
065952d0        oem-gce coreos.com/oem-gce:2135.5.0     running         6 minutes ago   6 minutes ago
755867b3        oem-gce coreos.com/oem-gce:2135.5.0     running         1 minute ago    1 minute ago
92878e0b        oem-gce coreos.com/oem-gce:2135.5.0     exited garbage  4 days ago      4 days ago
bdb4d4e6        oem-gce coreos.com/oem-gce:2135.5.0     running         4 minutes ago   4 minutes ago
bea7cb1d        oem-gce coreos.com/oem-gce:2135.5.0     running         3 minutes ago   3 minutes ago

Reproduction Steps

deploy a VM with latest CoreOS image on google cloud
configure resolver.conf to use some corporate non-google DNS server
remove any nameserver 169.254.160.254 entry from resolver confirmation
add 169.254.169.254 metadata.google.internal to hosts file

Other Information

from journal we got these logs that seem to relate to the issue

instance-setup[2400]: ERROR GET request error retrieving metadata. <urlopen error [Errno -2] Name or service not known>.
google-accounts[913]: ERROR GET request error retrieving metadata. <urlopen error [Errno -2] Name or service not known>.
google-networking[915]: ERROR GET request error retrieving metadata. <urlopen error [Errno -2] Name or service not known>.

As a workaround we found out that once I add nameserver 169.254.169.254 as first entry to resolver.conf before our corporate nameserver the problem disappears.
But this is no solution as it disables name resolution for our internal machines.

Looks like something has changed inside oem-gce container so that just adding the metadata.google.internal entry inside hosts file is not sufficient any more to allow the container to start properly even this config worked fine for years.

In older versions 1576 and 1855 the problem did not exist. It even looks like that the latest CoreOS version does not show this problem as long a the oem-gce-container version is old. This we found on one machine that was deployed long time ago and got updated continuously. During these updates obviously the oem-gce- container was not updated.

The text was updated successfully, but these errors were encountered:

bgilbert · 2020-05-17T10:00:13Z

This should be fixed in all channels this week.

HeikoOnnebrink mentioned this issue Aug 31, 2019

oem-gce.service crashlooping on version 2191.4.1 #2608

Closed

HeikoOnnebrink mentioned this issue Jan 27, 2020

multiple OEM-GCE container get started on GCE when using non-Google DNS flatcar/Flatcar#14

Closed

pothos mentioned this issue Feb 11, 2020

Fix name resolution in GCE OEM container coreos/coreos-overlay#3879

Merged

bgilbert closed this as completed in coreos/coreos-overlay#3879 May 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multiple OEM-GCE container get started on GCE when using non-Google DNS #2601

multiple OEM-GCE container get started on GCE when using non-Google DNS #2601

HeikoOnnebrink commented Jul 8, 2019

bgilbert commented May 17, 2020

multiple OEM-GCE container get started on GCE when using non-Google DNS #2601

multiple OEM-GCE container get started on GCE when using non-Google DNS #2601

Comments

HeikoOnnebrink commented Jul 8, 2019

Bug

Container Linux Version

Environment

Expected Behavior

Actual Behavior

Reproduction Steps

Other Information

bgilbert commented May 17, 2020