You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 16, 2020. It is now read-only.
Instead of a single continuous running oem-gce rkt container we found that every 1-2 minutes a new oem-gce container instance was spinning up until the system runs too many of them and in the end runs out of memory.
Container Linux Version
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=2135.5.0
VERSION_ID=2135.5.0
BUILD_ID=2019-07-01-1959
PRETTY_NAME="Container Linux by CoreOS 2135.5.0 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"
BUG_REPORT_URL="https://issues.coreos.com"
Environment
We run CoreOS on Google Cloud VMs since some years. Cloud is connected via site-2-site VPN to our corporate network.
In our setup up we use our corporate nameserver inside resolver.conf and added 169.254.169.254 metadata.google.internal to hosts file to allow lookup of metadata server name.
But since some time (apologize that I cannot name the exact CoreOS version since when problems started .. ) it does not work anymore and we get the symptoms described above.
Expected Behavior
Only one oem-gce container should be started after boot and stay running
Actual Behavior
every 1-2 minutes a new oem-gce rkt container instance is spinning up and stays running until we get OOM issues
UUID APP IMAGE NAME STATE CREATED STARTED NETWORKS
065952d0 oem-gce coreos.com/oem-gce:2135.5.0 running 6 minutes ago 6 minutes ago
755867b3 oem-gce coreos.com/oem-gce:2135.5.0 running 1 minute ago 1 minute ago
92878e0b oem-gce coreos.com/oem-gce:2135.5.0 exited garbage 4 days ago 4 days ago
bdb4d4e6 oem-gce coreos.com/oem-gce:2135.5.0 running 4 minutes ago 4 minutes ago
bea7cb1d oem-gce coreos.com/oem-gce:2135.5.0 running 3 minutes ago 3 minutes ago
Reproduction Steps
deploy a VM with latest CoreOS image on google cloud
configure resolver.conf to use some corporate non-google DNS server
remove any nameserver 169.254.160.254 entry from resolver confirmation
add 169.254.169.254 metadata.google.internal to hosts file
Other Information
from journal we got these logs that seem to relate to the issue
instance-setup[2400]: ERROR GET request error retrieving metadata. <urlopen error [Errno -2] Name or service not known>.
google-accounts[913]: ERROR GET request error retrieving metadata. <urlopen error [Errno -2] Name or service not known>.
google-networking[915]: ERROR GET request error retrieving metadata. <urlopen error [Errno -2] Name or service not known>.
As a workaround we found out that once I add nameserver 169.254.169.254 as first entry to resolver.conf before our corporate nameserver the problem disappears.
But this is no solution as it disables name resolution for our internal machines.
Looks like something has changed inside oem-gce container so that just adding the metadata.google.internal entry inside hosts file is not sufficient any more to allow the container to start properly even this config worked fine for years.
In older versions 1576 and 1855 the problem did not exist. It even looks like that the latest CoreOS version does not show this problem as long a the oem-gce-container version is old. This we found on one machine that was deployed long time ago and got updated continuously. During these updates obviously the oem-gce- container was not updated.
The text was updated successfully, but these errors were encountered:
Bug
Instead of a single continuous running oem-gce rkt container we found that every 1-2 minutes a new oem-gce container instance was spinning up until the system runs too many of them and in the end runs out of memory.
Container Linux Version
Environment
We run CoreOS on Google Cloud VMs since some years. Cloud is connected via site-2-site VPN to our corporate network.
In our setup up we use our corporate nameserver inside resolver.conf and added 169.254.169.254 metadata.google.internal to hosts file to allow lookup of metadata server name.
But since some time (apologize that I cannot name the exact CoreOS version since when problems started .. ) it does not work anymore and we get the symptoms described above.
Expected Behavior
Only one oem-gce container should be started after boot and stay running
Actual Behavior
every 1-2 minutes a new oem-gce rkt container instance is spinning up and stays running until we get OOM issues
Reproduction Steps
deploy a VM with latest CoreOS image on google cloud
configure resolver.conf to use some corporate non-google DNS server
remove any nameserver 169.254.160.254 entry from resolver confirmation
add 169.254.169.254 metadata.google.internal to hosts file
Other Information
from journal we got these logs that seem to relate to the issue
As a workaround we found out that once I add nameserver 169.254.169.254 as first entry to resolver.conf before our corporate nameserver the problem disappears.
But this is no solution as it disables name resolution for our internal machines.
Looks like something has changed inside oem-gce container so that just adding the metadata.google.internal entry inside hosts file is not sufficient any more to allow the container to start properly even this config worked fine for years.
In older versions 1576 and 1855 the problem did not exist. It even looks like that the latest CoreOS version does not show this problem as long a the oem-gce-container version is old. This we found on one machine that was deployed long time ago and got updated continuously. During these updates obviously the oem-gce- container was not updated.
The text was updated successfully, but these errors were encountered: