Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
oem-gce.service crashlooping on version 2191.4.1 #2608
This seems to be pushing the
Hiya. Completely ignoring the subject of how this affected all three alpha, beta, stable channels, I'd like to know why steps haven't been taken to remove the broken images from circulation?
Currently anyone launching a stable coreos image on Google Compute Engine will be unable to SSH to their instances because the affected service is responsible for retrieving SSH keys from GCP Project Metadata. Additionally the constant CPU thrashing caused by systemd trying to start the service every 5 seconds starves small (1vcpu) instances of resource and they cannot support their intended function.
Why have the affected images on GCP not been marked as deprecated and the previous known working images marked as the active member of the image families?
It turns out that the problem was introduced in alpha 2163.0.0. So the issue has been present in the alpha channel since June 4 and the beta channel since June 25. No one has reported it before now, and obviously our CI didn't catch it either.
Upgrades aren't affected, because the agent is in the OEM partition which is not updated. As a workaround, you can launch 2135.6.0 and allow it to update normally.
As a policy, we don't remove released artifacts. We'll revert the coreos-stable image family to 2135.6.0, but the alpha and beta channels have progressed too far to revert. We're working on tracking this down and hope to have a fixed release soon.