Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error getting config status, workload certificates may not be configured since 20230202.00 #200

Closed
rivershah opened this issue Feb 3, 2023 · 16 comments

Comments

@rivershah
Copy link

Since the release seeing these errors. Using terraform to create docker+machine executor for gitlab. Was working flawlessly for months prior to most recent release.

Feb  3 11:08:08 gitlab-ci-runner google_metadata_script_runner: startup-script: Installing Docker...
Feb  3 11:10:43 gitlab-ci-runner systemd: Starting GCE Workload Certificate refresh...
Feb  3 11:10:43 gitlab-ci-runner gce_workload_cert_refresh: 2023/02/03 11:10:43: Error getting config status, workload certificates may not be configured: HTTP 404
Feb  3 11:10:43 gitlab-ci-runner gce_workload_cert_refresh: 2023/02/03 11:10:43: Done
Feb  3 11:10:43 gitlab-ci-runner systemd: Started GCE Workload Certificate refresh.
@rally-dimi
Copy link

rally-dimi commented Apr 24, 2023

Is anyone looking into it ?
This query is being executed int the GO code & it returns HTTP404:

http://169.254.169.254/computeMetadata/v1/instance/workload-certificates-config-status -H "Metadata-Flavor:Google"

@dorileo @vorakl

@vorakl
Copy link
Contributor

vorakl commented Apr 24, 2023

Could you, guys, please share some background with us?
For instance, how this issue affects your workflow, what exactly (in a bigger context) doesn't work. If, as you noticed, this functionality doesn't work for a couple months, what is your workaround?

As you also noticed, the problem is related to the metadata endpoint, that no longer exists. It is not a specific Guest Agent's issue that could be fixed in its code. Guest Agent acts as a client in the interactions with the metadata server.

We'll try to figure out what is wrong, but we need to know how urgent it is, and what exactly it breaks on a customer's side, taking into consideration the amount of time when it doesn't work as expected.

@rally-dimi
Copy link

@vorakl is this feature is optional or it would stop agent from functioning?
Thank you.

@vorakl
Copy link
Contributor

vorakl commented May 3, 2023

Guest Agent can definitely work without this feature, no doubts. Guest Agent collects a number of features/functionality, and if one doesn't work as expected, other keep working.

This particular feature is special. It seems, there was a feature request some times ago that got implemented and shipped, but left undocumented. I don't see any mentioning in the documentation about its functionality or purpose. And I didn't find anyone who was aware about details related to the feature. That's why I asked you what exactly this issue breaks in your workflow, why you need this feature, and, if it's really needed, what a workaround you found as it is not working for months in a row.

Although, if you just noticed this error message in the logs and nothing really affects your workflow, meaning, all expected behavior from the documentation works fine, then I don't see any reasons to worry about it.

@kfsone
Copy link

kfsone commented Jun 28, 2023

When looking at a potentially certificates related SSL issue, this message naturally attracts your attention. I don't believe it has anything to do with the issue I'm root causing, but since I don't know what it's telling me, I can't know that for sure.

@derhally
Copy link

Our logs are being polluted by these errors. We are seeing them on ubuntu 22.04 minimal

@timdrysdale
Copy link

same for us (log pollution) with ubuntu-2004-focal-lts

@KeithBush
Copy link

Hopefully this can help others still dealing with this issue.

We do not use Google "workload" and from what I can tell it looks like part of Kubernetes and IAM integration?
Most of you who end up here are probably in a similar situation, these messages fill the logs and you were waiting for an official fix that doesn't seem to come.

We've accepted that this won't be fixed as it probably is beneficial to someone trying to use workloads and having an issue with that service. Since we do not use Kubernetes, we've disabled workload refresh to prevent error log spam and if you're interested, here is the commands you'll need.

We are in ubuntu, if you are using something else the timer/service names are probably similar if not the same. You need to stop and disable them.

echo "## Stopping GCE cert refresh... (stops currently running instances)"
systemctl stop gce-workload-cert-refresh.timer
systemctl stop gce-workload-cert-refresh.service

echo "## Disable GCE cert refresh... (prevents instance from coming back after restart)"
systemctl disable gce-workload-cert-refresh.timer
systemctl disable gce-workload-cert-refresh.service

We have many servers and this cut out a lot of spam from our unified view. We have experienced no negative side effects from disabling these two services.

@oxytis
Copy link

oxytis commented Oct 31, 2023

thanks... works!

@ei-grad
Copy link

ei-grad commented Nov 16, 2023

This should be enough:

systemctl disable --now gce-workload-cert-refresh.timer

@ei-grad
Copy link

ei-grad commented Nov 17, 2023

@vorakl it doesn't look like a right thing, to enable by default some timer which fails every 10 minute on all (non-GKE?) machines in Google Cloud, I guess it spams 10th GBs a day in Google Cloud Monitoring overall, not a big deal but what about carbon emission? :-)

Could it be enabled by some startup script only for instances where it is needed?

Or, at least quit silently if there are no credentials to update?

Also, is this timer really needed? Having a similar logic in google_guest_agent makes it look like a reasonable question. Related errors in logs from the google_guest_agent service:

Skipping scheduling credential generation job, failed to reach client credentials endpoint(instance/credentials/certs) with error: error connecting to metadata server, status code: 404
ERROR scheduler.go:177 Failed to schedule job MTLS_MDS_Credential_Boostrapper with error: ShouldEnable() returned false, cannot schedule job MTLS_MDS_Credential_Boostrapper

@ei-grad
Copy link

ei-grad commented Nov 17, 2023

Oh! It looks like this has been fixed in fresh guest-agent versions.

  • 1:20231115.00-g1 (version in current debian bullseye images) is not affected
  • 20230426.00-0ubuntu2~22.04.0 (latest available in ubuntu 22.04 repos) - affected

Filed a bug for Ubuntu: https://bugs.launchpad.net/ubuntu/+source/google-guest-agent/+bug/2043788

@olalofgren
Copy link

Any ideas when this guest-agent will be upgraded for Ubuntu?

@foxmadr
Copy link

foxmadr commented Jan 30, 2024

I'm also receiving this error on Ubuntu 20.04 image.

It may not be related but there is also an error on the Load Balancer, the error has status code 502 and statusDetails: "config_not_found". Strange that config is not found. Is not the same error on GCE load balancer machines?

@ChaitanyaKulkarni28
Copy link
Member

Sorry for the late response. I believe this issue is already fixed in Guest Agent and ubuntu will also have an update with this fix soon.

@ei-grad
Copy link

ei-grad commented Feb 8, 2024

Does anybody know a proper way to ping relevant Ubuntu maintainers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests