-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support workload identity #315
Comments
Hi, I too had trouble with CMSA and GKE Workload Identity on GKE v1.15.7-gke.23. The error messages at startup differed though:
The log would then be spammed by:
I've managed to make it work with GKE Workload Identity by adding It works because of the following documented limitation:
See https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#limitations |
/cc @kawych |
@pdecat, in your case, running CMSA on a node with Workload Identity (WI) enabled broke it probably because the Google Service Account (GSA) associated with WI that CMSA was running as didn't have the When you changed CMSA to run on host network mode, it probably worked because now CMSA is using the GKE node's default GSA which is different than the WI-related GSA. This GKE node default GSA probably has |
Workload Identity with CMSA 0.10.2 seems to work for me. I'm seeing these logs which are the same as the ones when it wasn't using WI.
|
@davidxia do you have Horizontal Pod Autoscalers based on external Stackdriver metrics? |
Yes
|
Seeing the same issue. |
Seems to work for me as well using workload identity. Getting never ending stream of these logs though
|
Same, would be great if these could be silenced or moved to a lower logging level. |
@davidxia, @JacobSMoller What role are you using for Google Service Account associated with the Workload Identity that CMSA is running as? I'm using |
roles/monitoring.viewer |
roles/monitoring.admin |
We're facing the same issue |
I managed to make it work with WI using the following approach : gcloud iam service-accounts create custom-metrics-sd-adapter --project "$GCP_PROJECT_ID"
gcloud projects add-iam-policy-binding "$GCP_PROJECT_ID" \
--member "serviceAccount:custom-metrics-sd-adapter@$GCP_PROJECT_ID.iam.gserviceaccount.com" \
--role "roles/monitoring.editor"
gcloud iam service-accounts add-iam-policy-binding \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:$GCP_PROJECT_ID.svc.id.goog[custom-metrics/custom-metrics-stackdriver-adapter]" \
"custom-metrics-sd-adapter@$GCP_PROJECT_ID.iam.gserviceaccount.com"
kubectl create -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter.yaml
kubectl annotate serviceaccount custom-metrics-stackdriver-adapter \
"iam.gke.io/gcp-service-account=custom-metrics-sd-adapter@$GCP_PROJECT_ID.iam.gserviceaccount.com" \
--namespace custom-metrics |
I have the same issue. @aubm steps do not work either. It will fail with WI on the adapter with errors |
@viniciusccarvalho did you try |
Yes, I deleted everything even the namespace, still won't work. Running 1.16.11-gke.5 on my cluster. Still no luck |
This worked for me I am using 1.15.12-gke.2 |
Running 1.17.14-gke.1600. Ran into this issue. I followed the steps describe in the README: The instruction is the same as @aubm. Actually, the first time I configured it, CMSA works with WI. Then I needed to replace the GSA so I re-annotated the K8S service account. My new GSA has the same roles as the original working GSA. I don't see any miss configuration. The new config is outputting these errors:
|
After wait a while, I do see my CMSA and WI working fine. Not sure why GSA/WI/CMSA does not work right away. May be took some time for GCP to sync up. |
There does appear to be a few issues here. One being that the GKE Metadata Service does not support all of the same endpoints that the GCE metadata service does. So if you don't run this workload with host networking enabled in a cluster with Workload Identity enabled, it fails immediately and gets thrown into a crashloop with the following error: This makes sense seeing that there is no
The second issue being that when not directly using Workload Identity, and instead setting
The prometheus-to-sd daemonset that comes with GKE as part of the core tooling, appears to use host-networking to bypass the GKE metadata server and use the GCE metadata server. If there are issues with this project functioning with Workload Identity enabled or with host-networking mode turned off perhaps some documentation would help. |
I have finally gotten passed the stage where it says permission denied, by cleaning up all the services and other stuff the other adapter yaml config creates. I have run these commands to create the service account, bind the correct roles and create the services and deployment. It does however, look like I am getting a new error and I will post the logs below. I thought if I applyed this adapter to my google project/GKE cluster? Then I would be able to scale based on request_per_second to the pods? Something like in the first example in this custom metrics adapter.
|
I tried all known tricks in the book, created ns ahead of time, create the needed K8s SA (custom-metrics-stackdriver-adapter), annotated K8s SA with GCP SA, the GCP SA already has a Moniotoring Editor role, and then used https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml
|
My GKE version is 1.24.3-gke.200 Without this feature the HPA simply doesn't work with custom metric. You need to rely on a SA key download which is against a security best practice. |
Hey, I'm getting this on 1.22.12-gke.300 as well, trying to use WIF. I'm also getting this error:
However when I run |
Can you confirm what service account is used by default by the metrics adapter? Is it the node's GSA? I assign my own GSA to the node pools:
Can you confirm that this GSA will need access? I don't need the adapter deployment to use workload id itself. Im ok with giving the node pool account this access. I granted my node pool GSA the
|
Sounds like your missing this piece
|
I'm also getting the same error in custom-stackdriver pod
However when I run $ kubectl proxy --port=8080 and go to http://127.0.0.1:8080/apis/custom.metrics.k8s.io/v1beta2 and http://127.0.0.1:8080/apis/custom.metrics.k8s.io/v1beta1 the response is not nil and happens almost instantaneously. |
I have the same error as above:
When I do Does anybody have solutions? |
I have also encountered this situation.
My cluster version is: v1.25.5-gke.2000 |
Shame this adapter isn't included by default. Struggling to resolve this on my end Permission errors with workload identity, and even with
|
Any updates? |
Happens to me as well, |
We're trying to setup the custom metrics so we can scale pods off of messages in a pub/sub queue. Currently can't get past this error in the logs for the E0912 19:48:53.983876 1 provider.go:320] Failed request to stackdriver api: googleapi: Error 403: Permission monitoring.metricDescriptors.list denied (or the resource may not exist)., forbidden
E0912 19:48:53.984071 1 writers.go:117] apiserver was unable to write a JSON response: http: Handler timeout
E0912 19:48:53.984139 1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout
E0912 19:48:53.985476 1 writers.go:130] apiserver was unable to write a fallback JSON response: http: Handler timeout
E0912 19:48:53.986869 1 timeout.go:135] post-timeout activity - time-elapsed: 9m10.432424127s, GET "/apis/custom.metrics.k8s.io/v1beta1" result: <nil> I've tried every combination of service account I can think of, but it appears that all of our nodes and all of our pods have the It's kind of wild to me that it appears that this is all over the place. Also kind of annoying that I just followed the documentation and now we're here. |
Same issue on our end. Followed every step in the GCP Readme and the alternative methods here as well, still getting 403 errors. I can get the 403 errors to be resolved when using GKE Cluster Version 1.24.15-gke.1700 EDIT: Finally managed to get it working. The crash loop I mentioned above was an OOMKilled, so I had to increase the resources for the custom-metrics Deployment. Final working steps:
|
I was still getting 403 exceptions using @aubm 's instructions until I added this:
once the IAM permissions propagated, it started working |
Just to be clear is monitoring.editor necessary? You'd have thought monitoring.viewer is enough - and that's what the README says. Although it's somewhat academic in my case as I'm getting:
either way. |
After applying the suggestion here, the permissions issue is fixed, but I still get loads of this sort of thing in the logs:
|
... and either I was mistaken, or the issue has resurfaced - I still see: this sort of thing:
even though the relevant service account has the monitoring.viewer role. |
When deploying the Stackdriver custom metrics adapter inside a GKE cluster with workload identity enabled, the adaptor (v0.10.2) fails to start.
Steps taken:
Adapter deployment log:
The text was updated successfully, but these errors were encountered: