Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

custom-metrics-stackdriver-adapter not working - auth problems? #498

Open
rbarberop opened this issue Nov 23, 2022 · 13 comments
Open

custom-metrics-stackdriver-adapter not working - auth problems? #498

rbarberop opened this issue Nov 23, 2022 · 13 comments

Comments

@rbarberop
Copy link

Hi have created a new GCP project and a GKE cluster inside it.

I've followed the instructions in the README...

However Logs Explorer is complaining about it... looks like an authentication problem?

  {
    "insertId": "0zumoy6o5n39r4ul",
    "jsonPayload": {
      "pid": "1",
      "message": "apiserver received an error that is not an metav1.Status: &errors.errorString{s:\"http2: stream closed\"}: http2: stream closed"
    },
    "resource": {
      "type": "k8s_container",
      "labels": {
        "pod_name": "custom-metrics-stackdriver-adapter-848d58c7c7-h4s5d",
        "cluster_name": "scheduled-autoscaler",
        "namespace_name": "custom-metrics",
        "container_name": "pod-custom-metrics-stackdriver-adapter",
        "location": "us-central1-f",
        "project_id": "rbarbero-gke-freshness-3"
      }
    },
    "timestamp": "2022-11-23T20:10:54.385935713Z",
    "severity": "ERROR",
    "labels": {
      "compute.googleapis.com/resource_name": "gke-scheduled-autoscaler-default-pool-28d0994f-85hh",
      "k8s-pod/run": "custom-metrics-stackdriver-adapter",
      "k8s-pod/k8s-app": "custom-metrics-stackdriver-adapter",
      "k8s-pod/pod-template-hash": "848d58c7c7",
      "k8s-pod/kubernetes_io/cluster-service": "true"
    },
    "logName": "projects/rbarbero-gke-freshness-3/logs/stderr",
    "sourceLocation": {
      "file": "status.go",
      "line": "71"
    },
    "receiveTimestamp": "2022-11-23T20:10:57.325326887Z"
  },
  {
    "insertId": "7crfx9uy4t13qxty",
    "jsonPayload": {
      "pid": "1",
      "message": "post-timeout activity - time-elapsed: 108.975022ms, GET \"/apis/custom.metrics.k8s.io/v1beta1\" result: <nil>"
    },
    "resource": {
      "type": "k8s_container",
      "labels": {
        "container_name": "pod-custom-metrics-stackdriver-adapter",
        "cluster_name": "scheduled-autoscaler",
        "pod_name": "custom-metrics-stackdriver-adapter-848d58c7c7-h4s5d",
        "location": "us-central1-f",
        "project_id": "rbarbero-gke-freshness-3",
        "namespace_name": "custom-metrics"
      }
    },
    "timestamp": "2022-11-23T20:10:54.389245350Z",
    "severity": "ERROR",
    "labels": {
      "k8s-pod/run": "custom-metrics-stackdriver-adapter",
      "k8s-pod/k8s-app": "custom-metrics-stackdriver-adapter",
      "k8s-pod/pod-template-hash": "848d58c7c7",
      "k8s-pod/kubernetes_io/cluster-service": "true",
      "compute.googleapis.com/resource_name": "gke-scheduled-autoscaler-default-pool-28d0994f-85hh"
    },
    "logName": "projects/rbarbero-gke-freshness-3/logs/stderr",
    "sourceLocation": {
      "file": "timeout.go",
      "line": "135"
    },
    "receiveTimestamp": "2022-11-23T20:10:57.325326887Z"
  },
  {
    "insertId": "484cm7yu3ea24ke2",
    "jsonPayload": {
      "message": "apiserver was unable to write a fallback JSON response: http2: stream closed",
      "pid": "1"
    },
    "resource": {
      "type": "k8s_container",
      "labels": {
        "pod_name": "custom-metrics-stackdriver-adapter-848d58c7c7-h4s5d",
        "location": "us-central1-f",
        "namespace_name": "custom-metrics",
        "container_name": "pod-custom-metrics-stackdriver-adapter",
        "project_id": "rbarbero-gke-freshness-3",
        "cluster_name": "scheduled-autoscaler"
      }
    },
    "timestamp": "2022-11-23T20:10:54.391067637Z",
    "severity": "ERROR",
    "labels": {
      "k8s-pod/pod-template-hash": "848d58c7c7",
      "k8s-pod/run": "custom-metrics-stackdriver-adapter",
      "k8s-pod/k8s-app": "custom-metrics-stackdriver-adapter",
      "k8s-pod/kubernetes_io/cluster-service": "true",
      "compute.googleapis.com/resource_name": "gke-scheduled-autoscaler-default-pool-28d0994f-85hh"
    },
    "logName": "projects/rbarbero-gke-freshness-3/logs/stderr",
    "sourceLocation": {
      "file": "writers.go",
      "line": "130"
    },
    "receiveTimestamp": "2022-11-23T20:10:57.325326887Z"
  },
  {
    "insertId": "vku7ud5xhug364zm",
    "jsonPayload": {
      "message": "post-timeout activity - time-elapsed: 193.853136ms, GET \"/apis/custom.metrics.k8s.io/v1beta1\" result: <nil>",
      "pid": "1"
    },
    "resource": {
      "type": "k8s_container",
      "labels": {
        "namespace_name": "custom-metrics",
        "location": "us-central1-f",
        "container_name": "pod-custom-metrics-stackdriver-adapter",
        "cluster_name": "scheduled-autoscaler",
        "pod_name": "custom-metrics-stackdriver-adapter-848d58c7c7-h4s5d",
        "project_id": "rbarbero-gke-freshness-3"
      }
    },
    "timestamp": "2022-11-23T20:10:54.486030132Z",
    "severity": "ERROR",
    "labels": {
      "k8s-pod/kubernetes_io/cluster-service": "true",
      "k8s-pod/k8s-app": "custom-metrics-stackdriver-adapter",
      "k8s-pod/run": "custom-metrics-stackdriver-adapter",
      "compute.googleapis.com/resource_name": "gke-scheduled-autoscaler-default-pool-28d0994f-85hh",
      "k8s-pod/pod-template-hash": "848d58c7c7"
    },
    "logName": "projects/rbarbero-gke-freshness-3/logs/stderr",
    "sourceLocation": {
      "file": "timeout.go",
      "line": "135"
    },
    "receiveTimestamp": "2022-11-23T20:10:57.325326887Z"
  },
  {
    "insertId": "u55i6pz9rfhil94j",
    "jsonPayload": {
      "message": "timeout or abort while handling: method=GET URI=\"/apis/custom.metrics.k8s.io/v1beta1\" audit-ID=\"3e30bad8-85bc-46e4-9a9b-0680dbfe47f8\"",
      "pid": "1"
    },
    "resource": {
      "type": "k8s_container",
      "labels": {
        "container_name": "pod-custom-metrics-stackdriver-adapter",
        "location": "us-central1-f",
        "cluster_name": "scheduled-autoscaler",
        "namespace_name": "custom-metrics",
        "project_id": "rbarbero-gke-freshness-3",
        "pod_name": "custom-metrics-stackdriver-adapter-848d58c7c7-h4s5d"
      }
    },
    "timestamp": "2022-11-23T20:10:54.502749250Z",
    "severity": "ERROR",
    "labels": {
      "k8s-pod/run": "custom-metrics-stackdriver-adapter",
      "k8s-pod/pod-template-hash": "848d58c7c7",
      "k8s-pod/kubernetes_io/cluster-service": "true",
      "compute.googleapis.com/resource_name": "gke-scheduled-autoscaler-default-pool-28d0994f-85hh",
      "k8s-pod/k8s-app": "custom-metrics-stackdriver-adapter"
    },
    "logName": "projects/rbarbero-gke-freshness-3/logs/stderr",
    "sourceLocation": {
      "file": "wrap.go",
      "line": "54"
    },
    "receiveTimestamp": "2022-11-23T20:10:57.325326887Z"
  },
  {
    "insertId": "foryseqqz48yqfbg",
    "jsonPayload": {
      "pid": "1",
      "message": "apiserver was unable to close cleanly the response writer: http: Handler timeout"
    },
    "resource": {
      "type": "k8s_container",
      "labels": {
        "project_id": "rbarbero-gke-freshness-3",
        "pod_name": "custom-metrics-stackdriver-adapter-848d58c7c7-h4s5d",
        "namespace_name": "custom-metrics",
        "cluster_name": "scheduled-autoscaler",
        "location": "us-central1-f",
        "container_name": "pod-custom-metrics-stackdriver-adapter"
      }
    },
    "timestamp": "2022-11-23T20:10:54.502797138Z",
    "severity": "ERROR",
    "labels": {
      "compute.googleapis.com/resource_name": "gke-scheduled-autoscaler-default-pool-28d0994f-85hh",
      "k8s-pod/kubernetes_io/cluster-service": "true",
      "k8s-pod/run": "custom-metrics-stackdriver-adapter",
      "k8s-pod/k8s-app": "custom-metrics-stackdriver-adapter",
      "k8s-pod/pod-template-hash": "848d58c7c7"
    },
    "logName": "projects/rbarbero-gke-freshness-3/logs/stderr",
    "sourceLocation": {
      "file": "writers.go",
      "line": "111"
    },
    "receiveTimestamp": "2022-11-23T20:10:57.325326887Z"
  },
  {
    "insertId": "ic9b2l6ofwbvvdx0",
    "jsonPayload": {
      "message": "post-timeout activity - time-elapsed: 4.947211ms, GET \"/apis/custom.metrics.k8s.io/v1beta1\" result: <nil>",
      "pid": "1"
    },
    "resource": {
      "type": "k8s_container",
      "labels": {
        "pod_name": "custom-metrics-stackdriver-adapter-848d58c7c7-h4s5d",
        "project_id": "rbarbero-gke-freshness-3",
        "namespace_name": "custom-metrics",
        "location": "us-central1-f",
        "cluster_name": "scheduled-autoscaler",
        "container_name": "pod-custom-metrics-stackdriver-adapter"
      }
    },
    "timestamp": "2022-11-23T20:10:54.502805932Z",
    "severity": "ERROR",
    "labels": {
      "k8s-pod/run": "custom-metrics-stackdriver-adapter",
      "compute.googleapis.com/resource_name": "gke-scheduled-autoscaler-default-pool-28d0994f-85hh",
      "k8s-pod/pod-template-hash": "848d58c7c7",
      "k8s-pod/k8s-app": "custom-metrics-stackdriver-adapter",
      "k8s-pod/kubernetes_io/cluster-service": "true"
    },
    "logName": "projects/rbarbero-gke-freshness-3/logs/stderr",
    "sourceLocation": {
      "file": "timeout.go",
      "line": "135"
    },
    "receiveTimestamp": "2022-11-23T20:10:57.325326887Z"
  }
@vschettino-asimov
Copy link

Looks like I have the same issue:

E1205 21:46:49.429171       1 timeout.go:135] post-timeout activity - time-elapsed: 3.304368ms, GET "/apis/custom.metrics.k8s.io/v1beta2" result: <nil>
E1205 21:46:49.523578       1 wrap.go:54] timeout or abort while handling: method=GET URI="/apis/custom.metrics.k8s.io/v1beta2" audit-ID="71f72960-a980-4afb-ad1d-03e1b6bec66f"
E1205 21:46:49.523649       1 writers.go:117] apiserver was unable to write a JSON response: http2: stream closed
E1205 21:46:49.523716       1 wrap.go:54] timeout or abort while handling: method=GET URI="/apis/custom.metrics.k8s.io/v1beta2" audit-ID="79f73bf9-dc54-4676-936a-aa819a77194e"
E1205 21:46:49.525281       1 writers.go:111] apiserver was unable to close cleanly the response writer: http: Handler timeout
E1205 21:46:49.525327       1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http2: stream closed"}: http2: stream closed
E1205 21:46:49.526374       1 writers.go:130] apiserver was unable to write a fallback JSON response: http2: stream closed
E1205 21:46:49.528673       1 timeout.go:135] post-timeout activity - time-elapsed: 4.796544ms, GET "/apis/custom.metrics.k8s.io/v1beta2" result: <nil>

@sethTrustle
Copy link

I have the same issue
post-timeout activity - time-elapsed: 109.781784ms, GET "/apis/custom.metrics.k8s.io/v1beta1" result: <nil>

@wyardley
Copy link

wyardley commented Jan 2, 2023

I have a high volume of the same / similar errors
That said, the adapter itself works AFAIK

@clarkmcc
Copy link

clarkmcc commented Feb 3, 2023

Same issue, also I had to add this resource to my cluster in order to get it to startup

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: custom-metrics-extension-apiserver-authentication-reader
  namespace: kube-system
subjects:
  - kind: ServiceAccount
    name: custom-metrics-stackdriver-adapter
    namespace: custom-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader

@PaulRudin
Copy link

PaulRudin commented Feb 8, 2023

Same issue, also I had to add this resource to my cluster in order to get it to startup

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: custom-metrics-extension-apiserver-authentication-reader
  namespace: kube-system
subjects:
  - kind: ServiceAccount
    name: custom-metrics-stackdriver-adapter
    namespace: custom-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader

I tried this but nothing changed. When you "get it to startup" what exactly does "it" refer to? The stackdriver adapter pod? In my case the pod is running, but it
has loads of of log entries like:

E0208 17:56:12.201768 1 provider.go:320] Failed request to stackdriver api: googleapi: Error 403: Permission monitoring.metricDescriptors.list denied (or the resource may not exist)., forbidden

Maybe it's a different issue?

@clarkmcc
Copy link

clarkmcc commented Feb 8, 2023

@PaulRudin just spitballing here, but that error looks like your cluster's service account doesn't have permission to call the GCP monitoring API.

@PaulRudin
Copy link

Yeah, but nothing has changed recently as far as I know. But maybe I've inadvertently modified something when changing something unrelated.

@morphalus
Copy link

morphalus commented Feb 9, 2023

Hi!

I have the exactly same issue and it appeared suddenly about the same period.

I thought too about a permission or scope issue but the node where the adapter is running has cloud-platform scope and the service account has Monitoring Viewer permission.

I also think that it is not a permission or scope issue as we have timeout errors.

I am still investigating. Maybe something has changed on GCP side.

@PaulRudin
Copy link

OK - so in my case I had inadvertently changed the service account, so the permission denied problem has been fixed. But I do still see messages similar to those reported by others:

E0209 10:59:56.474287       1 timeout.go:135] post-timeout activity - time-elapsed: 15.70529ms, GET "/apis/custom.metrics.k8s.io/v1beta1" result: <nil>
E0209 10:59:56.554654       1 writers.go:111] apiserver was unable to close cleanly the response writer: http: Handler timeout
E0209 10:59:56.555141       1 writers.go:130] apiserver was unable to write a fallback JSON response: http2: stream closed
E0209 10:59:56.556250       1 writers.go:130] apiserver was unable to write a fallback JSON response: http: Handler timeout
E0209 10:59:56.558605       1 timeout.go:135] post-timeout activity - time-elapsed: 93.570353ms, GET "/apis/custom.metrics.k8s.io/v1beta1" result: <nil>
E0209 10:59:56.559753       1 timeout.go:135] post-timeout activity - time-elapsed: 102.936145ms, GET "/apis/custom.metrics.k8s.io/v1beta1" result: <nil>
E0209 10:59:56.560900       1 timeout.go:135] post-timeout activity - time-elapsed: 102.266556ms, GET "/apis/custom.metrics.k8s.io/v1beta2" result: <nil>

@wyardley
Copy link

wyardley commented Feb 10, 2023

sorry to slightly hijack, but also curious what folks who are using Workload Identity do - do you create a GCP service account for the custom metrics adapter and bind it to the Kube service account? I assume the cluster role bindings don't provide Google API level access, and with workkload identity, I don't think the pod will implicitly have the creds of the nodepool's service account either.

#315 I guess kind of covers this

@sosimon
Copy link

sosimon commented Mar 31, 2023

We're seeing the same error messages but the adapter appears to be functional. It would be nice to understand what the errors mean and what changes we need to make to reduce the noise.

@matiasah
Copy link

@sosimon how do you test to know it is functional? are you reading a metric with kubectl ?

@sosimon
Copy link

sosimon commented Jul 25, 2023

@matiasah I think our HPA is working. Some of the logs shown here are same as the ones in #510. Not the auth errors, those should be, and need to be, resolved by providing the service account the right permissions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants