log spamming with horizontal pod autoscaler and custom-metrics-stackdriver-adapter #318

HugoTigre · 2020-04-02T18:37:36Z

I'm currently using Horizontal Pod Autoscaler (in google cloud) implemented with custom metrics, so custom-metrics-stackdriver-adapter is installed from here

The problem is that it's generating more than 10 log messages a second with the following errors:

jsonPayload: {
  message: "apiserver was unable to write a JSON response: http2: stream closed"   
  pid: "1"   
  source: "writers.go:172"   
 }

and

jsonPayload: {
  message: "apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http2: stream closed"}"   
  pid: "1"   
  source: "status.go:71"   
 }

The HPA is working as expected, so the amount of errors is very strange and I couldn't found a reason for it, not could I find documentation on how to change this, or even change the amount of requests periodicity, not in HPA nor in this adapter.

HPA is configured as follows:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: xxx
  namespace: xxx
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: xxx
  minReplicas: 2
  maxReplicas: 3
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 40

Kubernetes version is: 1.15

Is there any reason for this. It looks like a bug.

Also this issue seems to be related: #303

The text was updated successfully, but these errors were encountered:

JBodkin-LH · 2020-04-14T07:42:35Z

We've been seeing the same issue when using this adapter and autoscaling based on pubsub undelivered messages

msgongora · 2021-01-15T18:59:35Z

We facing this use as well, same use case.

lechen26 · 2021-02-14T11:37:30Z

same here.

we have HPA on most of our services based on custom metric (external).
GKE version v1.17.15-gke.800 and gcr.io/google-containers/custom-metrics-stackdriver-adapter:v0.8.0

it is working but we have a lot of errors on GKE events from the kind:
unable to fetch metrics from external metrics API: the server is currently unable to handle the request

on the custom metrics log the log is pretty not useful as its just FULL with the following:

apiserver was unable to write a JSON response: http2: stream closed
apiserver received an error that is not an metav1.Status: http2: stream closed

i've notices once this custom-metrics-stackdriver evicted and restarted we got the unable to handle request error, but also when its just running every few hours or minutes we get the errors and the hpa works but i suspect its not working as efficient as it used to be.

BTW, same happens on another cluster
GKE version v1.17.14-gke.1600 and gcr.io/google-containers/custom-metrics-stackdriver-adapter:v0.10.2

any idea what's going on?
thanks

lechen26 · 2021-02-24T10:42:55Z

anything here?

trucolo · 2021-06-30T20:16:25Z

same issue here

We are trying to use HPA with the same metric as @JBodkin-LH and I'm getting a lot of those errors, it seems the metrics are working fine, but that amount of error logs might hide other issues...

rajithavk · 2021-07-19T14:12:01Z

Is there a fix for this or a way to silence these logs?
we've already run into a surge in costs due to this spamming issue.

masterlog80 · 2021-07-20T13:32:18Z

For Google Cloud, it's possible to set Logs Exclusion for a specific pattern:
https://cloud.google.com/logging/docs/exclusions

bboykk1234 · 2021-07-26T11:53:59Z

same issue here

We setup the HPA following this guideline https://cloud.google.com/kubernetes-engine/docs/tutorials/autoscaling-metrics

any idea what's going on?

shpml · 2021-12-17T00:57:21Z

Any update?

Gwojda · 2022-01-10T18:40:37Z

Update ?

eric3chang · 2022-01-21T21:59:47Z

I'm also running into this issue :-(

asychev · 2022-01-26T12:29:19Z

Same for us. Any reaction from maintainers?

brianpham · 2022-02-02T21:16:18Z

I am seeing the same issue as well following the guide found here https://github.com/GoogleCloudPlatform/k8s-stackdriver/tree/master/custom-metrics-stackdriver-adapter. Anyone figure out a way to fix the error messages above or is this something we can ignore?

Running v1.21.5-gke.1302 for control plane and nodes with workload identity enabled.

stenicke · 2022-04-22T13:53:56Z

Same for me. Any update?

naxo8628 · 2022-10-04T09:43:36Z

+1

muscovitebob · 2023-01-09T08:39:49Z

Getting surprise spam cloud logging bills from this issue, except this is autodeployed as part of Cloud Composer.

kwiesmueller · 2023-01-09T15:43:13Z

@muscovitebob please reach out to cloud support for any issues caused by a managed product and related billing issues.

In general when managing this component yourself, check your adapters memory utilization. If it is running close to the memory limit this can be a symptom.
Also check the resources provided to the adapter in general and see if increasing them reduces the frequency of these errors (feel free to share learnings here).

If you are not seeing any data reaching the apiserver from the component, checking your networking rules/firewalls can also help to find what is causing traffic to get lost. Often these errors just mean the adapter can not respond in time or at all.

alina-bylkova · 2023-01-19T11:30:53Z

Same issue with stackdriver version gcr.io/gke-release/custom-metrics-stackdriver-adapter:v0.13.1-gke.0 and k8s version 1.23.14-gke.401

don-toptal · 2023-03-29T17:04:10Z

+1

Ture2019 · 2023-05-25T14:28:45Z

Hi,
we experience the same issue in two different environments. This produces ~10.000 error messages pr hour. This drowns any useful error message and causes higher than neccessary costs. Quite an important issue so to say. Quite disappointing to see that has not been solved in 2 1/2 years, and is not more prioritised!
Workaround for our application is to go back to composer version 1. We are happy to provide more information if anybody is willing to take on this issue.
"old prod env":

image version: composer-2.0.7-airflow-2.2.3
k8s version: 1.24.11-gke.1000
stackdriver version: gcr.io/gke-release/custom-metrics-stackdriver-adapter:v0.12.2-gke.0
"new prod env":
image version: composer-2.2.0-airflow-2.5.1
k8s version: 1.25.8-gke.500
stackdriver version: gcr.io/gke-release/custom-metrics-stackdriver-adapter:v0.13.1-gke.0

Steps to reproduce the issue:

Grant access:

gcloud projects add-iam-policy-binding kolumbus-atl-prod \
    --member=serviceAccount:service-123456789@cloudcomposer-accounts.iam.gserviceaccount.com \
    --role=roles/composer.ServiceAgentV2Ext

Create environment:

gcloud composer environments create kolumbus-composer5 \
  --location=europe-west1 \
  --image-version=composer-2.2.0-airflow-2.5.1 \
  --environment-size=small \
  --maintenance-window-start='2023-05-25T17:30:00Z' \
  --maintenance-window-end='2023-05-25T21:30:00Z' \
  --maintenance-window-recurrence='FREQ=DAILY'

Look at Logs Explorer, filter by Error.

davidxia · 2023-05-25T16:55:32Z

I don't think GCP teams look at or are notified of or maybe just don't care about GitHub.com comments and issues. The most effective way to get them to fix things is to create a partner issue on their internal tracking system or GCP support case if you're a paying GCP customer (paying more money correlates to faster response time) and linking back to this issue.

rajithavk · 2023-05-25T17:20:09Z

Well I have bad news, they don't care even you pay them 🤧 gcp seems to be losing to other to key players.

…

On Thu, May 25, 2023, 22:25 David Xia ***@***.***> wrote: I don't think GCP teams look at or are notified of or maybe just don't care about GitHub.com comments and issues. The most effective way to get them to fix things is to create a partner issue on their internal tracking system or GCP support case if you're a paying GCP customer (paying more money correlates to faster response time) and linking back to this issue. — Reply to this email directly, view it on GitHub <#318 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACIECJTHV6TVVZZM7SFPG7DXH6FJFANCNFSM4L2Y5CQA> . You are receiving this because you commented.Message ID: ***@***.***>

Ture2019 · 2023-05-26T11:21:37Z

I added a note to the corresponding composer bug report:
https://issuetracker.google.com/issues/159171905
Please upvote and comment you, too.

HugoTigre changed the title ~~log spamming~~ log spamming with horizontal pod autoscaler and custom-metrics-stackdriver-adapter Apr 2, 2020

serathius mentioned this issue May 4, 2020

custom metrics not work(http2: stream closed) prometheus-operator/kube-prometheus#383

Open

rhodgkins mentioned this issue Sep 7, 2021

API the server is currently unable to handle the request #405

Open

asychev mentioned this issue Jan 26, 2022

Problem with GKE inside private VPC #404

Open

sosimon mentioned this issue Jul 26, 2023

Custom metrics adapter spewing errors "apiserver was unable to write a fallback JSON response: http2: stream closed" #510

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

log spamming with horizontal pod autoscaler and custom-metrics-stackdriver-adapter #318

log spamming with horizontal pod autoscaler and custom-metrics-stackdriver-adapter #318

HugoTigre commented Apr 2, 2020 •

edited

Loading

JBodkin-LH commented Apr 14, 2020 •

edited

Loading

msgongora commented Jan 15, 2021

lechen26 commented Feb 14, 2021 •

edited

Loading

lechen26 commented Feb 24, 2021

trucolo commented Jun 30, 2021

rajithavk commented Jul 19, 2021

masterlog80 commented Jul 20, 2021

bboykk1234 commented Jul 26, 2021

shpml commented Dec 17, 2021

Gwojda commented Jan 10, 2022

eric3chang commented Jan 21, 2022

asychev commented Jan 26, 2022

brianpham commented Feb 2, 2022

stenicke commented Apr 22, 2022

naxo8628 commented Oct 4, 2022

muscovitebob commented Jan 9, 2023

kwiesmueller commented Jan 9, 2023

alina-bylkova commented Jan 19, 2023 •

edited

Loading

don-toptal commented Mar 29, 2023

Ture2019 commented May 25, 2023

davidxia commented May 25, 2023

rajithavk commented May 25, 2023 via email

Ture2019 commented May 26, 2023

log spamming with horizontal pod autoscaler and custom-metrics-stackdriver-adapter #318

log spamming with horizontal pod autoscaler and custom-metrics-stackdriver-adapter #318

Comments

HugoTigre commented Apr 2, 2020 • edited Loading

JBodkin-LH commented Apr 14, 2020 • edited Loading

msgongora commented Jan 15, 2021

lechen26 commented Feb 14, 2021 • edited Loading

lechen26 commented Feb 24, 2021

trucolo commented Jun 30, 2021

rajithavk commented Jul 19, 2021

masterlog80 commented Jul 20, 2021

bboykk1234 commented Jul 26, 2021

shpml commented Dec 17, 2021

Gwojda commented Jan 10, 2022

eric3chang commented Jan 21, 2022

asychev commented Jan 26, 2022

brianpham commented Feb 2, 2022

stenicke commented Apr 22, 2022

naxo8628 commented Oct 4, 2022

muscovitebob commented Jan 9, 2023

kwiesmueller commented Jan 9, 2023

alina-bylkova commented Jan 19, 2023 • edited Loading

don-toptal commented Mar 29, 2023

Ture2019 commented May 25, 2023

davidxia commented May 25, 2023

rajithavk commented May 25, 2023 via email

Ture2019 commented May 26, 2023

HugoTigre commented Apr 2, 2020 •

edited

Loading

JBodkin-LH commented Apr 14, 2020 •

edited

Loading

lechen26 commented Feb 14, 2021 •

edited

Loading

alina-bylkova commented Jan 19, 2023 •

edited

Loading