Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log spamming with horizontal pod autoscaler and custom-metrics-stackdriver-adapter #318

Open
HugoTigre opened this issue Apr 2, 2020 · 23 comments

Comments

@HugoTigre
Copy link

HugoTigre commented Apr 2, 2020

I'm currently using Horizontal Pod Autoscaler (in google cloud) implemented with custom metrics, so custom-metrics-stackdriver-adapter is installed from here

The problem is that it's generating more than 10 log messages a second with the following errors:

jsonPayload: {
  message: "apiserver was unable to write a JSON response: http2: stream closed"   
  pid: "1"   
  source: "writers.go:172"   
 }

and

jsonPayload: {
  message: "apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http2: stream closed"}"   
  pid: "1"   
  source: "status.go:71"   
 }

The HPA is working as expected, so the amount of errors is very strange and I couldn't found a reason for it, not could I find documentation on how to change this, or even change the amount of requests periodicity, not in HPA nor in this adapter.

HPA is configured as follows:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: xxx
  namespace: xxx
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: xxx
  minReplicas: 2
  maxReplicas: 3
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 40

Kubernetes version is: 1.15

Is there any reason for this. It looks like a bug.

Also this issue seems to be related: #303

@HugoTigre HugoTigre changed the title log spamming log spamming with horizontal pod autoscaler and custom-metrics-stackdriver-adapter Apr 2, 2020
@JBodkin-LH
Copy link

JBodkin-LH commented Apr 14, 2020

We've been seeing the same issue when using this adapter and autoscaling based on pubsub undelivered messages

@msgongora
Copy link

We facing this use as well, same use case.

@lechen26
Copy link

lechen26 commented Feb 14, 2021

same here.

we have HPA on most of our services based on custom metric (external).
GKE version v1.17.15-gke.800 and gcr.io/google-containers/custom-metrics-stackdriver-adapter:v0.8.0

it is working but we have a lot of errors on GKE events from the kind:
unable to fetch metrics from external metrics API: the server is currently unable to handle the request

on the custom metrics log the log is pretty not useful as its just FULL with the following:

apiserver was unable to write a JSON response: http2: stream closed
apiserver received an error that is not an metav1.Status: http2: stream closed

i've notices once this custom-metrics-stackdriver evicted and restarted we got the unable to handle request error, but also when its just running every few hours or minutes we get the errors and the hpa works but i suspect its not working as efficient as it used to be.

BTW, same happens on another cluster
GKE version v1.17.14-gke.1600 and gcr.io/google-containers/custom-metrics-stackdriver-adapter:v0.10.2

any idea what's going on?
thanks

@lechen26
Copy link

anything here?

@trucolo
Copy link

trucolo commented Jun 30, 2021

same issue here

We are trying to use HPA with the same metric as @JBodkin-LH and I'm getting a lot of those errors, it seems the metrics are working fine, but that amount of error logs might hide other issues...

@rajithavk
Copy link

Is there a fix for this or a way to silence these logs?
we've already run into a surge in costs due to this spamming issue.

@masterlog80
Copy link

For Google Cloud, it's possible to set Logs Exclusion for a specific pattern:
https://cloud.google.com/logging/docs/exclusions

@bboykk1234
Copy link

same issue here

We setup the HPA following this guideline https://cloud.google.com/kubernetes-engine/docs/tutorials/autoscaling-metrics

any idea what's going on?

@shpml
Copy link

shpml commented Dec 17, 2021

Any update?

@Gwojda
Copy link

Gwojda commented Jan 10, 2022

Update ?

@eric3chang
Copy link

I'm also running into this issue :-(

@asychev
Copy link

asychev commented Jan 26, 2022

Same for us. Any reaction from maintainers?

@brianpham
Copy link

I am seeing the same issue as well following the guide found here https://github.com/GoogleCloudPlatform/k8s-stackdriver/tree/master/custom-metrics-stackdriver-adapter. Anyone figure out a way to fix the error messages above or is this something we can ignore?

Running v1.21.5-gke.1302 for control plane and nodes with workload identity enabled.

@stenicke
Copy link

Same for me. Any update?

@naxo8628
Copy link

naxo8628 commented Oct 4, 2022

+1

@muscovitebob
Copy link

Getting surprise spam cloud logging bills from this issue, except this is autodeployed as part of Cloud Composer.

@kwiesmueller
Copy link
Contributor

@muscovitebob please reach out to cloud support for any issues caused by a managed product and related billing issues.

In general when managing this component yourself, check your adapters memory utilization. If it is running close to the memory limit this can be a symptom.
Also check the resources provided to the adapter in general and see if increasing them reduces the frequency of these errors (feel free to share learnings here).

If you are not seeing any data reaching the apiserver from the component, checking your networking rules/firewalls can also help to find what is causing traffic to get lost. Often these errors just mean the adapter can not respond in time or at all.

@alina-bylkova
Copy link

alina-bylkova commented Jan 19, 2023

Same issue with stackdriver version gcr.io/gke-release/custom-metrics-stackdriver-adapter:v0.13.1-gke.0 and k8s version 1.23.14-gke.401

@don-toptal
Copy link

+1

@Ture2019
Copy link

Hi,
we experience the same issue in two different environments. This produces ~10.000 error messages pr hour. This drowns any useful error message and causes higher than neccessary costs. Quite an important issue so to say. Quite disappointing to see that has not been solved in 2 1/2 years, and is not more prioritised!
Workaround for our application is to go back to composer version 1. We are happy to provide more information if anybody is willing to take on this issue.
"old prod env":

  • image version: composer-2.0.7-airflow-2.2.3
  • k8s version: 1.24.11-gke.1000
  • stackdriver version: gcr.io/gke-release/custom-metrics-stackdriver-adapter:v0.12.2-gke.0
    "new prod env":
  • image version: composer-2.2.0-airflow-2.5.1
  • k8s version: 1.25.8-gke.500
  • stackdriver version: gcr.io/gke-release/custom-metrics-stackdriver-adapter:v0.13.1-gke.0

Steps to reproduce the issue:

  1. Grant access:
gcloud projects add-iam-policy-binding kolumbus-atl-prod \
    --member=serviceAccount:service-123456789@cloudcomposer-accounts.iam.gserviceaccount.com \
    --role=roles/composer.ServiceAgentV2Ext
  1. Create environment:
gcloud composer environments create kolumbus-composer5 \
  --location=europe-west1 \
  --image-version=composer-2.2.0-airflow-2.5.1 \
  --environment-size=small \
  --maintenance-window-start='2023-05-25T17:30:00Z' \
  --maintenance-window-end='2023-05-25T21:30:00Z' \
  --maintenance-window-recurrence='FREQ=DAILY'
  1. Look at Logs Explorer, filter by Error.

@davidxia
Copy link

I don't think GCP teams look at or are notified of or maybe just don't care about GitHub.com comments and issues. The most effective way to get them to fix things is to create a partner issue on their internal tracking system or GCP support case if you're a paying GCP customer (paying more money correlates to faster response time) and linking back to this issue.

@rajithavk
Copy link

rajithavk commented May 25, 2023 via email

@Ture2019
Copy link

I added a note to the corresponding composer bug report:
https://issuetracker.google.com/issues/159171905
Please upvote and comment you, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests