-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metrics become very large over time on Kubernetes platform #157
Comments
cc @adinhodovic |
Hi, you can always provide custom relabeling configs. Here's an example and quick hotfix to drop labels with hostname: serviceMonitor:
enabled: true
relabelings:
- action: "labeldrop"
regex: "hostname" We could maybe provide some regex to rename the hostname label that removes the pod random generated suffix. https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config |
Hi! @adinhodovic I am not sure should I create a separate issue, but I would like to discuss your reply As you mentioned there is a workaround to reduce cardinality using relabelings, but what happens with metrics like celery_worker_up in this case if we omit hostname label? I tried to reproduce this behaviour using metric_relabels_config on local environment and as I can see we will fetch 1 or 0 as value of celery_worker_up, since this metric non unique anymore. Here is a config I used:
How can I see, one of the possible way to handle this issue - implement some logic to clear outdated metrics stored in exporter memory (in my opinion, when worker become offline all of his metrics become outdated and useless to be scrapped), i.e. when worker goes offline, we need to remove all metrics with hostname label equal to worker hostname Does this solution make any sense from your perspective? |
Yep, metric relabelings was just a quick hotfix with downsides - I think other metrics get squashed as well. I think in general we'd lean towards the solution you mentioned. Maybe we could introduce a flag as Flower has: I guess the best temporary workaround is to create statefulset that usually have fixed host names. (celery-worker-0, celery-worker-1, celery-worker-2) https://flower.readthedocs.io/en/latest/config.html#purge-offline-workers |
Use statefulsets that recycle the hostname. celery-worker-0, celery-worker-1, celery-worker-2 and so on.
Why would a new hostname be generated for a worker if a Celery task is called 🤔 ? |
@kittywaresz @iqbalaydrus The newest release will prune metrics for a worker that goes offline after 10 minutes by default (adjustable). Should result in way less active time series. |
Installed on kubernetes via the helm chart provided on this repo.
I see the metrics endpoint also has hostname label. The thing with kubernetes is, the hostname has some randomly generated suffix if you're use Deployment resource. So for every restart/update to the pods, it will generate new hostname.
I'm also using kubernetes' CronJob to call celery tasks. This also generates new hostname every time a job is called.
And now the grafana dashboard load time worsens as time goes by. Do you know any approach I can take to tackle this issue?
The text was updated successfully, but these errors were encountered: