Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No alerts are triggered when scraping job times out #126

Open
przemeklal opened this issue Dec 5, 2023 · 1 comment
Open

No alerts are triggered when scraping job times out #126

przemeklal opened this issue Dec 5, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@przemeklal
Copy link
Member

In a real-life deployment, scraping hardware-exporter may take more than 10s, e.g.

time curl localhost:10000/metrics
...
real	0m25.534s
user	0m0.008s
sys	0m0.000s

By default, grafana-agent's scrape_timeout is 10s. If the scrape takes more than 10s, it fails silently, there are no alerts about missing metrics and any firing alerts get resolved because there are no metrics in Prometheus since grafana-agent doesn't push them.

As a reliability engineer, I didn't know this was happening until I ran queries in Prometheus to check whether the metrics were there.

The workaround is to bump global_scrape_timeout in grafana-agent juju config to something like 60s to make sure the scrape jobs are actually executed.

@Pjack Pjack added enhancement New feature or request bug Something isn't working and removed enhancement New feature or request labels Dec 5, 2023
@jneo8
Copy link
Contributor

jneo8 commented Dec 8, 2023

We can try to add timeout mechanism on exporter to have this timeout metrics.

@Pjack Pjack added enhancement New feature or request and removed bug Something isn't working labels Dec 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants