Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scrapeTimeout as global configurable parameter #3250

Merged
merged 1 commit into from
Jun 9, 2020
Merged

Add scrapeTimeout as global configurable parameter #3250

merged 1 commit into from
Jun 9, 2020

Conversation

carlosedp
Copy link
Contributor

Add global configurable scrapeTimeout parameter to allow monitoring
targets on clusters consisting of slower hosts like Raspberry Pi and
many ARM boards used for labs.

Many times the 10s default timeout is not enough for targets like kube-controller or kube-scheduler so some targets get a "Context deadline exceeded" error.

Signed-off-by: Carlos de Paula me@carlosedp.com

@carlosedp
Copy link
Contributor Author

Works fine on my ARM cluster:

Jsonnet:

image

Config:

image

@brancz
Copy link
Contributor

brancz commented May 27, 2020

lgtm on green

@carlosedp
Copy link
Contributor Author

Tests seem flaky. Fails seems not related to the change as I looked into the PR test runs and lots fail around TestAllNS, maybe it's timing-out. Any idea?

@brancz
Copy link
Contributor

brancz commented May 29, 2020

hmm I kicked the jobs a couple of times, I think there is actually something not working

@pgier
Copy link
Contributor

pgier commented Jun 2, 2020

It seems be consistently failing on TestAllNS/y/PromGetAuthSecret. Maybe try increasing the scrape timeout for this test?

Copy link
Contributor

@lilic lilic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! One comment, otherwise can you adjust the test as Paul said, thanks!

@@ -177,6 +177,11 @@ func (cg *configGenerator) generateConfig(
scrapeInterval = p.Spec.ScrapeInterval
}

scrapeTimeout := "10s"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the reason for the 10seconds here, might have missed something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the default timeout. Even changing to 30s the same test fails. I don't think it is related to the change.

@carlosedp
Copy link
Contributor Author

I've reverted the timeout to 10s (the default Prometheus scrape_timeout) and added a 60s timeout for the failing test.

Add global configurable scrapeTimeout parameter to allow monitoring
targets on clusters consisting of slower hosts like Raspberry Pi and
many ARM boards used for labs.

Signed-off-by: Carlos de Paula <me@carlosedp.com>
@carlosedp
Copy link
Contributor Author

Now we're good to go. Better and cleaner implementation, also Travis is green!
Thanks for the patience!

Copy link
Member

@paulfantom paulfantom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@brancz brancz merged commit 0cc5289 into prometheus-operator:master Jun 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants