New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prometheus raise out of bounds error for all targets after resume the linux system from a suspend #8243
Comments
This sounds like time getting mixed up inside of Go. Generally if the system clock changes, things are going to break and there's not much we can do about it. |
It is not, just we only change the valid end times in TSDB when we reload the head I think. It is relative to head, not to time.Now() |
That's not right, the head only has a minimum timestamp for bounds checking - no maximum. The maximum is a safety check in the scrape code. |
Can you share the history of prometheus_tsdb_head_max_time_seconds, prometheus_tsdb_head_min_time_seconds, and give a timeline of the suspend? Can you confirm that this doesn't just affect one scrape post suspend, but ongoing? |
FWIW, same issue on macOS: I'm experimenting with Prometheus in a container on my mac, so it goes to sleep at night, and when it wakes up, I get this error too. What's even more problematic is that even when stopping and restarting the container, the issue resumes: some information seems to be corrupted in storage, remaining across restarts. |
I am having the exact same issue:
The hosts Below are the docker logs from the container:
|
Same issue with running in minikube on Mac OS. It will be nice to fix this so that makes it easier to experiment with Prometheus stack developing in local clusters |
Same issue experienced on Ubuntu 20.04 running Prometheus in Docker for a development environment that is suspended outside office hours. A restart of the container seems to have fixed it this time.
|
I experience the same issue (on various environments). Looks like restart of prometheus helps, but it's not clear how big the time jump must be to trigger the problem. |
I'm hopeful the linked PR #8601 will fix this. (Note if you want to test it, you likely need to suspend for >10 minutes). |
@dgl Thanks |
This will be released in the Prometheus 2.26 series. A workaround if you run an earlier version is to use the |
I have changed my mind and released v2.25.2 with this fix. Thanks all! |
I'm on 2.26.0 and got that issue on Windows 10 after updating time.
|
if you shift time backwards this is expected. There is no reasonable way we can deal with that I think. |
Yes, it was a backward change (BIOS update for some reason kicked the time 4h ahead). After passing the incorrectly set time data started to come normally. |
What did you do?
After suspend the system and resume again, prometheus report following error and can not scrape any new metrics, unless restart the Prometheus service.
What did you expect to see?
promtheus should continue to scrape new metrics.
What did you see instead? Under which circumstances?
check the log above. and in the webui, i got following
Environment
The text was updated successfully, but these errors were encountered: