-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vmagent: drop corrupted data blocks on start up #1030
Comments
Hi @jelmd! The number of bytes |
Hi @hagen1778, Well, the users on the related machines run experimental software which sometimes consumes all available memory (exactly to monitor this, I installed vmagent ;-)) and the machine needs to be rebooted, because no process can make any progress anymore. Not sure, what vmagent does in such situations, but at least I can say, that data series have gaps. The fs used is ZFS, thus should be always consistent because of x-action based functionality. |
Are these "normal" kind of reboots (with SIGTERM to all processes) or "power-off" kind of reboots? |
So more like hitting the reset button of the machine. |
Hm, while VM itself is resistant for |
Ah, ok. Would be nice to have as much data as possible to be able to analyze "frozen" boxes like this. But getting it to busy, is bad as well ;-). What is much more annoying is, that vmagent seems to never re-try unavailable prom scrape targets. E.g. if the machine gets rebooted and poor systemd starts vmagent before e.g. node-exporter, it seems to do nothing forever. So I need to restart vmagent manually to get any data again. But I guess, it is better to open another issue for it? |
Yes, please open another issue. Thanks! |
Hmm, it really, really, really sucks, that vmagent is not able to recover properly. It is of no use, if we have it to restart manually each time, after a reboot. Especially data collectors should be very robust to avoid loosing data. vmagent seems to be very fragile =:-((((
|
|
As a workaround, have you tried to add a startup script which will drop vmagent queue after reboot? |
Hmm, not sure, what you mean. IMHO good apps handle errors and do not just panic and put all the burden on the user, which actually (at least in my case) has no clue, whether it is really time run away and panic as well ... ;-) |
Agreed. We need to figure out how to handle the error properly. A temporary solution is to run |
The related issue - #687 |
|
There seems to be a problem wrt. handling cached data by vmagent v1.52.0:
The vmagent scrapes:
so 63720000 data pairs a day (~ 26 MB?).
The text was updated successfully, but these errors were encountered: