-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
einhorn should monitor errors on upgrade #2
Comments
@gdb: At least from a monitoring perspective, what do you think about adding a last_upgraded element to einhorn's state? That way you can have a monitoring script that warns if any child isn't the current version once Doesn't solve the underlying problem of getting in this stuck state, but at least gives you an easy way to automatically detect when you should be worried about it. |
Yeah, I like the idea last_upgraded. I think adding some sort of flapping detection would also be good, though it's unclear to me exactly what the semantics should be and how it should be reported. I guess you could do something along the lines of watching for workers dying pre-ACK and then putting the result in your state. Depends on exactly what we want to defend against, I guess. |
What's the right way for me to get the state in a machine-inspectable way over the control socket? The 'state' command gives me a string from .inspect - am I expected to..eval that or something? (That would sketch me out, but it doesn't matter because it also doesn't seem to work because, e.g., Time objects aren't eval'able) |
I can see a few possible answers:
I think the third one is superior, though it has the disadvantage of not exposing type information (e.g. symbol keys vs string keys) as well as .inspect does. That's mostly useful for debugging, and so maybe there should be an inspect_state command or something. |
Hmm, JSON on its own doesn't seem to be sufficient here. In
We could pre- and post-process the state dict to coerce timestamps to I think I'd lean towards using YAML, and probably hide it under a |
I've actually been meaning to get rid of JSON from the Einhorn protocol. It's the only gem dependency. If YAML allows arbitrary code execution, I'd rather not use it. If however it's safe, then we could switch to it. |
…ails Helps with #2, though it doesn't do much besides stop trying to spin up tons of processes.
This makes it possible to write external monitoring (cf #2)
Ran into a problem when upgrading its children process to a new version, if the new children error out immediately (due to a bug). Because they error pre-ack, the old children stick around and handle requests (yay!)
However, einhorn continuously tries to start the new children. It should possibly back off, and almost certainly notify somehow. If I hadn't happened to be doing maintenance on the server in question, who knows how long we would have gone without noticing.
The text was updated successfully, but these errors were encountered: