-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconnect if there's a DRB error in IPC #39
Conversation
Hi, But anyway, my guess is in case of a systematic error occurring during the writing operation on this file stream, this would result in an infinite loop? Did I have that right? Cheers from Hamburg |
Update: A simple job with a long runtime does not seem to trigger the broken pipe error. |
Thanks for the update and testing it! We're wrapping up work on a new release that will replace the current IPC system at the moment, so I'm happy that we have a non-invasive fix :-). Do I understand correctly that this fix does work for you? |
No. Not yet. The drbconnerror happens regularly on production, with a job handling a lot of data and running for a long time. |
I think the fix should work since the listener lives on the filesystem and the reconnect is a full reconnect. Let's see :-). |
Hey thijs,
I try to figure out what is wrong with socket peer - but it seems any write into the stream from the Appsignal::IPC::Client side fails with Broken pipe. |
And a bit more knowledge:
24 and 21 were the ones used by the problem jobs that failed with broken pipe. And we could verify that other jobs afterwards were processed by the same workers on the same sockets (24, 21) successfully. Mmh not much better, I have to dig into the two jobs that cause problem again. Perhaps something with serializing the messages into the socket stream buffer. And again investigating. Any ideas from your side? Cheers |
I'm starting to think your issue might not be a disconnection, but something that cannot be serialized by DRB. Are you using any hashes with default options for example? |
Moin, As to your hint with Hash with default options: I'll give it a try with a job that queries a lot and see whether I can find out whether it depends on the pure size of the transaction object. Cheers |
Hey,
As we don't do anything special in the job, we are quite certain it's just the size of transaction object that keeps track of all queries done in the job - too large for the uds stream?! |
I'm going to run some tests on this in the morning, will get back to you then. |
I've been doing some testing locally and haven't hit any size limits so far. Still not exactly sure what's going wrong in this situation. We're getting close to a beta of the new gem version which completely overhauls this. Are you interested in participating in the beta? If so I'll ping you via e-mail with details. |
Hi, mmh strange. |
You're welcome. I'll be in touch. |
No description provided.