-
Notifications
You must be signed in to change notification settings - Fork 647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unix Domain Socket: peer-peer connections appear to hang #1349
Comments
Thanks for the report and for the reproducer. Here's what I see when running on OS X (what's your platform?): Server Just one of these:
Client Lots of these:
|
Sorry, I should have specified the platform. It is macOS 10.13.6. Yes, I found the behaviour varies a bit from run to run. For me, the client usually stops receiving after a number of "HELLO WORLD <n>" from the server. The server keeps on receiving and printing "hello world <n>". But I have also seen the reverse as you report. For my latest run for example, I get up to
on the client, and then it stops. The server keeps going -- when I stopped it it was at
But always one side or the other seems to stop. (sorry for the inconsistent output format between client and server) |
Thanks for the additional info. I’m in the middle of a prod rollout this week and next, so time will be tight. However, I’ll look at it ASAP. My first move will be to create a unit test that reproduces yours, so if you feel like raising a PR for that then it’d save me some time. No worries if you can’t though. |
Thanks so much! No rush for this on my side. Time is short for me next couple of weeks but I will try to produce a unit test PR if I get cycles before you do. |
Again, sorry for the delay. I will take a look at this now. |
Thanks Christopher! I also have not been able to focus on it either, so no worries.
Thanks for your support!
Doug
… On Feb 25, 2019, at 18:16, Christopher Hunt ***@***.***> wrote:
Again, sorry for the delay. I will take a look at this now.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <#1349 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFkxAadbpQxZ5iySSKMZr2OwxmbXLaNjks5vRJkbgaJpZM4Y1tDi>.
|
Hey Doug - I'm unsure if this will help, but I've finally been able to get a long-outstanding PR complete: #1297. I'm now going to try it with some code I have, but perhaps you could build it locally and take it for a spin with your test case. I'm not convinced that it'll fix things, but there's an outside chance. |
UPDATE: I just tried your sample project @djm1329 - it seems to work for me... and I didn't update anything. :-( It'd be good to learn of your results. |
Hey Christopher,
I tried with #1297 <#1297> … it still fails for me: one side or the other stops receiving messages after some variable length of time, usually within 5s or so.
Thanks,
Doug
… On Feb 25, 2019, at 23:50, Christopher Hunt ***@***.***> wrote:
UPDATE: I just tried your sample project @djm1329 <https://github.com/djm1329> - it seems to work for me... and I didn't update anything. :-( It'd be good to learn of your results.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#1349 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFkxAR4wkwILaik0ddce7zef4jn9j6JRks5vROdTgaJpZM4Y1tDi>.
|
Ok, I’ll dig in more. Sounds like a race condition. |
Actually, I did manage to reproduce locally - I just didn't realise what the output should have been in your examples. :-) I've spent another day on this and ended up with another commit to that PR: b0b1617. However, I don't think I've been able to fix the issue here. You'll see that there's a new test that is now ignored. This test is designed to reproduce the condition you see and it does indeed fail by continuing forever. The crux of this issue is that the code wants to write some data and registers interest in doing so with NIO. However, NIO doesn't appear to honour this request and never permits the writing of the data. My next move is to abandon JNR in favour of providing my own JNA implementation that calls out to LibC - in the same way that Nailgun does: https://github.com/facebook/nailgun/blob/master/nailgun-server/src/main/java/com/facebook/nailgun/NGUnixDomainSocketLibrary.java. I'd also have to abstract Naturally, there could well be more bugs associated with the code I have here. But I've been through this code with a fine-tooth-comb quite a few times now, and so I'm increasingly suspicious of the code that we depend on. Also, unfortunately, I've no more time to spend on this project during the week and so it will be best-effort in my own time. That said, it bothers me that the code doesn't work fully as advertised so I'm motivated to fix it. :-) |
Great analysis. We really appreciate the work you put into this! |
With #1297 merged and 1.0-RC1 released, is this still an issue? |
I’d say it is still a problem. My view is to replace JNR. |
I have sample code for a peer-peer app that sends messages once every 10ms in both directions. With TCP the code runs "forever", but with domain socket, one side (usually the "client") almost always stops receiving after a short while. The reproducer code is in https://github.com/djm1329/AkkaStreamSocketTests .
The sample is run by starting a server, then a client. The client connects to the server over a domain socket, then both sides start sending messages to each other. After a short while, one side (usually the client) apparently stops receiving messages. Replacing the domain socket with a TCP socket (details on this are in the repo) seems to work fine.
The text was updated successfully, but these errors were encountered: