-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sending a message right after handshake #21
Comments
Hi @madsheep , Thanks for opening this issue 🙏🏻 It's intriguing and I wish to understand it more before I try to reproduce it. If you have an example ready up your sleeve, that would be great for tracking the issue down. For example...:
As for your guess and question:
Unless you wrote something explicit, than this shouldn't be the case. The C level uses locks to set the protocol object practically atomically. The Ruby bridge doesn't set any callbacks, it sets a whole object as a callback object (the object implements the callbacks)... so there's no "in between" time. It could be (in theory) that the HTTP protocol object is receiving the first message. But this shouldn't be the case unless the first message was sent before the handshake was complete.
There's a theoretical race condition between the I stress tested it using the websocket shootout client and at the moment it's solved by using the HTTP protocol's lock to call the The lock prevents new network "read" events from being processed, so the message event should be waiting until the "upgrade" task is complete (including the |
I tried reproducing the issue with no success. Could you send me example client code as well... I'm not sure it's a server-side issue. I've been using the following JavaScript edited to add some concurrency (the test might take a long while, depending on the client you use, mostly because client concurrency is limited): var count = 0;
var max_count = 65536;
// change this to test pub/sub at: ws://localhost:3000/
var ws_uri = "ws://localhost:3000/no_pubsub";
var fresh = true;
function attempt_small_msg() {
count++
ws = new WebSocket(ws_uri);
ws.attmpt_num = count;
ws.onmessage = function(e) {
if(!e.target.timeout)
console.log("SUCCESS for attempt " + e.target.attmpt_num + " (misreported).");
else
clearTimeout(e.target.timeout);
e.target.close();
if(count < max_count) { attempt_small_msg(); }
else if(fresh) { fresh = false; console.log("Couldn't reproduce issue."); }
};
ws.onopen = function(e) {
e.target.timeout = setTimeout(function (ws) {
fresh = false;
ws.timeout = false;
console.log("attempt " + ws.attmpt_num + " FAILED to get initial message in time.");
}, 2000, e.target);
e.target.send("hi");
e.target.test_flag = true;
};
ws.onclose = function(e) {
if(!e.target.test_flag)
console.log("Attempt " + ws.attmpt_num + " failed to connnect(!)");
}
}
// go get coffee
count = 0;
for (var i = 0; i < 128; i++) {
attempt_small_msg();
} With the following Ruby code: require 'iodine'
# A simple router - Checks for Websocket Upgrade and answers HTTP.
module MyHTTPRouter
# This is the HTTP response object according to the Rack specification.
HTTP_RESPONSE = [200, { 'Content-Type' => 'text/html',
'Content-Length' => '32' },
['Please connect using websockets.']]
WS_RESPONSE = [0, {}, []]
# this is function will be called by the Rack server (iodine) for every request.
def self.call env
# check if this is an upgrade request.
if(env['upgrade.websocket?'.freeze])
if env['PATH_INFO'] == '/'.freeze
env['upgrade.websocket'.freeze] = WSHandler.new
else
env['upgrade.websocket'.freeze] = WSHandler2.new
end
return [0, {}, []]
return WS_RESPONSE
end
# simply return the RESPONSE object, no matter what request was received.
HTTP_RESPONSE
end
end
# A simple Websocket Callback Object.
class WSHandler
# seng a message to new clients.
def on_open
subscribe channel: "chat"
# let everyone know we arrived
# publish channel: "chat", message: "#{@name} entered the chat."
end
# send a message, letting the client know the server is suggunt down.
def on_shutdown
write "Server shutting down. Goodbye."
end
# perform the echo using a channel, so it fails if `on_open` didn't happen yet
def on_message data
publish channel: "chat", message: "#{@name}: #{data}"
end
def on_close
# let everyone know we left
publish channel: "chat", message: "#{@name} left the chat."
# we don't need to unsubscribe, subscriptions are cleared automatically once the connection is closed.
end
end
# A simple Websocket Callback Object.
class WSHandler2
# just echo.
def on_message data
write data
end
end
Iodine::Rack.app = MyHTTPRouter
Iodine.threads = 8
Iodine.start |
hi! Answering your questions quickly:
A colleague of mine (@nilclass) worked further on narrowing this down and found an easy way to reproduce this. He replaced the write call here: https://github.com/boazsegev/iodine/blob/master/examples/echo.ru#L44 with and started spamming with a client like so:
In our case, we do some initial setup in the websocket handler object in on_open (i.e. lookup database for relevant objects, do rabbitmq rpc calls here and there). We think that's what causing it. |
Btw - we also figured out a workaround - we moved the initial setup and heavy lifting to the initialize method of our handler - so now it happens before browser gets the handshake reply. |
@madsheep and @nilclass - Thanks for the example and for exposing the issue! I can replicate the issue coding in Ruby and using the I'm trying to figure this out, but it's an elusive one... I checked the flow and noticed that the So I tried replicating the issue in the C level (where the I'm sure I'm overlooking something super simple and stupid, but it might take me a while to figure out what that is 🙄 I'll keep you posted, |
Update: I've managed to recreate the issue using the C layer. The difference was the number of threads (Iodine has a dedicated IO thread that makes sure the socket buffer is cleared, but the C layer doesn't actually send the response until it returns from This is very good news. Now that I can replicate the issue, I have it's scent, I'm on the hunt, and this bug is going down! 👍🏻 |
I fixed it... I think. Could you install from Github to test? I want to make sure the next release solves your issue. |
HTTP/1.1 was running in a loop, except for pipelined requests, meaning that any incoming data that was received during processing (not pipelined) was processed within the loop. By having the `on_open` delay the review of the loop, the loop resumed after new data entered the buffer. This was the root cause and this is an actual fix and not just a stop-gag that masks the issue. This fix implements a `facil_force_event` for looping, so if the protocol was changed mid-stream, the new protocol will receive the new data.
P.S. I released the fixed versions (0.4.7) and I'm closing the issue. Thanks again for opening the issue 🙏🏻 If anything else pops up, please let me know 👍🏻 |
Hi @madsheep , I just wanted to drop a note to let you know I released an update to fix a minor logging issue that popped into the patch I implemented for this issue. When experimenting with the issue you exposed, I updated iodine's facil.io library to use the edge version... However, the edge version included a testing message for an experiment Date string caching feature. I just realized it today as I was working on refactoring the HTTP/1.x parser and reviewed the logs. I apologize for any inconvenience. I've released an update (v. 0.4.8) with both a fix, an updated HTTP parser and the Date caching feature (which you probably won't use, since Rack often writes a Date header. I hope everything works as expected after this update. Kindly, |
Hi @boazsegev, I've tested v0.4.8 just now, and it works like a charm! One tiny thing i want to mention, because i keep seeing it in the examples and in the code you added above:
one of those suffices 🙊 cheers! |
Hi @nilclass , Thanks for keeping me posted 🙏🏻 I'm very happy everything's working and you're enjoying iodine
😂😂😂 I guess it shows how highly skilled I am at Copy&Paste, that I didn't even notice I kept pasting this... 😂 Thanks 👍🏻 Have a good one! |
Hi again!
We have been struggling with a very weird issue recently (that's also something that triggered us to update iodine).
In our websocket use case, we send an initial message to the server right after the handshake is finished. The message is relatively small, something like
{"init": true}
(we ask server for some initial data).Every now and then we don't get anything back as if the message didn't make it at all. It happens more often if we are in the same location as the server (i.e. it's a local machine connection, or local network connection).
In our logs we also don't see anything happening with the
on_message
callback. If we repeat the message second later we get the reply back normally.Our educated guess (I don't know C well enough) was that the callbacks for messages are set after the websocket handshake is finished - and hence if we are very quick we can squeeze the initial message on the socket before there is anything listening for it.
Could you please take a look if that's the case? Or maybe there is some other race condition in there?
The text was updated successfully, but these errors were encountered: