Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Socket.io 400 errors #5393

Closed
henrikekblad opened this issue Jan 23, 2017 · 5 comments

Comments

Projects
None yet
4 participants
@henrikekblad
Copy link

commented Jan 23, 2017

After spending a few days debugging a problem on our forum/nginx, I've finally found a solution to a massive amount of http 400 errors we've seen when loading the nodebb forum. These problems normally only manifest itself on high load.. In our case >500 connected clients.

In the browser console you will see responses like this (for the failed connections):
{"code":1,"message":"Session ID unknown"}

Background

  • When receiving a 4xx error, nginx proxy by default will take the errant upstream out of rotation for 10 seconds
  • When upstream-A is unavailable, ip_hash will route all of A's requests instead to upstream-B
  • Unfortunately, when upstream-B gets the new requests, it spits out 4xx errors (correctly) because the SID is not found in this.clients
  • That makes them get taken out of rotation as well, and their requests get routed to upstream-C
  • and so on...

The Solution

Set the max_fails on the upstream to something higher than default (1).

Example:

upstream io_nodes {
   ip_hash;
   server 127.0.0.1:4567 max_fails=50;
   server 127.0.0.1:4568 max_fails=50;
   server 127.0.0.1:4569 max_fails=50;
}

I suggest someone to update the NodeBB documentation, including this in the nginx examples.

@pitaj pitaj added the documentation label Jan 23, 2017

@pitaj pitaj changed the title Socket.io 400 errors (Documentation issue) Socket.io 400 errors Jan 23, 2017

@julianlam

This comment has been minimized.

Copy link
Member

commented Jan 24, 2017

😬 oh dear... this is a duplicate of #5295

Upstream issue: socketio/engine.io#458, so I hope you didn't waste your time 😦

But on the upside, now you really know how socket.io works! Perhaps there is a new socket.io version we can upgrade to to resolve this issue... on our instances, we use max_fails=0, which is a terrible terrible solution, since if a NodeBB goes down, nginx won't even bother to route requests to another NodeBB.

@julianlam

This comment has been minimized.

Copy link
Member

commented Jan 24, 2017

Setting a high max_fails is a band-aid, because in our specific case, we had 1000+ connections over 4 servers (8 NodeBBs in total). Despite setting a max_fails of 10, there eventually came a case where all of the NodeBBs would send back that 400 error at roughly the same time (+/- 10 seconds), causing nginx to report no available upstreams, which is in and of itself quite annoying.

@pitaj

This comment has been minimized.

Copy link
Contributor

commented Jan 24, 2017

Yep, looks like socket.io@1.8.2 or socket.io@2.0.0 would fix this issue.

@henrikekblad

This comment has been minimized.

Copy link
Author

commented Jan 24, 2017

Great!

I suggest we bump socket.io version ASAP.

barisusakli added a commit that referenced this issue Jan 24, 2017

@henrikekblad

This comment has been minimized.

Copy link
Author

commented Jan 24, 2017

I'm running this in production now with good results!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.