I have a tricky bug that currently is making it a nightmare to maintain a site we launched a month or two ago. We did several tests with multiple users simulating a real environment, but didn't catch this.
The problem is that after some time the now server stops answering and needs to be restarted, a way of seeing this is doing a curl on the now.js file:
curl: (56) Recv failure: Connection reset by peer
Earlier I got
curl: (55) Send failure: Broken pipe
It holds data about our users a link between their connection to new and their user profile in a CMS.
It provides some util functions to add and remove users from the list when they connect/disconnect. We use setTimeout to handle the disconnect that is cleared if the user connects within a set time limit. The setTimeout ids is also stored on this object.
After usage for some hours by about 25-100 users problems start to arise. It differs a lot how long time is needed to trigger this problem that deadlocks the server. Restarting the server only seem to partly solve the problem, as it usually more quickly will fall back into deadlock state. If we however replace the mentioned object (we have a function for that) the server seems to stabilize completely. The site open and closes the now.js functionality which is used to manage a help desk and open time is for about 4 hours. If the site admins clears the server data minutes before opening, it can sometimes last the 4 hour open time, but not always.
We have checked the server to see if memory leeks could cause the problem, but we're not seeing any spikes or anything else alarming.
I know this is not something you can just reproduce and tell me how to fix, I've tried many things to try to figure out what could be wrong. It's possible we messed up ourselves or maybe we have found a rare edge case.
Any hints, ideas suggestions would be most welcome.
This is a serious issue and we're looking into it now.
Much appreciated, let me know if you need more detail.