New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved handling of unexpected disconnects #228
Conversation
Interestingly enough, a "WebSocket error, and a whichServer error" are both 'wserror'. If they both cause the bot to disconnect from the room, why change it? Event documentation doesn't say anything about how you should name your messages, that's up to you. However, once you add the proxy thing into the mix, I kind of think 'error' and 'wserror' are not descriptive enough anymore. Now to make it intuitive you should probably emit a 'disconnected' message instead for all three cases. But if you do that, does it make sense to emit that from the .onError() function anymore? I don't think so. If you know certain errors cause a disconnect, then we should probably just log the error, and call .onDisconnected() which emits a 'disconnected' message. |
I see zero value in propagating the number of events we emit that all mean the same thing: "you're not talking to Turntable, pal." If you truly care about the precise reason for this you simply inspect the Error object and behave appropriately. Thanks for pulling out the onError() straw man. Why didn't you call it onWsError()? Does it make sense to emit a wserror event as a result of handling an onerror callback, which itself had also seperately emitted a wserror event? (This is in fact all nonsense, and I have no idea why we're using callbacks instead of handling the events that the WebSocket object emits. It's certainly not a performance penalty, no matter what your intuition might be telling you.) I'd love to hear what Alain thinks about this change, in any event. |
"If there is no listener for it, then the default action is to print a stack trace and exit the program." Bad idea... as soon as you start emitting 'error' you are going to crash everyone's bots that are not "handling" this message, which they won't be because it's not obvious to do. You'd have to update ALL of the examples to add this I named it onError() to mimic the way the websocket handlers were done, just like Alain did... but then I felt it was more appropriate to call it exactly what it was at the bot level... a websocket error 'wserror'. |
Of course it'll crash. That's the point. Do you somehow not understand that's already the lazy coder's auto-reconnect? I don't know anybody that writes perfect code that never crashes. I know a lot of people using a variety of methods to keep their bots online, regardless. This is hardly as bad as you make it out to be, and in fact is more than likely to be a boon to many. |
console.log("At this point you're acting counter-productively, pal...");
process.exit(1); //ragequit |
First, I'm sorry. My frustration is no excuse for implying that you're stupid. I was wrong. I would appreciate it if you would assume that this is a very carefully considered and tested proposal, because that's exactly what it is. I've been working on it on and off since a couple of weeks before you submitted your pull request. I get that you don't like this change - and that's okay, really - but this should be an opportunity for a thoughtful discussion of the merits. |
So, this brings up an interesting point. @DubbyTT feels like we should be catching errors and then handling them appropriately within the framework, but so does @gizmotronic, in a different sense, from multiple angles. I don't know if this is really what we should be doing, honestly. If the script disconnects in any way, and throws an error, this is obviously the right thing to do. If the person using the framework is not handling it, let it error out. In most cases, people's scripts are useless when not connected anyways. Now, if we are arguing that we should auto-reconnect the bot on this error, then that's a discussion that can have many outcomes. Should we really be handling this on our end? Different bot creators may want their bots to do different things when their bot is disconnected. What if the bot is permanently banned from turntable.fm? It will be continuous loop of connecting and disconnecting. We don't want this either, and possibly some logic could be implemented to prevent this. Either way, the best part about this framework is it's simplicity in just redirecting WebSocket events through an easy-to-follow API so people can do with the calls what they want. The more we try and add to this the less versatile it becomes. The only real thing I would suggest doing is implementing functions that bot creators CAN use if they feel like it, but not make it mandatory. Discuss. |
I think you have it right when you say that the best part is that the framework just redirects (or maybe "reflects") WebSocket events through the API. That's quite explicitly my goal in emitting events to expose the error conditions in whichServer. The fail-safe timeout is likewise designed to detect an unusual, but fatal, condition and emit an event when that happens. Connection problems are a special case, in general. It's one thing for most of the API to fail and return that result via the callback. It's another thing entirely when the connection drops, since the bot won't do anything meaningful after that. This is why I advocate throwing an 'error' event so that either the bot stops running - it's dead anyway - or the bot creator handles (and preferably responds to) the event. I actually did spend a bunch of time doing reconnects right in the library. For all of the reasons you mention, as well as the added complexity of a mechanism to ensure that there's only one connection attempt at a time (this turns out to be a bit trickier than it seems), I think you've got it exactly right. The bot creator should be given the control. |
So I just took a look myself at your changes. Unfortunately, you don't have any backwards compatibility in place. Anyone who might be using the I think your changes would be OK if you included schemes for both wserror and error alike. @DubbyTT said basically the same thing. |
I was hoping to avoid this given that it's a very recent change, but point taken. Thanks for looking at it. |
Understandable, but it is a pretty big deal 😄 |
standard error event. Inspect the provided Error object if it's important to know which condition caused the error.
event, and we also haven't heard from the server in a couple of minutes. This can happen in cases where a malfunctioning or misconfigured application-aware proxy/firewall sits between the client and server.
Because this does change the API in a way that's not backwards compatible, and because there is no way to bridge the gap in the interim, this change should not be merged as part of a point release. If you agree that emitting an 'error' event is more correct (which I readily admit is a primary point of contention), then there will be a flag day situation if and when that happens. It would be better to have it sooner than later in that case. This minimizes the number of people who would be affected. |
Seems fine to me. Unless people pull this copy from here and copy it to there npm directory, it won't get in there until Alain publishes to npm. So, I'm fine with it: the throwing of error instead of wserror, that is. |
so the only real difference is that error will now be thrown instead of wserror? does this remove the 'alive' event as well? I personally am against the bot trying to do anything other than give the developer events and functions to work with. like in my bot personally i try to pair having the bot detect whether its physically in the correct room or not with the wserror event (if the bot gets kicked to the lobby or something its still technically connected to turntable). i need to know when that event is thrown because without an internet connection my other method of detection does nothing. and i don't wanna have clashes going on between them etc. |
i'm also not a fan of it stopping trying to reconnect after a while. that should be optional if implemented. what if someone's bot get's kicked from the room twice and is banned for 15 minutes or something but they themselves are not the runner of their bot? if its allowed to try to reconnect the bot will be back in 15 minutes... otherwise you'll have to wait till you can get in contact with the bot runner etc. |
The error event would be emitted instead of the wserror event, but for connection errors in addition to just the WebSocket errors. For example:
I did not remove the alive event. I don't think it's strictly necessary, but there's also nothing wrong with it, and I don't see any reason to remove it. Reconnect attempts aren't done within the framework. It's up to the bot creator to handle disconnections and take action. |
@Izzmo @samuri51 @alaingilbert @gizmotronic I'm going to outline this again so everyone knows what's good and bad about these changes:
|
So, at its core, this is the disagreement (I think):
My reasoning is simple. Many people are already using external mechanisms to restart their bot when it crashes unexpectedly. Throwing an error means that these bots would reconnect without any additional work. |
I would argue that most people that run turntable bots do not use process monitors. It is not an intuitive thing to do. When I got one up and running, the process monitor itself would crash. If we have a mechanism for reconnecting the bot (which we do), let's promote that! Not let people's bots crash and have them wonder what went wrong. Will they even come here to ask how to fix it? Do we want them coming here and asking "why is my bot crashing?" When you restart the process in that way to reconnect the bot, the state of it's memory is reset. There is no telling what affect that has on the bot. For example, say you have a queue loaded... but no way to reload it on boot. If you just reconnect, everything is a-ok still running. |
I wonder how people handle unexpected crashes currently. At least some hosting providers give you a process monitor whether you want it or not. But, even if you run it yourself it's simple. When I ran my bot at home I used a batch file that looped unless it got an exit code that meant "no rly, just shut down". Am I weird for thinking that this is not a hard problem to solve? |
I will just say this: NPM packages should never crash. (npm says it right on their website when publishing and making packages.) With that in mind, then we should not ever throw an unhandled exception, and therefor should not stray from using the |
I'll be honest I haven't read the whole issue. When I was an inexperienced bot dev, if my bot crashed because of something in the API, I would have submitted that as a bug. I agree with @Izzmo, NPM packages should not cause a script to crash. They should be sent back to the script for the script to handle if the developer feels it's needed. Otherwise it should be able to be safely ignored and not affect the script as a whole. |
I'm not sure where that rule came from. Packages like redis, which is fundamental to a lot of applications that use a key-value store, emit error events that will cause a crash if left unhandled. [edited] If redis doesn't mean anything to you, how about the mysql package? or mongodb? These do the same thing. |
Well, let's be clear, there is a difference between throwing an exception and gracefully handling it and then throwing an exception because the application cannot move forward and it must crash. I assume this is what you are talking about when speaking on REDIS and other large frameworks. |
The redis client emits an error event when there is a connection error. It does not throw exceptions. You are able to intercept the error event and recover in any way you see fit. Similarly, this pull request includes code to emit an error event when there is a connection error of any kind. It does not throw exceptions. You are able to intercept the error event and recover in any way you see fit. I have probably been guilty of saying "throws an error" when I meant "emits an error event" a few times. They are not the same thing and I have been trying to be careful to say exactly what I mean. I'll say it again: an error will be thrown only if the bot creator doesn't include an error event handler. In case you're wondering, by the way, the appropriate response to a truly unrecoverable error is to either throw an exception or to simply exit. There's no point in emitting an error event if you can't recover from the condition. |
Whether some other package do it that way or NPM says you shouldn't, think about the impact of doing it here... It's not intuitive to most coders that use this API ... even amongst us that contribute to the TTAPI, let alone the likes of many inexperienced javascript coders. And we have a way to handle the disconnects... let's call it what it is and not confuse anyone further. It's not an error, it's a disconnection. When we call it what it is, it's easy to talk about and explain and most importantly it doesn't automatically throw an exception and crash your app if you do not handle it. |
Please also look at the first reply of this thread, we keep saying the same thing over and over... Let's bring it to a vote:
I vote OPTION 2. |
What you say my code does is not what my code does. I'm sorry if this offends you but I'm not concerned with how inexperienced coders might view this. Do it the right way and give experience to inexperienced coders. We could easily update the docs to say "handle this event". That's already the case for the ready event, though it wasn't even documented until yesterday. Redis is hardly the only example of how to do it the right way. Spend 30 minutes looking for examples and you'll come up with dozens of them. It is unwise to create our own way to tell the user about errors when other well-known frameworks have already established one. Ask yourself what happens with a TTAPI-based bot when there's any sort of interruption right now. If it doesn't handle the wserror event, what does it do? Nothing. At all. It just sits there until you kill it. Please don't complain about having to handle an event. You currently have to handle an event to do anything meaningful, anyway. At least in the case of the error event something happens that you can throw into Google to search for. |
And I think you mean to say it is an I've been calling it a message, because that's how I think of it and being transmitted from the TTAPI to your bot code. You are correct that it is an It still crashes the bot though if you don't provide a |
I'm looking at the whole topic right now... |
@DubbyTT, I'd rather you just say "I don't like this change" instead of making appeals to emotion. @alaingilbert, it's a special case of the events framework. If you don't handle the error event, an exception is thrown. I deliberately chose this because
|
@alaingilbert http://nodejs.org/api/events.html#events_class_events_eventemitter
@gizmotronic not understanding you again. I don't like this change because of the way it functions. |
Honestly, I think I prefer when the bot crash. |
FYI, to get the error message:
The |
@alaingilbert the reason I don't like the bot to simply crash is because we have a way to prevent that, the autoreconnect.js example shows this. If we call the error "error", then you have to drill into the messages and try to do some matching crap and if it matches one of three different error messages then you would try to reconnect. If we call it 'disconnected' and log the message regardless of the debug flag, it would be the best case... you get a nice message telling you the bot disconnected. If you come to the TTAPI docs and look for that, you'll see that you should be handling the .on('disconnected', function(data) { } ); event, with a nice example to go with it. If some time in the future you start collecting things that really are errors we can't do anything about, then go ahead and use the 'error' event for that ... but right now we are talking about three things that are all "disconnected" events that we know how to handle... so why would you want to crash the bot for that? |
@DubbyTT, I am proposing emitting an error event only in the case where there's a connection error. This is a common convention. You do not need to drill into it unless you want to. The intent is that the appropriate response to an error event would be to reconnect. It's misleading to call it "disconnected" because at least in some cases where you'd want to retry the connection, you may have never been connected in the first place (specifically errors on the HTTP request in whichServer). But, as I've said before, the end result is the same: it's a reason to retry the connection. |
I would rather have option 2. The reason? I can handle disconnects separately from other errors. Unless, the error event tells me it is a disconnect/not connected/banned error. Then I don't give a fuck. |
You will get a |
Wether or not you were ever connected, if you are not connected... 'disconnected' is a state... and it implies NOT CONNECTED :) It seems almost wreckless to bottle up all connection style errors into the 'error' event. How exactly is it implied that you have a connection error? |
I think (I'm really unsure) that the following code would be best: @emit 'error', new TurntableError 'Error parsing server response'
@emit 'error', new HttpError "whichServer error: #{e}"
@emit 'error', new TTApiError 'No response from server' So it would be possible to make a distinction between each errors. bot.on('error', function(err) {
if (err instanceof TTApiError) {
...
}
}); We could also make a distinct error for each ttapi errors. ( I am pretty sure that it is a good thing for the bot to crash when it lost the connection for whatever reason. @DubbyTT The example you made to reconnect the bot is cool, but we shouldn't force everyone to implement it. Your thoughts ? |
All of those errors are going to result in disconnections... so to get the bot reconnected you need to do something like this: bot.on('error', function(err) {
if (err instanceof TurntableError || err instanceof HttpError || err instanceof TTApiError) {
// run your reconnect code
}
}); as the TTAPI gets updated and new reasons for disconnects are found, more errors will be added each time requiring the bot creators to figure out what the new errors are. My proposal is still to bot.on('disconnected', function(data) {
// run your reconnect code
}); if the TTAPI adds new reasons to emit 'disconnected' events, everyone's bot code is futureproof. Process sitting idle is not actually a bad thing in my opinion... unless you have run your process with a way to pause it on crashing, the window will close (on windows) and you won't even know why. And if you log that the bot was 'DISCONNECTED FROM TURNTABLE' it will be pretty apparent what happened. We don't have to force anyone to implement my autoreconnect code, only suggest it somewhere in the documentation. Emitting 'error' forces everyone to implement the .on('error', function(data) { } ); handler though... if they don't want their bot crashing unexpectedly. |
«Emitting 'error' forces everyone to implement the .on('error', function(data) { } ); handler though... if they don't want their bot crashing unexpectedly.» But not doing it, means everyone's bot are running for nothing. (idle forever) |
@alaingilbert yes, custom error objects would be a huge further improvement. I had originally done something like this but rejected it because the only way I see to do it involves adding and exporting new classes. (The TTApiError object would have to exist in the user's namespace, too.) I can resurrect my changes if you like, and push them into this request. |
Assuming you have forgotten that you have a bot running, up in the cloud let's say, most cloud based nodejs websites will terminate your process if it is idle for a period of time. If you are actually USING your bot, it's going to be apparent pretty quickly that it's not running anymore, and you'll go check and see a log message that it was "DISCONNECTED FROM TURNTABLE". Even if you implement the 'error' listenter in your bot code, the bot can hang idle if you don't stay up to date with what all of the errors are that you should be trying to reconnect on. One way to do the 'disconnected' event AND make the bot crash ... is add another test case to that inactivity timer that says if the inactivity is greater than 6 hours (give your internet, power or turntable some time to come back online), then you can Edit: That's all I have to say about this... have fun with whatever you come up with... |
@DubbyTT Let say you run 20-30 bots at the same time (ttdashboard / ttstats...) You can't know for sure that they are working or not. |
Okay... let's all take a step back for a minute. Get your heads out of the details of HOW it works and look at HOW a beginner might come into this.
Is self-documenting. When looking through the API documentation and I see that, I will make a note to myself to look into that more. It would appear like I would need that if my bot disconnections for some reason. Now let's look at the
When I glance at that, I think that is handling various exceptions that might occur out of the API's control. It might be disconnects, but more so I would NOT be think about disconnects. In fact, I would assume that is handled automatically and not add it in my bot... if I wasn't reading the API docs in detail. I know all about digging into the details of how it works, but sometimes you need to step back and look at it from another perspective and think about usability. I don't have a bone in this other than being able to detect when a bot of mine is offline and bring it back online easily. That is the ultimate goal right? |
I completely agree with you @MikeWills . Except that every error received make the reconnection needed. bot.on('disconnected', function(data) {
// run your reconnect code
}); vs bot.on('error', function(data) {
// run your reconnect code
}); |
@alaingilbert Actually, @Izzmo probably does know when his bots are offline since he's written a master bot monitoring process and modified the TTAPI just like I did and @yayramen did. I still don't want my monitor process to constantly be restarting the bot when there is a way to handle it properly and gracefully. @alaingilbert you just emit 'disconnected' when you know you've done something to disconnect the bot... websocket error, whichServer and transparent proxy issue. Other than that, there's really no other errors that the TTAPI has right now.. besides maybe that bad roomid error, which is already handled in place. |
@DubbyTT Please stop complaining about the problems you can have with other members of the community. And keep your comments constructifs. That being said, the way we throw an error when the roomid is invalid is not good either. |
All types of error that might be considered a connection error emit a "disconnected" event, instead. If there are no listeners for this event an error will be throw.
event instead of throwing an error. Per discussion with @alaingilbert.
Improved handling of unexpected disconnects
I should pay more attention to github. I'm not handling errors at all, really. Woo! And my wrapper reboots my bot if they're disconnected from TT...but disconnected seems like it could come in handy. To the code! |
This change builds on @DubbyTT's previous work, adding an event trigger for the case where Turntable is down (responding with an 302 error instead of JSON), for a name lookup or other connection issue in whichServer, and for the case where a firewall/proxy issue causes the WebSocket to remain up, but we are nevertheless not seeing valid server responses.
This change also renames the "wserror" event to "error" for two reasons:
(I'm deliberately not including a compiled version in this request. Please either use a CoffeeScript test harness or compile the source using the project Makefile.)