Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Error: write EPIPE (crash version 1.2.4) #65

Closed
benjaminbarbe opened this Issue · 43 comments

5 participants

@benjaminbarbe

TLS: reading from clearOut
TLS: reading from clearOut
TLS: cleartext emit "data" with 6 bytes
apn Notification 0 caused an error: 8 +69ms
apn Raising error: +0ms 8 { }
errorCallback code:8 { }
apn Buffering 50 notifications +1ms
apn Destroying connection +0ms
TLS: reading from clearOut
TLS: encrypted._push
NET: destroy
NET: close
apn Restarting connection +1ms
apn Notification queue has 689 items, resending the first +0ms
apn Initialising connection +0ms

events.js:48
throw arguments[1]; // Unhandled 'error' event
^
Error: write EPIPE
at errnoException (net.js:670:11)
at Object.afterWrite [as oncomplete] (net.js:503:19)

Thanks

@argon
Owner

Sorry for the delay in getting back to you. Which version of node are you using? It seems like this is possibly a node problem. Possibly also a race condition, leaving this here for reference: http://stackoverflow.com/questions/12329816/error-write-epipe-when-piping-node-output-to-head

@benjaminbarbe

Hi,

Node.js v0.6.18... It moves so fast. I try the last. Thank you for the answer.

@benjaminbarbe

Same problem with Node.js v0.8.11

events.js:68
throw arguments[1]; // Unhandled 'error' event
^
Error: write EPIPE
at errnoException (net.js:769:11)
at Object.afterWrite (net.js:593:19)

@argon
Owner

Have you got a set of steps I can use to reproduce the error? Without that I'm not sure there's much I can do because it's quite a generic message.

@benjaminbarbe

It often happens after an 255 error code :

apn Sending notification from buffer +0ms
apn Sending notification +0ms
apn Clearing notification 6 from the cache +0ms
apn Notification 0 caused an error: 8 +22ms
apn Raising error: +0ms 255 null
apn Buffering 100 notifications +0ms
apn Destroying connection +0ms
apn Restarting connection +1ms
apn Notification queue has 1667 items, resending the first +0ms
apn Initialising connection +0ms

events.js:68
throw arguments[1]; // Unhandled 'error' event
^
Error: write EPIPE
at errnoException (net.js:769:11)
at Object.afterWrite (net.js:593:19)

I will increase the cacheLength value. But maybe it's a hint?

@davethehat

We're also seeing this intermittently - exactly the same symptoms and result. Any progress or new thoughts on this?

@argon
Owner

I have been thinking about this a lot but without a reliable case to reproduce the problem there isn't much I can do for the moment. I have some time next week when I might be able to make some progress.

@davethehat

Thanks! Actually, we're seeing it on some (but not all) error code 8s we're getting. We're not seeing 255 errors at all, which suggests that the cacheLength default of 100 is fine. I'm looking at it now, will let you know if I spot anything.

@argon
Owner
@benjaminbarbe

I confirm that it happens very often after an error occurred. But not all the time.

Thanks Andrew

@argon
Owner

I have spent an afternoon trying to debug this and was unable to reproduce the exact error. I did stumble across some similar problems which I have attempted to fix. If possible could you please try the latest version on the master branch and let me know if you see the problems still?

@davethehat

I'll do that - thanks

D

@argon
Owner

If you still experience the problem with the latest code in 'master' then try the code in 'develop' I have made some more changes relating to the socket handling. I'm not yet ready to move it to the master branch until some more people have tested it but it may solve the EPIPE error problem

@benjaminbarbe

I am trying the code in the 'master' branch. No problem since two days. Fingers crossed! Thanks :-)

@davethehat
@benjaminbarbe

Hi,

It seems to work ;-)

Thanks argon !

@davethehat
@d-a-n

I'm still seeing this error with the latest dev version. It happens all the time if I have a lot of wrong push tokens in my db.

{ [Error: write EPIPE] code: 'EPIPE', errno: 'EPIPE', syscall: 'write' }
null


APNS ERROR:
8
null
[TypeError: Cannot read property 'device' of null]

/home/dan/dev/node_modules/apn/lib/connection.js:339
                        var differentialSize = this.cachedNotifications[0]['_uid'] - notification['
                                                                          ^
TypeError: Cannot read property '_uid' of undefined
    at Connection.handleTransmissionError (/home/dan/dev/node_modules/apn/lib/connection.js:339:54)
    at CleartextStream.EventEmitter.emit (events.js:93:17)
    at CleartextStream.CryptoStream._push (tls.js:522:12)
    at SecurePair.cycle (tls.js:880:20)
    at EncryptedStream.CryptoStream.write (tls.js:267:13)
    at Socket.ondata (stream.js:38:26)
    at Socket.EventEmitter.emit (events.js:93:17)
    at TCP.onread (net.js:396:14)

@argon
Owner

This isn't the same problem. The problem this issue was linked again caused the library to crash because it was an uncaught error. The EPIPE you're seeing is caught and the socket is restarted automatically - the correct behaviour. The error is only there for your information. However, what is happening directly below is in fact a problem I have introduced with the new changes so I will try to fix that, thanks.

@d-a-n

Update:

@argon Thanks for the quick reply. You are right, the above error was an application error. The notification object of the error handler was null. I have fixed it by checking the object. But now I'm still seeing this error with the latest dev branch:

APNS ERROR:
Status code: 8
Notification: null

events.js:68
        throw arguments[1]; // Unhandled 'error' event
                       ^
Error: write EPIPE
    at errnoException (net.js:769:11)
    at Object.afterWrite (net.js:593:19)

If you need I can give you access to my server and a working sample of this error.

@argon
Owner

That would actually be really helpful as I have yet to recreate this problem in my own (very limited) test environment so I'm mostly guessing at present.

That error you posted with the "Cannot read property '_uid' of undefined" really shouldn't have happened either so I would like to look into that. Feel free to email me at the address on my profile and we can discuss further.

@soplwang

We have same problem...

After a bit hack, we find the point of this issue..., in restartConnection() function, there missing a call to force destroy the socket.

When the connection closed by Apple, node-apn will received 'end' event of the socket..., and trigger restartConnection() callback. In this function, this.socket.removeAllListeners() removed error event listener. According Node.js' document, half-closed socket will continue send buffered data. But, at this situation, Apple forced closed the connection before..., then EPIPE happens and, there has no error event listener and..., throw exception...

FIX:
Add this.socket.destroy(); below this.socket.removeAllListeners() line in restartConnection() function.

@soplwang soplwang referenced this issue from a commit in HoopCHINA/node-apn
@soplwang soplwang Fix `EPIPE` error of #65 b208439
@d-a-n

@soplwang This doesn't work for me. I'm still getting the same error.

@argon I have send you a mail with login data to a server. I have prepared a simple test that will reproduce this error every time. I hope this will help to fix this bug.

@argon
Owner

The problem was of course that the problem was happening upon connection, not disconnection. I have delivered a fix to the "develop" branch, please let me know if this fixes it.

@d-a-n thanks for letting me use the server, hopefully I've fixed it now.

@soplwang

@d-a-n You're right. I'm getting same error too.

Exactly, My first hack is add this.socket.on("error", function () {}); below this.socket.removeAllListeners() and seems fix the problem... and lately, I'm change this line to this.socket.destroy(); to try dispose socket object...

I've changed this back to this.socket.on("error", function () {}); and to re-observe on it...

@soplwang

@argon I reviewed connect(2) manpage, on connect there shouldn't have EPIPE error code. And, on connecting, first thing should be dns resolve (i.e. 'gateway.push.apple.com'), this is an async process. So in your 'develop' branch place on('error', ...) above connect() may be no help to solve this problem? ...

@soplwang soplwang referenced this issue from a commit in HoopCHINA/node-apn
@soplwang soplwang Re-Fix `EPIPE` error of #65 357022c
@d-a-n

@argon Thanks for the fix. Now node won't crash anymore but I'm still getting no notification. (My device is one of the last push token in my test array.) Another strange thing is, that the notification is null in the error callback.

APNS ERROR: 
8
null

Update:

I'm still getting the error sometimes:

APNS ERROR: 
8
{ encoding: 'utf8',
  payload: { aps: { badge: 1, sound: 'ping.aiff', alert: 'debug' } },
  expiry: 0,
  identifier: 0,
  alert: 'debug',
  badge: 1,
  sound: 'ping.aiff',
  newsstandAvailable: undefined,
  device: { token:  },
  _uid: 0 }

node.js:201
        throw e; // process.nextTick error, or 'error' event on first tick
              ^
Error: write EPIPE
    at errnoException (net.js:646:11)
    at Object.afterWrite [as oncomplete] (net.js:480:18)

@soplwang I don't get your proposed fix. How does overwriting the error handler with an empty function make any sense?

@argon
Owner

@soplwang I placed log statements throughout the connect code in the module and the error was happening before the tls.connect statement returned so your evaluation is incorrect. I believe that if the DNS Resolve is cached connection will proceed immediately, this is certainly the behaviour I was seeing.

@argon
Owner

@d-a-n does the error still cause node to crash or does it continue processing? The notification will be null if the notification which caused the error is no longer in the cache. You need to up your cacheLength (or enable automatic cache length adjustment) If the notification for your device is purged from the cache before an error comes through it wont get resent.

@d-a-n

@argon Node will crash when the EPIPE error occurs. Do you see the error on my test system?

@argon argon referenced this issue from a commit
@argon Removed erroneous #destroyConnection call
This is hopefully the cause of all the pain in #65
b1e3d6a
@argon
Owner

I think I have finally figured out the cause of the race condition, the code in 'develop' should fix it. I was looking in the wrong place earlier.

@soplwang

@d-a-n Overwrite error handler with empty function to suppress EventEmitter throw on 'error'..., that cause Node crash.

@argon I think your latest fix may cause the problem more serious..., when Apple reply error they'll close the connection..., then destroy the connection is more comfort.

Actually, our server still get EPIPE logs, but Node does not crash today after apply our fix. Let me observe more time.

@argon
Owner

The key point there is that Apple will close the connection. Calling destroy causes the 'close' event to be emitted immediately even if there are other socket errors queued, what then happens is the errors are emitted after the listeners have been removed. Adding a new 'error' handler after removing all listeners may work but it is exactly as you describe, a hack. It says in the node doc that destroy should only be called in the case of errors, which this situation is not, it is much safer to let the socket close cleanly rather than pre-empt what will happen and break things.

@soplwang

@argon If you don't destroy, 'end' event will still be emitted next tick and you still removed ALL listeners too quick...

Other hand, Node.js' 'socket.destroy()' scheduled 'close' event on next tick. So if there has error queued before, they should emit before 'close'...

@argon
Owner

But the errors will also be scheduled on next tick and they will be added to the queue after the 'close' so the problem still happens. The point being that node will by emit the errors before emitting a close, automatically, so the errors will be handled before the listeners are removed because things will be scheduled in the right order instead of the module trying to be clever and pre-empting the socket destruction. It says clearly in the documentation that destroy() should only be used in error situations. Receiving an error from Apple is not the same as a socket error occurring.

@soplwang

Ok. Try imagine, if we fill up the socket write buffer, at least one packet still wrote to Socket stream and buffered at userspace. If Apple disconnect at this point..., we got 'end' event and try restart connection and removed all listeners of old socket. But documents says Socket stream still try send this buffered message..., EPIPE happens and there has no 'error' listener...

@d-a-n

@argon Great, no more crashes so far but why is the notification null after some errors have occurred? I want to remove those bad device tokens, but if the notification object is null, I don't have an reference to this device.

Another problem is, that the push message won't get sent to those devices with an valid push token.

@argon
Owner
@d-a-n

After some more testing with current dev branch on the live system I'm still seeing a lot of this error:

{ [Error: write EPIPE] code: 'EPIPE', errno: 'EPIPE', syscall: 'write' }

And push notifications are not send to the devices.

@argon
Owner

It looks like the .removeAllListeners() call isn't necessary at all. I'm going to remove it in 1.3.0 as this will ensure the listeners are available to be fired and it shouldn't have a detrimental effect on the control flow of the module.

@argon
Owner

@d-a-n You're seeing null notifications returned because your cache isn't big enough so notifications are being removed before Apple has a chance to flag the error. This is documented in the README.md so I would encourage you take a look. There is also an autoAdjustCache option that will try to adjust the cache as necessary during the session to account for future errors.

@argon
Owner

This should be fixed in d4cb530

@argon argon closed this
@argon argon referenced this issue from a commit
@argon Fixed connection for 0.11 and older versions
Fixes #122.

node-0.11 changes how TLS connections handle sockets when connecting.

Discovered in #65, it's necessary to attach the error handler before the socket establishes a connection. In some cases the socket errors before the TLS#connect method returns, so the error handler hasn't been attached yet and an exception is raised.
68de3b9
@soplwang

@argon After one-year later on this issue, I think I should give some feedbacks.

My last hack on this issue, is make send 64 notifications per-tick and all troubles gone. The reason is, when EPIPE occurs, Node.js(0.8.x)' tls.write will always can write and never return error (and never buffer full, it has no chance...), then all queued notifications will be sent out and lost (after cache overflow).

When only send little notifications and schedule next send on next-loop, Node has chance to emit data to indicate Invalid token or emit error of EPIPE.

refs: https://github.com/HoopCHINA/node-notific/blob/master/lib/apns.js#L298

regards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.