-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notifications not sent, event when NotificationSent delegate is called (requeue involved) #157
Comments
After a bit more testing - I am experiencing exactly the same thing as #85. After a while, no notifications are sent, and I also start getting MaxConnectionAttempts exceeded. |
You might want to keep a lid on the number of parallel processes being On Tue, Apr 16, 2013 at 9:04 PM, omardelrio notifications@github.comwrote:
|
I am having the same issues and using a single thread to queue the notifications. The exact same frequencies and settings don't cause any issues in the previous releases, if you keep having issues I suggest downgrading the PushSharp version. |
@omerkirk please tell me which release you are on currently, and which 'previous' release you are referring to by version, and I can see what the changes made between them to see if I can find a regression. |
@Redth The previous files we used are version 1.1.1.0, which seems to work great for us, but as soon as I upgrade the versions to 2.0.4.0 which is I think the final version on NuGet we start receiving a lot of MaxSendAttemptsReached or similar. I am not really sure about the name. Another stat that might help you is in the older version our ratio of errors to successful notifications was 1 to 1000 however on the latest release we nearly get a 1 to 1 ratio. We use a single thread to send our notifications so it can't be about parallel connections. |
The way we resolved this was simply restarting the whole service after 15 minutes of inactivity. The trick is to keep a "last notification sent timestamp" and restart everything after some time passes; an additional optimization keeps the service from starting if there is no notification pending (some lazy initialization going on there). I also removed additional parallelization to avoid possible synchronization issues until I have more time to create an appropriate test case that consistently reproduces the first issue. I believe my "real" issue has to do with the channel being closed by either APNS servers or intermediary networking equipment; PushSharp, which uses a standard socket, does not find out about this until a few notifications have been considered as "sent", even when they never reached Apple. It appears that the sweet spot for our case was 20 minutes, which seems to be consistent with other reports on this kind of issue. BTW - the TCP keep-alive option used in the connection does not help, since that only kicks in after 2 hours. |
We're having issues with this on 2.0.4 also - we were getting 80-90% of notifications from our AppHarbor worker fail with MaxSendAttemptsReached, although exactly the same code works fine locally. Regularly spinning up a new PushBroker seems to be a workaround. Looking through the code, it looks like the connection is not properly re-established if it has problems (suspect we are repeatedly hitting ApplePushChannel.cs#136). We'll add some logging to the channel and try to narrow down further. |
@samritchie did you ever find anything more interesting in your logging? |
Not much, because we're restarting the broker now we're not hitting the error. We did get a couple of logs out of it a few weeks ago - some IOExceptions: "Authentication failed because the remote party closed the transport stream" |
@samritchie Any stack traces you could provide? |
No, the logs have been purged since then, sorry. We'll switch back to a singleton lifecycle and wait for the issue to recur if that would be helpful. |
@samritchie if you can, that would be much appreciated, I'd love to get to the bottom of this! |
This is what we seem to be getting the most of at the moment: System.IO.IOException: Unable to write data to the transport connection: An established connection was aborted by the software in your host machine. ---> System.Net.Sockets.SocketException: An established connection was aborted by the software in your host machine This seems to happen repeatedly and requeue the notifications until there's a MaxSendAttemptsReachedException. As before, we're seeing a handful of notifications going through, but most are failing. There doesn't appear to be a pattern. |
I can confirm this issue is happening even with the latest-beta build 2.1.1, in my error log I have this error repeated most of the time [The maximum number of Send attempts to send the notification was reached!] and by repeated I'm talking about thousands,most of the users aren't getting any pushes on ios and android in my code I send in bulks for android [1K of tokens per push] and for ios I send one by one |
I can also confirm this issue has happened to us as well with the latest stable version (2.0.4) We pushed the latest beta to production and will see if anything resolves. |
I've seen similar issues, and had implemented a similar fix (i.e. tearing down and re-establishing the connection after a period of "no pushes") in my fork of the older version. I used 30 minutes in my implementation. I've since updated to the latest version of PushSharp in the hope that the keepalive/canwrite checking would fix the issue. I'd vote for adding a 20-30 minute idle check before sending a push, and just closing and re-opening the socket if it's been more than that time since the last push in the channel itself. It worked wonders in the old version... Actually, after poking around a bit, it may be possible to change the 2 hour default on the keepalives using an IOControl call....might be simpler to just bump the keepalive time down to 20 minutes and call it a day. I'm working through some other issues in APNS in my fork so I'd be willing to take a stab at rolling this logic back in as a part of that effort. I'm going to open a separate issue for my other fixes since they seem unrelated. |
I was thinking about implementing an external rate per second monitor and then indeed kill the connection and retry it, why would it make sense to be idled for a period like 20-30 minutes before pushes start going through? |
@aviromascanu I like the idea of implementing some sort of mechanism to kill idle connections. Apple requests that we keep connections open to them as long as possible, but in reality, killing an idle connection after say 5 minutes or so is not going to get anyone banned from APNS (trust me, I've done MUCH worse to no ill effect). I've created a new issue: #231 to track this. @bmatson if you want to add something to your fork I'd be more than happy to bring it in. Thanks guys. |
…nutes to compensate for crappy NAT devices.
…nutes to compensate for crappy NAT devices.
Added logging facade, additional log messages in APNS. Added debug level to allow for repetitive messages. Fixed re-entrancy issues on timers. Corrected potential infinite loop in Dispose(). Possible fix for PushSharp Redth#157, dropped keepalive time to 20 minutes to compensate for crappy NAT devices.
All, please try version 2.1.2-beta or newer. It is on NuGet as a Pre-release and it's also in master now on the repository. Feel free to re-open this if you continue to have issues! |
FWIW we've been running commit dc37756 in our hosting environment for 2 weeks with no problems - unsure if the underlying issue was fixed somewhere after 2.0.4, or we haven't had a recurrence of the network behaviour within AppHarbor. I've upgraded to 2.1.2-beta, I'll report back if I see the it happen again. |
@samritchie thanks for the feedback. I have a feeling 2.1.2 should work much better in general for APNS.. There's a number of improvements now, one of which is the ApplePushService will scale down the number of channels to 0 after being Idle for more than 5 minutes (this idle timeout is configurable in the ApplePushChannelSettings)... Lots of other improvements that should prevent notifications from being lost, etc. |
Setup:
Singleton of PushBroker configured through Autofac. All default settings.
From time to time, we get events for requeuing - and then notification sent. However, sometimes we also see channels being created and destroyed (but sometimes we never see that, only notifications being sent).
Is this expected behavior? We are not even stressing the broker and get what seems apparently low throughput.
For example, this one is a sequence for a single message (although the payload is built every time) sent to only three devices. The notification DID NOT get to the device (the one that was requeued).
All this is called through async methods (a web api async controller). I am wondering if this the right usage pattern. I included the timestamps in the messages since I thought they might be useful.
Here's the code sending the notifications:
And this is how it is being called:
Is this the expected usage pattern?
Any help or pointers would be very appreciated. I am using the latest version of the code.
The text was updated successfully, but these errors were encountered: