Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notifications not sent, event when NotificationSent delegate is called (requeue involved) #157

Closed
omardelrio opened this issue Apr 16, 2013 · 21 comments

Comments

@omardelrio
Copy link

Setup:
Singleton of PushBroker configured through Autofac. All default settings.

From time to time, we get events for requeuing - and then notification sent. However, sometimes we also see channels being created and destroyed (but sometimes we never see that, only notifications being sent).

Is this expected behavior? We are not even stressing the broker and get what seems apparently low throughput.

For example, this one is a sequence for a single message (although the payload is built every time) sent to only three devices. The notification DID NOT get to the device (the one that was requeued).

15:46:29.8495 DEBUG api.Core.Dispatcher.iosDispatcher Sending notification... 
15:46:34.2485 DEBUG api.Core.Dispatcher.iosDispatcher Sending notification... 
15:46:34.3265 TRACE api.Core.Dispatcher.iosDispatcher Sending notification to <<devicetoken>>, ringin,<<otherdata>> 
15:46:34.3265 TRACE api.Core.Dispatcher.iosDispatcher Sending notification to <<anotherdevicetoken>>, ding, <<otherdata>>
15:46:34.3265 TRACE api.Core.Dispatcher.iosDispatcher Sending notification to <<anothertoken>>, EmergencyNotification, <<otherdata>
15:46:34.5605 ERROR api.Core.Dispatcher.PushService A notification has requested to requeue - {"aps":{"alert":"New Notification","badge":0,"sound":"ding.mp3"},"MessageId":536884044,"Type":"Incident","Subscriber":"SomeName","SubscriberId":"Someid"}  
15:46:36.8068 WARN api.Core.Dispatcher.PushService Notification channel has been created: PushSharp.Apple.ApplePushService 
15:46:38.3512 INFO api.Core.Dispatcher.PushService Notification sent q:0  d:4/16/2013 15:46:34 PM c:PushSharp.Apple.ApplePushService - {"aps":{"alert":"New Notification","badge":0,"sound":"ringin.mp3"},"MessageId":536884044,"Type":"Incident","Subscriber":"SomeName","SubscriberId":"SomeId"} 
15:46:39.3651 INFO api.Core.Dispatcher.PushService Notification sent q:0  d:4/16/2013 15:46:34 PM c:PushSharp.Apple.ApplePushService - {"aps":{"alert":"New Notification","badge":0,"sound":"EmergencyNotification.mp3"},"MessageId":536884044,"Type":"Incident","Subscriber":"SomeName","SubscriberId":"SomeId"} 
15:46:39.3651 INFO api.Core.Dispatcher.PushService Notification sent q:1  d:4/16/2013 15:46:34 PM c:PushSharp.Apple.ApplePushService - {"aps":{"alert":"New Notification","badge":0,"sound":"ding.mp3"},"MessageId":536884044,"Type":"Incident","Subscriber":"SomeName","SubscriberId":"SomeId"} 
15:46:51.8133 INFO PushSharp.Core.Log PushSharp.Apple.ApplePushService -> Destroying Channel 

All this is called through async methods (a web api async controller). I am wondering if this the right usage pattern. I included the timestamps in the messages since I thought they might be useful.

Here's the code sending the notifications:

var payload = new AppleNotificationPayload
                              {
                                  Sound = string.IsNullOrWhiteSpace(deviceToken.Item2) ? "" : deviceToken.Item2 + ".mp3",
                                  Badge = 0,
                                  Alert = 
                                      new AppleNotificationAlert
                                          {
                                              Body = !string.IsNullOrWhiteSpace(message.Subject)
                                                        ? message.Subject.Substring(0, 
                                                            message.Subject.Length > 50 
                                                                ? 50 : message.Subject.Length) : "Subject Text"
                                          }
                              };

            payload.AddCustom("MessageId", new object[] {message.MessageId});
            payload.AddCustom("Type", new object[]{ message.Type.ToString()});
            payload.AddCustom("<<someproperty>>", new object[] { deviceToken.Item3});

            payload.AddCustom("<<anotherproperty>>", subs);

            if(_logger.IsTraceEnabled) _logger.Trace("Sending notification to {0}, {1}, {2}", deviceToken.Item1, deviceToken.Item2, deviceToken.Item3);


            _pushService.Service.QueueNotification
                (new AppleNotification().ForDeviceToken(deviceToken.Item1)                    
                    .WithPayload(payload));

And this is how it is being called:

 recipientDevices.AsParallel().ForAll(
                    deviceToken => SendPayloadToSubscribers(message, subs,deviceToken));

Is this the expected usage pattern?

Any help or pointers would be very appreciated. I am using the latest version of the code.

@omardelrio
Copy link
Author

After a bit more testing - I am experiencing exactly the same thing as #85.

After a while, no notifications are sent, and I also start getting MaxConnectionAttempts exceeded.

@normanhh3
Copy link

You might want to keep a lid on the number of parallel processes being
spawned to see if too many threads are being run.

On Tue, Apr 16, 2013 at 9:04 PM, omardelrio notifications@github.comwrote:

After a bit more testing - I am experiencing exactly the same thing as #85#85.

After a while, no notifications are sent, and I also start getting
MaxConnectionAttempts exceeded.


Reply to this email directly or view it on GitHubhttps://github.com//issues/157#issuecomment-16481277
.

@omerkirk
Copy link
Contributor

omerkirk commented May 9, 2013

I am having the same issues and using a single thread to queue the notifications. The exact same frequencies and settings don't cause any issues in the previous releases, if you keep having issues I suggest downgrading the PushSharp version.

@Redth
Copy link
Owner

Redth commented May 9, 2013

@omerkirk please tell me which release you are on currently, and which 'previous' release you are referring to by version, and I can see what the changes made between them to see if I can find a regression.

@omerkirk
Copy link
Contributor

omerkirk commented May 9, 2013

@Redth The previous files we used are version 1.1.1.0, which seems to work great for us, but as soon as I upgrade the versions to 2.0.4.0 which is I think the final version on NuGet we start receiving a lot of MaxSendAttemptsReached or similar. I am not really sure about the name.

Another stat that might help you is in the older version our ratio of errors to successful notifications was 1 to 1000 however on the latest release we nearly get a 1 to 1 ratio.

We use a single thread to send our notifications so it can't be about parallel connections.

@omardelrio
Copy link
Author

The way we resolved this was simply restarting the whole service after 15 minutes of inactivity. The trick is to keep a "last notification sent timestamp" and restart everything after some time passes; an additional optimization keeps the service from starting if there is no notification pending (some lazy initialization going on there).

I also removed additional parallelization to avoid possible synchronization issues until I have more time to create an appropriate test case that consistently reproduces the first issue.

I believe my "real" issue has to do with the channel being closed by either APNS servers or intermediary networking equipment; PushSharp, which uses a standard socket, does not find out about this until a few notifications have been considered as "sent", even when they never reached Apple. It appears that the sweet spot for our case was 20 minutes, which seems to be consistent with other reports on this kind of issue.

BTW - the TCP keep-alive option used in the connection does not help, since that only kicks in after 2 hours.

@samritchie
Copy link

We're having issues with this on 2.0.4 also - we were getting 80-90% of notifications from our AppHarbor worker fail with MaxSendAttemptsReached, although exactly the same code works fine locally. Regularly spinning up a new PushBroker seems to be a workaround.

Looking through the code, it looks like the connection is not properly re-established if it has problems (suspect we are repeatedly hitting ApplePushChannel.cs#136). We'll add some logging to the channel and try to narrow down further.

@Redth
Copy link
Owner

Redth commented Jun 5, 2013

@samritchie did you ever find anything more interesting in your logging?

@samritchie
Copy link

Not much, because we're restarting the broker now we're not hitting the error. We did get a couple of logs out of it a few weeks ago - some IOExceptions: "Authentication failed because the remote party closed the transport stream"

@Redth
Copy link
Owner

Redth commented Jun 5, 2013

@samritchie Any stack traces you could provide?

@samritchie
Copy link

No, the logs have been purged since then, sorry. We'll switch back to a singleton lifecycle and wait for the issue to recur if that would be helpful.

@Redth
Copy link
Owner

Redth commented Jun 5, 2013

@samritchie if you can, that would be much appreciated, I'd love to get to the bottom of this!

@samritchie
Copy link

This is what we seem to be getting the most of at the moment:

System.IO.IOException: Unable to write data to the transport connection: An established connection was aborted by the software in your host machine. ---> System.Net.Sockets.SocketException: An established connection was aborted by the software in your host machine
at System.Net.Sockets.Socket.Send(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags)
at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size)
--- End of inner exception stack trace ---
at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size)
at System.Net.Security._SslStream.StartWriting(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security._SslStream.ProcessWrite(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
at System.Net.Security.SslStream.Write(Byte[] buffer, Int32 offset, Int32 count)
at PushSharp.Apple.ApplePushChannel.SendNotification(INotification notification, SendNotificationCallbackDelegate callback)

This seems to happen repeatedly and requeue the notifications until there's a MaxSendAttemptsReachedException. As before, we're seeing a handful of notifications going through, but most are failing. There doesn't appear to be a pattern.

@alaa9jo
Copy link
Contributor

alaa9jo commented Jul 15, 2013

I can confirm this issue is happening even with the latest-beta build 2.1.1, in my error log I have this error repeated most of the time [The maximum number of Send attempts to send the notification was reached!] and by repeated I'm talking about thousands,most of the users aren't getting any pushes on ios and android

in my code I send in bulks for android [1K of tokens per push] and for ios I send one by one

@aviromascanu
Copy link

I can also confirm this issue has happened to us as well with the latest stable version (2.0.4)
Were using the library to send notifications to hundreds of applications daily, and were hitting those socket issues with APNS only, no notifications arrive what so ever for some of these applications whitest their certificate is valid and everything else is ok.
We have attempted to disable the auto-scaling and use a single channel, nothing improved though.

We pushed the latest beta to production and will see if anything resolves.

@bmatson
Copy link

bmatson commented Jul 17, 2013

I've seen similar issues, and had implemented a similar fix (i.e. tearing down and re-establishing the connection after a period of "no pushes") in my fork of the older version. I used 30 minutes in my implementation. I've since updated to the latest version of PushSharp in the hope that the keepalive/canwrite checking would fix the issue. I'd vote for adding a 20-30 minute idle check before sending a push, and just closing and re-opening the socket if it's been more than that time since the last push in the channel itself. It worked wonders in the old version...

Actually, after poking around a bit, it may be possible to change the 2 hour default on the keepalives using an IOControl call....might be simpler to just bump the keepalive time down to 20 minutes and call it a day.

I'm working through some other issues in APNS in my fork so I'd be willing to take a stab at rolling this logic back in as a part of that effort. I'm going to open a separate issue for my other fixes since they seem unrelated.

@aviromascanu
Copy link

I was thinking about implementing an external rate per second monitor and then indeed kill the connection and retry it, why would it make sense to be idled for a period like 20-30 minutes before pushes start going through?

@Redth
Copy link
Owner

Redth commented Jul 17, 2013

@aviromascanu I like the idea of implementing some sort of mechanism to kill idle connections. Apple requests that we keep connections open to them as long as possible, but in reality, killing an idle connection after say 5 minutes or so is not going to get anyone banned from APNS (trust me, I've done MUCH worse to no ill effect).

I've created a new issue: #231 to track this. @bmatson if you want to add something to your fork I'd be more than happy to bring it in. Thanks guys.

bmatson pushed a commit to bmatson/PushSharp that referenced this issue Jul 17, 2013
…nutes to compensate for crappy NAT devices.
bmatson pushed a commit to bmatson/PushSharp that referenced this issue Jul 17, 2013
…nutes to compensate for crappy NAT devices.
bmatson pushed a commit to bmatson/PushSharp that referenced this issue Jul 17, 2013
Added logging facade, additional log messages in APNS.

Added debug level to allow for repetitive messages.

Fixed re-entrancy issues on timers.

Corrected potential infinite loop in Dispose().

Possible fix for PushSharp Redth#157, dropped keepalive time to 20 minutes to compensate for crappy NAT devices.
@Redth
Copy link
Owner

Redth commented Aug 8, 2013

All, please try version 2.1.2-beta or newer. It is on NuGet as a Pre-release and it's also in master now on the repository. Feel free to re-open this if you continue to have issues!

@Redth Redth closed this as completed Aug 8, 2013
@samritchie
Copy link

FWIW we've been running commit dc37756 in our hosting environment for 2 weeks with no problems - unsure if the underlying issue was fixed somewhere after 2.0.4, or we haven't had a recurrence of the network behaviour within AppHarbor.

I've upgraded to 2.1.2-beta, I'll report back if I see the it happen again.

@Redth
Copy link
Owner

Redth commented Aug 9, 2013

@samritchie thanks for the feedback. I have a feeling 2.1.2 should work much better in general for APNS.. There's a number of improvements now, one of which is the ApplePushService will scale down the number of channels to 0 after being Idle for more than 5 minutes (this idle timeout is configurable in the ApplePushChannelSettings)... Lots of other improvements that should prevent notifications from being lost, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants