APNSP8 : a very few amount of ios rpush notifications are stuck in processing mode #648

nicolas-entourage · 2022-11-18T13:36:01Z

System configuration:

ruby 2.6.6
rails 5.2.6
rpush 7.0.1
sidekiq 6.2.1
heroku-20 stack
apnsp8 for ios notifs

Issue

A very few amount of IOS rpush notifications (using apnsp8) are stuck in busy mode on sidekiq. The status of this rpush remains in processing in rpush_notifications table.

Context

Having 5 workers, plenty IOS rpush notifs succeed. But i consistently have until 4 workers that are in busy mode, with IOS rpush notifs.

No matter how many workers i have (from 3 to 10), all of them but one eventualy fall in "busy mode" for hours with IOS rpush notifs.

I moved recently from APNS to APNSP8 and did not have this issue before.

Expected behavior

Either the job launching rpush notification should fail and go to retry, either it should succeed and be moved from "busy" to "processed job". But it should not stay in busy mode for hours or more.

code

Here is the code i use to send ios notifications :

notification = Rpush::Apnsp8::Notification.new
notification.badge = badge if badge
notification.app = Rpush::Apnsp8::App.first
notification.device_token = device_token.to_s
notification.alert = {
  title: object,
  body: content
}
notification.data = { sender: sender, object: object, content: { message: content, extra: extra } }
notification.save!

Rpush.push

The text was updated successfully, but these errors were encountered:

jiehanzheng · 2023-04-01T00:33:47Z

Just ran in to the same issue! Although I looked at the code but don't understand why this is happening.

Here is the error message:

jiehan@Jiehans-MBP web % rpush start -f
* Booting Rails 'development' environment... ✔
[2023-03-31 05:12:08][my_app] Starting 1 dispatcher... ✔
[2023-03-31 05:12:08]Rpush operational.
// ...
[2023-03-31 05:13:02][my_app] 4 sent to b23523fa113370b0e7771c128ea14eb0b9ba71ef1329d74045e0a69fcc2029b7
[2023-03-31 16:43:46][ERROR] Errno::ECONNRESET, Connection reset by peer
/Users/jiehan/.rvm/rubies/ruby-3.2.1/lib/ruby/3.2.0/openssl/buffering.rb:214:in `sysread_nonblock'
/Users/jiehan/.rvm/rubies/ruby-3.2.1/lib/ruby/3.2.0/openssl/buffering.rb:214:in `read_nonblock'
/Users/jiehan/.rvm/gems/ruby-3.2.1/gems/net-http2-0.18.4/lib/net-http2/client.rb:145:in `block in socket_loop'
/Users/jiehan/.rvm/gems/ruby-3.2.1/gems/net-http2-0.18.4/lib/net-http2/client.rb:142:in `loop'
/Users/jiehan/.rvm/gems/ruby-3.2.1/gems/net-http2-0.18.4/lib/net-http2/client.rb:142:in `socket_loop'
/Users/jiehan/.rvm/gems/ruby-3.2.1/gems/net-http2-0.18.4/lib/net-http2/client.rb:114:in `block (2 levels) in ensure_open'
[2023-03-31 17:11:10]Shutting down... ✔

And this notification number 5 is stuck in processing:

3.2.1 :006 > Rpush::Notification.last
  Rpush::Client::ActiveRecord::Notification Load (0.4ms)  SELECT "rpush_notifications".* FROM "rpush_notifications" ORDER BY "rpush_notifications"."id" DESC LIMIT ?  [["LIMIT", 1]]
 =>                                                                        
#<Rpush::Client::ActiveRecord::Apnsp8::Notification:0x000000010ab72ef0     
 id: 5,                                                                    
 //...                                                
 delivered: false,                                                         
 delivered_at: nil,                                                        
 failed: false,                                                            
 failed_at: nil,                                                           
 error_code: nil,                                                          
 error_description: nil,
 deliver_after: nil,
 created_at: Fri, 31 Mar 2023 23:43:44.487971000 UTC +00:00,
 updated_at: Fri, 31 Mar 2023 23:43:46.109256000 UTC +00:00,
 alert_is_json: false,
 type: "Rpush::Client::ActiveRecord::Apnsp8::Notification",
 collapse_key: nil,
 delay_while_idle: false,
 registration_ids: nil,
 app_id: 1,
 retries: 0,
 uri: nil,
 fail_after: nil,
 processing: true,
 priority: nil,
 url_args: nil,
 category: nil,
 content_available: false,
 notification: nil,
 mutable_content: false,
 external_device_id: nil,
 thread_id: nil,
 dry_run: false,
 sound_is_json: false>

When I looked at the code, I do see there's ensure block that should have marked it as completed. Not very sure why it's not being executed as expected:

rpush/lib/rpush/daemon/apnsp8/delivery.rb

Lines 21 to 41 in eb48a90

    
           def perform 
        
             @batch.each_notification do |notification| 
        
               prepare_async_post(notification) 
        
             end 
        
             # Send all preprocessed requests at once 
        
             @client.join(timeout: CLIENT_JOIN_TIMEOUT) 
        
           rescue NetHttp2::AsyncRequestTimeout => error 
        
             mark_batch_retryable(Time.now + 10.seconds, error) 
        
             @client.close 
        
             raise 
        
           rescue Errno::ECONNREFUSED, SocketError, HTTP2::Error::StreamLimitExceeded => error 
        
             # TODO restart connection when StreamLimitExceeded 
        
             mark_batch_retryable(Time.now + 10.seconds, error) 
        
             raise 
        
           rescue StandardError => error 
        
             mark_batch_failed(error) 
        
             raise 
        
           ensure 
        
             @batch.all_processed 
        
           end

paulcarpenter · 2023-04-13T15:32:18Z

I'm also seeing a number of notifications stuck in processing on my RPush workers. Seems to be similar behavior to this, which was previously resolved: #494

OskarAtJoint · 2023-09-10T17:01:22Z

Still an issue, anyone resolved this? Also catching Errno::ECONNRESET and retry?

OskarAtJoint · 2023-09-11T06:29:09Z

seems like a "healthcheck" should be done: "Reuse a connection as long as possible. In most cases, you can reuse a connection for many hours to days. If your connection is mostly idle, you may send a HTTP2 PING frame after an hour of inactivity. Reusing a connection often results in less bandwidth and CPU consumption." https://developer.apple.com/documentation/usernotifications/setting_up_a_remote_notification_server/sending_notification_requests_to_apns#3634319 also discussed here node-apn/node-apn#454

paulcarpenter · 2023-09-11T12:11:09Z

We ended up creating a job that we run regularly using Sidekiq scheduler that just retries recently failed notifications. It's not elegant but it does the job:

if since_id = Rpush::Apns2::Notification.where(['created_at < ?', 5.minutes.ago]).order(id: :desc).first&.id

      processing_notifications = Rpush::Apns2::Notification.where(['id > ?', since_id]).where(processing: true).each do |processing_notification|
        processing_notification.processing = false
        processing_notification.save!
      end

      failed_notifications = Rpush::Apns2::Notification.where(['id > ?', since_id]).where(failed: true, error_description: "Connection reset by peer").where(['failed_at > ?', 5.minutes.ago]).each do |failed_notification|
        failed_notification.failed_at = nil
        failed_notification.failed = false
        failed_notification.save!
      end
    end

nicolas-entourage changed the title ~~APNSP8 : a very few amount of ios rpush notifications are stuck in busy mode~~ APNSP8 : a very few amount of ios rpush notifications are stuck in processing mode Nov 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

APNSP8 : a very few amount of ios rpush notifications are stuck in processing mode #648

APNSP8 : a very few amount of ios rpush notifications are stuck in processing mode #648

nicolas-entourage commented Nov 18, 2022 •

edited

jiehanzheng commented Apr 1, 2023

paulcarpenter commented Apr 13, 2023

OskarAtJoint commented Sep 10, 2023

OskarAtJoint commented Sep 11, 2023

paulcarpenter commented Sep 11, 2023

APNSP8 : a very few amount of ios rpush notifications are stuck in processing mode #648

APNSP8 : a very few amount of ios rpush notifications are stuck in processing mode #648

Comments

nicolas-entourage commented Nov 18, 2022 • edited

jiehanzheng commented Apr 1, 2023

paulcarpenter commented Apr 13, 2023

OskarAtJoint commented Sep 10, 2023

OskarAtJoint commented Sep 11, 2023

paulcarpenter commented Sep 11, 2023

nicolas-entourage commented Nov 18, 2022 •

edited