Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APNSP8 : a very few amount of ios rpush notifications are stuck in processing mode #648

Open
nicolas-entourage opened this issue Nov 18, 2022 · 5 comments

Comments

@nicolas-entourage
Copy link

nicolas-entourage commented Nov 18, 2022

System configuration:

  • ruby 2.6.6
  • rails 5.2.6
  • rpush 7.0.1
  • sidekiq 6.2.1
  • heroku-20 stack
  • apnsp8 for ios notifs

Issue

A very few amount of IOS rpush notifications (using apnsp8) are stuck in busy mode on sidekiq. The status of this rpush remains in processing in rpush_notifications table.

Context

Having 5 workers, plenty IOS rpush notifs succeed. But i consistently have until 4 workers that are in busy mode, with IOS rpush notifs.

No matter how many workers i have (from 3 to 10), all of them but one eventualy fall in "busy mode" for hours with IOS rpush notifs.

I moved recently from APNS to APNSP8 and did not have this issue before.

Expected behavior

Either the job launching rpush notification should fail and go to retry, either it should succeed and be moved from "busy" to "processed job". But it should not stay in busy mode for hours or more.

code

Here is the code i use to send ios notifications :

notification = Rpush::Apnsp8::Notification.new
notification.badge = badge if badge
notification.app = Rpush::Apnsp8::App.first
notification.device_token = device_token.to_s
notification.alert = {
  title: object,
  body: content
}
notification.data = { sender: sender, object: object, content: { message: content, extra: extra } }
notification.save!

Rpush.push
@nicolas-entourage nicolas-entourage changed the title APNSP8 : a very few amount of ios rpush notifications are stuck in busy mode APNSP8 : a very few amount of ios rpush notifications are stuck in processing mode Nov 28, 2022
@jiehanzheng
Copy link

Just ran in to the same issue! Although I looked at the code but don't understand why this is happening.

Here is the error message:

jiehan@Jiehans-MBP web % rpush start -f
* Booting Rails 'development' environment... ✔
[2023-03-31 05:12:08][my_app] Starting 1 dispatcher... ✔
[2023-03-31 05:12:08]Rpush operational.
// ...
[2023-03-31 05:13:02][my_app] 4 sent to b23523fa113370b0e7771c128ea14eb0b9ba71ef1329d74045e0a69fcc2029b7
[2023-03-31 16:43:46][ERROR] Errno::ECONNRESET, Connection reset by peer
/Users/jiehan/.rvm/rubies/ruby-3.2.1/lib/ruby/3.2.0/openssl/buffering.rb:214:in `sysread_nonblock'
/Users/jiehan/.rvm/rubies/ruby-3.2.1/lib/ruby/3.2.0/openssl/buffering.rb:214:in `read_nonblock'
/Users/jiehan/.rvm/gems/ruby-3.2.1/gems/net-http2-0.18.4/lib/net-http2/client.rb:145:in `block in socket_loop'
/Users/jiehan/.rvm/gems/ruby-3.2.1/gems/net-http2-0.18.4/lib/net-http2/client.rb:142:in `loop'
/Users/jiehan/.rvm/gems/ruby-3.2.1/gems/net-http2-0.18.4/lib/net-http2/client.rb:142:in `socket_loop'
/Users/jiehan/.rvm/gems/ruby-3.2.1/gems/net-http2-0.18.4/lib/net-http2/client.rb:114:in `block (2 levels) in ensure_open'
[2023-03-31 17:11:10]Shutting down... ✔

And this notification number 5 is stuck in processing:

3.2.1 :006 > Rpush::Notification.last
  Rpush::Client::ActiveRecord::Notification Load (0.4ms)  SELECT "rpush_notifications".* FROM "rpush_notifications" ORDER BY "rpush_notifications"."id" DESC LIMIT ?  [["LIMIT", 1]]
 =>                                                                        
#<Rpush::Client::ActiveRecord::Apnsp8::Notification:0x000000010ab72ef0     
 id: 5,                                                                    
 //...                                                
 delivered: false,                                                         
 delivered_at: nil,                                                        
 failed: false,                                                            
 failed_at: nil,                                                           
 error_code: nil,                                                          
 error_description: nil,
 deliver_after: nil,
 created_at: Fri, 31 Mar 2023 23:43:44.487971000 UTC +00:00,
 updated_at: Fri, 31 Mar 2023 23:43:46.109256000 UTC +00:00,
 alert_is_json: false,
 type: "Rpush::Client::ActiveRecord::Apnsp8::Notification",
 collapse_key: nil,
 delay_while_idle: false,
 registration_ids: nil,
 app_id: 1,
 retries: 0,
 uri: nil,
 fail_after: nil,
 processing: true,
 priority: nil,
 url_args: nil,
 category: nil,
 content_available: false,
 notification: nil,
 mutable_content: false,
 external_device_id: nil,
 thread_id: nil,
 dry_run: false,
 sound_is_json: false> 

When I looked at the code, I do see there's ensure block that should have marked it as completed. Not very sure why it's not being executed as expected:

def perform
@batch.each_notification do |notification|
prepare_async_post(notification)
end
# Send all preprocessed requests at once
@client.join(timeout: CLIENT_JOIN_TIMEOUT)
rescue NetHttp2::AsyncRequestTimeout => error
mark_batch_retryable(Time.now + 10.seconds, error)
@client.close
raise
rescue Errno::ECONNREFUSED, SocketError, HTTP2::Error::StreamLimitExceeded => error
# TODO restart connection when StreamLimitExceeded
mark_batch_retryable(Time.now + 10.seconds, error)
raise
rescue StandardError => error
mark_batch_failed(error)
raise
ensure
@batch.all_processed
end

@paulcarpenter
Copy link

I'm also seeing a number of notifications stuck in processing on my RPush workers. Seems to be similar behavior to this, which was previously resolved: #494

@OskarAtJoint
Copy link

Still an issue, anyone resolved this? Also catching Errno::ECONNRESET and retry?

@OskarAtJoint
Copy link

seems like a "healthcheck" should be done: "Reuse a connection as long as possible. In most cases, you can reuse a connection for many hours to days. If your connection is mostly idle, you may send a HTTP2 PING frame after an hour of inactivity. Reusing a connection often results in less bandwidth and CPU consumption." https://developer.apple.com/documentation/usernotifications/setting_up_a_remote_notification_server/sending_notification_requests_to_apns#3634319 also discussed here node-apn/node-apn#454

@paulcarpenter
Copy link

We ended up creating a job that we run regularly using Sidekiq scheduler that just retries recently failed notifications. It's not elegant but it does the job:

if since_id = Rpush::Apns2::Notification.where(['created_at < ?', 5.minutes.ago]).order(id: :desc).first&.id

      processing_notifications = Rpush::Apns2::Notification.where(['id > ?', since_id]).where(processing: true).each do |processing_notification|
        processing_notification.processing = false
        processing_notification.save!
      end

      failed_notifications = Rpush::Apns2::Notification.where(['id > ?', since_id]).where(failed: true, error_description: "Connection reset by peer").where(['failed_at > ?', 5.minutes.ago]).each do |failed_notification|
        failed_notification.failed_at = nil
        failed_notification.failed = false
        failed_notification.save!
      end
    end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants