Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freezes puma on code change #95

Closed
sj26 opened this issue Aug 25, 2020 · 6 comments · Fixed by #97
Closed

Freezes puma on code change #95

sj26 opened this issue Aug 25, 2020 · 6 comments · Fixed by #97
Labels
bug Something isn't working
Projects

Comments

@sj26
Copy link
Contributor

sj26 commented Aug 25, 2020

When running GoodJob async in a puma process, if I save a file on the filesystem and then make a new puma request then the whole process locks up irrecoverably, and I have to kill -9 the process. I'm using Rails 6.0.3.2, puma 4.3.5 and good_job 1.2.1, in the default configuration, i.e with zeitwerk and listen, on macOS.

@bensheldon bensheldon added this to Inbox in Backlog Aug 25, 2020
@sj26
Copy link
Contributor Author

sj26 commented Aug 25, 2020

I got a big fat stack of backtraces via sigdump.

Here's my puma thread trying to unload:

  Thread #<Thread:0x00007f87bd12ef70@puma threadpool 003 gems/puma-4.3.5/lib/puma/thread_pool.rb:89 sleep_forever> status=sleep priority=0
      monitor.rb:108:in `sleep'
      monitor.rb:108:in `wait'
      monitor.rb:108:in `wait_for_cond'
      monitor.rb:108:in `wait'
      monitor.rb:116:in `wait_while'
      gems/activesupport-6.0.3.2/lib/active_support/concurrency/share_lock.rb:220:in `wait_for'
      gems/activesupport-6.0.3.2/lib/active_support/concurrency/share_lock.rb:83:in `block (2 levels) in start_exclusive'
      gems/activesupport-6.0.3.2/lib/active_support/concurrency/share_lock.rb:187:in `yield_shares'
      gems/activesupport-6.0.3.2/lib/active_support/concurrency/share_lock.rb:82:in `block in start_exclusive'
      monitor.rb:202:in `synchronize'
      monitor.rb:202:in `mon_synchronize'
      gems/activesupport-6.0.3.2/lib/active_support/concurrency/share_lock.rb:77:in `start_exclusive'
      gems/activesupport-6.0.3.2/lib/active_support/dependencies/interlock.rb:27:in `start_unloading'
      gems/activesupport-6.0.3.2/lib/active_support/reloader.rb:100:in `require_unload_lock!'
      gems/activesupport-6.0.3.2/lib/active_support/reloader.rb:119:in `class_unload!'
      gems/railties-6.0.3.2/lib/rails/application/finisher.rb:231:in `block (2 levels) in <module:Finisher>'
      gems/activesupport-6.0.3.2/lib/active_support/callbacks.rb:428:in `instance_exec'
      gems/activesupport-6.0.3.2/lib/active_support/callbacks.rb:428:in `block in make_lambda'
      gems/activesupport-6.0.3.2/lib/active_support/callbacks.rb:200:in `block (2 levels) in halting'
      gems/activesupport-6.0.3.2/lib/active_support/callbacks.rb:605:in `block (2 levels) in default_terminator'
      gems/activesupport-6.0.3.2/lib/active_support/callbacks.rb:604:in `catch'
      gems/activesupport-6.0.3.2/lib/active_support/callbacks.rb:604:in `block in default_terminator'
      gems/activesupport-6.0.3.2/lib/active_support/callbacks.rb:201:in `block in halting'
      gems/activesupport-6.0.3.2/lib/active_support/callbacks.rb:513:in `block in invoke_before'
      gems/activesupport-6.0.3.2/lib/active_support/callbacks.rb:513:in `each'
      gems/activesupport-6.0.3.2/lib/active_support/callbacks.rb:513:in `invoke_before'
      gems/activesupport-6.0.3.2/lib/active_support/callbacks.rb:134:in `run_callbacks'
      gems/activesupport-6.0.3.2/lib/active_support/execution_wrapper.rb:111:in `run!'
      gems/activesupport-6.0.3.2/lib/active_support/reloader.rb:114:in `run!'
      gems/activesupport-6.0.3.2/lib/active_support/execution_wrapper.rb:73:in `block in run!'
      gems/activesupport-6.0.3.2/lib/active_support/execution_wrapper.rb:70:in `tap'
      gems/activesupport-6.0.3.2/lib/active_support/execution_wrapper.rb:70:in `run!'
      gems/activesupport-6.0.3.2/lib/active_support/reloader.rb:63:in `run!'
      gems/actionpack-6.0.3.2/lib/action_dispatch/middleware/executor.rb:12:in `call'
      gems/actionpack-6.0.3.2/lib/action_dispatch/middleware/actionable_exceptions.rb:17:in `call'
      gems/actionpack-6.0.3.2/lib/action_dispatch/middleware/debug_exceptions.rb:32:in `call'
      gems/web-console-4.0.4/lib/web_console/middleware.rb:132:in `call_app'
      gems/web-console-4.0.4/lib/web_console/middleware.rb:28:in `block in call'
      gems/web-console-4.0.4/lib/web_console/middleware.rb:17:in `catch'
      gems/web-console-4.0.4/lib/web_console/middleware.rb:17:in `call'
      gems/actionpack-6.0.3.2/lib/action_dispatch/middleware/show_exceptions.rb:33:in `call'
      gems/railties-6.0.3.2/lib/rails/rack/logger.rb:37:in `call_app'
      gems/railties-6.0.3.2/lib/rails/rack/logger.rb:26:in `block in call'
      gems/activesupport-6.0.3.2/lib/active_support/tagged_logging.rb:80:in `block in tagged'
      gems/activesupport-6.0.3.2/lib/active_support/tagged_logging.rb:28:in `tagged'
      gems/activesupport-6.0.3.2/lib/active_support/tagged_logging.rb:80:in `tagged'
      gems/railties-6.0.3.2/lib/rails/rack/logger.rb:26:in `call'
      gems/actionpack-6.0.3.2/lib/action_dispatch/middleware/remote_ip.rb:81:in `call'
      gems/actionpack-6.0.3.2/lib/action_dispatch/middleware/request_id.rb:27:in `call'
      gems/rack-2.2.3/lib/rack/method_override.rb:24:in `call'
      gems/rack-2.2.3/lib/rack/runtime.rb:22:in `call'
      gems/activesupport-6.0.3.2/lib/active_support/cache/strategy/local_cache_middleware.rb:29:in `call'
      gems/actionpack-6.0.3.2/lib/action_dispatch/middleware/executor.rb:14:in `call'
      gems/actionpack-6.0.3.2/lib/action_dispatch/middleware/static.rb:126:in `call'
      gems/rack-2.2.3/lib/rack/sendfile.rb:110:in `call'
      gems/actionpack-6.0.3.2/lib/action_dispatch/middleware/host_authorization.rb:82:in `call'
      gems/webpacker-4.3.0/lib/webpacker/dev_server_proxy.rb:23:in `perform_request'
      gems/rack-proxy-0.6.5/lib/rack/proxy.rb:57:in `call'
      gems/railties-6.0.3.2/lib/rails/engine.rb:527:in `call'
      gems/puma-4.3.5/lib/puma/configuration.rb:228:in `call'
      gems/puma-4.3.5/lib/puma/server.rb:713:in `handle_request'
      gems/puma-4.3.5/lib/puma/server.rb:472:in `process_client'
      gems/puma-4.3.5/lib/puma/server.rb:328:in `block in run'
      gems/puma-4.3.5/lib/puma/thread_pool.rb:134:in `block in spawn_thread'

It seems to be competing with several good job schedulers:


  Thread #<Thread:0x00007f87bf057a50@GoodJob::Scheduler(queues=* max_threads=9 poll_interval=1)-worker-2 gems/concurrent-ruby-1.1.7/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:332 sleep_forever> status=sleep priority=0
      monitor.rb:108:in `sleep'
      monitor.rb:108:in `wait'
      monitor.rb:108:in `wait_for_cond'
      monitor.rb:108:in `wait'
      monitor.rb:116:in `wait_while'
      gems/activesupport-6.0.3.2/lib/active_support/concurrency/share_lock.rb:220:in `wait_for'
      gems/activesupport-6.0.3.2/lib/active_support/concurrency/share_lock.rb:125:in `block in start_sharing'
      monitor.rb:202:in `synchronize'
      monitor.rb:202:in `mon_synchronize'
      gems/activesupport-6.0.3.2/lib/active_support/concurrency/share_lock.rb:115:in `start_sharing'
      gems/activesupport-6.0.3.2/lib/active_support/dependencies/interlock.rb:35:in `start_running'
      gems/railties-6.0.3.2/lib/rails/application/finisher.rb:148:in `run'
      gems/activesupport-6.0.3.2/lib/active_support/execution_wrapper.rb:28:in `before'
      gems/activesupport-6.0.3.2/lib/active_support/callbacks.rb:428:in `block in make_lambda'
      gems/activesupport-6.0.3.2/lib/active_support/callbacks.rb:200:in `block (2 levels) in halting'
      gems/activesupport-6.0.3.2/lib/active_support/callbacks.rb:605:in `block (2 levels) in default_terminator'
      gems/activesupport-6.0.3.2/lib/active_support/callbacks.rb:604:in `catch'
      gems/activesupport-6.0.3.2/lib/active_support/callbacks.rb:604:in `block in default_terminator'
      gems/activesupport-6.0.3.2/lib/active_support/callbacks.rb:201:in `block in halting'
      gems/activesupport-6.0.3.2/lib/active_support/callbacks.rb:513:in `block in invoke_before'
      gems/activesupport-6.0.3.2/lib/active_support/callbacks.rb:513:in `each'
      gems/activesupport-6.0.3.2/lib/active_support/callbacks.rb:513:in `invoke_before'
      gems/activesupport-6.0.3.2/lib/active_support/callbacks.rb:134:in `run_callbacks'
      gems/activesupport-6.0.3.2/lib/active_support/execution_wrapper.rb:111:in `run!'
      gems/activesupport-6.0.3.2/lib/active_support/execution_wrapper.rb:73:in `block in run!'
      gems/activesupport-6.0.3.2/lib/active_support/execution_wrapper.rb:70:in `tap'
      gems/activesupport-6.0.3.2/lib/active_support/execution_wrapper.rb:70:in `run!'
      gems/activesupport-6.0.3.2/lib/active_support/execution_wrapper.rb:86:in `wrap'
      gems/good_job-1.2.1/lib/good_job/scheduler.rb:142:in `block in create_thread'
      gems/concurrent-ruby-1.1.7/lib/concurrent-ruby/concurrent/executor/safe_task_executor.rb:24:in `block in execute'
      gems/concurrent-ruby-1.1.7/lib/concurrent-ruby/concurrent/synchronization/mutex_lockable_object.rb:41:in `block in synchronize'
      gems/concurrent-ruby-1.1.7/lib/concurrent-ruby/concurrent/synchronization/mutex_lockable_object.rb:41:in `synchronize'
      gems/concurrent-ruby-1.1.7/lib/concurrent-ruby/concurrent/synchronization/mutex_lockable_object.rb:41:in `synchronize'
      gems/concurrent-ruby-1.1.7/lib/concurrent-ruby/concurrent/executor/safe_task_executor.rb:19:in `execute'
      gems/concurrent-ruby-1.1.7/lib/concurrent-ruby/concurrent/ivar.rb:169:in `safe_execute'
      gems/concurrent-ruby-1.1.7/lib/concurrent-ruby/concurrent/future.rb:55:in `block in execute'
      gems/concurrent-ruby-1.1.7/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:363:in `run_task'
      gems/concurrent-ruby-1.1.7/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:352:in `block (3 levels) in create_worker'
      gems/concurrent-ruby-1.1.7/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:335:in `loop'
      gems/concurrent-ruby-1.1.7/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:335:in `block (2 levels) in create_worker'
      gems/concurrent-ruby-1.1.7/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:334:in `catch'
      gems/concurrent-ruby-1.1.7/lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:334:in `block in create_worker'

@sj26
Copy link
Contributor Author

sj26 commented Aug 25, 2020

It looks like this is too broad:

future = Concurrent::Future.new(args: [@performer], executor: @pool) do |performer|
output = nil
Rails.application.executor.wrap { output = performer.next }
output
end

It should not be wrapping until the future is fulfilled. Otherwise it goes to sleep holding the lock and prevents any critical exclusive sections from running, i.e. deadlocks.

@sj26
Copy link
Contributor Author

sj26 commented Aug 25, 2020

... I'm not even sure you should be using Rails.application.executor.wrap, ActiveJob does it for you:

https://github.com/rails/rails/blob/master/activejob/lib/active_job/railtie.rb#L41-L45

@sj26
Copy link
Contributor Author

sj26 commented Aug 25, 2020

Here too — going to sleep (wait_for_notify) while holding a lock is going to lead to a deadlock, even while allowing concurrent loads (the unloads will block):

Rails.application.reloader.wrap do
conn = ActiveRecord::Base.connection.raw_connection
ActiveSupport::Notifications.instrument("notifier_listen.good_job") do
conn.async_exec "LISTEN #{CHANNEL}"
end
begin
ActiveSupport::Dependencies.interlock.permit_concurrent_loads do
while pool.running?
listening.make_true
conn.wait_for_notify(WAIT_INTERVAL) do |channel, _pid, payload|
listening.make_false
next unless channel == CHANNEL
ActiveSupport::Notifications.instrument("notifier_notified.good_job", { payload: payload })
parsed_payload = JSON.parse(payload, symbolize_names: true)
recipients.each do |recipient|
target, method_name = recipient.is_a?(Array) ? recipient : [recipient, :call]
target.send(method_name, parsed_payload)
end
end
listening.make_false
end
end
rescue StandardError => e
ActiveSupport::Notifications.instrument("notifier_notify_error.good_job", { error: e })
raise
ensure
@listening.make_false
ActiveSupport::Notifications.instrument("notifier_unlisten.good_job") do
conn.async_exec "UNLISTEN *"
end
end
end

@bensheldon
Copy link
Owner

Thanks @sj26 for digging into this. This is great work ✨ I did a lot of deadlock tracing when #82 landed, there's additional debugging documentation there.

To dig into your comments:

  • Scheduler:
    • re: comment: Calling Performer#next performs an ActiveReord query to fetch/lock the next job. So afaik it needs a Rails.application.executor.wrap around the whole thing to check out the ActiveRecord connection.
    • re: comment The block passed to Concurrent::Future.new(args: [@performer], executor: @pool) do |performer| ... only executes when the future is executed.
  • Notifier:
    • re: comment. The long-running polling process is the piece I'm least confident in. It's currently wrapped in a reloader, but having sunk hours into working on deadlocks I'm still not confident in my understanding of the Rails lazy/hot reloading, and whether the reloader is more necessary than an executor from a theoretical standpoint, regardless of the actual behavior.

I'm fairly certain the long-running Notifier is the troublemaker here. And that all of these troubles are due to lazy/hot reloading in development (I have knowledge of one person other than me successfully running :async in production).

Is this something you'd be able to dig into more deeply? If you have the time for synchronous work, I really like pair-programming and could host a zoom/tuple.

@bensheldon bensheldon added the bug Something isn't working label Aug 25, 2020
@sj26
Copy link
Contributor Author

sj26 commented Aug 25, 2020

Nice, I'll read through that a little more closely soon. And re: pairing, I'm just about to head off for the night, but another time for sure.

For now, I think you want something a little more like this — explicitly managing ActiveRecord connection pool usage a la ActionCable, without being trapped by the ActiveSupport::Dependencies.interlock:
main...sj26:less-interlocking

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Backlog
  
Done
Development

Successfully merging a pull request may close this issue.

2 participants