Question: what's the correct way to handle database connection pool size when using cron #443

jgrau · 2021-11-01T19:38:28Z

I'm using good_job in my kubernetes setup. I run 2 worker pods that are started with bundle exec good_job start. The relevant configuration is

# application.rb
    config.good_job.enable_cron = true
    config.good_job.cron = {
      complete_completed_bookings: {
        cron: 'at midnight',
        class: 'CompleteCompletedBookingsJob',
        description:
          'Every midnight: update finished bookings: Set state to completed',
      },
      import_content_translations: {
        cron: 'every 15 minutes',
        class: 'ImportContentTranslationsJob',
        description:
          'Every 15 minutes: import content translations from crowdin',
      },
      purge_unattached_active_storage_blobs: {
        cron: 'at midnight',
        class: 'Maintenance::PurgeUnattachedActiveStorageBlobsJob',
        description: 'Every midnight: purge unattached active storage blobs',
      },
    }

# production.rb
  config.x.good_job = {
    execution_mode: :external,
    username: 'landfolk',
    password: Rails.application.credentials.good_job.fetch(:password),
  }

# database.yml
default: &default
  adapter: postgresql
  encoding: unicode
  # For details on connection pooling, see Rails configuration guide
  # https://guides.rubyonrails.org/configuring.html#database-pooling
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>

production:
  primary:
    <<: *default
    username: <%= ENV['PGUSER'] %>
    password: <%= ENV['PGPASSWORD'] %>
    database: <%= ENV['PGDATABASE'] %>
    host: <%= ENV['PGHOST'] %>
    port: <%= ENV['PGPORT'] %>
  primary_replica:
    <<: *default
    username: <%= ENV['REPLICA_PGUSER'] %>
    password: <%= ENV['REPLICA_PGPASSWORD'] %>
    database: <%= ENV['REPLICA_PGDATABASE'] %>
    host: <%= ENV['REPLICA_PGHOST'] %>
    port: <%= ENV['REPLICA_PGPORT'] %>
    replica: true

I've been seeing ActiveRecord::ConnectionTimeoutError recently. Here's a full log entry:

E, [2021-11-01T11:15:57.005688 #1] ERROR -- : [ActiveJob] Failed enqueuing ImportContentTranslationsJob to GoodJob(production_default): ActiveRecord::ConnectionTimeoutError (could not obtain a connection from the pool within 5.000 seconds (waited 5.002 seconds); all pooled connections were in use

which got me wondering if the database connection pool setup is correct.

Does the "cron" process use a connection from the pool?

I should add that I was seeing the errors at a time where a lot of jobs was being retried (using wait: :exponentially_longer) but never the less I wouldn't expect this error.

Also: I considered whether it would be better to run a separate scheduler pod that I could limit to always only have 1 replica and that I would start with --enable-cron but I then realised that it runs the cron process AND the worker process so I figured that was not the intended setup :)

The text was updated successfully, but these errors were encountered:

jgrau · 2021-11-01T19:40:53Z

I should also add that I have studied the readme but as I understood it, the complexity with connection pool size was with execution_mode: :async..

bensheldon · 2021-11-02T00:22:54Z

Does the "cron" process use a connection from the pool?

Yes. Cron runs in background thread(s), and each thread will consume a database connection from the pool.

The CronManager creates Concurrent::ScheduledTasks which run on Concurrent Ruby's global thread pool, which is implicitly sized to the number of CPU processors.

Note: It's possible to change the implementation for the CronManager to run on a dedicated and explicitly sized Thread Pool, which would give a bit more certainty to the number of database connections consumed.

At the moment, I would recommend that your database pool size = [Web/Puma threads if async] + [GoodJob execution threads] + [1 GoodJob LISTEN/NOTIFY thread] + [2 GoodJob cron threads] + 20% margin

GoodJob will check in database connections at the end of each operation (e.g. executing a job, enqueuing a cron entry), so it's possible to to run with less database connections than threads, but that leaves open the possibility that you will hit the limit under load, which will result in those database connection timeouts.

jgrau · 2021-11-02T08:15:21Z

Thank you @bensheldon ! It makes a lot of sense and with that calculation I needed to double my database pool size. I'm deploying that change now and feel pretty confident it will resolve my issue! :) ❤️

jgrau closed this as completed Nov 2, 2021

reczy mentioned this issue Feb 16, 2022

Best practices in deploying and monitoring a queue #523

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: what's the correct way to handle database connection pool size when using cron #443

Question: what's the correct way to handle database connection pool size when using cron #443

jgrau commented Nov 1, 2021

jgrau commented Nov 1, 2021

bensheldon commented Nov 2, 2021

jgrau commented Nov 2, 2021

Question: what's the correct way to handle database connection pool size when using cron #443

Question: what's the correct way to handle database connection pool size when using cron #443

Comments

jgrau commented Nov 1, 2021

jgrau commented Nov 1, 2021

bensheldon commented Nov 2, 2021

jgrau commented Nov 2, 2021