Remove `pg_advisory_unlock_all()` after job is run; only verify blank `finished_at` (and not lock presence) before performing job #1113

bensheldon · 2023-10-18T15:55:04Z

The background of this change arises from trying to overcome the performance limitations of the lock verification as described in #896.

Before this change, the sequence when dequeuing a job to perform:

Do the advisory lock query, and return the record
Verify that the record is advisory locked and finished_at is not set
Execute the job if the previous condition is ok
Blanket unlock all adisory unlocks held by the session

After this change:

Do the advisory lock query, and return the record
Verify that the record's finished_at is not set
Execute the job if the previous condition is ok
Unlock the specific record that was advisory locked in the first step

Background / Analysis

Most briefly, GoodJob's advisory locking strategy went through a lot of churn in mid-2021.

Using pg_advisory_unlock_all, instead of unlocking the specifically returned record, was introduced in #285 to address unexpectedly advisory locked records caused by "an indeterminate reason" (😞). In the fullness of time, I'm fairly certain that it was caused by a FOR UPDATE SKIP LOCKED that was introduced in #273 (mistakenly? that PR says it was extracted to #274, and it wasn't), and the SKIP LOCKED was removed in #288.

For the record, the problem wasn't inherently the FOR UPDATE SKIP LOCKED but the placement of it in the query. It was placed in the same subselect as the advisory-lock function, implying that sometimes the advisory-lock function would lock a record that would be then be skip-locked from the results but still remain advisory locked: #285 (comment)

Or, the issue could be that at the time I was using a Rails.application.executor.wrap instead of a Rails.application.reloader.wrap which was also identified in #389 as as source of WARNING: you don't own a lock of type ExclusiveLock errors. (a little more convo in #99 too, which is when the executor.wrap was last reviewed)

Or both!

…le `finished_at` is blank before performing job

bensheldon · 2023-10-19T15:31:09Z

I tried benchmarking this with queue-shootout and I don't see a meaningful performance difference:

Main Branch

❯ JOB_COUNT=100_000 bundle exec rake --backtrace
Benchmarking good_job
  QUIET = false
  ITERATIONS = 5
  DATABASE_URL = postgres://postgres:@localhost/shootout
  JOB_COUNT = 100000
  TEST_PERIOD = 0.2
  WARMUP_PERIOD = 0.2
  SYNCHRONOUS_COMMIT = on

Iteration #1:
good_job: 1 => 29.8, 2 => 49.7, 3 => 59.6, 4 => 79.7, 5 => 99.3, 6 => 119.3, 7 => 139.1, 8 => 158.8, 9 => 183.2, 10 => 201.3, 11 => 198.0, 12 => 202.3, 13 => 202.3, 14 => 197.5, 15 => 202.7, 16 => 168.7, 17 => 187.8, 18 => 193.1, 19 => 213.2, 20 => 213.0, 21 => 207.3, 22 => 200.4, 23 => 194.3, 24 => 199.7
good_job: Peaked at 19 workers with 213.2 jobs/second

Iteration #2:
good_job: 1 => 29.8, 2 => 59.8, 3 => 74.5, 4 => 79.3, 5 => 99.4, 6 => 119.3, 7 => 139.1, 8 => 159.0, 9 => 203.2, 10 => 207.3, 11 => 208.0, 12 => 225.6, 13 => 221.5, 14 => 221.5, 15 => 218.0, 16 => 208.9, 17 => 210.9
good_job: Peaked at 12 workers with 225.6 jobs/second

Iteration #3:
good_job: 1 => 29.7, 2 => 44.5, 3 => 59.6, 4 => 99.4, 5 => 99.3, 6 => 119.0, 7 => 139.1, 8 => 158.8, 9 => 208.5, 10 => 198.1, 11 => 203.9, 12 => 207.7, 13 => 196.4, 14 => 216.2, 15 => 205.8, 16 => 219.9, 17 => 203.2, 18 => 215.8, 19 => 214.0, 20 => 191.4, 21 => 164.6
good_job: Peaked at 16 workers with 219.9 jobs/second

Iteration #4:
good_job: 1 => 19.8, 2 => 44.8, 3 => 44.7, 4 => 59.5, 5 => 74.9, 6 => 89.3, 7 => 104.3, 8 => 119.0, 9 => 133.8, 10 => 160.7, 11 => 145.4, 12 => 177.7, 13 => 181.0, 14 => 167.9, 15 => 176.4, 16 => 174.6, 17 => 167.3, 18 => 186.8, 19 => 186.7, 20 => 165.2, 21 => 145.3, 22 => 161.5, 23 => 146.4
good_job: Peaked at 18 workers with 186.8 jobs/second

Iteration #5:
good_job: 1 => 19.8, 2 => 29.8, 3 => 29.9, 4 => 59.6, 5 => 74.9, 6 => 89.4, 7 => 104.3, 8 => 119.3, 9 => 133.5, 10 => 168.6, 11 => 168.4, 12 => 170.7, 13 => 168.0, 14 => 148.9, 15 => 178.1, 16 => 160.2, 17 => 167.9, 18 => 179.2, 19 => 166.6, 20 => 177.6, 21 => 175.9, 22 => 143.3, 23 => 159.1
good_job: Peaked at 18 workers with 179.2 jobs/second

good_job jobs per second: avg = 204.9, max = 225.6, min = 179.2, stddev = 20.7

Total runtime: 56.1 seconds

This PR

❯ JOB_COUNT=100_000 bundle exec rake --backtrace
Benchmarking good_job
  QUIET = false
  ITERATIONS = 5
  DATABASE_URL = postgres://postgres:@localhost/shootout
  JOB_COUNT = 100000
  TEST_PERIOD = 0.2
  WARMUP_PERIOD = 0.2
  SYNCHRONOUS_COMMIT = on

Iteration #1:
good_job: 1 => 29.8, 2 => 59.2, 3 => 74.5, 4 => 79.5, 5 => 124.2, 6 => 119.3, 7 => 139.3, 8 => 159.6, 9 => 216.7, 10 => 221.7, 11 => 218.0, 12 => 210.1, 13 => 231.2, 14 => 223.6, 15 => 211.1, 16 => 212.3, 17 => 221.8, 18 => 231.4, 19 => 208.5, 20 => 206.1, 21 => 211.4, 22 => 220.5, 23 => 177.2
good_job: Peaked at 18 workers with 231.4 jobs/second

Iteration #2:
good_job: 1 => 34.6, 2 => 49.6, 3 => 74.5, 4 => 79.5, 5 => 124.2, 6 => 149.0, 7 => 139.1, 8 => 158.6, 9 => 193.5, 10 => 208.7, 11 => 217.7, 12 => 210.3, 13 => 222.3, 14 => 221.7, 15 => 217.9, 16 => 214.7, 17 => 204.8, 18 => 219.7
good_job: Peaked at 13 workers with 222.3 jobs/second

Iteration #3:
good_job: 1 => 24.8, 2 => 49.7, 3 => 59.6, 4 => 59.5, 5 => 72.9, 6 => 118.9, 7 => 139.1, 8 => 198.7, 9 => 213.1, 10 => 198.2, 11 => 215.4, 12 => 182.0, 13 => 209.7, 14 => 216.0, 15 => 216.9, 16 => 223.6, 17 => 202.8, 18 => 227.9, 19 => 232.3, 20 => 209.9, 21 => 222.7, 22 => 204.1, 23 => 200.7, 24 => 202.1
good_job: Peaked at 19 workers with 232.3 jobs/second

Iteration #4:
good_job: 1 => 24.8, 2 => 49.6, 3 => 74.6, 4 => 79.5, 5 => 97.0, 6 => 119.2, 7 => 139.1, 8 => 158.2, 9 => 211.6, 10 => 202.7, 11 => 216.8, 12 => 211.0, 13 => 218.1, 14 => 231.7, 15 => 204.0, 16 => 221.6, 17 => 232.0, 18 => 221.6, 19 => 196.0, 20 => 236.7, 21 => 231.1, 22 => 206.1, 23 => 223.2, 24 => 189.3, 25 => 198.4
good_job: Peaked at 20 workers with 236.7 jobs/second

Iteration #5:
good_job: 1 => 29.8, 2 => 59.5, 3 => 59.6, 4 => 99.4, 5 => 124.2, 6 => 119.2, 7 => 139.0, 8 => 158.8, 9 => 193.8, 10 => 207.2, 11 => 215.0, 12 => 195.7, 13 => 235.3, 14 => 202.8, 15 => 237.3, 16 => 225.7, 17 => 185.4, 18 => 233.5, 19 => 227.7, 20 => 213.0
good_job: Peaked at 15 workers with 237.3 jobs/second

good_job jobs per second: avg = 232.0, max = 237.3, min = 222.3, stddev = 6.0

Total runtime: 55.0 seconds

bensheldon changed the title ~~Remove pg_advisory_unlock_all() after job is run; only verify mutable finished_at is blank before performing job~~ Remove pg_advisory_unlock_all() after job is run; only verify blank finished_at (and not lock presence) before performing job Oct 18, 2023

bensheldon force-pushed the remove_unlock_all branch from 558adaf to 8ce7182 Compare October 18, 2023 16:14

Remove pg_advisory_unlock_all() after job is run; only verify mutab…

68501a3

…le `finished_at` is blank before performing job

bensheldon force-pushed the remove_unlock_all branch from 8ce7182 to 68501a3 Compare October 18, 2023 17:59

bensheldon marked this pull request as ready for review October 19, 2023 00:08

bensheldon merged commit 51942f7 into main Oct 19, 2023
20 checks passed

bensheldon deleted the remove_unlock_all branch October 19, 2023 18:21

bensheldon added the refactor Code changes that do not introduce new features label Oct 19, 2023

This was referenced Oct 23, 2023

Wrap Adapter enqueue methods and Batch callbacks with Rails Reloader; verify in tests that no Advisory locks remain at database connection check-in #1124

Merged

Investigating performance issues with good_job #896

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove `pg_advisory_unlock_all()` after job is run; only verify blank `finished_at` (and not lock presence) before performing job #1113

Remove `pg_advisory_unlock_all()` after job is run; only verify blank `finished_at` (and not lock presence) before performing job #1113

bensheldon commented Oct 18, 2023 •

edited

bensheldon commented Oct 19, 2023

Remove pg_advisory_unlock_all() after job is run; only verify blank finished_at (and not lock presence) before performing job #1113

Remove pg_advisory_unlock_all() after job is run; only verify blank finished_at (and not lock presence) before performing job #1113

Conversation

bensheldon commented Oct 18, 2023 • edited

Background / Analysis

bensheldon commented Oct 19, 2023

Remove `pg_advisory_unlock_all()` after job is run; only verify blank `finished_at` (and not lock presence) before performing job #1113

Remove `pg_advisory_unlock_all()` after job is run; only verify blank `finished_at` (and not lock presence) before performing job #1113

bensheldon commented Oct 18, 2023 •

edited