Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove pg_advisory_unlock_all() after job is run; only verify blank finished_at (and not lock presence) before performing job #1113

Merged
merged 1 commit into from Oct 19, 2023

Conversation

bensheldon
Copy link
Owner

@bensheldon bensheldon commented Oct 18, 2023

The background of this change arises from trying to overcome the performance limitations of the lock verification as described in #896.

Before this change, the sequence when dequeuing a job to perform:

  1. Do the advisory lock query, and return the record
  2. Verify that the record is advisory locked and finished_at is not set
  3. Execute the job if the previous condition is ok
  4. Blanket unlock all adisory unlocks held by the session

After this change:

  1. Do the advisory lock query, and return the record
  2. Verify that the record's finished_at is not set
  3. Execute the job if the previous condition is ok
  4. Unlock the specific record that was advisory locked in the first step

Background / Analysis

Most briefly, GoodJob's advisory locking strategy went through a lot of churn in mid-2021.

Using pg_advisory_unlock_all, instead of unlocking the specifically returned record, was introduced in #285 to address unexpectedly advisory locked records caused by "an indeterminate reason" (😞). In the fullness of time, I'm fairly certain that it was caused by a FOR UPDATE SKIP LOCKED that was introduced in #273 (mistakenly? that PR says it was extracted to #274, and it wasn't), and the SKIP LOCKED was removed in #288.

For the record, the problem wasn't inherently the FOR UPDATE SKIP LOCKED but the placement of it in the query. It was placed in the same subselect as the advisory-lock function, implying that sometimes the advisory-lock function would lock a record that would be then be skip-locked from the results but still remain advisory locked: #285 (comment)

Or, the issue could be that at the time I was using a Rails.application.executor.wrap instead of a Rails.application.reloader.wrap which was also identified in #389 as as source of WARNING: you don't own a lock of type ExclusiveLock errors. (a little more convo in #99 too, which is when the executor.wrap was last reviewed)

Or both!

@bensheldon bensheldon changed the title Remove pg_advisory_unlock_all() after job is run; only verify mutable finished_at is blank before performing job Remove pg_advisory_unlock_all() after job is run; only verify blank finished_at (and not lock presence) before performing job Oct 18, 2023
…le `finished_at` is blank before performing job
@bensheldon bensheldon marked this pull request as ready for review October 19, 2023 00:08
@bensheldon
Copy link
Owner Author

I tried benchmarking this with queue-shootout and I don't see a meaningful performance difference:

Main Branch

❯ JOB_COUNT=100_000 bundle exec rake --backtrace
Benchmarking good_job
  QUIET = false
  ITERATIONS = 5
  DATABASE_URL = postgres://postgres:@localhost/shootout
  JOB_COUNT = 100000
  TEST_PERIOD = 0.2
  WARMUP_PERIOD = 0.2
  SYNCHRONOUS_COMMIT = on

Iteration #1:
good_job: 1 => 29.8, 2 => 49.7, 3 => 59.6, 4 => 79.7, 5 => 99.3, 6 => 119.3, 7 => 139.1, 8 => 158.8, 9 => 183.2, 10 => 201.3, 11 => 198.0, 12 => 202.3, 13 => 202.3, 14 => 197.5, 15 => 202.7, 16 => 168.7, 17 => 187.8, 18 => 193.1, 19 => 213.2, 20 => 213.0, 21 => 207.3, 22 => 200.4, 23 => 194.3, 24 => 199.7
good_job: Peaked at 19 workers with 213.2 jobs/second

Iteration #2:
good_job: 1 => 29.8, 2 => 59.8, 3 => 74.5, 4 => 79.3, 5 => 99.4, 6 => 119.3, 7 => 139.1, 8 => 159.0, 9 => 203.2, 10 => 207.3, 11 => 208.0, 12 => 225.6, 13 => 221.5, 14 => 221.5, 15 => 218.0, 16 => 208.9, 17 => 210.9
good_job: Peaked at 12 workers with 225.6 jobs/second

Iteration #3:
good_job: 1 => 29.7, 2 => 44.5, 3 => 59.6, 4 => 99.4, 5 => 99.3, 6 => 119.0, 7 => 139.1, 8 => 158.8, 9 => 208.5, 10 => 198.1, 11 => 203.9, 12 => 207.7, 13 => 196.4, 14 => 216.2, 15 => 205.8, 16 => 219.9, 17 => 203.2, 18 => 215.8, 19 => 214.0, 20 => 191.4, 21 => 164.6
good_job: Peaked at 16 workers with 219.9 jobs/second

Iteration #4:
good_job: 1 => 19.8, 2 => 44.8, 3 => 44.7, 4 => 59.5, 5 => 74.9, 6 => 89.3, 7 => 104.3, 8 => 119.0, 9 => 133.8, 10 => 160.7, 11 => 145.4, 12 => 177.7, 13 => 181.0, 14 => 167.9, 15 => 176.4, 16 => 174.6, 17 => 167.3, 18 => 186.8, 19 => 186.7, 20 => 165.2, 21 => 145.3, 22 => 161.5, 23 => 146.4
good_job: Peaked at 18 workers with 186.8 jobs/second

Iteration #5:
good_job: 1 => 19.8, 2 => 29.8, 3 => 29.9, 4 => 59.6, 5 => 74.9, 6 => 89.4, 7 => 104.3, 8 => 119.3, 9 => 133.5, 10 => 168.6, 11 => 168.4, 12 => 170.7, 13 => 168.0, 14 => 148.9, 15 => 178.1, 16 => 160.2, 17 => 167.9, 18 => 179.2, 19 => 166.6, 20 => 177.6, 21 => 175.9, 22 => 143.3, 23 => 159.1
good_job: Peaked at 18 workers with 179.2 jobs/second

good_job jobs per second: avg = 204.9, max = 225.6, min = 179.2, stddev = 20.7

Total runtime: 56.1 seconds

This PR

❯ JOB_COUNT=100_000 bundle exec rake --backtrace
Benchmarking good_job
  QUIET = false
  ITERATIONS = 5
  DATABASE_URL = postgres://postgres:@localhost/shootout
  JOB_COUNT = 100000
  TEST_PERIOD = 0.2
  WARMUP_PERIOD = 0.2
  SYNCHRONOUS_COMMIT = on

Iteration #1:
good_job: 1 => 29.8, 2 => 59.2, 3 => 74.5, 4 => 79.5, 5 => 124.2, 6 => 119.3, 7 => 139.3, 8 => 159.6, 9 => 216.7, 10 => 221.7, 11 => 218.0, 12 => 210.1, 13 => 231.2, 14 => 223.6, 15 => 211.1, 16 => 212.3, 17 => 221.8, 18 => 231.4, 19 => 208.5, 20 => 206.1, 21 => 211.4, 22 => 220.5, 23 => 177.2
good_job: Peaked at 18 workers with 231.4 jobs/second

Iteration #2:
good_job: 1 => 34.6, 2 => 49.6, 3 => 74.5, 4 => 79.5, 5 => 124.2, 6 => 149.0, 7 => 139.1, 8 => 158.6, 9 => 193.5, 10 => 208.7, 11 => 217.7, 12 => 210.3, 13 => 222.3, 14 => 221.7, 15 => 217.9, 16 => 214.7, 17 => 204.8, 18 => 219.7
good_job: Peaked at 13 workers with 222.3 jobs/second

Iteration #3:
good_job: 1 => 24.8, 2 => 49.7, 3 => 59.6, 4 => 59.5, 5 => 72.9, 6 => 118.9, 7 => 139.1, 8 => 198.7, 9 => 213.1, 10 => 198.2, 11 => 215.4, 12 => 182.0, 13 => 209.7, 14 => 216.0, 15 => 216.9, 16 => 223.6, 17 => 202.8, 18 => 227.9, 19 => 232.3, 20 => 209.9, 21 => 222.7, 22 => 204.1, 23 => 200.7, 24 => 202.1
good_job: Peaked at 19 workers with 232.3 jobs/second

Iteration #4:
good_job: 1 => 24.8, 2 => 49.6, 3 => 74.6, 4 => 79.5, 5 => 97.0, 6 => 119.2, 7 => 139.1, 8 => 158.2, 9 => 211.6, 10 => 202.7, 11 => 216.8, 12 => 211.0, 13 => 218.1, 14 => 231.7, 15 => 204.0, 16 => 221.6, 17 => 232.0, 18 => 221.6, 19 => 196.0, 20 => 236.7, 21 => 231.1, 22 => 206.1, 23 => 223.2, 24 => 189.3, 25 => 198.4
good_job: Peaked at 20 workers with 236.7 jobs/second

Iteration #5:
good_job: 1 => 29.8, 2 => 59.5, 3 => 59.6, 4 => 99.4, 5 => 124.2, 6 => 119.2, 7 => 139.0, 8 => 158.8, 9 => 193.8, 10 => 207.2, 11 => 215.0, 12 => 195.7, 13 => 235.3, 14 => 202.8, 15 => 237.3, 16 => 225.7, 17 => 185.4, 18 => 233.5, 19 => 227.7, 20 => 213.0
good_job: Peaked at 15 workers with 237.3 jobs/second

good_job jobs per second: avg = 232.0, max = 237.3, min = 222.3, stddev = 6.0

Total runtime: 55.0 seconds

@bensheldon bensheldon merged commit 51942f7 into main Oct 19, 2023
20 checks passed
@bensheldon bensheldon deleted the remove_unlock_all branch October 19, 2023 18:21
@bensheldon bensheldon added the refactor Code changes that do not introduce new features label Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Code changes that do not introduce new features
Projects
Development

Successfully merging this pull request may close these issues.

None yet

1 participant