Look into using Advisory Locks #3

Mythra · 2019-03-21T18:45:17Z

Description

I was linked to: Que a job queue in ruby that claims superior performance by using Advisory locks. This is true advisory locks are faster, and would require some work (since you're essentially implementing you're own concurrency). However, there's several questions left to answer:

What happens to advisory locks in a failover scenario? It seems like they're entirely lost (since they are stored in a shared memory region, and not flushed to disk). We could use a notification here, but notifications aren't completely durable. I still see a row update needing to happen otherwise (which negates the benefits).
Advisory locks seem to use the same shared memory region that PostgreSQL uses for connections. To quote the docs:

Care must be taken not to exhaust this memory or the server will be unable to grant any locks at all. This imposes an upper limit on the number of advisory locks grantable by the server, typically in the tens to hundreds of thousands depending on how the server is configured.

Between the number of connections, and locks for a potentially large job queue this could be really catastrophic. As the number of jobs could create so many locks normal behavior that uses locks can't go through.

Now none of these are necissarily deal breakers (perhaps we can offer an "unsafe" mode of execution that uses locks, for the extra performance gain; where we can then document the pitfalls). That being said research needs to be done in order to determine, what the best way to handle these scenarios are.

Mythra · 2019-03-21T21:01:12Z

FWIW talked to Chris, the lead developer for Que, and got some answers to his experiences:

1. You’re right that the locks would get cleared in a failover scenario, and the workers would start from 
scratch when they connected to the new database. But then, that’s another reason to make job 
idempotent. I guess you could maybe also run into a situation where new workers spin up against the 
new database while the old workers are still working jobs through the old one, so you could have some 
jobs being worked more than once simultaneously in that case?

2. The advisory locks are taken only when the job is locked to be worked, so the limiting factor would 
be the number of lockers/workers, not the number of jobs (at my last job the job queue got backed up 
to around a million records once, and it was fine). And if you need that many advisory locks I believe 
there are configuration options you can use to bump up the limits.```

Mythra · 2019-03-21T23:30:55Z

A couple ideas I think we can take away from the points above:

Could potentially be saved by having each worker place a "placeholder" advisory lock. One that doesn't actually ever get checked, but can be used as detection for when a failover has occurred out from under the application. Then the application in the scheduler loop can re-lock any job IDs it's currently working on.
Another possibility is using an unlogged table. If your worker node id isn't in that table, you've failed over. Relock jobs.
Advisory lock buildup doesn't seem to be too big of a concern. Even amongst our biggest worker fleets at Instructure. I don't think this is something we have to be too concerned about. However we should check what the performance of postgres is when a large number of nodes exist/a large number of locks exist (also to check: how pgbouncer handles these).

Mythra added the enhancement New feature or request label Mar 21, 2019

Mythra mentioned this issue Dec 17, 2019

Offline node check without consul #5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Look into using Advisory Locks #3

Look into using Advisory Locks #3

Mythra commented Mar 21, 2019 •

edited

Loading

Mythra commented Mar 21, 2019 •

edited

Loading

Mythra commented Mar 21, 2019

Look into using Advisory Locks #3

Look into using Advisory Locks #3

Comments

Mythra commented Mar 21, 2019 • edited Loading

Description

Mythra commented Mar 21, 2019 • edited Loading

Mythra commented Mar 21, 2019

Mythra commented Mar 21, 2019 •

edited

Loading

Mythra commented Mar 21, 2019 •

edited

Loading