question about running many queues for webservers and workers #96

dylanjha · 2018-01-19T01:59:02Z

This seems to be a common pattern with background workers in a web application, but I haven't seen it explicitly documented so I want to open this up as a question. If necessary, I'm happy to provide a PR with a documentation update.

Let's say I have a web app and I have 30 different background jobs that get run. For example:

send an email
after a user authenticates with facebook, queue a job to fetch new data from the facebook api and update the user's profile
after receiving a webhook from stripe, queue a job to update a customer's subscription data
after some action happens, queue a job to send an api call to zapier
after receiving a webhook from zapier, and process it

The way bee-queue (and bull) are set up, each of these 30 background jobs would have their own "queue". Below is an example showing the first two queues.

As one might imagine, with 30 different background jobs I will have 30 different instances of Queue on each webserver and 30 different instances of Queue on each worker server.

From the docs:

Queues are very lightweight — the only significant overhead is connecting to Redis — so if you need to handle different types of jobs, just instantiate a queue for each:

My Questions

Is there a way to re-use the Redis connection so that each webserver and each worker server only maintains 1 connection? Or is that something I need to worry about?
Is there a better recommended pattern that I am missing?
Are there other gotchas or things to watch out for if I have 30+ different queues?

Webserver

const Queue = require('bee-queue')
const emailQueue = new Queue('EMAIL_DELIVERY', {
  redis: process.env.REDIS_URL,
  isWorker: false,
  getEvents: false
})

const facebookUpdateQueue = new Queue('FACEBOOK_UPDATE', {
  redis: process.env.REDIS_URL,
  isWorker: false,
  getEvents: false
})

function sendEmail (messageData) {
  const job = queue.createJob(messageData)
  return job.save()
}

function updateFacebook (data) {
  const job = facebookUpdateQueue.createJob(data)
  return job.save()
}

Worker server

const Queue = require('bee-queue')
const emailQueue = new Queue('EMAIL_DELIVERY', {redis: process.env.REDIS_URL})
const facebookUpdateQueue = new Queue('FACEBOOK_UPDATE', {redis: process.env.REDIS_URL})

emailQueue.process((job) => {
  return Email.deliver(job.data)
})

facebookUpdate.process((job) => {
  return FacebookUpdater.process(job.data)
})

The text was updated successfully, but these errors were encountered:

LewisJEllis · 2018-01-19T07:44:14Z

Spot-on use case and analysis of the situation. You're right to be concerned, but fortunately we can mostly keep it under control.

The Redis server allows something like min(system_max_file_descriptors, 10000) connections, so we can get away with this one-connection-per-queue pattern for small deployments with lots of queues or large deployments with few queues, but otherwise we can use them up pretty quickly; if we have a medium deployment of 20 servers, 4 processes/server, 30 queues/process, and 2 connections/queue (typical case), we're halfway to our limit!

Quick breakdown: queues can have 3 kinds of Redis clients:

general client used for most stuff
event client
- used for pub/sub - mostly for producers finding out their job is finished
- not used if getEvents and activateDelayedJobs are false (and latter is false by default)
blocking client
- used by workers waiting for a job to come in (brpoplpush)
- not used if isWorker is false

Every queue always has its own event and/or blocking client if necessary, but they can reuse/share the general client (but don't by default). This means that if you have 30 producer queues and none of them need to receive completion events, you can use just one connection for all of them, but if you have 30 workers you will need to use 1 general connection plus 30 blocking clients. Just make sure you also have the proper settings flags set depending on the role of each Queue instance to avoid these secondary connections when you don't need them - it looks like you've already got that sorted on the producers in your example, but you should be able to do getEvents: false on the workers as well.

To share the general client, you can pass a node_redis RedisClient instance as the redis argument of the Queue settings; make one, pass it to all of your queues. You can see the implementation here and we briefly mention this in the queue config docs, although I just noticed the explanation there kind of trails off - would definitely accept a PR to improve that.

So your above worker example could look something like:

const redis = require('redis')
const Queue = require('bee-queue')

const sharedConfig = {
  getEvents: false,
  redis: redis.createClient(process.env.REDIS_URL)
}
  
const emailQueue = new Queue('EMAIL_DELIVERY', sharedConfig)
const facebookUpdateQueue = new Queue('FACEBOOK_UPDATE', sharedConfig)

emailQueue.process((job) => {
  return Email.deliver(job.data)
})

facebookUpdate.process((job) => {
  return FacebookUpdater.process(job.data)
})

Fully implemented, if our original 20 servers were half producers and half workers and we don't use getEvents, we can have just 40 total producer clients, and 40 * 31 worker clients - around a quarter of where we started. Of course, I doubt all 40 worker processes need to process on all 30 job types, so you could probably reduce it much further.

@bradvogel I recall you mentioned that you run a pretty good number of separate queues, but I'm not sure how split up those are between services or how many total connections your Redis instances typically have open. Have you paid much attention to total connections by any one server/process or to total redis connection count, or taken any specific measures to help keep it under control?

dylanjha · 2018-01-19T14:38:10Z

@LewisJEllis Thank you for your thoughtful and thorough reply, I had a feeling something like this would be possible.

I'll give this a try today and see how it goes. I'll happily take a stab at improving the docs, specifically related to the shared RedisClient

dylanjha · 2018-01-19T22:27:27Z

@LewisJEllis thanks, I've tested this out, seems to be working as expected. I submitted a PR to add some clarifying documentation and an example to the README.

Please let me know if anything in there is inaccurate or could be explained in a better way. Thank you!

skeggse · 2018-07-10T21:54:44Z

I recall you mentioned that you run a pretty good number of separate queues, but I'm not sure how split up those are between services or how many total connections your Redis instances typically have open.

We have some redis clusters with upwards of 5000 open connections.

Have you paid much attention to total connections by any one server/process or to total redis connection count, or taken any specific measures to help keep it under control?

In the past we've consolidated to common redis command clients (as opposed to event/blocking connections), but haven't made it a high priority.

So basically yes this is the way we've handled many connections internally, and what we'd recommend for others.

skeggse · 2018-07-10T21:55:18Z

Closing as this issue seems resolved (but useful for historical context 😄). Feel free to reopen.

gask · 2020-10-15T19:46:30Z

In the past we've consolidated to common redis command clients (as opposed to event/blocking connections), but haven't made it a high priority.

@skeggse I'm not sure what you meant by consolidated to common redis command clients when we have a blocking client (aka a Worker). Can you help me understand how would I make the worker do the job scheduled without the blocking client?

alam38 · 2024-05-22T01:15:17Z

@skeggse

Hey, I wanted to ask the same question that Gask was asking above.

In the past we've consolidated to common redis command clients (as opposed to event/blocking connections), but haven't made it a high priority.

I'm not sure what you meant by consolidated to common redis command clients when we have a blocking client (aka a Worker). Can you help me understand how would I make the worker do the job scheduled without the blocking client?

For further context on my specific use cases. I have multiple docker containers running the same worker queues. Any help consolidating the number of redis connections would be greatly appreciated.

Adding @LewisJEllis for extra visibility. Please let me know if any extra context would help.

dylanjha mentioned this issue Jan 19, 2018

README: add example and explanation about sharing redis connections i… #97

Merged

LewisJEllis mentioned this issue Apr 3, 2018

Grouping jobs #110

Closed

skeggse added the question label Jul 10, 2018

skeggse closed this as completed Jul 10, 2018

skeggse assigned skeggse and LewisJEllis Jul 10, 2018

skeggse added this to the 1.2.3 milestone Apr 16, 2020

skeggse added the needs-documentation label Apr 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about running many queues for webservers and workers #96

question about running many queues for webservers and workers #96

dylanjha commented Jan 19, 2018

LewisJEllis commented Jan 19, 2018 •

edited

Loading

dylanjha commented Jan 19, 2018

dylanjha commented Jan 19, 2018

skeggse commented Jul 10, 2018

skeggse commented Jul 10, 2018

gask commented Oct 15, 2020

alam38 commented May 22, 2024

question about running many queues for webservers and workers #96

question about running many queues for webservers and workers #96

Comments

dylanjha commented Jan 19, 2018

My Questions

Webserver

Worker server

LewisJEllis commented Jan 19, 2018 • edited Loading

dylanjha commented Jan 19, 2018

dylanjha commented Jan 19, 2018

skeggse commented Jul 10, 2018

skeggse commented Jul 10, 2018

gask commented Oct 15, 2020

alam38 commented May 22, 2024

LewisJEllis commented Jan 19, 2018 •

edited

Loading