Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about running many queues for webservers and workers #96

Closed
dylanjha opened this issue Jan 19, 2018 · 7 comments · Fixed by #97
Closed

question about running many queues for webservers and workers #96

dylanjha opened this issue Jan 19, 2018 · 7 comments · Fixed by #97

Comments

@dylanjha
Copy link
Contributor

This seems to be a common pattern with background workers in a web application, but I haven't seen it explicitly documented so I want to open this up as a question. If necessary, I'm happy to provide a PR with a documentation update.

Let's say I have a web app and I have 30 different background jobs that get run. For example:

  1. send an email
  2. after a user authenticates with facebook, queue a job to fetch new data from the facebook api and update the user's profile
  3. after receiving a webhook from stripe, queue a job to update a customer's subscription data
  4. after some action happens, queue a job to send an api call to zapier
  5. after receiving a webhook from zapier, and process it

The way bee-queue (and bull) are set up, each of these 30 background jobs would have their own "queue". Below is an example showing the first two queues.

As one might imagine, with 30 different background jobs I will have 30 different instances of Queue on each webserver and 30 different instances of Queue on each worker server.

From the docs:

Queues are very lightweight — the only significant overhead is connecting to Redis — so if you need to handle different types of jobs, just instantiate a queue for each:

My Questions

  1. Is there a way to re-use the Redis connection so that each webserver and each worker server only maintains 1 connection? Or is that something I need to worry about?
  2. Is there a better recommended pattern that I am missing?
  3. Are there other gotchas or things to watch out for if I have 30+ different queues?

Webserver

const Queue = require('bee-queue')
const emailQueue = new Queue('EMAIL_DELIVERY', {
  redis: process.env.REDIS_URL,
  isWorker: false,
  getEvents: false
})

const facebookUpdateQueue = new Queue('FACEBOOK_UPDATE', {
  redis: process.env.REDIS_URL,
  isWorker: false,
  getEvents: false
})

function sendEmail (messageData) {
  const job = queue.createJob(messageData)
  return job.save()
}

function updateFacebook (data) {
  const job = facebookUpdateQueue.createJob(data)
  return job.save()
}

Worker server

const Queue = require('bee-queue')
const emailQueue = new Queue('EMAIL_DELIVERY', {redis: process.env.REDIS_URL})
const facebookUpdateQueue = new Queue('FACEBOOK_UPDATE', {redis: process.env.REDIS_URL})

emailQueue.process((job) => {
  return Email.deliver(job.data)
})

facebookUpdate.process((job) => {
  return FacebookUpdater.process(job.data)
})
@LewisJEllis
Copy link
Member

LewisJEllis commented Jan 19, 2018

Spot-on use case and analysis of the situation. You're right to be concerned, but fortunately we can mostly keep it under control.

The Redis server allows something like min(system_max_file_descriptors, 10000) connections, so we can get away with this one-connection-per-queue pattern for small deployments with lots of queues or large deployments with few queues, but otherwise we can use them up pretty quickly; if we have a medium deployment of 20 servers, 4 processes/server, 30 queues/process, and 2 connections/queue (typical case), we're halfway to our limit!

Quick breakdown: queues can have 3 kinds of Redis clients:

  • general client used for most stuff
  • event client
    • used for pub/sub - mostly for producers finding out their job is finished
    • not used if getEvents and activateDelayedJobs are false (and latter is false by default)
  • blocking client
    • used by workers waiting for a job to come in (brpoplpush)
    • not used if isWorker is false

Every queue always has its own event and/or blocking client if necessary, but they can reuse/share the general client (but don't by default). This means that if you have 30 producer queues and none of them need to receive completion events, you can use just one connection for all of them, but if you have 30 workers you will need to use 1 general connection plus 30 blocking clients. Just make sure you also have the proper settings flags set depending on the role of each Queue instance to avoid these secondary connections when you don't need them - it looks like you've already got that sorted on the producers in your example, but you should be able to do getEvents: false on the workers as well.

To share the general client, you can pass a node_redis RedisClient instance as the redis argument of the Queue settings; make one, pass it to all of your queues. You can see the implementation here and we briefly mention this in the queue config docs, although I just noticed the explanation there kind of trails off - would definitely accept a PR to improve that.

So your above worker example could look something like:

const redis = require('redis')
const Queue = require('bee-queue')

const sharedConfig = {
  getEvents: false,
  redis: redis.createClient(process.env.REDIS_URL)
}
  
const emailQueue = new Queue('EMAIL_DELIVERY', sharedConfig)
const facebookUpdateQueue = new Queue('FACEBOOK_UPDATE', sharedConfig)

emailQueue.process((job) => {
  return Email.deliver(job.data)
})

facebookUpdate.process((job) => {
  return FacebookUpdater.process(job.data)
})

Fully implemented, if our original 20 servers were half producers and half workers and we don't use getEvents, we can have just 40 total producer clients, and 40 * 31 worker clients - around a quarter of where we started. Of course, I doubt all 40 worker processes need to process on all 30 job types, so you could probably reduce it much further.

@bradvogel I recall you mentioned that you run a pretty good number of separate queues, but I'm not sure how split up those are between services or how many total connections your Redis instances typically have open. Have you paid much attention to total connections by any one server/process or to total redis connection count, or taken any specific measures to help keep it under control?

@dylanjha
Copy link
Contributor Author

@LewisJEllis Thank you for your thoughtful and thorough reply, I had a feeling something like this would be possible.

I'll give this a try today and see how it goes. I'll happily take a stab at improving the docs, specifically related to the shared RedisClient

@dylanjha
Copy link
Contributor Author

@LewisJEllis thanks, I've tested this out, seems to be working as expected. I submitted a PR to add some clarifying documentation and an example to the README.

Please let me know if anything in there is inaccurate or could be explained in a better way. Thank you!

@skeggse
Copy link
Member

skeggse commented Jul 10, 2018

I recall you mentioned that you run a pretty good number of separate queues, but I'm not sure how split up those are between services or how many total connections your Redis instances typically have open.

We have some redis clusters with upwards of 5000 open connections.

Have you paid much attention to total connections by any one server/process or to total redis connection count, or taken any specific measures to help keep it under control?

In the past we've consolidated to common redis command clients (as opposed to event/blocking connections), but haven't made it a high priority.

So basically yes this is the way we've handled many connections internally, and what we'd recommend for others.

@skeggse
Copy link
Member

skeggse commented Jul 10, 2018

Closing as this issue seems resolved (but useful for historical context 😄). Feel free to reopen.

@gask
Copy link

gask commented Oct 15, 2020

In the past we've consolidated to common redis command clients (as opposed to event/blocking connections), but haven't made it a high priority.

@skeggse I'm not sure what you meant by consolidated to common redis command clients when we have a blocking client (aka a Worker). Can you help me understand how would I make the worker do the job scheduled without the blocking client?

@alam38
Copy link

alam38 commented May 22, 2024

@skeggse

Hey, I wanted to ask the same question that Gask was asking above.

In the past we've consolidated to common redis command clients (as opposed to event/blocking connections), but haven't made it a high priority.

I'm not sure what you meant by consolidated to common redis command clients when we have a blocking client (aka a Worker). Can you help me understand how would I make the worker do the job scheduled without the blocking client?

For further context on my specific use cases. I have multiple docker containers running the same worker queues. Any help consolidating the number of redis connections would be greatly appreciated.

Adding @LewisJEllis for extra visibility. Please let me know if any extra context would help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants