Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why is queue.close() too slow.. and why is it not empty? #2372

Closed
snawaz opened this issue Jun 9, 2022 · 11 comments · Fixed by #2380
Closed

why is queue.close() too slow.. and why is it not empty? #2372

snawaz opened this issue Jun 9, 2022 · 11 comments · Fixed by #2380
Labels

Comments

@snawaz
Copy link

snawaz commented Jun 9, 2022

queue.close() takes almost 0.5 secs to complete and debugging further, I found that line #1269 is being slow (inside whenCurrentJobsFinished which is being called by .close()).

bull/lib/queue.js

Lines 1265 to 1270 in 5649689

const forcedReconnection = redisClientDisconnect(this.bclient).then(() => {
return this.bclient.connect();
});
return Promise.all(this.processing).then(() => forcedReconnection);
};

I splitted the line to measure the time elapsed as follows:

const t1 = new Date().getTime();
await Promise.all(this.processing);  
console.log("elapsed: ", (new Date().getTime() - t1)/1000);  // this reports almost 0.5 seconds!
return forcedReconnection;

Interestingly, my service does not add any job to the queue, yet this.processing.length is 4. Why is it not 0?

I"m using v3.22.0 but this code has not changed since then. Not sure if the other part of codebase makes this efficient in the recent version. I'm planning to use bullmq (nestjs/bullmq) instead. Just want to know if this slowness is known and acceptable?

@snawaz snawaz changed the title why queue.close() is too slow.. and why is it not empty? why is queue.close() too slow.. and why is it not empty? Jun 9, 2022
@manast
Copy link
Member

manast commented Jun 10, 2022

Worker.close() by default waits for current active jobs to complete. As for your specific case: isolate in the smallest possible case that you can write to reproduce the issue and we can look into it.

@snawaz
Copy link
Author

snawaz commented Jun 10, 2022

Worker.close() by default waits for current active jobs to complete.

Yes. I have understood that myself. But the question is, why are there jobs to begin with? I expect the queues to be empty, because I did not add any jobs to the queues.

Also, what jobs are there? I tried to print job in processJob() and all of the jobs are undefined and returns immediately as Promise.resolve(). In that case, even if there are jobs which are undefined, and returns immediately, why is it taking 0.5s then? That is so strange!

As for your specific case: isolate in the smallest possible case that you can write to reproduce the issue and we can look into it.

My backend is written in nestjs and it's huge. I can still try to write a small program to reproduce the issue.

@manast
Copy link
Member

manast commented Jun 10, 2022

As mentioned, isolate in a minimal case, if it is a bug in Bull then we can look into it, otherwise, it may be due to your or nestjs specific code.

@snawaz
Copy link
Author

snawaz commented Jun 10, 2022

As mentioned, isolate in a minimal case, if it is a bug in Bull then we can look into it, otherwise, it may be due to your or nestjs specific code.

This code reproduces the queue.close() being too slow. It uses the latest version (v4.8.3) as well.

const Queue = require('bull');

const time = () => new Date().getTime() / 1000;

async function closeBeingSlowIssue() {
  const videoQueue = new Queue('video transcoding', 'redis://127.0.0.1:6379');

  const t1 = time();

  await videoQueue.close();

  console.log("time elapsed: ", time() - t1); // logs almost 0.5s
}

closeBeingSlowIssue();

I have created this repo (https://github.com/snawaz/bull-close) which you can use to run it yourself.

@snawaz
Copy link
Author

snawaz commented Jun 11, 2022

OK. Investigating further using the above code, I found that this is the code (from redisClientDisconnect()) that takes too much time than necessary when closing the queue:

bull/lib/queue.js

Lines 546 to 553 in 41ec58e

pTimeout(
client.quit().catch(err => {
if (err.message !== 'Connection is closed.') {
throw err;
}
}),
500
)

Can we use smaller values than 500?

BTW, why this 0.5s is a problem is because, in a large codebase, each test is a clean start and shutdown of the service. So if there are 15 queues only in my service, then each test will take 15 * 0.5 = 7.5s to close the queues only, and then if there are 200 tests, almost 25 mins (200 * 7.5 = 1500s) would take to close the queues + running time of the tests. 200 tests are not many, so increasing the number of tests slows down the tests further.

For that reason, .close() needs to be quick.. ideally less than 0.05s latency.

@manast
Copy link
Member

manast commented Jun 13, 2022

These are the results I get when running your test above:
image

Btw, we close the queues in every test for the bull test suite and we do not get this issue 🤔

@snawaz
Copy link
Author

snawaz commented Jun 13, 2022

@manast

These are the results I get when running your test above: <...>

Btw, we close the queues in every test for the bull test suite and we do not get this issue 🤔

Thanks for sharing the results you get on your system. That's indeed interesting.

Could you please also try running the code without running any Redis instance at all? I've tried both scenarios, with or without Redis instance. I get the same results.

@manast
Copy link
Member

manast commented Jun 13, 2022

If there is no instance at all then timing changes as the client will spend time timing out from a nonexisting redis host.

@snawaz
Copy link
Author

snawaz commented Jun 14, 2022

If there is no instance at all then timing changes as the client will spend time timing out from a nonexisting redis host.

That makes sense.

I further investigated. Now I found the issue. The code shared above isn't enough to reproduce the problem. Here is the change. The comment explains what exactly is happening and what causes the delay!

image

See the commit: snawaz/bull-close@d812f4c

Please note that the call videoQueue.getNextJob() does not necessarily have to make sense. But in the real scenario, at some point, queue.bclient will be initialized and queue.bclientInitialized will be true and so on.

@snawaz
Copy link
Author

snawaz commented Jun 15, 2022

@manast

I think one fix, or rather improvement, could be as follows..

Replace this pattern:

//
// Force reconnection of blocking connection to abort blocking redis call immediately.
//
const forcedReconnection = redisClientDisconnect(this.bclient).then(() => {
   return this.bclient.connect();
});

with this pattern:

void redisClientDisconnect(this.bclient); // no await here
this.bclient = <create-new-client-with-new-connection>;

I mean why do we need to wait for redisClientDisconnect() to complete before reconnecting ? I guess that is only because we use the same client? What if we recreate the client itself? That will enable us to run the slow-step i.e redisClientDisconnect() in the background.

Please let me know your thoughts. If acceptable, I might create a PR for this. 🤔

@manast
Copy link
Member

manast commented Jun 16, 2022

🎉 This issue has been resolved in version 4.8.4 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants