removeJobs took 30 minutes to finish #2127

kietheros · 2021-08-07T15:24:13Z

Description

I have a queue

const queue = new Bull('queue', '', {
    redis: {
       ...
    },
    limiter: {
        max: 5,
        duration: 10000,
        groupKey: 'sourceId'
    },
});

Job data {
     file: '..',
     sourceId: '..'
}

I have 300 sources, each source has 10k -> 30k job data. Now, I have pushed 3 million jobs into the queue. My workers will take 5 -> 7 days to process all jobs. Sometimes, I want to remove all jobs of a source using removeJobs.

Example: sourceId is "abc"
I call queue.removeJobs("*:abc"). It took 30 minutes to finish (very very slow). I hope that removing jobs take within 1 or 2 minutes.

How can I optimize removeJobs in Bull ?
If not, could you recommend me a design to remove jobs faster ? Thanks !

Minimal, Working Test code to reproduce the issue.

(An easy to reproduce test case will dramatically decrease the resolution time.)

Bull version

3.27.0

Additional information

Redis server v=6.2.5

The text was updated successfully, but these errors were encountered:

manast · 2021-08-07T21:27:39Z

How many jobs were there in the queue that took 30 minutes to be removed?
Btw, are you aware of the "removeOnComple" option?

kietheros · 2021-08-08T04:03:52Z

I push 3M jobs into the queue with removeOnComplete and removeOnFail. My workers are slow, they only can process 12 jobs/s.
The worker processes jobs of each source to find some data. If found, don't need to process remaining jobs of that source.

In my test, a source (id: "abc") has 30k jobs, workers have only processed 300 jobs of this source and have found the result, remaining jobs still ~ 30k. queue.removeJobs("*:abc") takes 30 minutes to remove them.

I have checked in redis: waiting LIST has 2.5M elements. When removing 30k jobs, I think that running time will be ~ O(30k * 2.5M). Therefore, it is very very slow.

manast · 2021-08-08T07:16:05Z

I think you should consider using the queue method "empty" (https://github.com/OptimalBits/bull/blob/develop/REFERENCE.md#queueempty) for your use case, so it will just clean the jobs that are in the wait list, but of course for this to work you need to use a different queue for every "source". Otherwise it will be as you say, it need to scan the whole Redis key space until it finds the keys that matches the pattern.

manast · 2021-08-08T07:17:18Z

Btw, it would just need to scan it once so only 2.5M checks not 30 * 2.5M.

kietheros · 2021-08-08T07:42:26Z

I have 300 sources, each source has the same kind of jobs.
If using a different queue for each source: 300 queues, each queue has N workers with rate limiter.
There are 300 * N workers, but my server only should run with 16 workers in parallel. Using this design, I can't adjust concurrency level for overall system.

manast · 2021-08-08T08:53:37Z

In that case it is going to be difficult to make it much faster than it is now. One thing you can try is using the internal method "getRanges" (you will need to check the source code to understand how it should be called), and get the "waiting" job ids in batches, maybe a couple of hundred per batch (since getRanges supports pagination), and then just remove the jobs that matches the pattern manually (you can delete many jobs in parallel to accelerate it), maybe this works faster I don't know.

kietheros · 2021-08-08T09:44:32Z

I intent to change code as below.
The crawler gets list of items from a source (lightweight, JSON array), and push items into a crawl-queue of each source.
A process will pop items from all crawl-queue, and push them to process-queue. I will try to adjust speed of pushing jobs to keep process-queue small.

kietheros closed this as completed Aug 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

removeJobs took 30 minutes to finish #2127

removeJobs took 30 minutes to finish #2127

kietheros commented Aug 7, 2021 •

edited

manast commented Aug 7, 2021

kietheros commented Aug 8, 2021 •

edited

manast commented Aug 8, 2021

manast commented Aug 8, 2021

kietheros commented Aug 8, 2021

manast commented Aug 8, 2021

kietheros commented Aug 8, 2021

removeJobs took 30 minutes to finish #2127

removeJobs took 30 minutes to finish #2127

Comments

kietheros commented Aug 7, 2021 • edited

Description

Minimal, Working Test code to reproduce the issue.

(An easy to reproduce test case will dramatically decrease the resolution time.)

Bull version

Additional information

manast commented Aug 7, 2021

kietheros commented Aug 8, 2021 • edited

manast commented Aug 8, 2021

manast commented Aug 8, 2021

kietheros commented Aug 8, 2021

manast commented Aug 8, 2021

kietheros commented Aug 8, 2021

kietheros commented Aug 7, 2021 •

edited

kietheros commented Aug 8, 2021 •

edited