Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeatable job never gets processed sometimes #1739

Closed
malisetti opened this issue May 16, 2020 · 23 comments
Closed

Repeatable job never gets processed sometimes #1739

malisetti opened this issue May 16, 2020 · 23 comments

Comments

@malisetti
Copy link

malisetti commented May 16, 2020

I have observed that sometimes a repeated job never gets handled for processing. If you get the getRepeatableJobs of the queue, sometimes(randomly happening) the next value is always less than current timestamp and both getDelayedCount and getPausedCount are 0

Minimal, Working Test code to reproduce the issue


const helloQueue = new Queue('hello-queue', 'redis://127.0.0.1:6379');

(async () => {
    helloQueue.process(async (job, done) => {
        console.log('hello world');
        done();
    });

    const c = await helloQueue.add(null, {
        repeat: { cron: '*/2 * * * *' },
    });

    setInterval(() => {
        helloQueue.getDelayedCount().then(j => console.log(j));
        helloQueue.getPausedCount().then(j => console.log(j));
        helloQueue.getRepeatableJobs().then(ji => console.log(ji));
        console.log(new Date().getTime());
    }, 10 * 1000);
})();

Output:

current timestamp: 1589657543148
[
  {
    key: '__default__::::*/2 * * * *',
    name: '__default__',
    id: null,
    endDate: null,
    tz: null,
    cron: '*/2 * * * *',
    every: null,
    next: 1589657520000
  }
]
getDelayedCount: 0
getPausedCount: 0

Create a hello world cron job which prints to console every 2 minutes

Bull version: "bull": "^3.13.0"

Additional information

next timestamp is lower than the current timestamp and the hello world is not printed every 2 mins

@malisetti
Copy link
Author

Sometimes, even with every repeat options too this happens.

1589659117238
[
  {
    key: '__default__:::120000',
    name: '__default__',
    id: null,
    endDate: null,
    tz: null,
    cron: null,
    every: 120000,
    next: 1589659080000
  }
]
0
0

@malisetti
Copy link
Author

Same issue occurs with https://github.com/taskforcesh/bullmq also

@rmzoni
Copy link

rmzoni commented Jun 12, 2020

I am facing the same problem. At some point the delay counter is negative and the repeatable job is no longer scheduled.

@malisetti
Copy link
Author

One observation: if the current run didnt handle any jobs, restarting the worker process picks them up.

@ronnn
Copy link

ronnn commented Jul 2, 2020

Maybe this is fixed by 10a9eae since version 3.14.0+?

@malisetti
Copy link
Author

Tried with "bull": "^3.15.0", seems not fixed

@ronnn
Copy link

ronnn commented Jul 6, 2020

We're using Bull for repeated jobs only. About 1-2 weeks ago I saw the behavior you described. Jobs were working fine for some time but suddenly processing stopped. Since we upgraded to 3.15.0 we didn't see this behavior anymore.

I set up a minimal example with your code, Node 12.18.1 and version 3.15.0 of Bull locally. Until now (about 3 hours) everything is working fine. How long do you usually have to wait until "hello world" isn't printed anymore?

@malisetti
Copy link
Author

The issue I observed is not about "jobs not running after sometime". I am facing an issue where if I restart the worker process, it will not pick up any jobs to process. So each start of the worker process behaves differently. Hope I am clear. Let me know your thoughts.

@niftylettuce
Copy link

niftylettuce commented Jul 7, 2020

Same issue here, I also experience this issue (related to repeated jobs) where I try to clear the queue of them before adding: #1792.

@fspoettel
Copy link

Ran into this issue as well, every as well as repeatable options are completely unusable due to this imo. Had to resort to using my own scheduler with a job-in-queue check 😕

@OptimalBits OptimalBits deleted a comment from niftylettuce Jul 17, 2020
@manast
Copy link
Member

manast commented Jul 17, 2020

@fspoettel can you be more explicit, what is "completely unusable" ? can you provide a code example that reproduces the issue?

@manast
Copy link
Member

manast commented Jul 17, 2020

Guys, If you can provide some use case that reproduces the issue I can look into it, as this issue stands currently it is not reproducible. Also note that repeatable jobs are working fine for many people, so this must be some edge case or something. I will love to look into it, but need some code I can use for reproduce it.

@fspoettel
Copy link

@fspoettel can you be more explicit, what is "completely unusable" ? can you provide a code example that reproduces the issue?

Yes, I can produce a test case for this, give me some time to isolate it from the project I'm using bull in. I'll try to provide a repo with an isolated example.

I ran into this problem not too long ago. It boils down to the queue not respecting the order of repeatable tasks when a number of tasks are registered in a very short time frame. Some tasks are executed multiple times in a row while others are "stuck" in delayed state. The problem gets worse if rate limiting option is involved.

@manast
Copy link
Member

manast commented Jul 18, 2020

Great, I will look into it as soon as you provide the test case.

@malisetti
Copy link
Author

@manast hi, thanks for this library. Hope this issue is clear to you. I have provided the sample code to reproduce this issue at the first comment.

@fspoettel
Copy link

fspoettel commented Jul 20, 2020

@manast This is a minimal example of what I ran into (edit: fix and add rate limiter which actually causes the issue)

const Queue = require('bull');
(async () => {
  const testQueue = new Queue('test-queue', {
    redis: 'redis://127.0.0.1:6379',
    limiter: {
      max: 1,
      duration: 1000,
    }
  });
  await testQueue.empty();

  testQueue.process((job, done) => {
    console.log(job.data.id);
    done();
  });

  await testQueue.add({ id: 'foo' }, {
    repeat: { every: 1000 },
    jobId: 'foo-id'
  });

  console.log(`queued foo`);

  setTimeout(async () => {
    await testQueue.add({ id: 'bar' }, {
      repeat: { every: 1000 },
      jobId: 'bar-id'
    });

    console.log(`queued bar`);
  }, 100);
})();

If I comment out the foo task, it prints:

➜  bull-test-case node index
queued bar
bar
bar
bar
bar

If I comment out the bar task, it prints:

➜  bull-test-case node index
queued foo
foo
foo
foo
foo

When both are present, it prints:

➜  bull-test-case node index
queued foo
foo
queued bar
foo
foo
foo
bar
bar
foo
foo
foo
foo
foo
foo

while it should print foo and bar in alternating fashion

@manast
Copy link
Member

manast commented Jul 20, 2020

@fspoettel ok. So this example works as designed. In order to achieve what you want you need to remove the previous repeatable job, i.e. you cannot update a given repeatable job. If you add "bar" with a different setting than every: 1000 then it would be added and you will have 2 different repeatable jobs.
For deleting a repeatable job you can either use https://github.com/OptimalBits/bull/blob/develop/REFERENCE.md#queueremoverepeatable specifying the same repeatable options, or https://github.com/OptimalBits/bull/blob/develop/REFERENCE.md#queueremoverepeatablebykey but then you need to use https://github.com/OptimalBits/bull/blob/develop/REFERENCE.md#queuegetrepeatablejobs to get the keys.

@manast
Copy link
Member

manast commented Jul 20, 2020

@mseshachalam regarding your issue. If the next timestamp is older than current timestamp, that means that for some reason the delayed job that is waiting for the next repetition has been removed from the delay set. Did you possibly remove the delay set or called the "empty" function (that also removes the delayed jobs and will effectively break the repetitions?)

@fspoettel
Copy link

@manast Thanks for your helpful comment. Apologies, I messed up the example and forgot to assign separate jobId params 🙈
The test case then works with correct ordering until I introduce a limiter on the queue which causes order to be lost. I updated the test case but I see that there are already tracking bugs for the limiter elsewhere. Sorry for commenting on the wrong issue, seems like my problem actually lies with the limiter (also sorry OP!)

@manast
Copy link
Member

manast commented Jul 20, 2020

no problem!.

@malisetti
Copy link
Author

Yes @manast , i have .empty called on the job queues. My code has the following structure.

customSchedulerQueue.process(async (job, done) => {})
await customSchedulerQueue.empty()
await customSchedulerQueue.add(null, { repeat: { cron: '*/2 * * * *' } })
customSchedulerQueue.on('completed', (job, result) => {
// Job completed with output result!
logger.log('custom event scheduling completed', result);
});

@malisetti
Copy link
Author

@manast

Instead of emptying the queue, I am using

await customSchedulerQueue.removeRepeatable({
cron: eventSchedulingCronExp,
jobId: 'csq',
});

and my process is working as expected. Thanks.

@PlkMarudny
Copy link

I also have a similar problem. Repeatable jobs do not start after some time. I have around 10k jobs and they stay always in a "delayed' state. As definition of jobs is kept in a separate database, and they are created and deleted in bulk, I use .obliterate() quite often to quickly remove and recreate jobs, if a bulk operation is involved. But after some time, the queue stops working. No jobs are picked any more. The solution I found is to create a new queue with a different name, recreate jobs and I am good again. I did not investigate that a lot, but observing a Bull queue using Bull dashboard I noticed, that after adding a jobs to a broken queue when the time to launch comes, jobs are moved for a to a "waiting" queue" for a moment (but not picked by a worker) and again land in the 'delayed' one with no further activity later on whatsoever. I can promote a job manually and it is processed.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants