Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove old jobs? #58

Closed
ybogdanov opened this issue Aug 12, 2011 · 31 comments
Closed

Remove old jobs? #58

ybogdanov opened this issue Aug 12, 2011 · 31 comments

Comments

@ybogdanov
Copy link

How to do some "garbage collection" by removing old completed jobs, for example, those which are older than 1 hour?
Thanks!

@dinedal
Copy link

dinedal commented Aug 18, 2011

You should be able to to use the queue's 'job complete' event to call kue.Job.get(job_id) and then use the returned job's remove() method.

But that will delete them instantly on removal.

I too would like something that calls expire with a given time on the relevant keys, but i'm not sure how it would handle delaying the removal of the jobs from the lists...

@julesjanssen
Copy link

You could let the job expire after x seconds by using Redis' EXPIRE command: http://redis.io/commands/expire

@dinedal
Copy link

dinedal commented Nov 3, 2011

Expiring the job's key does not remove it from the list of jobs.

@mypark
Copy link

mypark commented Mar 14, 2012

I'd like to have this feature as well. Perhaps a command that deletes all complete jobs that I can run periodically? I suppose I can make the request directly in redis but I'm a bit unfamiliar with the structure of the data that kue uses.

@sebicas
Copy link

sebicas commented Sep 8, 2012

+1 for having Jobs deleted from Completed Queue

@jwchang0206
Copy link

+1 for periodically removal of completed jobs with certain amount of intervals

@jharlap
Copy link

jharlap commented Sep 11, 2012

Could someone explain why they prefer a delayed removal of completed jobs rather than just using something like:

jobs.on('job complete', function(id) {
  kue.Job.get(id, function(err, job) {
    job.remove();
  }
});

@sebicas
Copy link

sebicas commented Sep 11, 2012

I prefer delayed removal, because that way I can be certain that the job completed ( since I can see it on the queue on complete state ) by removing the job you have the felling that the job vanish from the queue with out been certain that the job was completed or not.

@Qard
Copy link
Contributor

Qard commented Jan 8, 2013

This would indeed be handy. When you process 1000+ jobs per minute, the keyspace gets flooded rather quickly. I'd like to keep stuff in the complete log for at least a few minutes, so we can watch it for odd behaviour when necessary.

@OliverJAsh
Copy link

+1

@mikemoser
Copy link

+1 for periodically removal of completed jobs with certain amount of intervals

@wemakeweb
Copy link

+1

@anthonywebb
Copy link

+1 I fear millions of completed jobs isnt going to do the UI or the redis queue any favors

@aventuralabs
Copy link

It seems that Redis is just now getting to the point where this can be handled asynchronously.

When an event is completed, we add to a q:expiring sorted set, with EXPIRE. Meanwhile, we subscribe to the EXPIRE event, and handle removal appropriately thereafter. http://redis.io/topics/notifications.

Alternatively, we can create a sorted set containing the job ids, sorted by seconds since epoch and periodically run a "clean up" command.

@aventuralabs
Copy link

Submitted pull request that addresses this problem using the second technique I discussed. I do not think that garbage collection should be the primary means by which things are cleared out, but instead used secondarily.

@niravmehta
Copy link

@brandoncarl Really appreciate you contribution and your recent pull request. Hopefully that gets accepted soon.

Here's something I did today to clean up old jobs. Modified from an example found on Stack Exchange.

https://gist.github.com/niravmehta/6112330

HTH.

@aventuralabs
Copy link

Thanks @niravmehta - great work on the exponential backoff pull request.

It seems that this repo really needs a solid set of tests. It seems that #202 is a good start.

The forums are pretty ripe with pull requests...they are just not being tested/accepted quickly enough.

@niravmehta
Copy link

@brandoncarl I'd like to have the changes tested, but do not have expertize in test suite generation / maintenance. May be @visionmedia can take that lead and combine #202 with additional tests so that future contributions can be handled better.

But till that time, @LearnBoost or @visionmedia 's eyes could be great for accepting these changes!

@yi
Copy link

yi commented Nov 7, 2013

we are using kue to process million of jobs every hour, and found the best way to remove complete jobs is to set up a standalone sweeper service. So I wrote a small tool to do that. And following are reasons why I do so:

  1. when processing large amount of jobs, job:search keyes take up significant memory in redis even if you remove the job later. This is because the design of redis is lazy on memory clean up
  2. when processing large amount of jobs, if the job complete handling service went down, completed jobs in kue will take down redis quickly.

@niravmehta
Copy link

Good work @yi Thank you!

@behrad behrad closed this as completed Nov 29, 2013
@hypesystem
Copy link

+1 for this entire thread!

@andyyou
Copy link

andyyou commented Sep 3, 2014

sorry everyone I am a new guy of kue and have a question to ask about why don't just use a function to complete like

var CLEANUP_TIME = 0.5 * 60 * 1000;
var CLEANUP_INTERVAL = 0.1 * 60 * 1000;
function performCleanup() {
  var now = new Date().getTime();
  Job.rangeByType('someType', 'complete', 0, -1, 'asc', function (err, selectedJobs) {
    selectedJobs.forEach(function (job) {
      var created = job.created_at;
      if (now - created > CLEANUP_TIME) {
        job.remove();
      }
    });
  });
}
setInterval(performCleanup, CLEANUP_INTERVAL);

Is this a bad way for this problem??

@psi-4ward
Copy link

i simply added the redis expire to the jobs:

var job = kueJobs.create('myjob', {}, ).save(function(err) {
   if(err) return console.error(err);
   kueJobs.client.expire(kueJobs.client.getKey('job:' + job.id), 30*24*3600);
});

@behrad
Copy link
Collaborator

behrad commented Oct 5, 2014

@psi-4ward that won't help much since expire only removes the job:id itself, not from the ZSETs that job id belongs to. and this makes serious inconsistencies

@behrad
Copy link
Collaborator

behrad commented Oct 5, 2014

@andyyou thats OK andy

@psi-4ward
Copy link

oh damn, do you know how to EXPIRE the ZSET also?

@behrad
Copy link
Collaborator

behrad commented Oct 5, 2014

You should do it programmatically, however it has a simpler solution if you are on >= 2.8 version of redis. Read these:

  1. http://stackoverflow.com/questions/13174615/how-to-get-callback-when-key-expires-in-redis
  2. https://groups.google.com/forum/#!topic/redis-db/rXXMCLNkNSs

Kue shall provide a job TTL feature in future :)

@behrad
Copy link
Collaborator

behrad commented Oct 5, 2014

For those interested, you could also use
job.create( ... ).removeOnComplete( true ).save()
that will remove each job on completion.

@warrickhill
Copy link

Try this, worked for me

jobs.on('job complete', function(id) {
    kue.Job.get(id, function(err, job) {
        setTimeout(function() {
            job.remove();
        }, 60000);
    });
});

@rkotcherr
Copy link

For those interested, you could also use
job.create( ... ).removeOnComplete( true ).save()
that will remove each job on completion.

For smaller systems I'd say this is ideal, but if you have a system where you're creating jobs in many different places I'd say it's probably best to listen to the "on complete" event.

As for using setTimeout() to remove the jobs after a certain time interval, the downside is that if your restart the application you'll lose those callbacks and the jobs will never be removed.

Just some thoughts...

@lfernandes
Copy link

lfernandes commented Jan 28, 2020

sorry everyone I am a new guy of kue and have a question to ask about why don't just use a function to complete like

var CLEANUP_TIME = 0.5 * 60 * 1000;
var CLEANUP_INTERVAL = 0.1 * 60 * 1000;
function performCleanup() {
  var now = new Date().getTime();
  Job.rangeByType('someType', 'complete', 0, -1, 'asc', function (err, selectedJobs) {
    selectedJobs.forEach(function (job) {
      var created = job.created_at;
      if (now - created > CLEANUP_TIME) {
        job.remove();
      }
    });
  });
}
setInterval(performCleanup, CLEANUP_INTERVAL);

Is this a bad way for this problem??

Other status can be:

  • active
  • inactive
  • failed
  • complete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests