Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revoked tasks are not removed from DB #713

Closed
polakovicp opened this issue Feb 24, 2023 · 3 comments
Closed

Revoked tasks are not removed from DB #713

polakovicp opened this issue Feb 24, 2023 · 3 comments

Comments

@polakovicp
Copy link

polakovicp commented Feb 24, 2023

Hi,
I noticed constant growth of my Redis DB size. The issue is in revoked tasks, which are not being removed:

redis-cli --scan --pattern 'huey.r.<queue name>.r:*' | wc -l
35989

I am using PriorityRedisExpireStorage and revocation is done in following way:

response = client.process(file_hash)(blocking=True, timeout=30,  revoke_on_timeout=True, preserve=False)

I am checking implementation of Huey class, shouldn't _check_revoked get the key with peek=peek?

    def execute(self, task, timestamp=None):
        ...
        elif self.is_revoked(task, timestamp, False):

    def _check_revoked(self, revoke_id, timestamp=None, peek=True):

        res = self.get(revoke_id, peek=True) # peek=peek
@coleifer
Copy link
Owner

The implementation is correct, as far as I'm concerned. The peek parameter serves a different purpose in _check_revoked() which is an internal-only method that requires peek to always be True when performing the get(). The actual clearing of the revocation flags happens in the callers to _check_revoked. When a task is picked-up by the consumer and actually executed, Huey checks if the task is revoked destructively -- in the process, the revocation key is cleared -- this is the code you've highlighted in execute().

I believe what is happening is:

  1. Your task gets enqueued and you begin blocking on the result, for up to 30s
  2. Before 30s have elapsed, the consumer begins executing your task. In the process it checks if the task is revoked -- it is NOT, so the task function begins running.
  3. The task continues to run and has not finished within 30s of being enqueued.
  4. The result handle times out and places the revocation key in Redis (this is the result of the key growth)
  5. The task finishes.

The revoke_on_timeout unfortunately cannot detect that the task has already been picked-up and is being executed, so it is unnecessarily adding the revoked key. My suggestion would be to either:

  1. Modify your task to take the task instance as a parameter, and clear the task.revoke_id at the end of your task.
  2. Do not use revoke_on_timeout and use lock_task() instead to prevent multiple instances of the task from running.

@coleifer
Copy link
Owner

Another thing you might try is adding a post_execute() hook to clear these:

@huey.post_execute()
def clear_revoked_on_process(task, task_value, exc):
    if 'process' in task.name:
        huey.delete(task.revoke_id)

@coleifer
Copy link
Owner

After thinking about it a bit more, let's try this: 026a5ac -- this change ensures the task instance's revoked flag is cleared after it executes. This should address the scenario I described above, where the task is revoked some time while the task is being executed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants