New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized Redis result backend (publish/subscribe instead of polling) #2511

Closed
wants to merge 6 commits into
base: master
from

Conversation

Projects
None yet
8 participants
@sangwa

sangwa commented Feb 20, 2015

A set of improvements for Redis result backend aimed to eliminate polling as much as possible and make it responsive for very fast tasks (with execution times under 0.5 seconds - the default polling interval used before).

  • If the initial retrieval of the task result determines its state as not ready yet, the new backend subscribes to notifications sent upon setting the task result, instead of polling Redis on the fixed intervals. As soon as notification is received, the result is returned.
  • To eliminate the possibility of missing the published notification in a short window between retrieving the initial task state and subscribing to notifications, locks for the task results are introduced. The lock is placed before the set operation and released after publishing the notification, also the lock is placed before retrieving the initial task state and released after subscribing to notifications.
  • To avoid blocking on these locks if some operation crashes in the progress, lock expiration timers and tunable settings for them are introduced. Millisecond precision expiration is used if supported by Redis server.
  • All batch operations (set/publish, get/subscribe, lock acquisition/release) use either lua scripts or pipelined commands depending on the support of lua scripts by Redis server.
  • Feature detection for Redis server is implemented to support the functions listed above.

Should close #799 (GroupResult improvements are already implemented in redis_join branch as far as I can see and are supported in this version by re-implementing RedisBackend.get_many()).

sangwa added some commits Feb 20, 2015

@thedrow

This comment has been minimized.

Show comment
Hide comment
@thedrow

thedrow Feb 26, 2015

Contributor

The PyPy failure is not related to you.

Contributor

thedrow commented Feb 26, 2015

The PyPy failure is not related to you.

@coveralls

This comment has been minimized.

Show comment
Hide comment
@coveralls

coveralls Apr 30, 2015

Coverage Status

Changes Unknown when pulling 077d515 on sangwa:redis_nopoll into * on celery:master*.

coveralls commented Apr 30, 2015

Coverage Status

Changes Unknown when pulling 077d515 on sangwa:redis_nopoll into * on celery:master*.

@thedrow

This comment has been minimized.

Show comment
Hide comment
@thedrow

thedrow Jul 19, 2015

Contributor

LGTM. @malinoff @ask Any objections?

Contributor

thedrow commented Jul 19, 2015

LGTM. @malinoff @ask Any objections?

@malinoff

This comment has been minimized.

Show comment
Hide comment
@malinoff

malinoff Jul 19, 2015

Member

@thedrow I can't help with reviewing non-amqp backends - so I think @ask has the final word here.

Member

malinoff commented Jul 19, 2015

@thedrow I can't help with reviewing non-amqp backends - so I think @ask has the final word here.

@randylayman

This comment has been minimized.

Show comment
Hide comment
@randylayman

randylayman Jan 5, 2016

Its been a while since there was any activity on this PR? Is this dead?

randylayman commented Jan 5, 2016

Its been a while since there was any activity on this PR? Is this dead?

@thedrow thedrow added this to the v4.0 milestone Jan 5, 2016

@thedrow

This comment has been minimized.

Show comment
Hide comment
@thedrow

thedrow Jan 5, 2016

Contributor

I've marked this for 4.0.
Let's see if @ask responds. If he doesn't, let's rebase and merge this.

Contributor

thedrow commented Jan 5, 2016

I've marked this for 4.0.
Let's see if @ask responds. If he doesn't, let's rebase and merge this.

@ask

This comment has been minimized.

Show comment
Hide comment
@ask

ask Mar 9, 2016

Member

Great work! The only reservation I have here is the frightening combination of having "redis" and "lock" appear in the same sentence :)

I realize that you have spent some time on the lock implementation, but I still would prefer if we didn't use it due to the difficulty of proving that it will work in all cases.

Instead of using the lock, would it be possible to hook into Backend.on_task_call to subscribe before the task is sent? The rpc:// backend uses this to make sure the queue is declared before the task message is published.

Member

ask commented Mar 9, 2016

Great work! The only reservation I have here is the frightening combination of having "redis" and "lock" appear in the same sentence :)

I realize that you have spent some time on the lock implementation, but I still would prefer if we didn't use it due to the difficulty of proving that it will work in all cases.

Instead of using the lock, would it be possible to hook into Backend.on_task_call to subscribe before the task is sent? The rpc:// backend uses this to make sure the queue is declared before the task message is published.

ask added a commit that referenced this pull request Mar 9, 2016

@ask

This comment has been minimized.

Show comment
Hide comment
@ask

ask Mar 9, 2016

Member

Since I recently refactored the backends to have an async interface, it means that we can now keep the pubsub instance running instead of creating/destroying it for every task.

I wrote an implementation of this in 1f995b3, it seems to work but I have only tested it naively.

I don't have much experience with redis pubsub, so would be great to have it reviewed.

Member

ask commented Mar 9, 2016

Since I recently refactored the backends to have an async interface, it means that we can now keep the pubsub instance running instead of creating/destroying it for every task.

I wrote an implementation of this in 1f995b3, it seems to work but I have only tested it naively.

I don't have much experience with redis pubsub, so would be great to have it reviewed.

@thedrow

This comment has been minimized.

Show comment
Hide comment
@thedrow

thedrow Mar 9, 2016

Contributor

@sangwa @randylayman Can you guys take a look?

Contributor

thedrow commented Mar 9, 2016

@sangwa @randylayman Can you guys take a look?

@yingzong

This comment has been minimized.

Show comment
Hide comment
@yingzong

yingzong Mar 9, 2016

For those just looking for applying this change (3 commits) without waiting for the eventual rebase/merge, I pulled it out of that foreign branch as a patch:
https://gist.github.com/yingzong/4d07ed37c0d9c0a35331

I've been running this patch back-applied to 3.1.18 and it runs fine. The waiting time of a simple Redis-backed Chord I'm running has gone from 1s to about 50ms (which includes actual work). I had to also apply CELERY_DISABLE_RATE_LIMITS=True; otherwise it was still in the hundred ms range, see:
celery/kombu#94

Another nice thing this set of changes seems to make is that results are deleted immediately from Redis instead of hanging around for CELERY_TASK_RESULT_EXPIRES seconds.

yingzong commented Mar 9, 2016

For those just looking for applying this change (3 commits) without waiting for the eventual rebase/merge, I pulled it out of that foreign branch as a patch:
https://gist.github.com/yingzong/4d07ed37c0d9c0a35331

I've been running this patch back-applied to 3.1.18 and it runs fine. The waiting time of a simple Redis-backed Chord I'm running has gone from 1s to about 50ms (which includes actual work). I had to also apply CELERY_DISABLE_RATE_LIMITS=True; otherwise it was still in the hundred ms range, see:
celery/kombu#94

Another nice thing this set of changes seems to make is that results are deleted immediately from Redis instead of hanging around for CELERY_TASK_RESULT_EXPIRES seconds.

@randylayman

This comment has been minimized.

Show comment
Hide comment
@randylayman

randylayman Mar 10, 2016

After a read this looks good. Unfortunately I don't have the time to test and dig in deeper.

Does the comment by @yingzong need to be documented?

randylayman commented Mar 10, 2016

After a read this looks good. Unfortunately I don't have the time to test and dig in deeper.

Does the comment by @yingzong need to be documented?

@ask

This comment has been minimized.

Show comment
Hide comment
@ask

ask Mar 10, 2016

Member

@yingzong

CELERY_DISABLE_RATE_LIMITS should have absolute no effect anymore in 3.1 when using redis/amqp as the broker.

The task results cannot actually be removed if this is a persistent result backend, at the very least not by default.

Member

ask commented Mar 10, 2016

@yingzong

CELERY_DISABLE_RATE_LIMITS should have absolute no effect anymore in 3.1 when using redis/amqp as the broker.

The task results cannot actually be removed if this is a persistent result backend, at the very least not by default.

@ask

This comment has been minimized.

Show comment
Hide comment
@ask

ask Mar 10, 2016

Member

Clarification: disabling rate limits will have no effect if you don't have any rate limited tasks. As there's no dedicated thread for rate limits any more, there's no longer any overhead.

Member

ask commented Mar 10, 2016

Clarification: disabling rate limits will have no effect if you don't have any rate limited tasks. As there's no dedicated thread for rate limits any more, there's no longer any overhead.

ask added a commit that referenced this pull request Mar 10, 2016

@yingzong

This comment has been minimized.

Show comment
Hide comment
@yingzong

yingzong Mar 10, 2016

@ask I just checked again, and you're right on both counts. I must have misremembered some behavior. Thanks for clarifying.

yingzong commented Mar 10, 2016

@ask I just checked again, and you're right on both counts. I must have misremembered some behavior. Thanks for clarifying.

ask added a commit that referenced this pull request Mar 18, 2016

ask added a commit that referenced this pull request Mar 18, 2016

[result][redis] Use pubsub for consuming results, and use the new asy…
…nc backend interface

Incorporates ideas taken from Yaroslav Zhavoronkov's diff in #2511

Closes Issue #2511
@ask

This comment has been minimized.

Show comment
Hide comment
@ask

ask Mar 18, 2016

Member

I merged the async backend version, not using locks into master as it's now passing the stress test suite.

@sangwa Even though I didn't merge your commit, I consider this a collaboration as I used your diff as a guide. Thank you for this, it's a feature I have been wanting for a long time.
Could you please add your name to https://github.com/celery/celery/edit/master/CONTRIBUTORS.txt ?

Member

ask commented Mar 18, 2016

I merged the async backend version, not using locks into master as it's now passing the stress test suite.

@sangwa Even though I didn't merge your commit, I consider this a collaboration as I used your diff as a guide. Thank you for this, it's a feature I have been wanting for a long time.
Could you please add your name to https://github.com/celery/celery/edit/master/CONTRIBUTORS.txt ?

@ask

This comment has been minimized.

Show comment
Hide comment
@ask

ask Mar 18, 2016

Member

The changes merged into master are: 886faf6 + 456fd75

Member

ask commented Mar 18, 2016

The changes merged into master are: 886faf6 + 456fd75

@sangwa

This comment has been minimized.

Show comment
Hide comment
@sangwa

sangwa Mar 18, 2016

@ask Thank you for the feedback! I haven't been working with Celery for quite some time, so I didn't follow the new async backend and didn't have enough spare time to review and/or test your changeset thoroughly, but at the first quick glance it seems fine. I trust other contributors to this issue ensured it's working properly.

sangwa commented Mar 18, 2016

@ask Thank you for the feedback! I haven't been working with Celery for quite some time, so I didn't follow the new async backend and didn't have enough spare time to review and/or test your changeset thoroughly, but at the first quick glance it seems fine. I trust other contributors to this issue ensured it's working properly.

@jheld

This comment has been minimized.

Show comment
Hide comment
@jheld

jheld Jul 28, 2016

Hello @ask, if this code is still on the trajectory to get merged into the proper code-base, what is remaining before it is feature and functionally complete? And, would you or anyone like additional help? Documentation and/or dev work.

jheld commented Jul 28, 2016

Hello @ask, if this code is still on the trajectory to get merged into the proper code-base, what is remaining before it is feature and functionally complete? And, would you or anyone like additional help? Documentation and/or dev work.

@ask

This comment has been minimized.

Show comment
Hide comment
@ask

ask Jul 28, 2016

Member

@jheld it should already be merged into master, you can help us test it, or work on an issue in the 4.0 milestone so we can release it soon :)

Member

ask commented Jul 28, 2016

@jheld it should already be merged into master, you can help us test it, or work on an issue in the 4.0 milestone so we can release it soon :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment