Optimized Redis result backend (publish/subscribe instead of polling) #2511

Closed
wants to merge 6 commits into
from

Projects

None yet

8 participants

@sangwa
sangwa commented Feb 20, 2015

A set of improvements for Redis result backend aimed to eliminate polling as much as possible and make it responsive for very fast tasks (with execution times under 0.5 seconds - the default polling interval used before).

  • If the initial retrieval of the task result determines its state as not ready yet, the new backend subscribes to notifications sent upon setting the task result, instead of polling Redis on the fixed intervals. As soon as notification is received, the result is returned.
  • To eliminate the possibility of missing the published notification in a short window between retrieving the initial task state and subscribing to notifications, locks for the task results are introduced. The lock is placed before the set operation and released after publishing the notification, also the lock is placed before retrieving the initial task state and released after subscribing to notifications.
  • To avoid blocking on these locks if some operation crashes in the progress, lock expiration timers and tunable settings for them are introduced. Millisecond precision expiration is used if supported by Redis server.
  • All batch operations (set/publish, get/subscribe, lock acquisition/release) use either lua scripts or pipelined commands depending on the support of lua scripts by Redis server.
  • Feature detection for Redis server is implemented to support the functions listed above.

Should close #799 (GroupResult improvements are already implemented in redis_join branch as far as I can see and are supported in this version by re-implementing RedisBackend.get_many()).

@sangwa sangwa Optimized Redis result backend (publish/subscribe instead of polling)
Conflicts:
	celery/backends/redis.py
c626079
sangwa added some commits Feb 20, 2015
@sangwa sangwa tests passing
16cdee8
@sangwa sangwa Optimized Redis backend: py3 compat fixes
d0f5c76
@thedrow
Contributor
thedrow commented Feb 26, 2015

The PyPy failure is not related to you.

sangwa added some commits Apr 30, 2015
@sangwa sangwa typo fix 04aad82
@sangwa sangwa small fix for tests of redis backend 7f0e03c
@sangwa sangwa Optimized Redis backend: chains are more responsive on exceptions, sm…
…all improvement for group results
077d515
@coveralls

Coverage Status

Changes Unknown when pulling 077d515 on sangwa:redis_nopoll into * on celery:master*.

@thedrow
Contributor
thedrow commented Jul 19, 2015

LGTM. @malinoff @ask Any objections?

@malinoff
Member

@thedrow I can't help with reviewing non-amqp backends - so I think @ask has the final word here.

@ask ask was assigned by malinoff Jul 19, 2015
@randylayman

Its been a while since there was any activity on this PR? Is this dead?

@thedrow thedrow added this to the v4.0 milestone Jan 5, 2016
@thedrow
Contributor
thedrow commented Jan 5, 2016

I've marked this for 4.0.
Let's see if @ask responds. If he doesn't, let's rebase and merge this.

@ask
Member
ask commented Mar 9, 2016

Great work! The only reservation I have here is the frightening combination of having "redis" and "lock" appear in the same sentence :)

I realize that you have spent some time on the lock implementation, but I still would prefer if we didn't use it due to the difficulty of proving that it will work in all cases.

Instead of using the lock, would it be possible to hook into Backend.on_task_call to subscribe before the task is sent? The rpc:// backend uses this to make sure the queue is declared before the task message is published.

@ask ask added a commit that referenced this pull request Mar 9, 2016
@ask ask [result][redis] Use pubsub for consuming results, and use the new asy…
…nc backend interface (Issue #2511)
1f995b3
@ask
Member
ask commented Mar 9, 2016

Since I recently refactored the backends to have an async interface, it means that we can now keep the pubsub instance running instead of creating/destroying it for every task.

I wrote an implementation of this in 1f995b3, it seems to work but I have only tested it naively.

I don't have much experience with redis pubsub, so would be great to have it reviewed.

@thedrow
Contributor
thedrow commented Mar 9, 2016

@sangwa @randylayman Can you guys take a look?

@yingzong
yingzong commented Mar 9, 2016

For those just looking for applying this change (3 commits) without waiting for the eventual rebase/merge, I pulled it out of that foreign branch as a patch:
https://gist.github.com/yingzong/4d07ed37c0d9c0a35331

I've been running this patch back-applied to 3.1.18 and it runs fine. The waiting time of a simple Redis-backed Chord I'm running has gone from 1s to about 50ms (which includes actual work). I had to also apply CELERY_DISABLE_RATE_LIMITS=True; otherwise it was still in the hundred ms range, see:
celery/kombu#94

Another nice thing this set of changes seems to make is that results are deleted immediately from Redis instead of hanging around for CELERY_TASK_RESULT_EXPIRES seconds.

@randylayman

After a read this looks good. Unfortunately I don't have the time to test and dig in deeper.

Does the comment by @yingzong need to be documented?

@ask
Member
ask commented Mar 10, 2016

@yingzong

CELERY_DISABLE_RATE_LIMITS should have absolute no effect anymore in 3.1 when using redis/amqp as the broker.

The task results cannot actually be removed if this is a persistent result backend, at the very least not by default.

@ask
Member
ask commented Mar 10, 2016

Clarification: disabling rate limits will have no effect if you don't have any rate limited tasks. As there's no dedicated thread for rate limits any more, there's no longer any overhead.

@ask ask added a commit that referenced this pull request Mar 10, 2016
@ask ask [result][redis] Use pubsub for consuming results, and use the new asy…
…nc backend interface (Issue #2511)
dc9093c
@yingzong

@ask I just checked again, and you're right on both counts. I must have misremembered some behavior. Thanks for clarifying.

@ask ask added a commit that referenced this pull request Mar 18, 2016
@ask ask [result][redis] Use pubsub for consuming results, and use the new asy…
…nc backend interface (Issue #2511)
0bf3ab6
@ask ask added a commit that referenced this pull request Mar 18, 2016
@ask ask [result][redis] Use pubsub for consuming results, and use the new asy…
…nc backend interface

Incorporates ideas taken from Yaroslav Zhavoronkov's diff in #2511

Closes Issue #2511
886faf6
@ask
Member
ask commented Mar 18, 2016

I merged the async backend version, not using locks into master as it's now passing the stress test suite.

@sangwa Even though I didn't merge your commit, I consider this a collaboration as I used your diff as a guide. Thank you for this, it's a feature I have been wanting for a long time.
Could you please add your name to https://github.com/celery/celery/edit/master/CONTRIBUTORS.txt ?

@ask
Member
ask commented Mar 18, 2016

The changes merged into master are: 886faf6 + 456fd75

@ask ask closed this Mar 18, 2016
@sangwa
sangwa commented Mar 18, 2016

@ask Thank you for the feedback! I haven't been working with Celery for quite some time, so I didn't follow the new async backend and didn't have enough spare time to review and/or test your changeset thoroughly, but at the first quick glance it seems fine. I trust other contributors to this issue ensured it's working properly.

@jheld
jheld commented Jul 28, 2016

Hello @ask, if this code is still on the trajectory to get merged into the proper code-base, what is remaining before it is feature and functionally complete? And, would you or anyone like additional help? Documentation and/or dev work.

@ask
Member
ask commented Jul 28, 2016

@jheld it should already be merged into master, you can help us test it, or work on an issue in the 4.0 milestone so we can release it soon :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment