New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Dynamic tasks #817
Conversation
Interesting! Does it work with other serializers than pickle? I moved it to the 2.7 milestone, need to get 2.6 out the door first |
it does. used it with pickle, json and msgpack. |
Major update to the dynamic design. Waiting for your feedback @ask :) |
Should I merge this ?? |
I'm not sure, I'm reluctant to add new stuff into contrib, if something really is useful why can't we support it in core? Why is the calling of the tasks delayed to until after the task has returned?
instead of:
|
Yes, the returned subtask is placed a the new callback for the current task, and the old callback as a callback to the returned subtask. The rationale for returning subtasks is that you can now do this:
And have Admitedly though, having it in |
For instance, here is one of our use:
And all of this executes as part of a big pipeline. |
Also, by the way, regular tasks could be extended to support this, but we might now want to change their behaviour? In any case, yes, it's possible to change the task decorator to support this. |
But I'd rather have a |
Also, this enables stuff like: Even if And chords inside chords with chains etc... |
Hmm, but dynamic task is maybe not descriptive enough, at least I don't get what it does by the name, regular tasks are "dynamic" after all. It could be something like |
Dynamic tasks made me think of dynamically defining tasks during execution. Maybe a more suitable name would be 'Support for dynamic chaining' instead of 'Support for dynamic tasks'? |
I've been using dynamic tasks for a while now and it works fine so far. However, I'm wondering if it could be the cause of an issue I have. |
Thanks for the feedback! In regards to your queue leaking issue, I bet you are using the AMQP result backend and do not consume the results? This is not related to dynamic tasks. |
We are also using them in production @veezio for quite some time, works fine. By the way if anyone has a better name than dynamic, I'm all for it :) |
Indeed Steeve, I do use the AMQP result backend. |
Indeed yes, but I think you can put CELERY_IGNORE_RESULTS so that it won't store them. |
First, I'm new to Celery and evaluating it for inclusion in our project. And I'm not sure q fully understand this pull request. Let say I have a queue of three tasks : But some times I would have : |
dynamic tasks would allow you to do this:
Then In the end the pipe would be as if: |
Steeve, I've tried setting |
I've also thought of using another result backend, like Memcache.
|
I just wanted to throw in and say this sounds like an excellent idea. I'm currently building a processing framework and right now we're pre-defining everything up front. This capability would give us much greater flexibility. |
Perhaps we could provide this feature as a recipe maybe ? But I agree with @ask that it maybe needs a better name |
I don't doubt that this is useful, I just don't like introducing different @task decorators as I think that will be confusing for the user. I also don't want to add major features to |
And yeah, a recipe is also possible. But then the recipe should be simple enough that the user don't have to copy and paste a lot of code. If there's too much code, parts of it can be in core (but integrated rather than hidden in contrib), or there can be a separate library. |
Any pointers on how one would go about inspecting dynamically created tasks? I've just been toying with this addition try and break down one large, beastly and very specific task into lots of different generic sub tasks. That much works really (!) well but I feel like I've lost touch with that top level task and can't easily determine when it's 'done'. |
After a lot of playing around with these new tasks, I think I mostly understood them. One thing left open for me: I want to pass on data up through multiple levels of dynamic tasks. Currently I'm doing this: testtasks.py import os
import time
import random
from celery import Celery, group
from celery.utils.log import get_task_logger
import worker.dynamic_task
from worker.dynamic_task import dynamic_chord
celery = Celery('hello',
backend=os.environ['BROKER_URL'],
broker=os.environ['BROKER_URL'])
logger = get_task_logger(__name__)
@celery.dynamic_task
def pass_on(*args):
return args
@celery.dynamic_task
def one(name):
logger.warning('[1] ' + name)
time.sleep(1)
amount = random.randint(2, 3)
subtasks = [two.si(name, '{} von {}'.format(i + 1, amount))
for i in range(amount)]
chord = dynamic_chord(subtasks, pass_on.s())
return chord
@celery.dynamic_task
def two(name, label):
logger.warning(' [2] {} | {}'.format(name, label))
time.sleep(1)
amount = random.randint(2, 3)
subtasks = group([thr.si(name, '{} von {}'.format(i + 1, amount))
for i in range(amount)])
return subtasks
@celery.dynamic_task
def thr(name, label):
logger.warning(' [3] {} | {}'.format(name, label))
time.sleep(1)
return 'URL'
@celery.dynamic_task
def notify(arg):
logger.warning('[NOTIFY] done!')
logger.warning('Arg: {}'.format(arg)) test.py from worker.dynamic_task import dynamic_chord
from testtasks import *
maintask = dynamic_chord(map(one.si, ['A', 'B', 'C']), notify.s())
maintask.apply_async() This prints the following:
Which is good, because I'm passing up data from the lowest level to the top level, without having a blocking task that waits for all subtasks to finish. The point is that I need a "dummy callback" that passes on all data from one level to the next. Wouldn't this snippet also make sense in def dynamic_group(tasks):
return group([(task | subtask("celery.checkpoint")) for task in tasks]) This way I don't need the dummy callback and everything still works. (But maybe I've overlooked some bad implicatoins or an easier alternative.) As a sidenode, I can't get this to work together with Django / djcelery. This traceback shows up when using |
Any news on design decisions about this feature? In the past weeks I've seen a lot of use cases where this feature would be perfectly suited and much needed. |
It would come in handy for us too. Especially the dynamic chord. Currently we need to have one of the tasks holding up the process to keep checking for status. See my post in celery-users. I am hoping this is the solution for it: https://groups.google.com/forum/#!topic/celery-users/eNGbLlAwhi0 |
Is this being considered for a Celery release? This is very useful. |
As I wrote earlier in the thread, if this is useful then it should be possible to implement this in the existing 'task' decorator. Adding things to |
If merging this as-is I imagine having to ask users 'do you use tasks or dynamic tasks?' when responding to bug reports and that is something I want to avoid :) The dynamic task basically links the return value to the currently executing task so this should be possible to do with existing tasks, but more explicit e.g.::
Not sure I like the name |
self.replace or self.subst? self.transmogrify? :) On Wednesday, August 28, 2013, Ask Solem Hoel wrote:
|
I think both |
A new pledge is available on this issue: https://www.catincan.com/bounty/https-github-com-celery-celery-pull-817 . |
What's wrong with just auto-detecting if a @app.task
def one(i):
if i > 3:
return two.s(i)
return i + 1 Also, something like this should just work (chains in chords in chords in chains ... .etc.): chord(map(...), chain(chord(...), chord(...., chain(...)))) It doesn't at the moment with this patch. On another note. When I replayed this PR on top of 3.0 I had to change the following: retval |= current_task.request.callbacks[0] To: retval |= Signature(current_task.request.callbacks[0]) |
@mehcode: We cannot convert any sig in a return value as that would not be backwards compatible. Regarding the Signature change, you are maybe using the json serializer?
|
The The way it replaces callbacks is no good as I imagine it's only supposed to replace the chain node, not all of the callbacks, but this may be improved when the new 'chain' message field is added (also for 3.2). Dynamic chord I'm not sure about, what does it solve? Btw, that cantincan site seems pretty useless as it's more work to fill in the forms than it's to simply implement a feature :) |
If a task can be replaced by a chord using the Example: I have a task that parses URLs of different types. For each URL, according to the type, a series of tasks should be executed. But each URL can be processed in parallel. So the main task should return a group of chords. For example:
This can be implemented if the |
@ask This And yes, I was using the JSON serializer. |
The .replace feature by @ask looks great for my use-case of dynamic tasks. What can I do to help it get released? Does it need some testing? |
+1 what can be done to help get the .replace feature released? |
Documentation and tests would help, as well as further defining the semantics of this feature and how it works with the canvas (e.g. chord) |
The I just tested the new feature. This is the base script I used: # test.py
from celery import Celery, chord
from celery.utils.log import get_task_logger
app = Celery('test', backend='redis://localhost:6379/10', broker='redis://localhost:6379/11')
app.conf.CELERY_ALWAYS_EAGER = False
logger = get_task_logger(__name__)
@app.task(bind=True)
def get_one(self):
return 1
@app.task
def get_two():
return 2
@app.task
def sum_all(data):
logger.error(data)
return sum(data)
if __name__ == '__main__':
x = chord(get_one.s() for i in range(3))
body = sum_all.s()
result = x(body)
print(result.get()) Standard output is 3 (sum of Replacement works great: @app.task(bind=True)
def get_one(self):
self.replace_in_chord(get_two.s())
return 1 Output:
The result is 6, which is correct (2+2+2). When I use @app.task(bind=True)
def get_one(self):
self.add_to_chord(get_two.s())
return 1 Output:
The result is 3, not 9 (1+2+1+2+1+2). Also, another time I started the same script, the following exceptions occured in the celery worker log:
And the Traceback (most recent call last):
File "test.py", line 29, in <module>
print(result.get())
File "/home/danilo/Projects/celery/celery/result.py", line 177, in get
raise self.backend.exception_to_python(meta['result'])
celery.backends.base.ChordError: GroupResult bfaf94da-23b3-4f2f-b163-016cbac50f43 no longer exists |
Also, does this only support "simple" task signatures? Replacing a task with a chord doesn't currently seem to work. @app.task(bind=True)
def get_one(self):
x = chord(get_two.s() for i in range(3))
body = sum_all.s()
sig = x(body)
self.replace_in_chord(sig) |
On Oct 15, 2014, at 10:21 AM, Danilo Bargen notifications@github.com wrote:
You’re not using redis:///10?new_join=1 for the result backend here, Ask Solem |
Aah, sorry for forgetting about that! I'll test it again tomorrow. |
Great, seems to work! ✨ |
Would you have working examples of this type of behavior? I cannot reproduce it... This is my current code:
but I get the error AttributeError: 'hello' object has no attribute 'replace' I tried both versions 3.1.17 and the master from today (for which I had to git clone and install your amqp and kombu projects). |
@traxair as discussed above, did you enable the Redis result backend with |
So I submit to you dynamic tasks, and dynamic chords. This is a really powerful concept, and I think it changes the game.
Subtasks returned by dynamic tasks are executed right after the first task executes. As if they were in a chain.
You can also return chains and chords, and they will be properly inserted.
This allows you to design a pipeline that can be completely dynamic, while benefiting from Celery's powerful idioms (subtasks, chains, chords...).
Our whole backend at @veezio is powered by these. They allow use to have extensively dynamic pipelines.
We have pipes inside chords, chords inside chords, tasks put before pipes etc....
To be honest, I can't think of something you can't do with this.
How to use:
But you can do cool shit too!
You can also use them in chords! And have chords in chords!:
In that case url_resolved will be called with the results from on_finished(), which is: