[DaskExecutor] Deterministic task key #5844
Replies: 1 comment
-
Hey! We're hard at work getting Orion ready so it's tough to get to these. We are submitting our orchestration of the task to Dask (in both v1 and v2) so if your task is retried it will occur within the same Dask task. Once our orchestration finishes, the task has always reached a final state. If the task were to be submitted again our API would just return the final state and our orchestration engine would exit without running the task. Since this interaction is pure for any given task run, we can use a pure key for Dask. We've actually had to adjust this once we introduced flow run level retries since the task actually does need to be orchestrated again. You'll see we now pass a key that includes the flow run retry count to ensure the task can be orchestrated a second time. I'm not sure about the implications for v1. The API is far less strict about state management in v1. |
Beta Was this translation helpful? Give feedback.
-
I was wondering around Orion code base and found this line orion@task_runners.py#L378 I know this may cause problems with randomness since it basically forces dask to save the first output but since the task key is deterministic no two tasks have will have the same key on the other hand v1 is using impure version master@dask.py#L371 and generating task key based on random UUID value which in some cases cause task recomputation even though the worker computed the result before I was wondering what was the design choices for this? and if it possible to implement pure version for v1?
Beta Was this translation helpful? Give feedback.
All reactions