You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The method Covalent currently uses to retrieve the stdout and stderr logs of each task is unreliable. When multiple tasks are run concurrently, the messages output by one task may be inadvertently attributed to another task and stored in the wrong graph node.
The following workflow reproduces the problem on my machine.
fromcovalent.executorimportDaskExecutorfromdask.distributedimportLocalClusterlc=LocalCluster()
dask_exec=DaskExecutor(lc.scheduler_address)
@ct.electron(executor=dask_exec)deftask_0():
print("Hello from Task 0")
return0@ct.electron(executor=dask_exec)deftask_1():
importtimetime.sleep(2)
print("Hello from Task 1")
return1@ct.latticedefworkflow():
task_0()
task_1()
After dispatching this workflow, I retrieved the result object and inspected the stdout property of each graph node. Here is what I get:
Since all executor instances monitor the same sys.stdout file descriptor, this technique breaks down when multiple tasks are writing to that file descriptor. The context manager for one task could inadvertently capture the output printed by the run() method for another task.
It would seem better for each task to print to its own "stdout" and "stderr" file descriptors which are not shared with any other task.
Design
Acceptance Criteria
For BaseExecutor and AsyncBaseExecutor
The redirection logic for sys.stdout and sys.stderr is removed.
Task-specific streams self._task_stdout and self._task_stderr are instantiated at the beginning of execute(). These attributes should be defined only at the beginning of execute()and not in the executor constructor so that they don't appear during workflow construction. They should be made accessible to run() as the properties self.task_stdout and self.task_stderr, respectively.
Local and Dask executors should be adjusted:
The stdout and stderr generated by the task are printed to self.task_stdout and self.task_stderr, respectively.
When there are concurrent tasks, the stdout and stderr for each task is stored in the correct transport graph node; the test cases in #1380 should pass.
Add a functional test involving a workflow with multiple electrons concurrently printing to stdout and stderr. Verify that the messages generated by each task are stored in the correct transport graph node.
The text was updated successfully, but these errors were encountered:
The method Covalent currently uses to retrieve the stdout and stderr logs of each task is unreliable. When multiple tasks are run concurrently, the messages output by one task may be inadvertently attributed to another task and stored in the wrong graph node.
The following workflow reproduces the problem on my machine.
After dispatching this workflow, I retrieved the result object and inspected the
stdout
property of each graph node. Here is what I get:If I interchange the sleep statements, the output of each task is correctly retrieved and persisted:
Why this happens
Each executor's implementation of
run()
is currently expected to retrieve thestdout
andstderr
from the executor backend after a task completes and print those strings tosys.stdout
andsys.stderr
, respectively. These streams are redirected by a context manager in the base executor's implementation ofexecute()
and returned to the dispatcher. For example, here is how the Dask executor captures the stdout for a task.Since all executor instances monitor the same
sys.stdout
file descriptor, this technique breaks down when multiple tasks are writing to that file descriptor. The context manager for one task could inadvertently capture the output printed by therun()
method for another task.It would seem better for each task to print to its own "stdout" and "stderr" file descriptors which are not shared with any other task.
Design
Acceptance Criteria
For
BaseExecutor
andAsyncBaseExecutor
sys.stdout
andsys.stderr
is removed.self._task_stdout
andself._task_stderr
are instantiated at the beginning ofexecute()
. These attributes should be defined only at the beginning ofexecute()
and not in the executor constructor so that they don't appear during workflow construction. They should be made accessible torun()
as the propertiesself.task_stdout
andself.task_stderr
, respectively.Local and Dask executors should be adjusted:
stdout
andstderr
generated by the task are printed toself.task_stdout
andself.task_stderr
, respectively.stdout
andstderr
for each task is stored in the correct transport graph node; the test cases in #1380 should pass.Add a functional test involving a workflow with multiple electrons concurrently printing to stdout and stderr. Verify that the messages generated by each task are stored in the correct transport graph node.
The text was updated successfully, but these errors were encountered: