-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Sending an 'IN_PROGRESS' update for task resurfaces it #1378
Comments
@kishorebanala @apanicker-nflx Could you please clarify this? |
@Shole4ever I'm not sure if I understood what you meant by |
@kishorebanala From an experimentation point of view, we see that the Is there a notion of a heartbeat in Conductor? A way for a worker to update Conductor periodically while processing a task without specifying |
@kishorebanala essentially, we want what is mentioned here: https://netflix.github.io/conductor/tasklifecycle/#timeout-seconds But, sending an update for a task with IN_PROGRESS does not work as mentioned in the above-mentioned link. On the contrary, sending an IN_PROGRESS update makes the task available for other workers to pick, which completely defeats the purpose of sending this status update from the worker. |
@bharadwajrembar @Shole4ever A heartbeat without Also, the Task's retry policy can be set to |
@kishorebanala The task is resurfaced after My scenario is: I have a task that takes variable time, say 5 minutes on average, but can go upto 1 hour. Now, i wouldn't to wait 1 hour to retry for every failed task. Hence the use of heartbeat, with the assurance that if there are any retries, they are traceable [which they aren't in case of |
FWIW, it never occurred to me to try and use the task response in that way. For long running jobs we are returning IN_PROGRESS in the task result along with "callbackAfter" time set and a "job ID" placed in the task output. Then when the task is available to be polled again the worker that receives it finds the job ID and calls the service that is doing the processing to check on the status. |
I'm doing the following, but as soon as I sent this response, another worker picks up that same tasks, ie it isn't hidden
the task has a timeout set to 60 seconds, responseTimeoutSeconds: 10. Per the original poster, we have the same scenerio. A worker grabs a task. It may take a minute or hours. We can't set an accurate timeout on the initial task, so we'd like to extend a heartbeat on the task while we are processing and IF that heartbeat stops, the task is requeued as timeout_out |
Correct. In this case, the task lifecycle is not completed yet, for it to be rescheduled with a new task id. The worker working on this task returns the task to Conductor saying it couldn't complete the task. For eg., the worker might be waiting on some external dependency (say a file to be written to S3). But, the 2 workers won't work on the task at the same time. The second worker can poll for this task only after first worker returns it to Conductor. Conductor expects the tasks to be idempotent, and promises at least once task scheduling guarantee. Exactly once will have to handled in the worker logic.
Following up from above, the task lifecycle determines the retry count. The number of times a task is picked by a worker can by found from
Yes, |
@funston LMK if my above comment doesn't answer your question. |
I don’t see this behavior or not understanding it. If one worker takes task 1 and returns an IN_PROGRESS with a callback time, a second worker does see the task and is able to pull it, work on it and respond to complete it (I’ve verified with some simple code) I guess I’m confused on what an IN PROGRESS is for? Looking at the code it seems to just follow the same logic that rescheduled the task The proposed solution to set a super long timeout won’t work for our needs. We need a heartbeat mechanism |
When worker_1 updates the task with IN_PROGRESS state, it is giving up the claim on that task. The task will be put back to task queue, and the workers can poll for it after callback time. This is helpful in cases like above eg. of waiting for some external dependency to be available. But, if the worker is expected to take an hour, and if the responseTimeout is configured to 5min, the task would always be timed out. Conductor doesn't have a worker lease extension mechanism today. We shall add it to roadmap, but if anyone is waiting for it, please feel free to start a conversation leading to a PR. |
@kishorebanala Why should sending the 'IN_PROGRESS' update be treated as 'giving up the claim on that task'. Rather it should be opposite, that the worker is claiming this task saying it is processing it. |
@Shole4ever This comes back to Task lease extension mechanism, which Conductor doesn't support today. We're considering implementing this in next major release, but if you'd like this sooner, we'd very much welcome contributions from the community. |
Hi,
I need to use the 'responseTimeoutSeconds' as heartbeat for the consumer, and 'timeoutSeconds' as the terminal timeout. The use case being that the consumer of the task need to notify conductor server within every responseTimeoutSeconds time that it is still consuming the task.
For sending the heartbeat, we are sending a taskUpdate via the POST /tasks API refer: [https://netflix.github.io/conductor/apispec/#polling-ack-and-update-task], with 'status' as 'IN_PROGRESS'.
However, this results in server resurfacing the task. [refer: WorkflowExecutor.java:834]
Can you suggest if I am doing something wrong here, or is this a bug altogether.
Thanks.
The text was updated successfully, but these errors were encountered: