Add a `pendingTimeout` parameter #10341

hadim · 2023-01-11T00:52:09Z

Summary

See #3572 for context.

Some of our workflows fails to schedule a k8s node because sometimes there are errors in the configuration that is responsible to execute a workflow.

The currently available option activeDeadlineSeconds considers both the pending phase and also the execution phase. We would need an option that only consider the pending phase so our failing pending workflow would be marked as failed after xxx seconds.

This new option could be pendingTimeoutSeconds or pendintDeadlineSeconds.

Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.

The text was updated successfully, but these errors were encountered:

drawlerr · 2024-01-22T17:16:44Z

I investigated this issue and when I looked into a possible solution, I ran into one that actually already exists: #3686

The template timeout field as currently documented (https://argo-workflows.readthedocs.io/en/latest/fields/#template) sounds like a duplicate of the activeDeadlineSeconds field, but as actually implemented the node StartedAt time is when the workflow node was created, and timeout is only considered for nodes in the NodePending phase, thus making timeout more like pendingTimeout in practice.

I have verified that specifying timeout: 600s in my templates does indeed prevent them from spending more than 600s in Pending state, while allowing them to run for however long they need to.

Perhaps some improvement to the documentation is in order?

drawlerr · 2024-02-29T15:52:52Z

Update: while the timeout parameter does seem to catch pods that have been pending too long, it has some issues and consequences:

The template timeout is only evaluated "incidentally" and is not guaranteed to be evaluated near to expiration time, so it's it's more of a "minimum" than a "maximum" parameter
Template timeout is transferred to activeDeadlineSeconds if that param is unset or greater than template deadline. So, timeout is not just applicable to the pending state but rather a full end-to-end timeout
I am investigating other options for resolution.

hadim added the type/feature Feature request label Jan 11, 2023

hadim mentioned this issue Jan 11, 2023

If step is in pending state, step timeout(active deadline seconds) is not working #3572

Closed

agilgur5 added the area/spec Changes to the workflow specification. label Jan 20, 2024

drawlerr linked a pull request Mar 7, 2024 that will close this issue

feat: add pendingTimeout for non-deadline timeout Fixes #10341 #12762

Open

agilgur5 mentioned this issue May 27, 2024

Retrying workflows have short deadline ignoring activeDeadlineSeconds #13044

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a `pendingTimeout` parameter #10341

Add a `pendingTimeout` parameter #10341

hadim commented Jan 11, 2023

drawlerr commented Jan 22, 2024

drawlerr commented Feb 29, 2024

Add a pendingTimeout parameter #10341

Add a pendingTimeout parameter #10341

Comments

hadim commented Jan 11, 2023

Summary

drawlerr commented Jan 22, 2024

drawlerr commented Feb 29, 2024

Add a `pendingTimeout` parameter #10341

Add a `pendingTimeout` parameter #10341