Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a pendingTimeout parameter #10341

Open
hadim opened this issue Jan 11, 2023 · 2 comments · May be fixed by #12762
Open

Add a pendingTimeout parameter #10341

hadim opened this issue Jan 11, 2023 · 2 comments · May be fixed by #12762
Labels
area/spec Changes to the workflow specification. type/feature Feature request

Comments

@hadim
Copy link

hadim commented Jan 11, 2023

Summary

See #3572 for context.

Some of our workflows fails to schedule a k8s node because sometimes there are errors in the configuration that is responsible to execute a workflow.

The currently available option activeDeadlineSeconds considers both the pending phase and also the execution phase. We would need an option that only consider the pending phase so our failing pending workflow would be marked as failed after xxx seconds.

This new option could be pendingTimeoutSeconds or pendintDeadlineSeconds.


Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.

@hadim hadim added the type/feature Feature request label Jan 11, 2023
@agilgur5 agilgur5 added the area/spec Changes to the workflow specification. label Jan 20, 2024
@drawlerr
Copy link
Contributor

I investigated this issue and when I looked into a possible solution, I ran into one that actually already exists: #3686

The template timeout field as currently documented (https://argo-workflows.readthedocs.io/en/latest/fields/#template) sounds like a duplicate of the activeDeadlineSeconds field, but as actually implemented the node StartedAt time is when the workflow node was created, and timeout is only considered for nodes in the NodePending phase, thus making timeout more like pendingTimeout in practice.

I have verified that specifying timeout: 600s in my templates does indeed prevent them from spending more than 600s in Pending state, while allowing them to run for however long they need to.

Perhaps some improvement to the documentation is in order?

@drawlerr
Copy link
Contributor

Update: while the timeout parameter does seem to catch pods that have been pending too long, it has some issues and consequences:

  • The template timeout is only evaluated "incidentally" and is not guaranteed to be evaluated near to expiration time, so it's it's more of a "minimum" than a "maximum" parameter
  • Template timeout is transferred to activeDeadlineSeconds if that param is unset or greater than template deadline. So, timeout is not just applicable to the pending state but rather a full end-to-end timeout
    I am investigating other options for resolution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/spec Changes to the workflow specification. type/feature Feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants