Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add previous resource request to lastRetry #12849

Open
EladProject opened this issue Mar 27, 2024 · 4 comments
Open

Add previous resource request to lastRetry #12849

EladProject opened this issue Mar 27, 2024 · 4 comments
Labels
area/retryStrategy Template-level retryStrategy area/templating Templating with `{{...}}` type/feature Feature request

Comments

@EladProject
Copy link

Summary

Add the previous run memory request to lastRetry

Use Cases

I'd like to be able to better control the memory request on retries. Sometimes I don't need to increase it (when the pod is evicted from a spot instance for example). On other occasions (OOMKilled) I'd like to exponentially increase it. Knowing the previous resource request along with #12722 should make this possible.


Message from the maintainers:

Love this feature request? Give it a 👍. We prioritise the proposals with the most 👍.

@EladProject EladProject added the type/feature Feature request label Mar 27, 2024
@tczhao tczhao added area/templating Templating with `{{...}}` area/retry-manual Manual workflow "Retry" Action (API/CLI/UI). See retryStrategy for template-level retries labels Mar 27, 2024
@agilgur5 agilgur5 changed the title Add memreqnum and memrequnit to lastRetry Add previous resource request to lastRetry Mar 28, 2024
@eduardodbr
Copy link
Member

Do you really need the previous resource request? Can't you just use the exit code to know if the last node failed with OOM and only increase the memory in those cases?

@EladProject
Copy link
Author

Are you talking about this: #12722 ?

It' still open, no?

Anyway, there are situations where this won't be enough:
Let's say that a node was preempted after being retried for OOM. In this case I'd like to retry (for the 3rd time) with the last memory request (which is not the original request, because it was increased after the OOM).

I think that adding the last retry memory together with #12722 will do the trick.

@Joibel
Copy link
Member

Joibel commented Mar 31, 2024

You can calculate it based upon the retry number. There is no need to read the previous value to do the calculations for any mathematical sequence whether it is linear or exponential.

@EladProject
Copy link
Author

Retry number increases regardless whether the retry is because of OOM or another reason. So there is no way to calculate with certainty the memory of the previous run (unless there is some way to access the exit codes for all the previous retries).

@agilgur5 agilgur5 added area/retryStrategy Template-level retryStrategy and removed area/retry-manual Manual workflow "Retry" Action (API/CLI/UI). See retryStrategy for template-level retries labels Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/retryStrategy Template-level retryStrategy area/templating Templating with `{{...}}` type/feature Feature request
Projects
None yet
Development

No branches or pull requests

5 participants