Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix PodCache handling of multi node jobs #573

Merged
merged 2 commits into from
May 19, 2021

Commits on May 19, 2021

  1. Fix PodCache handling of multi node jobs

    The PodCache used job id to identify a pod uniquely. The issue with this is that JobId is no longer unique to a single pod.
    
    So when you cancelled a multi node job, it'll delete one of the pods then leave the others until the cache expired
     - We prevent repeated deletion calls by holding an empty value for that pod in the cache
    However this means we wait for the PodExpiry between each pod being deleted of a multi node job
    
    Similarly we have a cache of submitted pods (that may not have been reported back via the api yet). However we'd incorrectly report only 1 pod being submitted when submitting many pods as part of a multi node job.
    JamesMurkin committed May 19, 2021
    Configuration menu
    Copy the full SHA
    ae16f42 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    b31c22d View commit details
    Browse the repository at this point in the history