New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify PAPI Error Code 10 Message 14 #3855
Comments
I also saw this problem. The VM is not a preemptible and I'm using Cromwell v32. There's a lot of shards spending 10 minutes in "Waiting for quota" when this problem happens. The instance that gives PAPI Error Code 10 was able to get a virtual machine, though. Maybe there is a timeout for "Waiting for quota" which causes all other shards to fail with Error Code 10 even though there was nothing wrong with this particular shard?
|
@juhawilppu We hope to have this automatically retried soon, on the order of months. |
Cromwell treats Error Code 10, Message 14 as a preemption error. When a preemptible machine fails with Error Code 10: Message 14, a user doesn't usually see it as Cromwell retries the preemption. However, we've observed it is possible to get this error on a non-preemptible machine, which isn't retried and causes a workflow to fail.
The problem here is that it's quite unclear from this message that this is a transient failure and it's best to retry the workflow. Adjust the error message to include more information about the nature of this error and action items one can take to mitigate this failure mode.
The text was updated successfully, but these errors were encountered: