-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Return an error payload if run_async! fails #143
Return an error payload if run_async! fails #143
Conversation
Docker with a bad image name: Before:
After:
NOTE: the |
65a37c4
to
cf4f721
Compare
cleanup(runner_context) | ||
raise | ||
{"Error" => "States.TaskFailed", "Cause" => err.to_s} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it help with clarity to also include the error class?
{"Error" => "States.TaskFailed", "Cause" => err.to_s} | |
{"Error" => "States.TaskFailed", "Cause" => "#{err.class.name}: #{err}"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm thinking about this a little more, not sure how helpful the ruby exception class is going to be to the user since this is floe internal stuff.
What if we are more explicit about catching e.g. Kubeclient::Error
and AwesomeSpawn::CommandResultError
and just print the error string but let other exceptions raise up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we are more explicit about catching e.g. Kubeclient::Error and AwesomeSpawn::CommandResultError and just print the error string but let other exceptions raise up.
Not sure - the ultimate problem is that the parent wasn't handling it at the task level and it just got stuck, right? Perhaps we need changes for handling other errors on that side as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the problem that we saw on kubernetes was that the pod status was Pending so even though it "failed" we treated it as if it hadn't started yet (since that status was also Pending). The problem you were likely seeing on docker wasn't that an exception had been raised but it actually wasn't done pulling the image yet.
If an unhandled exception is raised the MiqQueue deliver method will catch it and still invoke the queue_callback
to mark the task as failed (I just tested this by throwing a raise in the workflow run_nonblock).
I'd like for user errors like podman/k8s failed to pull an image be handled as ASL errors, and anything else like NoMethodError on NilClass (straight bugs) be raised up as exceptions so we get a backtrace in the logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok yeah I agree 👍
cf4f721
to
16e6b54
Compare
16e6b54
to
aa67039
Compare
Checked commit agrare@aa67039 with ruby 2.6.10, rubocop 1.28.2, haml-lint 0.35.0, and yamllint |
After. verified it is working {
"Comment": "Invalid Image Name",
"StartAt": "UnknownImageNameRun",
"States": {
"UnknownImageNameRun": {
"Type": "Task",
"Resource": "docker://docker.io/kbrock/unknown:latest",
"Parameters": {
"ERROR": "failure message"
},
"End": true
}
}
}
|
Yea, so only comment here is the error differ based upon platform. Hope this information will help others writing workflows: # kubernetes
"Error" => "ErrorImagePull"
# docker and podman
"Error" => "States.TaskFailed" |
Yeah I went with the more general taskfailed because I couldn't get a helpful exception type from the failed AwesomeSpawn runs. I didn't think |
Currently if the command to start the container fails we simply raise the exception up to the caller rather than "handle" the error payload. This effectively aborts the workflow runtime without going through the normal failure paths that are already builtin to handle if a task fails after it is started.
This commit changes how these failures are handled in order to line up with how the rest of the Task error handling operates.
#141