New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-10166] Improve execution time errors with more concise error messages from DoFns #12170
[BEAM-10166] Improve execution time errors with more concise error messages from DoFns #12170
Conversation
R: @lostluck |
pid string | ||
} | ||
func (e *doFnError) Error() string { | ||
return fmt.Sprintf("%v caused error %v, uid is %v pid is %v", e.doFn, e.err, e.uid, e.pid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wiki syntax in JIRA really butchered my intended suggestion of DoFn[<uid>;<pid>]<doFn> returned error:<error>
Which places the short/known length things before the potentially long and rife with newlines error. It puts the context first, and avoids it getting lost or truncated behind a very long error string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an excellent suggestion.
ff46c6a
This is a good first step! Next up is adding it for use in user error returns in exec/pardo.go and similar. If you did that work already, it's probable that the commit wasn't pushed to your branch. Thanks for your patience in my getting to this. I took a long weekend for the US (and Canadian) holiday, and avoided all pretense of work. |
In BEAM-10166, it was highlighted that when DoFns with emitters use panics to signal failures, the error message returned to the user is not helpful. In this patch, we introduce a new error type called the doFnError which can be used to generate more concise error messages for DoFns that use emitters.
As a partt of BEAM-10166, a new type of error doFnError was introduced to make errors generated from DoFns during bundle execution more digestable to the user. In this patch, we implement the catching of this error in the callNoPanic function, which wraps bundle execution. WEe explicitly check the type of the error generated to see if its of type doFnError and return that error. Where it is not the case, the default behaviour of returning the stack trace is perserved.
bf36c9e
to
ff46c6a
Compare
In the utils package, a new error type DoFnError was introduced to make the signalling of DoFN errors more concise. In this patch, we implement this error to be created when a ParDo failure hook is created.
…ent/BEAM-10166-exec-err
This looks good to me! Thank you for the contribution! Would you mind adding a test to validate this behaviour? There's no need to test everything in exec/util.go, just callNoPanic. You can test three cases:
|
thanks @lostluck , I will implement this today / tomorrow. |
Definitely no need to apologize. Life >> code. |
Your image is missing the whole error, but I suspect that the build doesn't know where that package is coming from.
An alternative that should work is if you go to the beam root (.../github.com/apache/beam/) and run the gogradle vendor command. Unfortunately I can't recommend the best solution since it's not guaranteed to work with the way the Go SDK is currently implemented, which should get everything for you in one command. There's a small bit of logistics that need to be accomplished and feature work finished before we fix that though. |
Thanks @lostluck let me try my luck! =D |
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions. |
This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
The Go SDK uses errors returned by DoFns to signal failures to process bundles, and terminate bundle processing. However, if the preceding DoFn uses emitters, rather than error returns, the code has no choice to panic to avoid user code handling or ignoring the cross DoFn error (which could cause dataloss or other correctness problems).
All bundle executions are wrapped in
callNoPanic
to prevent worker termination on such panics, and orderly terminate just the affected bundle instead.callNoPanic
uses Go's built in recover mechanism to get the error and provide a stack trace.This is message is not necessarily helpful to the user and can lead to degraded user experience when debugging such errors.
In this patch, we introduce a new type of error; a
doFnError
which has a more concise representation of the DoFn which failed, the error which the DoFn failed with and other meta information.Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username
).[BEAM-XXX] Fixes bug in ApproximateQuantiles
, where you replaceBEAM-XXX
with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.