New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
recoverable SNES and TS #15276
recoverable SNES and TS #15276
Conversation
@bangerth This should also go the release. |
da5aed0
to
1854cf9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like too complicated a scheme, because you have to first call call_and_possibly_capture_exception()
and then re-interpret what that function did. What if you added an extra argument to that function, say const std::function<bool()> &deal_with_recoverable_error
, that is called in the catch (const RecoverableUserCallbackError &exc)
case. If that function returns true
, then the exception is simply absorbed and we return -1
; if the function returns false
, then we save the exception and return -1
. This way, for each callback where there is a specific way to signal to PETSc that an error is recoverable, we can do that in the function object and return true
; for each place where we can't, we do nothing but return false
.
Would you mind rebasing this on current master to get rid of the already-merged commits? |
1854cf9
to
1fec014
Compare
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch has three parts, two of which are fine and one which I disagree with. Can you submit them separately?
- The part where you change
PetscFunctionReturn(err)
toPetscFunctionReturn(err ? PETSC_ERR_LIB : 0);
. This part is fine, and I'd like to get this committed via a separate PR. - The addition of the
compress()
calls to tests. This is fine as well, same here: Separate PR. - The part I disagree with is the error handling. I'll post a separate proposal in a separate comment in a second.
About the error handling: The way you deal with this is that if you get a recoverable user callback error, you deal with this in the
The problem is that you have to then deal with this at all calling locations, but you really only do it for the residual function. In all of the other places, right now you just eat the exception, pass the error on to PETSc, which then probably doesn't know what to do, errors out, but you have eaten the exception so you cannot re-throw it -- though it would be useful to know from a user perspective that the failure was due to a recoverable error that wasn't recovered from. I think a better system would go as follows:
My preference, though, would be to call the function Jed Brown suggested instead of explicitly setting vector elements to NaNs. |
1fec014
to
b93c9c5
Compare
@bangerth Thanks for the suggestion about the recoverable error callback. I have implemented it and extended recovery capabilities to |
b93c9c5
to
05a5611
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I like this. I think it's quite close.
My only question is whether we should expose these recoverable_action_...
callbacks to users. I don't think you use them in any of the tests (but may have missed it?) but from a software design perspective, this configuration point seems duplicative: These callbacks get called whenever a previous callback threw an exception. Whatever a user wants to do in the recoverable_action_*
function is something they could have already done at the point where they raised the exception to begin with.
So my preference would be to just get rid of these function objects and only use the internal ones you already define.
05a5611
to
f526c95
Compare
Done, with proper handling of unrecoverable errors too. This is ready for final review |
For example, now if you raise an unrecoverable exception, you also get a nice stack trace in PETSc
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two comments and one question.
return PetscError( | ||
PetscObjectComm((PetscObject)ppc), | ||
lineno + 3, | ||
"vmult", | ||
__FILE__, | ||
PETSC_ERR_LIB, | ||
PETSC_ERROR_INITIAL, | ||
"Failure in pcapply from dealii::PETScWrappers::NonlinearSolver"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct that by default, PetscError
simply returns the error code (here PETSC_ERR_LIB
) and does nothing else? But that one can configure things to print errors on the console with all the information you are giving as arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean? PetscError
is the entry point for error handling in PETSc, hidden in the macro calls SETERRQ
, CHKERRQ
(in the past) and PetscCall(...)
in the most recent versions. It can start a cascade of errors (when called with PETSC_ERROR_INITIAL
, used in the SETERRQ
macro) or continue going up the hierarchy of calls (with PETSC_ERROR_REPEAT
in the PetscCall
and CHKERRQ
). Inside PetscError
a few things are done, including the preparation of the line for the traceback and the invocation of the error handler, which by default is PetscTraceBackErrorHandler
, which prints the stack trace above. The error handler is customizable by the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question was mainly what happens by default when you hit PetscError
. Does it automatically print a stacktrace, or does it only do that if you configure PETSc to do so (or pass an appropriate command line flag)?
I'm asking because throwing exceptions (recoverable or not) should be a totally legitimate thing to do. The "error" is simply if you don't deal with them somewhere before you are back in main()
, and at that time the stacktrace should be printed. If you throw and catch an exception, no error should be printed because no error has happened: In the catch
statement, you are dealing with the error, after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default error handling is the one that prints the traceback above. We don't get the what()
the message when the error is raised by the solve_with_jacobian
callback because I'm eating the excpetion in PetscPreconditioner
. Before over-engineering the code, we should probably decide what to do in terms of unifying the usage of AssertPETSc
(i.e. throw exception, i.e. when calling PETSc in deal.II world), PetscCall
(i.e. calling PETSc in PETSc callbacks like snes_function
in NonlinearSolver
) and, possibly, have a single and unified call_and_possibly_recover
that will be shared by all the PETSc code in deal.II
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too late -- the PR is already merged :-)
f526c95
to
ce6c303
Compare
@bangerth Ok, I think this is ready now |
/rebuild |
// Failed reason is not reset uniformly within the | ||
// interface code of PCSetUp in PETSc. | ||
// We handle it here. | ||
PetscCall(PCSetFailedReason(ppc, PC_NOERROR)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks our regression testers, see https://cdash.dealii.org/viewBuildError.php?buildid=1876. Looks like this function was only added in PETSc 3.14, see petsc/petsc@1b2b984 but we claim to support 3.7.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to know this before merging to main.... I'll fix it ASAP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
depends on #15271 #15269