recoverable SNES and TS #15276

stefanozampini · 2023-05-26T19:27:06Z

stefanozampini · 2023-05-26T19:32:15Z

@bangerth This should also go the release.

bangerth

This seems like too complicated a scheme, because you have to first call call_and_possibly_capture_exception() and then re-interpret what that function did. What if you added an extra argument to that function, say const std::function<bool()> &deal_with_recoverable_error, that is called in the catch (const RecoverableUserCallbackError &exc) case. If that function returns true, then the exception is simply absorbed and we return -1; if the function returns false, then we save the exception and return -1. This way, for each callback where there is a specific way to signal to PETSc that an error is recoverable, we can do that in the function object and return true; for each place where we can't, we do nothing but return false.

include/deal.II/lac/petsc_snes.templates.h

tests/petsc/petsc_snes_05.cc

bangerth · 2023-06-04T02:23:55Z

Would you mind rebasing this on current master to get rid of the already-merged commits?

stefanozampini · 2023-06-05T20:32:36Z

Would you mind rebasing this on current master to get rid of the already-merged commits?

done

bangerth

This patch has three parts, two of which are fine and one which I disagree with. Can you submit them separately?

The part where you change PetscFunctionReturn(err) to PetscFunctionReturn(err ? PETSC_ERR_LIB : 0);. This part is fine, and I'd like to get this committed via a separate PR.
The addition of the compress() calls to tests. This is fine as well, same here: Separate PR.
The part I disagree with is the error handling. I'll post a separate proposal in a separate comment in a second.

bangerth · 2023-06-08T22:30:44Z

About the error handling: The way you deal with this is that if you get a recoverable user callback error, you deal with this in the call_and_possibly_capture_snes_exception() function by returning -1 instead of -2:

     catch (const RecoverableUserCallbackError &)
        {
          // ok, ignore, but reset the pointer
          eptr = nullptr;
          return -1;
      }
      catch (...)
        {
          eptr = std::current_exception();
          return -1;
          return -2;
        }

The problem is that you have to then deal with this at all calling locations, but you really only do it for the residual function. In all of the other places, right now you just eat the exception, pass the error on to PETSc, which then probably doesn't know what to do, errors out, but you have eaten the exception so you cannot re-throw it -- though it would be useful to know from a user perspective that the failure was due to a recoverable error that wasn't recovered from.

I think a better system would go as follows:

Change the call_and_possibly_capture_snes_exception() function to the following:

    /**
     * A function that calls the function object given by its first argument
     * with the set of arguments following at the end. If the call returns
     * regularly, the current function returns zero to indicate success. If
     * the call fails with an exception, then the current function returns with
     * an error code of -1. In that case, the exception thrown by `f` is
     * captured and `eptr` is set to the exception. In case of success,
     * `eptr` is set to `nullptr`.
     *
     * If the user callback fails with a recoverable exception, then (i) if
     * `recoverable_action` is set, execute it, eat the exception, and return
     * zero; (ii) if `recoverable_action` is an empty function object, store the
     * exception and return -1.
     */
    template <typename F, typename... Args>
    int
    call_and_possibly_capture_snes_exception(const F &           f,
                                             std::exception_ptr &eptr,
                                             const std::function<void()> &recoverable_action,
                                             Args &&...args)
    {
      // See whether there is already something in the exception pointer
      // variable. There is no reason why this should be so, and
      // we should probably bail out:
      AssertThrow(eptr == nullptr, ExcInternalError());

      // Call the function and if that succeeds, return zero:
      try
        {
          f(std::forward<Args>(args)...);
          eptr = nullptr;
          return 0;
        }
      // In case of a recoverable exception, do as suggested above:
      catch (const RecoverableUserCallbackError &)
        {
          if (recoverable_action)
          {
             recoverable_action();
             eptr = nullptr;
             return 0;
          }
          else
          {
            eptr = std::current_exception();
            return -1;
          }
        }
      // In case of an exception, capture the exception and
      // return -1:
      catch (...)
        {
          eptr = std::current_exception();
          return -1;
        }
    }

For callbacks for which recovery is not an option, you would then simply call call_and_possibly_capture_snes_exception() with {} for the new argument.
In places where recovery is an option, you would call call_and_possibly_capture_snes_exception with something like [f]() { *f = std::numeric_limits<real_type>::quiet_NaN(); } for the new argument.

My preference, though, would be to call the function Jed Brown suggested instead of explicitly setting vector elements to NaNs.

stefanozampini · 2023-06-12T00:03:54Z

@bangerth Thanks for the suggestion about the recoverable error callback. I have implemented it and extended recovery capabilities to PETScWrappers::TimeStepper too. Please review again, this should now satisfy your requirements

bangerth

Yes, I like this. I think it's quite close.

My only question is whether we should expose these recoverable_action_... callbacks to users. I don't think you use them in any of the tests (but may have missed it?) but from a software design perspective, this configuration point seems duplicative: These callbacks get called whenever a previous callback threw an exception. Whatever a user wants to do in the recoverable_action_* function is something they could have already done at the point where they raised the exception to begin with.

So my preference would be to just get rid of these function objects and only use the internal ones you already define.

stefanozampini · 2023-06-13T12:25:05Z

Yes, I like this. I think it's quite close.

My only question is whether we should expose these recoverable_action_... callbacks to users. I don't think you use them in any of the tests (but may have missed it?) but from a software design perspective, this configuration point seems duplicative: These callbacks get called whenever a previous callback threw an exception. Whatever a user wants to do in the recoverable_action_* function is something they could have already done at the point where they raised the exception to begin with.

So my preference would be to just get rid of these function objects and only use the internal ones you already define.

Done, with proper handling of unrecoverable errors too. This is ready for final review

stefanozampini · 2023-06-13T12:28:05Z

For example, now if you raise an unrecoverable exception, you also get a nice stack trace in PETSc

$ ./petsc_ts_05.debug 
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Error in external library
[0]PETSC ERROR: Failure in ts_ijacobian from dealii::PETScWrappers::TimeStepper
...
[0]PETSC ERROR: #1 implicit_jacobian() at /home/szampini/Devel/dealii/include/deal.II/lac/petsc_ts.templates.h:489
[0]PETSC ERROR: #2 TSComputeIJacobian() at /home/szampini/Devel/petsc/src/ts/interface/ts.c:920
[0]PETSC ERROR: #3 SNESTSFormJacobian_BDF() at /home/szampini/Devel/petsc/src/ts/impls/bdf/bdf.c:375
[0]PETSC ERROR: #4 SNESTSFormJacobian() at /home/szampini/Devel/petsc/src/ts/interface/ts.c:4262
[0]PETSC ERROR: #5 SNESComputeJacobian() at /home/szampini/Devel/petsc/src/snes/interface/snes.c:2755
[0]PETSC ERROR: #6 SNESSolve_NEWTONLS() at /home/szampini/Devel/petsc/src/snes/impls/ls/ls.c:216
[0]PETSC ERROR: #7 SNESSolve() at /home/szampini/Devel/petsc/src/snes/interface/snes.c:4631
[0]PETSC ERROR: #8 TSBDF_SNESSolve() at /home/szampini/Devel/petsc/src/ts/impls/bdf/bdf.c:209
[0]PETSC ERROR: #9 TSStep_BDF() at /home/szampini/Devel/petsc/src/ts/impls/bdf/bdf.c:269
[0]PETSC ERROR: #10 TSStep() at /home/szampini/Devel/petsc/src/ts/interface/ts.c:3449
[0]PETSC ERROR: #11 TSSolve() at /home/szampini/Devel/petsc/src/ts/interface/ts.c:3842

bangerth

Two comments and one question.

include/deal.II/lac/petsc_ts.h

bangerth · 2023-06-13T19:41:40Z

source/lac/petsc_precondition.cc

+        return PetscError(
+          PetscObjectComm((PetscObject)ppc),
+          lineno + 3,
+          "vmult",
+          __FILE__,
+          PETSC_ERR_LIB,
+          PETSC_ERROR_INITIAL,
+          "Failure in pcapply from dealii::PETScWrappers::NonlinearSolver");


Is it correct that by default, PetscError simply returns the error code (here PETSC_ERR_LIB) and does nothing else? But that one can configure things to print errors on the console with all the information you are giving as arguments?

What do you mean? PetscError is the entry point for error handling in PETSc, hidden in the macro calls SETERRQ, CHKERRQ (in the past) and PetscCall(...) in the most recent versions. It can start a cascade of errors (when called with PETSC_ERROR_INITIAL, used in the SETERRQ macro) or continue going up the hierarchy of calls (with PETSC_ERROR_REPEAT in the PetscCall and CHKERRQ ). Inside PetscError a few things are done, including the preparation of the line for the traceback and the invocation of the error handler, which by default is PetscTraceBackErrorHandler, which prints the stack trace above. The error handler is customizable by the user.

My question was mainly what happens by default when you hit PetscError. Does it automatically print a stacktrace, or does it only do that if you configure PETSc to do so (or pass an appropriate command line flag)?

I'm asking because throwing exceptions (recoverable or not) should be a totally legitimate thing to do. The "error" is simply if you don't deal with them somewhere before you are back in main(), and at that time the stacktrace should be printed. If you throw and catch an exception, no error should be printed because no error has happened: In the catch statement, you are dealing with the error, after all.

The default error handling is the one that prints the traceback above. We don't get the what() the message when the error is raised by the solve_with_jacobian callback because I'm eating the excpetion in PetscPreconditioner. Before over-engineering the code, we should probably decide what to do in terms of unifying the usage of AssertPETSc (i.e. throw exception, i.e. when calling PETSc in deal.II world), PetscCall (i.e. calling PETSc in PETSc callbacks like snes_function in NonlinearSolver) and, possibly, have a single and unified call_and_possibly_recover that will be shared by all the PETSc code in deal.II

@bangerth On second thought (and by looking at #15349 and #15360) I think I should handle the exception here to have a more uniform error report from the nonlinear solvers. I will push one last change before merging this

Too late -- the PR is already merged :-)

…epper

stefanozampini · 2023-06-13T22:05:32Z

@bangerth Ok, I think this is ready now

bangerth · 2023-06-14T21:49:30Z

/rebuild

masterleinad · 2023-06-15T13:01:30Z

source/lac/petsc_precondition.cc

+    // Failed reason is not reset uniformly within the
+    // interface code of PCSetUp in PETSc.
+    // We handle it here.
+    PetscCall(PCSetFailedReason(ppc, PC_NOERROR));


This breaks our regression testers, see https://cdash.dealii.org/viewBuildError.php?buildid=1876. Looks like this function was only added in PETSc 3.14, see petsc/petsc@1b2b984 but we claim to support 3.7.

It would be nice to know this before merging to main.... I'll fix it ASAP

@masterleinad #15368

This was referenced May 26, 2023

Make the PETSc TS interfaces conform to callback conventions. #15269

Merged

Convert PETSc SNES interfaces to use callback error conventions. #15271

Merged

stefanozampini force-pushed the stefanozampini/recoverable branch from da5aed0 to 1854cf9 Compare May 26, 2023 20:30

bangerth added this to the Release 9.5 milestone May 27, 2023

bangerth reviewed May 27, 2023

View reviewed changes

include/deal.II/lac/petsc_snes.templates.h Outdated Show resolved Hide resolved

include/deal.II/lac/petsc_snes.templates.h Outdated Show resolved Hide resolved

tests/petsc/petsc_snes_05.cc Show resolved Hide resolved

stefanozampini force-pushed the stefanozampini/recoverable branch from 1854cf9 to 1fec014 Compare June 5, 2023 20:32

bangerth reviewed Jun 8, 2023

View reviewed changes

stefanozampini force-pushed the stefanozampini/recoverable branch from 1fec014 to b93c9c5 Compare June 12, 2023 00:01

stefanozampini force-pushed the stefanozampini/recoverable branch from b93c9c5 to 05a5611 Compare June 12, 2023 11:51

bangerth reviewed Jun 12, 2023

View reviewed changes

stefanozampini force-pushed the stefanozampini/recoverable branch from 05a5611 to f526c95 Compare June 13, 2023 12:24

bangerth reviewed Jun 13, 2023

View reviewed changes

PETScWrappers: allow recoverable errors in NonlinearSolver and TimeSt…

ce6c303

…epper

stefanozampini force-pushed the stefanozampini/recoverable branch from f526c95 to ce6c303 Compare June 13, 2023 22:04

bangerth added Reviewed and ready to merge ready to test labels Jun 14, 2023

bangerth approved these changes Jun 14, 2023

View reviewed changes

stefanozampini mentioned this pull request Jun 15, 2023

Update tests #15360

Merged

luca-heltai merged commit e1f4330 into dealii:master Jun 15, 2023
14 checks passed

stefanozampini mentioned this pull request Jun 15, 2023

PETSc tests: Add alternative output files for clang-16 #15349

Merged

masterleinad reviewed Jun 15, 2023

View reviewed changes

stefanozampini mentioned this pull request Jun 15, 2023

PETSc: Fix backward compatibility #15368

Merged

stefanozampini deleted the stefanozampini/recoverable branch July 18, 2023 22:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recoverable SNES and TS #15276

recoverable SNES and TS #15276

stefanozampini commented May 26, 2023

stefanozampini commented May 26, 2023

bangerth left a comment

bangerth commented Jun 4, 2023

stefanozampini commented Jun 5, 2023

bangerth left a comment

bangerth commented Jun 8, 2023

stefanozampini commented Jun 12, 2023

bangerth left a comment

stefanozampini commented Jun 13, 2023

stefanozampini commented Jun 13, 2023

bangerth left a comment

bangerth Jun 13, 2023

stefanozampini Jun 13, 2023

bangerth Jun 14, 2023

stefanozampini Jun 14, 2023

stefanozampini Jun 15, 2023

bangerth Jun 15, 2023

stefanozampini commented Jun 13, 2023

bangerth commented Jun 14, 2023

masterleinad Jun 15, 2023

stefanozampini Jun 15, 2023

stefanozampini Jun 15, 2023

recoverable SNES and TS #15276

recoverable SNES and TS #15276

Conversation

stefanozampini commented May 26, 2023

stefanozampini commented May 26, 2023

bangerth left a comment

Choose a reason for hiding this comment

bangerth commented Jun 4, 2023

stefanozampini commented Jun 5, 2023

bangerth left a comment

Choose a reason for hiding this comment

bangerth commented Jun 8, 2023

stefanozampini commented Jun 12, 2023

bangerth left a comment

Choose a reason for hiding this comment

stefanozampini commented Jun 13, 2023

stefanozampini commented Jun 13, 2023

bangerth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stefanozampini commented Jun 13, 2023

bangerth commented Jun 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment