Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PETSc object is in wrong state #13100

Closed
prj- opened this issue Dec 20, 2021 · 15 comments
Closed

PETSc object is in wrong state #13100

prj- opened this issue Dec 20, 2021 · 15 comments

Comments

@prj-
Copy link

prj- commented Dec 20, 2021

I'm compiling deal.II 9.3.2 with the following configuration line.

cmake .. -DPETSC_DIR=$PETSC_DIR -DPETSC_ARCH=$PETSC_ARCH -DDEAL_II_WITH_PETSC=ON -DDEAL_II_WITH_MPI=ON -DDEAL_II_WITH_METIS=ON -DDEAL_II_WITH_SLEPC=ON -DMETIS_DIR=$PETSC_DIR/$PETSC_ARCH -DSLEPC_DIR=$PETSC_DIR/$PETSC_ARCH -DCMAKE_BUILD_TYPE=Release -DDEAL_II_WITH_UMFPACK=ON -DUMFPACK_DIR=$PETSC_DIR/$PETSC_ARCH -DDEAL_II_WITH_P4EST=ON -DP4EST_DIR=$PETSC_DIR/$PETSC_ARCH -DDEAL_II_WITH_TBB=OFF

I'm using PETSc main branch.
When I run the following, everything is OK.

$ mpirun -n 1 ./step-40.release -version -log_view -pc_type hypre

However, if I switch from hypre to some other PC, like bjacobi, I get hit by the following.

[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Object is in wrong state
[0]PETSC ERROR: Logging event had unbalanced begin/end pairs
[...]
[0]PETSC ERROR: #1 PetscLogEventEndDefault() at petsc/src/sys/logging/utils/eventlog.c:758
[0]PETSC ERROR: #2 KSPSolve_Private() at petsc/src/ksp/ksp/interface/itfunc.c:964
[0]PETSC ERROR: #3 KSPSolve() at petsc/src/ksp/ksp/interface/itfunc.c:1103

I don't know enough about deal.II and the plumbing done under the hood with PETSc to fix this (or reproduce this on a simpler MWE). Any help would be appreciated.

@harmonj
Copy link
Contributor

harmonj commented Dec 28, 2021

As the BoomerAMG preconditioner provided through PETSc and used in Step-40 relies on the package hypre (in fact, the initialize() call in the deal.II wrapper for the preconditioner tries to set the PCType to PCHYPRE internally), it is likely that other PCTypes are not guaranteed to be supported by PETSc.

@prj-
Copy link
Author

prj- commented Dec 28, 2021

Switching between PCType before an actual KSPSolve() is supported by PETSc, so are you implying this is a limitation of the deal.II interface to PETSc?

@bangerth
Copy link
Member

Yes. PETSc has a different mental model than we do. Their view is that you should be able to switch solvers and preconditioners based on command line arguments, whereas in deal.II we represent these as fixed types that are chosen at compile time (because that's how we do things with deal.II's own solvers and preconditioners, as well as those we import from Trilinos or other external libraries). I have never tried to set PETSc details on the command line, but it does not surprise me that that leads to trouble.

@prj-
Copy link
Author

prj- commented Dec 29, 2021

Even if I replace:

    LA::MPI::PreconditionAMG preconditioner;

    LA::MPI::PreconditionAMG::AdditionalData data;

#ifdef USE_PETSC_LA
    data.symmetric_operator = true;
#else
    /* Trilinos defaults are good */
#endif
    preconditioner.initialize(system_matrix, data);

by:

    dealii::PETScWrappers::PreconditionBlockJacobi preconditioner(system_matrix);

and not toy around with -pc_type, everything is supposedly determined at compile-time when it comes to PCType, yet using the command-line argument -log_view still yields the same error. I see you are setting a custom KSPMonitor, but I don't see why that would raise such an error. There is probably something else that slip under my radar in source/lac/petsc_solver.cc. Feel free to close the issue if users are not supposed to use PETSc options from the command line (in which case you probably shouldn't parse argc and argv in SlepcInitialize() and PetscInitialize()?).

@bangerth
Copy link
Member

I certainly have no idea what is going wrong here, and the PETSc error message isn't helpful either :-)

What does -log_view do?

@prj-
Copy link
Author

prj- commented Dec 29, 2021

It tracks PETSc operations so that you can get a summary at the end of your script with FLOP/s, load imbalance, and such. It is much more trivial for me to understand the performance summary from PETSc to split preconditioner setup vs. solution phase (I'm interested in benchmarking multiple preconditioners, so I need to breakdown the solve timer from step 40 in these two parts).
There may be a bug in PETSc, or at least a corner case that is currently not covered, but the deal.II interface is too much for me to handle to try to generate a MWE that exhibits the same behavior.

The PETSc error message says that PetscLogEventEnd(KSP_Solve) has been called more times than PetscLogEventBegin(KSP_Solve), but I've checked the execution with both -start_in_debugger and -log_trace trace and that just isn't the case, so there is some side effect somewhere...

@bangerth
Copy link
Member

I agree that that might be useful, but I'm afraid that it's fairly low on the priority list of anyone to fix. The PETSc solver interfaces were written in 2003 or 2004, and I don't think anyone readily remembers how they work and this is the first time anyone has brought up the issue :-)

@harmonj
Copy link
Contributor

harmonj commented Dec 29, 2021

@prj- With PETSc version 3.16.2, I am running into the same issues as you for Step-40, with the original Step-40 file and the command line arguments. Same issue appears with the modified Step-40 file, with additional arguments -version -log_view. Without -log_view works correctly, even with the other command line arguments in your original post, i.e., mpirun -n 1 ./step-40.release -version -pc_type bjacobi executes without throwing any errors.

However, making the changes above with PETSc version 3.13.2, I am not running into any errors, even with appending -version -log_view to the mpirun call.

I also repeated your call to mpirun -n 1 ./step-40.release -version -log_view -pc_type bjacobi above without issue (using the unmodified Step-40).

The execution times between the command line version and explicitly modifying the preconditioner to be PreconditionBlockJacobi in Step-40 are the same, with either version of PETSc (leaving out -log_view with v. 3.16.2).

Can you try building deal.II with an older version of PETSc (v. 3.13.2) and see if your issues persists and/or the results are what you expect? -log_view appears to be the culprit (at least with v.3.16.2). Perhaps something discussed in https://petsc.org/release/docs/manualpages/Profiling/PetscLogView.html has changed recently.

@prj-
Copy link
Author

prj- commented Dec 29, 2021

Without -log_view works correctly [...]

Indeed, this is working on my side as well.

However, making the changes above with PETSc version 3.13.2, I am not running into any errors [...]

That's great news. I'll git bisect down to tag v3.13.2. Thanks!

@prj-
Copy link
Author

prj- commented Dec 30, 2021

Just FYI.

# first bad commit: [c00cb57f5940c39b3204dc31afc73698b59112d7] Do not log KSPSolve() and PCApply() from in PCSetUp()

I'll try to see with @BarrySmith (author of that MR) what he thinks about this.

@prj-
Copy link
Author

prj- commented Dec 30, 2021

We are sorting this out upstream (see https://gitlab.com/petsc/petsc/-/merge_requests/3320#note_798371459 and https://gitlab.com/petsc/petsc/-/merge_requests/4685 for reference). Thanks for your help and for the reproducer.

@prj- prj- closed this as completed Dec 30, 2021
@bangerth
Copy link
Member

Thanks for sorting it out! I love these cross-project things!

Just because the deal.II interfaces are what they are, there is no reasons that they need to stay that way. It sounds like you already identified what is triggering the issue; if there is an easy way to address things (like initializing the KSP before the PC is attached to it), then that is something we can certainly do.

@prj-
Copy link
Author

prj- commented Dec 30, 2021

What you do in the interface is not the canonical way of doing things in PETSc (different mental model again, to quote you). If I understood the code correctly, you first create a PC and then attach it to a KSP via KSPSetPC(). Some deep hidden structures (only used when activating logging) of the KSP (PETSc) "class" what not fully initialized, but yet, it was used during PCSetUp(), before the creation of the KSP by deal.II, hence the error. A more standard approach would be to first create the KSP (which would fully initialize the structure) and then call KSPGetPC(). Then, there would be no problem calling PCSetUp().

@bangerth
Copy link
Member

bangerth commented Jan 1, 2022

For reference, this is where things are done:

  void
  SolverBase::solve(const MatrixBase &      A,
                    VectorBase &            x,
                    const VectorBase &      b,
                    const PreconditionBase &preconditioner)
  {
    [...]
    // first create a solver object if this
    // is necessary
    if (solver_data.get() == nullptr)
      {
        solver_data = std::make_unique<SolverData>();

        PetscErrorCode ierr = KSPCreate(mpi_communicator, &solver_data->ksp);
        AssertThrow(ierr == 0, ExcPETScError(ierr));

        // let derived classes set the solver
        // type, and the preconditioning
        // object set the type of
        // preconditioner
        set_solver_type(solver_data->ksp);

        ierr = KSPSetPC(solver_data->ksp, preconditioner.get_pc());
        AssertThrow(ierr == 0, ExcPETScError(ierr));

        // make sure the preconditioner has an associated matrix set
        const Mat B = preconditioner;
        AssertThrow(B != nullptr,
                    ExcMessage("PETSc preconditioner should have an "
                               "associated matrix set to be used in solver."));

        // setting the preconditioner overwrites the used matrices.
        // hence, we need to set the matrices after the preconditioner.
#  if DEAL_II_PETSC_VERSION_LT(3, 5, 0)
        // the last argument is irrelevant here,
        // since we use the solver only once anyway
        ierr = KSPSetOperators(solver_data->ksp,
                               A,
                               preconditioner,
                               SAME_PRECONDITIONER);
#  else
        ierr = KSPSetOperators(solver_data->ksp, A, preconditioner);
#  endif
        AssertThrow(ierr == 0, ExcPETScError(ierr));

        // then a convergence monitor
        // function. that function simply
        // checks with the solver_control
        // object we have in this object for
        // convergence
        ierr = KSPSetConvergenceTest(solver_data->ksp,
                                     &convergence_test,
                                     reinterpret_cast<void *>(&solver_control),
                                     nullptr);
        AssertThrow(ierr == 0, ExcPETScError(ierr));
      }

    // set the command line option prefix name
    PetscErrorCode ierr =
      KSPSetOptionsPrefix(solver_data->ksp, prefix_name.c_str());
    AssertThrow(ierr == 0, ExcPETScError(ierr));

    // set the command line options provided
    // by the user to override the defaults
    ierr = KSPSetFromOptions(solver_data->ksp);
    AssertThrow(ierr == 0, ExcPETScError(ierr));

    // then do the real work: set up solver
    // internal data and solve the
    // system.
    ierr = KSPSetUp(solver_data->ksp);
    AssertThrow(ierr == 0, ExcPETScError(ierr));

    ierr = KSPSolve(solver_data->ksp, b, x);
    AssertThrow(ierr == 0, ExcPETScError(ierr));

    // do not destroy solver object
    //    solver_data.reset ();

    // in case of failure: throw
    // exception
    if (solver_control.last_check() != SolverControl::success)
      AssertThrow(false,
                  SolverControl::NoConvergence(solver_control.last_step(),
                                               solver_control.last_value()));
    // otherwise exit as normal
  }

If I understand things right, then the problem is that at this point we have already created the PC and only attach it to the KSP after the fact? The PC is created during the construction of the PETScWrappers::PreconditionerBase object that is passed to this function, but if that is a problem, we could defer the creation to a virtual function call that is triggered after we have created the KSP object above. I don't think this would be difficult, but I don't have enough insight into what PETSc wants to know what one should do.

@prj-
Copy link
Author

prj- commented Jan 1, 2022

[...] If I understand things right, then the problem is that at this point we have already created the PC and only attach it to the KSP after the fact? [...]

My initial problem (PETSC ERROR: Object is in wrong state) is slightly more convoluted to explain. But if by "problem" you mean "PETSc mental model != deal.II mental model", then yes, that is the problem.

[...] we could defer the creation to a virtual function call that is triggered after we have created the KSP object above [...]

That would fix both problems indeed. But still, PETSc purists would say that there is no real point for you to keep a reference to both a PC and a KSP. They'll argue that it would simplify the interface to keep only a reference to a KSP(*), e.g., you will not have to call PCDestroy() yourself as it will be done in KSPDestroy(), and could make it less error-prone, e.g., the KSP and the PC will always live on the same communicator, which I think is not strictly enforced in the current deal.II interface if one puts crazy parameter for the MPI_Comm.

(*) The purest of the purists will say that you only need a reference to a SNES of type SNESKSPONLY but it doesn't look like you have PETSc nonlinear solvers hooked in so that's out of the question, I guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants