Nonlinear Solver Interface #1570

termi-official · 2022-01-11T19:24:49Z

Some nonlinear problems can be very efficiently solved with non-standard methods, possibly abusing the specific structure of the problem. To allow users to efficiently utilize such methods in tandem with existing time solvers it might be helpful to expand the nonlinear solver interface (in the long term). A nice first approach to a nonlinear solver interface is given in [6]. Let me give 4 examples for possible applications of such an interface to boost the time step performance in specific applications to study possible requirements on such an interface (in addition to the existing ones, e.g. W-transformation, Jacobian reuse, ...). This list is not ment to be exhaustive, as I do not have a full overview on the state of the art on nonlinear solvers:

Multilevel-Newton

Nonlinear problems arising from space-time discretizations of PDEs, PDAEs and similar, with solution variables that are not directly dependent on spatial derivatives can be efficiently solved with a multilevel strategy. These solution variables are often referred to as internal variables, occurring regularly in electronics and mechanics. Now, denoting the internal variables with q and the remaining variables with u, we can write the problem in mass matrix form as

Mu' = f(u,s,t)
Ns' = g(u,s,t)

where 'denotes the time derivative and M is a positive semi-definite matrix and N is a positive semi-definite diagonal matrix. Now we can algorithmically decouple the solve for s and u by an alternating Newton scheme, where we solve for s on the "local" level and u on the "global" level. This way we obtain a large number of small local problems and one large problem on global level, however if dim(s) >> dim(u) we can greatly improve the efficiency of the scheme. A detailed derivation with FEM in conjunction with Runge-Kutta schemes is for example in [1].

Quasi-newton methods

An alternative approach to efficient Jacobian reuse, which still requires the solution of equation systems, can be a quasi-Newton strategy. This method class utilizes (updated) approximations of the Jacobian in each iterative solve (see e.g. [4]) - however it is often observed that the quadratic convergence of the classical Newton-Raphon is lost. Nonetheless these methods can be highly efficient as e.g. studied in [5].

Globalization strategies

Newton-Raphon does not perform well if no good initial guess for the nonlinear solve is available. In these scenarios so-called globalization strategies can help to some extend, usually by limiting the length of the tangent appropriately. Some overview is given in the methods section of [2] and the references therein, as well as the Eisenstat-Walker paper [3].

Nonlinear preconditioning

So-called nonlinear preconditioners take the idea from linear preconditioners and apply it to nonlinear solvers - especially Newton-type methods. Instead of solving F(u) = 0 a function G \approx F^-1 such that G(F(u)) = 0 is used, which has the same roots and is simpler to solve. Alternatively it is possible to solve F(H(\tilde{u})) = 0 where `u = H(\tilde{u}) such that the nonlinear problem is simpler to solve.

References

[1] Hartmann, Stefan. "A remark on the application of the Newton-Raphson method in non-linear finite element analysis." Computational Mechanics 36.2 (2005): 100-116.
[2] Pawlowski, Roger P., et al. "Globalization techniques for Newton–Krylov methods and applications to the fully coupled solution of the Navier–Stokes equations." SIAM review 48.4 (2006): 700-721.
[3] Eisenstat, Stanley C., and Homer F. Walker. "Globally convergent inexact Newton methods." SIAM Journal on Optimization 4.2 (1994): 393-422.
[4] Matthies, Hermann, and Gilbert Strang. "The solution of nonlinear finite element equations." International journal for numerical methods in engineering 14.11 (1979): 1613-1626.
[5] Liu, Tiantian, Sofien Bouaziz, and Ladislav Kavan. "Quasi-newton methods for real-time simulation of hyperelastic materials." Acm Transactions on Graphics (TOG) 36.3 (2017): 1-16.
[6] Brune, Peter R., et al. "Composing scalable nonlinear algebraic solvers." SIAM Review 57.4 (2015): 535-565.

Problems

When trying to implement a Multilevel-Newton (with Ferrite.jl) I encountered two major issues.

There is a tight coupling (in a software engineering sense) between the caches of the nonlinear solver and some of the time stepper.
The nonlinear solver framework is scattered across the ecosystem and the documentation is incomplete (although the latter is more of a minor problem, since I can throw more time on this).

I am happy to discuss possible designs of the nonlinear interface and invest some time here. However, I am lacking experience with nonlinear iterative solvers and some details behind ODE/DAE/... solvers, which are incorporated in the ecosystem to tackle this by-myself.

The text was updated successfully, but these errors were encountered:

ChrisRackauckas · 2022-01-16T11:43:57Z

There is a tight coupling (in a software engineering sense) between the caches of the nonlinear solver and some of the time stepper.

Yeah, this is necessary for the quasi-Newton method Jacobian reuse strategies, which is why the nonlinear solvers have to be specialized to the ODE solvers instead of using NonlinearSolve.jl. It would be interesting to have a version of a nlsolver which just takes a NonlinearSolve.jl algorithm though, just for research and trying things. That would instantly give globalization strategies (via NLsolve.jl) and so it would be a test of how useful they would be to add.

An alternative approach to efficient Jacobian reuse, which still requires the solution of equation systems, can be a quasi-Newton strategy. This method class utilizes (updated) approximations of the Jacobian in each iterative solve (see e.g. [4]) - however it is often observed that the quadratic convergence of the classical Newton-Raphon is lost. Nonetheless these methods can be highly efficient as e.g. studied in [5].

One thing to note about these quasi-Newton methods, Broyden and such, is that they lack the stability of methods which have the linear solve, so they generally aren't applicable to stiff systems which is where most of the machinery is geared. That said, there are implicit non-stiff methods, such as the symplectic FIRKs, for which this would be useful.

So-called nonlinear preconditioners take the idea from linear preconditioners and apply it to nonlinear solvers - especially Newton-type methods. Instead of solving F(u) = 0 a function G \approx F^-1 such that G(F(u)) = 0 is used, which has the same roots and is simpler to solve. Alternatively it is possible to solve F(H(\tilde{u})) = 0 where `u = H(\tilde{u}) such that the nonlinear problem is simpler to solve.

Indeed this could be useful to add to NLNewton. That said, I'm not sure of a use case where a user knows a nonlinear preconditioner where a linear preconditioner wouldn't do as much of the work, but it would be interesting to see a reference on that.

The nonlinear solver framework is scattered across the ecosystem and the documentation is incomplete (although the latter is more of a minor problem, since I can throw more time on this).

It should all just be in OrdinaryDiffEq.jl. There is a little bit of duplicate functionality in SciMLBase, with a little bit that isn't duplicate:

https://github.com/SciML/DiffEqBase.jl/tree/master/src/nlsolve

The reason for that was for the nlsolvers of StochasticDiffEq. However, nowadays StochasticDiffEq.jl just depends on OrdinaryDiffEq.jl and uses its full nonlinear solver machinery. Since really only the *DiffEq.jl libraries can use these nonlinear solvers (not like they will be hacked into LSODA.jl for example), they might as well follow this structure and completely move to OrdinaryDiffEq.jl. If anyone has a second of free time that would be the best starting point here. It should be a rather simple and non-breaking change to do it.

I am happy to discuss possible designs of the nonlinear interface and invest some time here. However, I am lacking experience with nonlinear iterative solvers and some details behind ODE/DAE/... solvers, which are incorporated in the ecosystem to tackle this by-myself.

Pulling @devmotion and @YingboMa in, along with @kanav99 who worked on this some. Always good to train a new contributor. I think normally I like to understand it from the lens of TRBDF2. Maybe it's worth adding a description of this somewhere in http://devdocs.sciml.ai/latest/

termi-official · 2022-01-17T19:01:33Z

Thanks for the valuable comments Chris! I have some follow up questions.

There is a tight coupling (in a software engineering sense) between the caches of the nonlinear solver and some of the time stepper.

Yeah, this is necessary for the quasi-Newton method Jacobian reuse strategies, which is why the nonlinear solvers have to be specialized to the ODE solvers instead of using NonlinearSolve.jl. [...]

Correct me if I miss something more fundamental, but is there a reason (other than the lack of man-power) why we could not wrap reuse-strategies and specific transformations (e.g. W-transformations) into hooks or other abstraction mechanisms?

One thing to note about these quasi-Newton methods, Broyden and such, is that they lack the stability of methods which have the linear solve, so they generally aren't applicable to stiff systems which is where most of the machinery is geared.

I was not aware of the the instability, but it makes at least some sense for me (I assume the secant condition is the source of the instability). I found you answer on stackexchange) on this, but I am not sure what the exact cause is. Do you by any chance have some literature on this at hand?

That said, I'm not sure of a use case where a user knows a nonlinear preconditioner where a linear preconditioner wouldn't do as much of the work, but it would be interesting to see a reference on that.

For example, you can interpret nonlinear FETI-DP and nonlinear BDDC as nonlinear right-preconditioned Newton strategies. I am leaving [1] for a reference where the classical Newton-Krylow (BoomerAMG+GMRES) setting is "outperformed". I am aware that this is rather a specific example and not even in a time-dependent setting, but at least some example.

[1] Klawonn, Axel, et al. "Nonlinear FETI-DP and BDDC methods: a unified framework and parallel results." SIAM Journal on Scientific Computing 39.6 (2017): C417-C451. https://numerik.uni-koeln.de/sites/numerik/research/nonl_framework.pdf

ChrisRackauckas · 2022-01-23T12:34:28Z

Correct me if I miss something more fundamental, but is there a reason (other than the lack of man-power) why we could not wrap reuse-strategies and specific transformations (e.g. W-transformations) into hooks or other abstraction mechanisms?

We could, it's just really difficult because the reuse strategies are very deep in there, using things like Newton convergence rate estimates from within the algorithm. So it's hard to have a really clean interface on that.

I was not aware of the the instability, but it makes at least some sense for me (I assume the secant condition is the source of the instability). I found you answer on stackexchange) on this, but I am not sure what the exact cause is. Do you by any chance have some literature on this at hand?

I'm not sure of the best literature on this. It's easy to show between Newton and fixed point iteration, the other ones I know by word of mouth and empirically. You can probably prove something about how it's related to the condition number of the Jacobian because the Broyden methods are essentially explicit ODE time steppers falling towards a minima of sorts.

For example, you can interpret nonlinear FETI-DP and nonlinear BDDC as nonlinear right-preconditioned Newton strategies. I am leaving [1] for a reference where the classical Newton-Krylow (BoomerAMG+GMRES) setting is "outperformed". I am aware that this is rather a specific example and not even in a time-dependent setting, but at least some example.

That's a nice example. Yeah, this would all be a good GSoC project.

bgroenks96 · 2023-05-08T19:31:05Z

Hey @termi-official and @ChrisRackauckas has there been any progress or further discussion on this?

termi-official · 2023-05-08T19:45:00Z

I could not really find time to circle back to this one, sorry for this, but it is still on my list. As far as I can judge from peeking the internals from time to time the nonlinear interfaces has been quite nicely refactored since I have opened the issue! I tried to follow Chris advice a few weeks ago and bridge NonlinearSolve.jl, but I got stuck and ran out of the time which I could allocate. I also won't find time to give this one another try before August.

oscardssmith · 2023-09-26T18:28:29Z

One reason to break this out is that other features like multiple shooting might also want to introspect the convergence rate of a solver.

termi-official · 2023-09-26T18:40:30Z

Can you elaborate further @oscardssmith ? I am not familiar with this class of methods.

oscardssmith · 2023-09-26T20:15:45Z

The idea is that if you are trying to find roots without a good guess multiple shooting has you initialize a root-finder from a lot of different points and to increase your chance of having one of them near a root. To make this more efficient, you likely want to stop root-finding on the points that fail to converge relatively quickly.

ChrisRackauckas · 2023-09-30T16:14:22Z

That's not multiple shooting. That's a multistart method.

ChrisRackauckas · 2023-09-30T16:19:08Z

We should pick back up this idea though of committing to the NonlinearSolve.jl interface. That interface now has proper init/solve/solve! choices, uses LinearSolve and has integration with SparseDiffTools.jl, etc. It has everything the ODE solver nonlinear solvers have and more, other than the specialized reuse techniques. However, with the current interface, the reuse techniques can be done. Effectively it would just need:

Use cache = init(nonlinearprob, alg) to construct the nonlinear solver
Some options or methods for quasi-Newton which don't factorize
Have a hook on resample_jacobian(cache, u0) to force when to update it

Then everything you're talking about in the OP would just need to go into NonlinearSolve.jl. That would be a much cleaner design and allow the same tooling to be reused in all sorts of other solvers. We're now close enough that we should really make the push to get it done sometime in the next year.

termi-official · 2023-10-09T18:47:59Z

I agree on the part that this issue is also related to the NonlinearSolve.jl interface and I love the progress on its design. However, I have trouble seeing how to preserve specific structural information from the time-dependent/semidiscrete problem into the nonlinear solver, such that everything can go into NonlinearSolve.jl. Let me elaborate on an example from electrophysiology, i.e. a monodomain problem, where I currently see some specific open issues related to OrdinaryDiffEq.jl (in addition to NonlinearSolve.jl, and maybe LinearSolve.jl). Monodomain problems can be written down as the following system:

$$ \begin{aligned} \partial_t {u} &= \nabla \cdot D \nabla u + f({u},\boldsymbol{v}, t) \\ \partial_t \boldsymbol{v} &=\boldsymbol{g}(u,\boldsymbol{v}) , \end{aligned} $$

where we want to solve for $u$ and $v$ with a Newton-type method. The extra structure here is that $\boldsymbol{g}$ is only "locally decoupled". Using a finite element discretization in space, we end up with a problem like

$$ \begin{aligned} M d_t \hat{u} &= K \hat{u} + f(\hat{u},\hat{v}, t) \\ d_t \hat{v} &=\boldsymbol{g}(\hat{u},\hat{\boldsymbol{v}}) , \end{aligned} $$

assuming $\boldsymbol{v}$ is scalar and the finite element discretization has been done in 1D with 2 linear elements and 2 quadrature points per element, then we end up with 4 dofs for $\hat{v}$ and 3 dofs for $\hat{u}$:

o-x---x-o-x---x-o

where ,,o´´ indicates the dofs $\hat{u}_1, \hat{u}_2, \hat{u}_3$ and ,,x´´ indicates the dofs $\hat{v}_1, \hat{v}_2, \hat{v}_3, \hat{v}_4$.

Using information about the Schur complement

Next, we can, for example, simply discretize in time with Backward Euler and solve the nonlinear problems with the standard Newton-Raphson. However, for this larger class of problems we know that this solve is not efficient, especially if we go to big 3D problems with nonlinear ansatz spaces.

A more efficient solve here could for example use the Schur complement of the full jacobian. We can already see the 2x2 block structure in the sparsity pattern:

$$ J_{u,v}(\hat{u},\hat{v}) := \begin{bmatrix} K + \partial_u f & \partial_v f \\ \partial_u g & \partial_v g \\ \end{bmatrix} $$

where $\partial_v g$ is per construction a (block-)diagonal matrix. Now a first logical step to optimize the solve is to use the schur completment in the inner linear solve. I think this can be done already, but I have trouble here on how to encode it in types (via LinearSolve.jl).

Multilevel-Newton / Nested Solvers

Now, for this specific class of problems it has been shown that we still can do better. Instead of using the Schur complement it is possible to cleverly use the implicit function theorem to first solve for $\hat{v}$ with a fixed guess for $\hat{u}$. With this we can now fix $\hat{v}$ and first solve a helper system $C = - [\partial_v g]^{-1} \partial_u g$ (which is reasonably because $\partial_v g$ is block-diagonal). The remainder works as for the Schur complement to get the correct jacobian for quadratic convergence for the nonlinear solve.

With this method I am not even sure where to start, because on the one hand the extra structural information has to be passed down to the nonlinear solver and then we have the local/nested nonlinear solver. Higher order time discretizations follow a similar strategy. I can provide papers on this if wanted. I am also willing to implement prototypes, but I am really know stuck where to start and which kinds of designs are even acceptable for the SciML ecosystem. I mean the obvious one is "just define another problem" like LocalGlobalODEProblem/LocalGlobalDAEProblem/..., dispatch on suitable existing solvers like the SDIRK family and a specific nonlinear solve for this problem like MultilevelNewtonSolver, but I am not confident yet that this is really the way to go.

cc @frankschae from the slack discussion

ChrisRackauckas · 2023-10-16T10:04:50Z

What I meant is that using the NonlinearSolve.jl interface would make it have a very well-defined and documented way to be easily extended. We just need to get it the right hooks to fully replace the internal nonlinear solve, and then we can just use that. Once we are there, all of the things you want to do would just be done via adding new (maybe specialized) solvers to NonlinearProblem.

Using information about the Schur complement
Multilevel-Newton / Nested Solvers

These are not uncommon things, and indeed adding special solvers to NonlinearSolve.jl for these cases should be on the docket. It would be good to open an issue on the repo that describes these solvers in detail along with references. I could see these as good GSoC projects.

termi-official · 2023-10-31T12:57:42Z

What I meant is that using the NonlinearSolve.jl interface would make it have a very well-defined and documented way to be easily extended. We just need to get it the right hooks to fully replace the internal nonlinear solve, and then we can just use that. Once we are there, all of the things you want to do would just be done via adding new (maybe specialized) solvers to NonlinearProblem.

This is exactly my concern. Let us take the problem from the most recent my most recent answer. This would be an ODEProblem. I am still struggling to see how to pass down specific structural problem information down to the inner nonlinear solver. E.g. if we want to use a a (non-)linear solver where the Schur complement is directly used, then, to my understanding, the information about the block structure is already lost when we arrive at the solve stage. Should this information then implicitly be encoded in jac_prototype for ODEFunction or what is the idea here?

Using information about the Schur complement
Multilevel-Newton / Nested Solvers

These are not uncommon things, and indeed adding special solvers to NonlinearSolve.jl for these cases should be on the docket. It would be good to open an issue on the repo that describes these solvers in detail along with references.

Will grab this once I got an idea what changes need to go where.

I could see these as good GSoC projects.

Sure. We can go looking for students once the baseline is done and I gathered some experience with the SciML internals. I have already started but I need more time to explore the codebase and its design. From my side I would like to polish our Navier Stokes solver (https://ferrite-fem.github.io/Ferrite.jl/dev/tutorials/ns_vs_diffeq/) as a first study and add another example with Newmark (i.e. resolve #411) to Ferrite. Once I have done this we can take a look at the nonlinear preconditioning and/or multilevel newton for the next GSoC round. Does that sound reasonable?

ChrisRackauckas transferred this issue from SciML/DifferentialEquations.jl Jan 14, 2022

termi-official mentioned this issue May 30, 2022

Example Integration with DifferentialEquations.jl shown by the Bidomain model with FHN cell model Ferrite-FEM/Ferrite.jl#325

Draft

3 tasks

termi-official mentioned this issue Jun 29, 2022

Interface Design koehlerson/FESolvers.jl#2

Open

ChrisRackauckas mentioned this issue Oct 18, 2023

Completeness for use in OrdinaryDiffEq.jl SciML/NonlinearSolve.jl#247

Closed

sjdaines mentioned this issue Oct 20, 2023

Nonlinear solve TODO PALEOtoolkit/PALEOmodel.jl#76

Open

This was referenced Nov 9, 2023

Improve pressure handling in incompressible Navier Stokes example Ferrite-FEM/Ferrite.jl#508

Open

PDE and PDAE interface SciML/DifferentialEquations.jl#997

Open

ChrisRackauckas mentioned this issue Dec 28, 2023

ImplicitEuler fails to solve a nonlinear spring-damper problem SciML/DifferentialEquations.jl#1010

Closed

termi-official mentioned this issue Jan 10, 2024

Nonlinear Preconditioner Support SciML/NonlinearSolve.jl#351

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nonlinear Solver Interface #1570

Nonlinear Solver Interface #1570

termi-official commented Jan 11, 2022

ChrisRackauckas commented Jan 16, 2022

termi-official commented Jan 17, 2022

ChrisRackauckas commented Jan 23, 2022

bgroenks96 commented May 8, 2023

termi-official commented May 8, 2023

oscardssmith commented Sep 26, 2023

termi-official commented Sep 26, 2023

oscardssmith commented Sep 26, 2023

ChrisRackauckas commented Sep 30, 2023

ChrisRackauckas commented Sep 30, 2023

termi-official commented Oct 9, 2023

ChrisRackauckas commented Oct 16, 2023

termi-official commented Oct 31, 2023

Nonlinear Solver Interface #1570

Nonlinear Solver Interface #1570

Comments

termi-official commented Jan 11, 2022

Multilevel-Newton

Quasi-newton methods

Globalization strategies

Nonlinear preconditioning

References

Problems

ChrisRackauckas commented Jan 16, 2022

termi-official commented Jan 17, 2022

ChrisRackauckas commented Jan 23, 2022

bgroenks96 commented May 8, 2023

termi-official commented May 8, 2023

oscardssmith commented Sep 26, 2023

termi-official commented Sep 26, 2023

oscardssmith commented Sep 26, 2023

ChrisRackauckas commented Sep 30, 2023

ChrisRackauckas commented Sep 30, 2023

termi-official commented Oct 9, 2023

Using information about the Schur complement

Multilevel-Newton / Nested Solvers

ChrisRackauckas commented Oct 16, 2023

termi-official commented Oct 31, 2023