-
-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
confusing Jacobian options... #800
Comments
Yes it does matrix coloring, no it doesn't do Jv automatically because that's normally much slower.
Yes
It means the OpenBLAS banded matrix solver is slower than RecursiveFactorization.jl
Newton-Krylov methods are slow without really good problem-specific preconditioners.
We need to get rid of |
Thanks for the explanation! |
autodiff=true is orthogonal to the other options. If autodiff = true then autodiff is used. If autodiff is false then autodiff is not used. If jac_prototype is set and autodiff = true, then sparse automatic differentiation with matrix coloring is used. If jac_prototype is set and autodiff = false, then sparse numerical differentiation with matrix coloring is used. It's hard to describe it all because there's a lot of codegen going on to handle the full set of combinations, but the arguments are all orthogonal and what you get are the combinations of the ideas.
Yes. And that's what happens by default. So why is your case slower? It's probably because the BandedMatrix solver is slower than our specialized LU factorization. By default it will be using OpenBLAS. However, the differential equation default linear solver on dense matrices is using a special library called RecursiveFactorization.jl. What's the difference? OpenBLAS: using BenchmarkTools, LinearAlgebra
A = rand(40,40)
@btime lu!(A) # 124.600 μs (2 allocations: 432 bytes)
@btime lu(A) # 128.400 μs (3 allocations: 13.05 KiB) vs RecursiveFactorization.jl: using BenchmarkTools, LinearAlgebra, RecursiveFactorization
A = rand(40,40)
@btime RecursiveFactorization.lu!(A) # 3.112 μs (3 allocations: 464 bytes)
@btime RecursiveFactorization.lu(A) # 3.525 μs (3 allocations: 13.05 KiB) Yes, we have written a pure Julia-based factorization method which outperforms the classic "BLAS is too fast don't even try" by about 100x. And that's what's being called by default. Now, the BandedMatrix stuff is using the OpenBLAS banded solver, so it's not as slow as the one I show there, but the fact that it's still using OpenBLAS at all will eat away at the perform gain. It should still be asymptotically better (it should be an O(m) algorithm compared to the O(n^3) of a dense LU), but for small enough matrices we have optimized the crap out of the pure Julia linear algebra tools that it probably still comes out on top. See https://www.youtube.com/watch?v=KQ8nvlURX4M for more details on that. The best thing to do would be to create a pure Julia banded matrix solver, but that will take time. So what would be the truly optimal thing to do? Set the sparsity pattern or the color vector directly to specialize the automatic differentiation but then use the dense matrix solver until we improve the banded matrix solvers. |
Sounds great to have a native Julia BLAS! |
When you want to use color differentiation to speed up the AD part, but want to use a dense matrix for the linear solves. Your case would actually be more optimized with that split.
They are pretty much the same 😓, except JacVecOperator is an AbstractDiffEqOperator. But I want to make that automatic in DiffEq. |
I read the document carefully but still can't understand the relationships among the jacobian options and what the default options are. Is it possible to show a chart of how these options are related?
For example, how does
autodiff=true
inside the algorithm combine withjac_prototype
andlinsolve
?autodiff=true
do matrix coloring and jacobian-vector product automatically?autodiff=true
come with the defaultlinsolve=DefaultLinSolve
?jac_prototype
usingBandedmatrices
andautodiff=true
, the result is much slower than doingautodiff=true
alone. Does this mean specialized banded matrix factorization was not used?autodiff=true
withjac_prototype=JacVecOperator()
is much slower thanautodiff=true
alone, why?autodiff
option insideJacVecOperator()
. Is it redundant to enable bothautodiff
?The text was updated successfully, but these errors were encountered: