New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fully automatic GPU offloading of linear solves #273
Conversation
In my experience, hard-coded defaults don't work well over time. I personally prefer making it easy for the user to switch as the simple option with a predictable performance model or having some kind of auto-tuning that can adapt to a variety of GPUs. |
Playing with: using OrdinaryDiffEq
using Random
Random.seed!(123)
gr()
# 2D Linear ODE
function f(du,u,p,t)
@inbounds for i in eachindex(u)
du[i] = 1.01*u[i]
end
end
function f_analytic(u₀,p,t)
u₀*exp(1.01*t)
end
tspan = (0.0,10.0)
prob = ODEProblem(ODEFunction(f,analytic=f_analytic),rand(3000,1),tspan)
abstols = 1.0 ./ 10.0 .^ (3:13)
reltols = 1.0 ./ 10.0 .^ (0:10);
@time solve(prob,Rodas5())
using LinearAlgebra
@time solve(prob,Rodas5(linsolve = LinSolveFactorize(lu!))) Seems highly architecture independent in a way that can't be understood by querying memory. You really do need to know the number of CUDA cores to do this right. I think I might spawn this off to a package which type pirates |
|
||
# Piracy, should get upstreamed | ||
function Base.ldiv!(x::CuArrays.CuArray,_qr::CuArrays.CUSOLVER.CuQR,b::CuArrays.CuArray) | ||
_x = UpperTriangular(_qr.R) \ (_qr.Q' * reshape(b,length(b),1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to do it in-place?
ldiv!(UpperTriangular(_qr.R), mul!(x, _qr.Q', reshape(b,length(b),1)))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that work? If it does, open up a separate PR on that.
@maleadt is there a way to query for the number of CUDA cores? |
It's not readily available, but you can compute if from other attributes ( |
Currently throws a warning:
I think this may warrant some discussion. In DiffEq, we like to have our defaults as "good as possible". This follows the idea of Xeon Phi's which auto-offloaded large matrix computations when it knew it would give a speedup. We would like to similarly do that. This PR is not tuned yet, and we might want a stricter GPU memory restriction (and make sure that the matrix will fit), but that's details.
The question isn't if we want to do something like this, we definitely want to as an overridable default, but whether this is a good or correct way to do it. Using Requires.jl would require that the user does
using CuArrays
which defaults the whole purpose of this since then it's sensitive to the user remembering to add a using statement but not use the package.I am curious if @vchuravy and @StefanKarpinski have comments.