Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fully automatic GPU offloading of linear solves #273

Closed
wants to merge 4 commits into from

Conversation

ChrisRackauckas
Copy link
Member

@ChrisRackauckas ChrisRackauckas commented Jul 4, 2019

Currently throws a warning:

┌ Warning: Package DiffEqBase does not have CuArrays in its dependencies:- If you have DiffEqBase checked out for development and have
│   added CuArrays as a dependency but haven't updated your primary
│   environment's manifest file, try `Pkg.resolve()`.
│ - Otherwise you may need to report an issue with DiffEqBase
└ Loading CuArrays into DiffEqBase from project dependency, future warnings for DiffEqBase are suppressed.

I think this may warrant some discussion. In DiffEq, we like to have our defaults as "good as possible". This follows the idea of Xeon Phi's which auto-offloaded large matrix computations when it knew it would give a speedup. We would like to similarly do that. This PR is not tuned yet, and we might want a stricter GPU memory restriction (and make sure that the matrix will fit), but that's details.

The question isn't if we want to do something like this, we definitely want to as an overridable default, but whether this is a good or correct way to do it. Using Requires.jl would require that the user does using CuArrays which defaults the whole purpose of this since then it's sensitive to the user remembering to add a using statement but not use the package.

I am curious if @vchuravy and @StefanKarpinski have comments.

fix cuify
@ViralBShah
Copy link

In my experience, hard-coded defaults don't work well over time. I personally prefer making it easy for the user to switch as the simple option with a predictable performance model or having some kind of auto-tuning that can adapt to a variety of GPUs.

@ChrisRackauckas
Copy link
Member Author

Playing with:

using OrdinaryDiffEq
using Random
Random.seed!(123)
gr()
# 2D Linear ODE
function f(du,u,p,t)
  @inbounds for i in eachindex(u)
    du[i] = 1.01*u[i]
  end
end
function f_analytic(u₀,p,t)
  u₀*exp(1.01*t)
end
tspan = (0.0,10.0)
prob = ODEProblem(ODEFunction(f,analytic=f_analytic),rand(3000,1),tspan)

abstols = 1.0 ./ 10.0 .^ (3:13)
reltols = 1.0 ./ 10.0 .^ (0:10);

@time solve(prob,Rodas5())
using LinearAlgebra
@time solve(prob,Rodas5(linsolve = LinSolveFactorize(lu!)))

Seems highly architecture independent in a way that can't be understood by querying memory. You really do need to know the number of CUDA cores to do this right.

I think I might spawn this off to a package which type pirates ldiv! and * to check sizes and then knows whether to offload.


# Piracy, should get upstreamed
function Base.ldiv!(x::CuArrays.CuArray,_qr::CuArrays.CUSOLVER.CuQR,b::CuArrays.CuArray)
_x = UpperTriangular(_qr.R) \ (_qr.Q' * reshape(b,length(b),1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to do it in-place?

ldiv!(UpperTriangular(_qr.R), mul!(x, _qr.Q', reshape(b,length(b),1)))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that work? If it does, open up a separate PR on that.

@ChrisRackauckas
Copy link
Member Author

ChrisRackauckas commented Feb 2, 2020

@maleadt is there a way to query for the number of CUDA cores?

@maleadt
Copy link
Contributor

maleadt commented Feb 2, 2020

It's not readily available, but you can compute if from other attributes (CUDAdrv.attribute(::CuDevice, attribute)): https://stackoverflow.com/a/32531982

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants