Support for GPU codes? #1002

renatobellotti · 2022-08-22T11:51:58Z

Hi,

I wonder whether Optim.jl supports efficient optimisations on the GPU. For me this is essential because each function evaluation is quite expensive and I have a big design vector (length ~10^5) that should stay on on the GPU throughout the optimisation to avoid unnecessary communication between host/device.

Here is a minimum example of a simple optimisation that does not seem to work:

using Optim

function test(x)
    return sum(x.^2)
end

function ∇test!(gradient, x)
    gradient[:] = (2 .* x)[:]
end

# This works:
result = optimize(test, ∇test!, [1., 2.])
# This does not:
result = optimize(test, ∇test!, cu([1., 2.]))

Error message:

CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}

DivideError: integer division error

Stacktrace:
  [1] macro expansion
    @ ~/.julia/packages/CUDA/DfvRa/lib/cublas/libcublas.jl:231 [inlined]
  [2] macro expansion
    @ ~/.julia/packages/CUDA/DfvRa/src/pool.jl:232 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/CUDA/DfvRa/lib/cublas/error.jl:61 [inlined]
  [4] cublasSdot_v2(handle::Ptr{Nothing}, n::Int64, x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, incx::Int64, y::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, incy::Int64, result::Base.RefValue{Float32})
    @ CUDA.CUBLAS ~/.julia/packages/CUDA/DfvRa/lib/utils/call.jl:26
  [5] dot
    @ ~/.julia/packages/CUDA/DfvRa/lib/cublas/wrappers.jl:142 [inlined]
  [6] dot(x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, y::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
    @ CUDA.CUBLAS ~/.julia/packages/CUDA/DfvRa/lib/cublas/linalg.jl:18
  [7] dot
    @ ~/.julia/packages/Optim/rpjtl/src/multivariate/precon.jl:20 [inlined]
  [8] perform_linesearch!(state::Optim.LBFGSState{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Vector{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Vector{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, method::LBFGS{Nothing, LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Optim.var"#19#21"}, d::Optim.ManifoldObjective{OnceDifferentiable{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}})
    @ Optim ~/.julia/packages/Optim/rpjtl/src/utilities/perform_linesearch.jl:43
  [9] update_state!(d::OnceDifferentiable{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, state::Optim.LBFGSState{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Vector{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Vector{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, method::LBFGS{Nothing, LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Optim.var"#19#21"})
    @ Optim ~/.julia/packages/Optim/rpjtl/src/multivariate/solvers/first_order/l_bfgs.jl:204
 [10] optimize(d::OnceDifferentiable{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, initial_x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, method::LBFGS{Nothing, LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Optim.var"#19#21"}, options::Optim.Options{Float64, Nothing}, state::Optim.LBFGSState{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, Vector{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Vector{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}})
    @ Optim ~/.julia/packages/Optim/rpjtl/src/multivariate/optimize/optimize.jl:54
 [11] optimize(d::OnceDifferentiable{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, initial_x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, method::LBFGS{Nothing, LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Optim.var"#19#21"}, options::Optim.Options{Float64, Nothing})
    @ Optim ~/.julia/packages/Optim/rpjtl/src/multivariate/optimize/optimize.jl:36
 [12] optimize(f::Function, g::Function, initial_x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}; inplace::Bool, autodiff::Symbol, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Optim ~/.julia/packages/Optim/rpjtl/src/multivariate/optimize/interface.jl:100
 [13] optimize(f::Function, g::Function, initial_x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer})
    @ Optim ~/.julia/packages/Optim/rpjtl/src/multivariate/optimize/interface.jl:94
 [14] top-level scope
    @ In[128]:1
 [15] eval
    @ ./boot.jl:373 [inlined]
 [16] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
    @ Base ./loading.jl:1196

The text was updated successfully, but these errors were encountered:

renatobellotti · 2022-08-24T07:29:57Z

Are GPU evaluations supported?

johnmyleswhite · 2022-08-24T13:57:49Z

I think you will likely get a better answer if you can ask a slightly more precise question given that GPU evaluations are supported and other people have worked with them in the past (e.g. #946). Is your goal to use CuArray with L-BFGS?

renatobellotti · 2022-09-15T09:15:04Z

I don't know why this example code has not worked before. It does now and I can use my GPU evaluations, so I'm closing this issue.

renatobellotti closed this as completed Sep 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for GPU codes? #1002

Support for GPU codes? #1002

renatobellotti commented Aug 22, 2022

renatobellotti commented Aug 24, 2022

johnmyleswhite commented Aug 24, 2022

renatobellotti commented Sep 15, 2022

Support for GPU codes? #1002

Support for GPU codes? #1002

Comments

renatobellotti commented Aug 22, 2022

renatobellotti commented Aug 24, 2022

johnmyleswhite commented Aug 24, 2022

renatobellotti commented Sep 15, 2022