Skip to content

Conversation

@avik-pal
Copy link
Contributor

@avik-pal avik-pal commented Jun 30, 2021

TODO:

  • Fails for ArrayPartition type

MWE:

using Optim, CUDA
rosenbrock(x) =  (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
result = optimize(rosenbrock, cu(zeros(2)), BFGS())

Prev:

ERROR: ArgumentError: cannot take the CPU address of a CuArray{Float32, 1}
Stacktrace:
  [1] unsafe_convert(#unused#::Type{Ptr{Float32}}, x::CuArray{Float32, 1})
    @ CUDA ~/.julia/packages/CUDA/mVgLI/src/array.jl:262
  [2] gemv!(trans::Char, alpha::Float32, A::Matrix{Float32}, X::CuArray{Float32, 1}, beta::Float32, Y::CuArray{Float32, 1})
    @ LinearAlgebra.BLAS /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/blas.jl:704
  [3] gemv!(y::CuArray{Float32, 1}, tA::Char, A::Matrix{Float32}, x::CuArray{Float32, 1}, α::Bool, β::Bool)
    @ LinearAlgebra /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:544
  [4] mul!
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:66 [inlined]
  [5] mul!
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:275 [inlined]
  [6] update_state!(d::OnceDifferentiable{Float32, CuArray{Float32, 1}, CuArray{Float32, 1}}, state::Optim.BFGSState{CuArray{Float32, 1}, Matrix{Float32}, Float32, CuArray{Float32, 1}}, method::BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat})
    @ Optim ~/testing/Optim.jl/src/multivariate/solvers/first_order/bfgs.jl:119
  [7] optimize(d::OnceDifferentiable{Float32, CuArray{Float32, 1}, CuArray{Float32, 1}}, initial_x::CuArray{Float32, 1}, method::BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat}, options::Optim.Options{Float64, Nothing}, state::Optim.BFGSState{CuArray{Float32, 1}, Matrix{Float32}, Float32, CuArray{Float32, 1}})
    @ Optim ~/testing/Optim.jl/src/multivariate/optimize/optimize.jl:57
  [8] optimize
    @ ~/testing/Optim.jl/src/multivariate/optimize/optimize.jl:35 [inlined]
  [9] #optimize#87
    @ ~/testing/Optim.jl/src/multivariate/optimize/interface.jl:142 [inlined]
 [10] optimize(f::Function, initial_x::CuArray{Float32, 1}, method::BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat}, options::Optim.Options{Float64, Nothing}) (repeats 2 times)
    @ Optim ~/testing/Optim.jl/src/multivariate/optimize/interface.jl:141
 [11] top-level scope
    @ REPL[6]:1

Curr:

    * Status: success

    * Candidate solution
       Final objective value:     1.982578e-05
   
    * Found with
       Algorithm:     BFGS
   
    * Convergence measures
       |x - x'|               = 0.00e+00 ≤ 0.0e+00
       |x - x'|/|x'|          = 0.00e+00 ≤ 0.0e+00
       |f(x) - f(x')|         = 0.00e+00 ≤ 0.0e+00
       |f(x) - f(x')|/|f(x')| = 0.00e+00 ≤ 0.0e+00
       |g(x)|                 = 1.37e-03 ≰ 1.0e-08
   
    * Work counters
       Seconds run:   0  (vs limit Inf)
       Iterations:    47
       f(x) calls:    971
       ∇f(x) calls:   971

Copy link
Contributor

@ChrisRackauckas ChrisRackauckas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could fuse a bit more operations, but this probably isn't the place for high performance nasty code, so this will give the right mix of generality and conciseness.

@avik-pal avik-pal changed the title BFGS GPU Support [WIP] BFGS GPU Support Jun 30, 2021
@avik-pal avik-pal changed the title [WIP] BFGS GPU Support BFGS GPU Support Jul 6, 2021
Project.toml Outdated
Parameters = "d96e819e-fc66-5662-9728-84c9c7592b0a"
PositiveFactorizations = "85a6dd25-e78a-55b7-8502-1745935b8125"
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
RecursiveArrayTools = "731186ca-8d62-57ce-b412-fbd966d074cd"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember to add a compat

@pkofod
Copy link
Member

pkofod commented Jul 26, 2021

Thanks. I had to approve CI runs since this is your first contribution. Looks good to me.

@codecov
Copy link

codecov bot commented Jul 26, 2021

Codecov Report

Merging #931 (6300094) into master (e439de4) will increase coverage by 0.06%.
The diff coverage is 81.81%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #931      +/-   ##
==========================================
+ Coverage   83.60%   83.67%   +0.06%     
==========================================
  Files          43       42       -1     
  Lines        3020     3019       -1     
==========================================
+ Hits         2525     2526       +1     
+ Misses        495      493       -2     
Impacted Files Coverage Δ
src/multivariate/solvers/first_order/bfgs.jl 94.11% <81.81%> (+0.46%) ⬆️
...ariate/solvers/second_order/krylov_trust_region.jl 87.77% <0.00%> (-0.14%) ⬇️
src/utilities/generic.jl 100.00% <0.00%> (ø)
src/multivariate/solvers/constrained/samin.jl 78.06% <0.00%> (+1.29%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e439de4...6300094. Read the comment docs.

@pkofod pkofod merged commit 79454aa into JuliaNLSolvers:master Jul 27, 2021
@vaishnavtv
Copy link

Hello, I see in the release notes of v1.4 that this pull request has been merged. Does this mean that BFGS now has GPU support? I'm able to use LBFGS, but not BFGS on GPU.
A MWE:

using GalacticOptim, CUDA, Optim, Zygote
CUDA.allowscalar(false)
lossFn(θ, p) = sum(abs, θ)
θ0 = cu(zeros(2));
f = OptimizationFunction(lossFn, GalacticOptim.AutoZygote())
prob = GalacticOptim.OptimizationProblem(f, initθ);
# sol1 = GalacticOptim.solve(prob, Optim.BFGS()); # doesn't work
# sol2 = GalacticOptim.solve(prob, Optim.LBFGS()); # works
Details:

ERROR: LoadError: Scalar indexing is disallowed.
Invocation of getindex resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore are only permitted from the REPL for prototyping purposes.
If you did intend to index this array, annotate the caller with @allowscalar.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] assertscalar(op::String)
    @ GPUArrays /scratch/user/vish0908/.julia/packages/GPUArrays/0vqbc/src/host/indexing.jl:53
  [3] getindex(::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, ::Int64, ::Int64)
    @ GPUArrays /scratch/user/vish0908/.julia/packages/GPUArrays/0vqbc/src/host/indexing.jl:86
  [4] macro expansion
    @ /scratch/user/vish0908/.julia/packages/Optim/3K7JI/src/multivariate/solvers/first_order/bfgs.jl:166 [inlined]
  [5] macro expansion
    @ ./simdloop.jl:77 [inlined]
  [6] update_h!(d::TwiceDifferentiable{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, state::Optim.BFGSState{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, method::BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat})
    @ Optim /scratch/user/vish0908/.julia/packages/Optim/3K7JI/src/multivariate/solvers/first_order/bfgs.jl:165
  [7] optimize(d::TwiceDifferentiable{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, initial_x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, method::BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat}, options::Optim.Options{Float64, GalacticOptim.var"#_cb#103"{GalacticOptim.var"#101#109", BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat}, Base.Iterators.Cycle{Tuple{GalacticOptim.NullData}}}}, state::Optim.BFGSState{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}})
    @ Optim /scratch/user/vish0908/.julia/packages/Optim/3K7JI/src/multivariate/optimize/optimize.jl:71
  [8] optimize(d::TwiceDifferentiable{Float32, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}, initial_x::CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, method::BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat}, options::Optim.Options{Float64, GalacticOptim.var"#_cb#103"{GalacticOptim.var"#101#109", BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat}, Base.Iterators.Cycle{Tuple{GalacticOptim.NullData}}}})
    @ Optim /scratch/user/vish0908/.julia/packages/Optim/3K7JI/src/multivariate/optimize/optimize.jl:35
  [9] __solve(prob::OptimizationProblem{true, OptimizationFunction{true, GalacticOptim.AutoZygote, typeof(lossFn3), Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, SciMLBase.NullParameters, Nothing, Nothing, Nothing, Nothing, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}}, opt::BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat}, data::Base.Iterators.Cycle{Tuple{GalacticOptim.NullData}}; maxiters::Nothing, cb::Function, progress::Bool, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ GalacticOptim /scratch/user/vish0908/.julia/packages/GalacticOptim/bEh06/src/solve/optim.jl:55
 [10] __solve (repeats 2 times)
    @ /scratch/user/vish0908/.julia/packages/GalacticOptim/bEh06/src/solve/optim.jl:10 [inlined]
 [11] #solve#476
    @ /scratch/user/vish0908/.julia/packages/SciMLBase/n3U0M/src/solve.jl:3 [inlined]
 [12] solve(::OptimizationProblem{true, OptimizationFunction{true, GalacticOptim.AutoZygote, typeof(lossFn3), Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}, CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}, SciMLBase.NullParameters, Nothing, Nothing, Nothing, Nothing, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}}, ::BFGS{LineSearches.InitialStatic{Float64}, LineSearches.HagerZhang{Float64, Base.RefValue{Bool}}, Nothing, Nothing, Flat})
    @ SciMLBase /scratch/user/vish0908/.julia/packages/SciMLBase/n3U0M/src/solve.jl:3

@longemen3000
Copy link
Contributor

The main culprit seems to be this function

function update_h!(d, state, method::BFGS)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants