Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU tests fail on GTX970 and P100 #57

Closed
leios opened this issue Mar 14, 2022 · 7 comments
Closed

GPU tests fail on GTX970 and P100 #57

leios opened this issue Mar 14, 2022 · 7 comments

Comments

@leios
Copy link
Contributor

leios commented Mar 14, 2022

I could not get the tests to work on my GTX970 GPU. Seems like there is an issue with

function DistanceVecNeighborFinder(;
                                nb_matrix,
                                matrix_14=falses(size(nb_matrix)),
                                n_steps=10,
                                dist_cutoff)
    n_atoms = size(nb_matrix, 1)
    if isa(nb_matrix, CuArray)
        is = cu(hcat([collect(1:n_atoms) for i in 1:n_atoms]...))
        js = cu(permutedims(is, (2, 1)))
        m14 = cu(matrix_14)
    else
        is = hcat([collect(1:n_atoms) for i in 1:n_atoms]...)
        js = permutedims(is, (2, 1))
        m14 = matrix_14
    end
    return DistanceVecNeighborFinder{typeof(dist_cutoff), typeof(nb_matrix), typ
eof(is)}(
            nb_matrix, m14, n_steps, dist_cutoff, is, js)
end

Specifically when called in test/protein.jl

I think permuteddims doesn't work on a CuArray, so I tried keeping it as an array, but eventually ran into an issue with turning is into an array for the DistanceVecNeighborFinder. I tried a bunch of different variations, so I'll just leave the unchanged error here:

OpenMM protein comparison: Error During Test at /home/leios/projects/Molly.jl/test/protein.jl:54
  Got exception outside of a @test
  MethodError: no method matching iterate(::Nothing)
  Closest candidates are:
    iterate(::Union{LinRange, StepRangeLen}) at ~/builds/julia-1.7.1/share/julia/base/range.jl:826
    iterate(::Union{LinRange, StepRangeLen}, ::Integer) at ~/builds/julia-1.7.1/share/julia/base/range.jl:826
    iterate(::T) where T<:Union{Base.KeySet{<:Any, <:Dict}, Base.ValueIterator{<:Dict}} at ~/builds/julia-1.7.1/share/julia/base/dict.jl:695
    ...
  Stacktrace:
    [1] indexed_iterate(I::Nothing, i::Int64)
      @ Base ./tuple.jl:92
    [2] CUDA.MemoryInfo()
      @ CUDA ~/.julia/packages/CUDA/VWaZ6/src/pool.jl:155
    [3] OutOfGPUMemoryError (repeats 2 times)
      @ ~/.julia/packages/CUDA/VWaZ6/src/pool.jl:199 [inlined]
    [4] throw_api_error(res::CUDA.cudaError_enum)
      @ CUDA ~/.julia/packages/CUDA/VWaZ6/lib/cudadrv/error.jl:89
    [5] macro expansion
      @ ~/.julia/packages/CUDA/VWaZ6/lib/cudadrv/error.jl:101 [inlined]
    [6] cuMemAlloc_v2(dptr::Base.RefValue{CuPtr{Nothing}}, bytesize::Int64)
      @ CUDA ~/.julia/packages/CUDA/VWaZ6/lib/utils/call.jl:26
    [7] #alloc#1
      @ ~/.julia/packages/CUDA/VWaZ6/lib/cudadrv/memory.jl:86 [inlined]
    [8] macro expansion
      @ ~/.julia/packages/CUDA/VWaZ6/src/pool.jl:41 [inlined]
    [9] macro expansion
      @ ./timing.jl:299 [inlined]
   [10] actual_alloc(bytes::Int64; async::Bool, stream::CuStream)
      @ CUDA ~/.julia/packages/CUDA/VWaZ6/src/pool.jl:39
   [11] macro expansion
      @ ~/.julia/packages/CUDA/VWaZ6/src/pool.jl:224 [inlined]
   [12] macro expansion
      @ ./timing.jl:299 [inlined]
   [13] #_alloc#204
      @ ~/.julia/packages/CUDA/VWaZ6/src/pool.jl:305 [inlined]
   [14] #alloc#203
      @ ~/.julia/packages/CUDA/VWaZ6/src/pool.jl:291 [inlined]
   [15] alloc
      @ ~/.julia/packages/CUDA/VWaZ6/src/pool.jl:287 [inlined]
   [16] CuArray{Int64, 2, CUDA.Mem.DeviceBuffer}(#unused#::UndefInitializer, dims::Tuple{Int64, Int64})
      @ CUDA ~/.julia/packages/CUDA/VWaZ6/src/array.jl:42
   [17] similar
      @ ~/.julia/packages/CUDA/VWaZ6/src/array.jl:164 [inlined]
   [18] permutedims(B::CuArray{Int64, 2, CUDA.Mem.DeviceBuffer}, perm::Tuple{Int64, Int64})
      @ Base ./multidimensional.jl:1503
   [19] DistanceVecNeighborFinder(; nb_matrix::CuArray{Bool, 2, CUDA.Mem.DeviceBuffer}, matrix_14::CuArray{Bool, 2, CUDA.Mem.DeviceBuffer}, n_steps::Int64, dist_cutoff::Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}})
      @ Molly ~/projects/Molly.jl/src/neighbors.jl:134
   [20] System(coord_file::String, force_field::OpenMMForceField{Float64, Quantity{Float64, 𝐌, Unitful.FreeUnits{(u,), 𝐌, nothing}}, Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}, Quantity{Float64, 𝐋^2 𝐌 𝐍^-1 𝐓^-2, Unitful.FreeUnits{(kJ, mol^-1), 𝐋^2 𝐌 𝐍^-1 𝐓^-2, nothing}}, Quantity{Float64, 𝐌 𝐍^-1 𝐓^-2, Unitful.FreeUnits{(kJ, nm^-2, mol^-1), 𝐌 𝐍^-1 𝐓^-2, nothing}}}; velocities::CuArray{SVector{3, Quantity{Float64, 𝐋 𝐓^-1, Unitful.FreeUnits{(nm, ps^-1), 𝐋 𝐓^-1, nothing}}}, 1, CUDA.Mem.DeviceBuffer}, box_size::Nothing, loggers::Dict{Any, Any}, units::Bool, gpu::Bool, gpu_diff_safe::Bool, dist_cutoff::Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}, nl_dist::Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}, rename_terminal_res::Bool)
      @ Molly ~/projects/Molly.jl/src/setup.jl:678
   [21] macro expansion
      @ ~/projects/Molly.jl/test/protein.jl:156 [inlined]
   [22] macro expansion
      @ ~/builds/julia-1.7.1/share/julia/stdlib/v1.7/Test/src/Test.jl:1283 [inlined]
   [23] top-level scope
      @ ~/projects/Molly.jl/test/protein.jl:55
   [24] include(fname::String)
      @ Base.MainInclude ./client.jl:451
   [25] top-level scope
      @ ~/projects/Molly.jl/test/runtests.jl:71
   [26] include(fname::String)
      @ Base.MainInclude ./client.jl:451
   [27] top-level scope
      @ none:6
   [28] eval
      @ ./boot.jl:373 [inlined]
   [29] exec_options(opts::Base.JLOptions)
      @ Base ./client.jl:268
   [30] _start()
      @ Base ./client.jl:495

This could be related to #16 , but I felt it was different enough to warrant a separate issue.

@jgreener64
Copy link
Collaborator

It's not the clearest error but the key line is [3] OutOfGPUMemoryError, indicating that there isn't enough GPU memory for the test.

The package makes poor use of GPU memory at the minute due to the requirements of Zygote, so the 4 GB of a GTX 970 isn't enough to run the 16k atoms in the protein test. Sorry about that, hopefully it will change in future.

@leios
Copy link
Contributor Author

leios commented Mar 14, 2022

Woops, that's completely my bad. Sorry for the random issue then!

@leios leios closed this as completed Mar 14, 2022
@jgreener64
Copy link
Collaborator

No it's fine, good to see people are using the software.

@leios leios changed the title GPU tests fail on GTX970 GPU tests fail on GTX970 and P100 Sep 21, 2022
@leios
Copy link
Contributor Author

leios commented Sep 21, 2022

I was just testing this on a P100 with 16 GB of available RAM and ran into a related issue:

OpenMM protein comparison: Error During Test at /home/leios/projects/CESMIX/Molly.jl/test/protein.jl:57
  Got exception outside of a @test
  Out of GPU memory trying to allocate 1.896 GiB
  Effective GPU memory usage: 99.90% (15.766 GiB/15.782 GiB)
  Memory pool usage: 2.372 GiB (3.219 GiB reserved)
  Stacktrace:
    [1] macro expansion
      @ ~/.julia/packages/CUDA/DfvRa/src/pool.jl:320 [inlined]
    [2] macro expansion
      @ ./timing.jl:382 [inlined]
    [3] #_alloc#170
      @ ~/.julia/packages/CUDA/DfvRa/src/pool.jl:313 [inlined]
    [4] #alloc#169
      @ ~/.julia/packages/CUDA/DfvRa/src/pool.jl:299 [inlined]
    [5] alloc
      @ ~/.julia/packages/CUDA/DfvRa/src/pool.jl:293 [inlined]
    [6] CuArray{Int64, 2, CUDA.Mem.DeviceBuffer}(#unused#::UndefInitializer, dims::Tuple{Int64, Int64})
      @ CUDA ~/.julia/packages/CUDA/DfvRa/src/array.jl:42
    [7] similar
      @ ~/.julia/packages/CUDA/DfvRa/src/array.jl:164 [inlined]
    [8] permutedims(B::CuArray{Int64, 2, CUDA.Mem.DeviceBuffer}, perm::Tuple{Int64, Int64})
      @ Base ./multidimensional.jl:1560
    [9] DistanceVecNeighborFinder(; nb_matrix::CuArray{Bool, 2, CUDA.Mem.DeviceBuffer}, matrix_14::CuArray{Bool, 2, CUDA.Mem.DeviceBuffer}, n_steps::Int64, dist_cutoff::Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}})
      @ Molly ~/projects/CESMIX/Molly.jl/src/neighbors.jl:115
   [10] System(coord_file::String, force_field::OpenMMForceField{Float64, Quantity{Float64, 𝐌, Unitful.FreeUnits{(u,), 𝐌, nothing}}, Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}, Quantity{Float64, 𝐋^2 𝐌 𝐍^-1 𝐓^-2, Unitful.FreeUnits{(kJ, mol^-1), 𝐋^2 𝐌 𝐍^-1 𝐓^-2, nothing}}, Quantity{Float64, 𝐌 𝐍^-1 𝐓^-2, Unitful.FreeUnits{(kJ, nm^-2, mol^-1), 𝐌 𝐍^-1 𝐓^-2, nothing}}}; velocities::CuArray{SVector{3, Quantity{Float64, 𝐋 𝐓^-1, Unitful.FreeUnits{(nm, ps^-1), 𝐋 𝐓^-1, nothing}}}, 1, CUDA.Mem.DeviceBuffer}, boundary::Nothing, loggers::Tuple{}, units::Bool, gpu::Bool, gpu_diff_safe::Bool, dist_cutoff::Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}, dist_neighbors::Quantity{Float64, 𝐋, Unitful.FreeUnits{(nm,), 𝐋, nothing}}, implicit_solvent::Nothing, center_coords::Bool, rename_terminal_res::Bool, kappa::Quantity{Float64, 𝐋^-1, Unitful.FreeUnits{(nm^-1,), 𝐋^-1, nothing}})
      @ Molly ~/projects/CESMIX/Molly.jl/src/setup.jl:773
   [11] macro expansion
      @ ~/projects/CESMIX/Molly.jl/test/protein.jl:164 [inlined]
   [12] macro expansion
      @ ~/builds/julia-1.8.1/share/julia/stdlib/v1.8/Test/src/Test.jl:1357 [inlined]
   [13] top-level scope
      @ ~/projects/CESMIX/Molly.jl/test/protein.jl:58
   [14] include(fname::String)
      @ Base.MainInclude ./client.jl:476
   [15] top-level scope
      @ ~/projects/CESMIX/Molly.jl/test/runtests.jl:78
   [16] include(fname::String)
      @ Base.MainInclude ./client.jl:476
   [17] top-level scope
      @ none:6
   [18] eval
      @ ./boot.jl:368 [inlined]
   [19] exec_options(opts::Base.JLOptions)
      @ Base ./client.jl:276
   [20] _start()
      @ Base ./client.jl:522
Test Summary:             | Pass  Error  Total     Time
OpenMM protein comparison |   26      1     27  4m57.8s
ERROR: LoadError: Some tests did not pass: 26 passed, 0 failed, 1 errored, 0 broken.
in expression starting at /home/leios/projects/CESMIX/Molly.jl/test/protein.jl:57
in expression starting at /home/leios/projects/CESMIX/Molly.jl/test/runtests.jl:77
ERROR: Package Molly errored during testing

It seems to have flooded the available memory pool and cannot allocate more space. This could be a garbage collection issue where the tests run fine independently, but fail when put together because the memory has not been properly deallocated?

Re-opening this issue because it's another error on protein.jl.

How much memory are we asking the users to have for this test? Maybe we should just check the available memory and cancel the test for certain GPUs.

@leios leios reopened this Sep 21, 2022
@jgreener64
Copy link
Collaborator

GPU memory usage on master is very poor and I honestly don't know the range of hardware it will work on. I'm hoping to switch to the kernel setup within the next couple of months though, which should be much better. At that point it is probably worth doing a survey across different hardware and addressing any issues.

@leios
Copy link
Contributor Author

leios commented Sep 21, 2022

Yeah, that's fair. I'll just quietly comment out the test for now for #99

@jgreener64
Copy link
Collaborator

GPU memory usage should be much improved in v0.15.0.

Testing and benchmarking on different GPUs is on the todo list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants