Skip to content
This repository has been archived by the owner on Mar 12, 2021. It is now read-only.

Memory allocation tracing and bug fixes. #212

Merged
merged 1 commit into from
Nov 29, 2018
Merged

Memory allocation tracing and bug fixes. #212

merged 1 commit into from
Nov 29, 2018

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Nov 22, 2018

Developed when working on https://github.com/JuliaGPU/CuArrays.jl/issues/210:

$ CUARRAYS_MANAGED_POOL=true CUARRAYS_TRACE_POOL=true jj --compiled-modules=no
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.0.2 (2018-11-08)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using CuArrays

julia> As = []
0-element Array{Any,1}

julia> push!(As, CuArray{Float32}(undef, 1000, 1000, 1000));

julia> push!(As, CuArray{Float32}(undef, 1000, 1000, 1000));
┌ Error: Failed to allocate 3.725 GiB (requires 3.725 GiB buffer)
└ @ CuArrays ~/Julia/CuArrays/src/memory.jl:273
┌ Warning: Outstanding allocation of 3.725 GiB (requires 3.725 GiB buffer)
│   exception =
│    CUDA error: out of memory (code #2, ERROR_OUT_OF_MEMORY)
│    Stacktrace:
│     [1] macro expansion at ./util.jl:213 [inlined]
│     [2] alloc(::Int64) at /home/tbesard/Julia/CuArrays/src/memory.jl:225
│     [3] CuArray{Float32,3}(::UndefInitializer, ::Tuple{Int64,Int64,Int64}) at /home/tbesard/Julia/CuArrays/src/array.jl:29
│     [4] CuArray{Float32,N} where N(::UndefInitializer, ::Tuple{Int64,Int64,Int64}) at /home/tbesard/Julia/CuArrays/src/array.jl:36
│     [5] CuArray{Float32,N} where N(::UndefInitializer, ::Int64, ::Vararg{Int64,N} where N) at /home/tbesard/Julia/CuArrays/src/array.jl:37
│     [6] top-level scope at none:0
│     [7] eval(::Module, ::Any) at ./boot.jl:319
│     [8] eval_user_input(::Any, ::REPL.REPLBackend) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/REPL/src/REPL.jl:85
│     [9] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/REPL/src/REPL.jl:117 [inlined]
│     [10] (::getfield(REPL, Symbol("##28#29")){REPL.REPLBackend})() at ./task.jl:259
└ @ CuArrays ~/Julia/CuArrays/src/memory.jl:276
ERROR: CUDA error: out of memory (code #2, ERROR_OUT_OF_MEMORY)
Stacktrace:
 [1] macro expansion at /home/tbesard/Julia/CUDAdrv/src/base.jl:147 [inlined]
 [2] #alloc#3(::CUDAdrv.Mem.CUmem_attach, ::Function, ::Int64, ::Bool) at /home/tbesard/Julia/CUDAdrv/src/memory.jl:161
 [3] alloc at /home/tbesard/Julia/CUDAdrv/src/memory.jl:157 [inlined] (repeats 2 times)
 [4] (::getfield(CuArrays, Symbol("##17#18")))() at /home/tbesard/Julia/CuArrays/src/memory.jl:263
 [5] lock(::getfield(CuArrays, Symbol("##17#18")), ::ReentrantLock) at ./lock.jl:101
 [6] macro expansion at ./util.jl:213 [inlined]
 [7] alloc(::Int64) at /home/tbesard/Julia/CuArrays/src/memory.jl:225
 [8] CuArray{Float32,3}(::UndefInitializer, ::Tuple{Int64,Int64,Int64}) at /home/tbesard/Julia/CuArrays/src/array.jl:29
 [9] CuArray{Float32,N} where N(::UndefInitializer, ::Tuple{Int64,Int64,Int64}) at /home/tbesard/Julia/CuArrays/src/array.jl:36
 [10] CuArray{Float32,N} where N(::UndefInitializer, ::Int64, ::Vararg{Int64,N} where N) at /home/tbesard/Julia/CuArrays/src/array.jl:37
 [11] top-level scope at none:0

@maleadt
Copy link
Member Author

maleadt commented Nov 22, 2018

bors try

bors bot added a commit that referenced this pull request Nov 22, 2018
@maleadt
Copy link
Member Author

maleadt commented Nov 22, 2018

@staticfloat: IIRC you developed something similar while debugging FluxML, any missing features? Furthermore, since you looked at the Profiler code recently, any suggestions on improving the call stack saving? push!(..., stacktrace()) doesn't sound very efficient.

@bors
Copy link
Contributor

bors bot commented Nov 22, 2018

try

Build succeeded

@staticfloat
Copy link
Contributor

I started looking at GPU usage, but it was all manual and didn't get me very far. I'm actually currently working on something a little more fundamental for Julia; something that can build graphs similar to what Google gives you in the TPU cloud console:

The basic idea is that you'll do something like Profile.@memprofile foo() and it will capture relevant information about all allocations and deallocations within the given expression. The simplest output would be just a cumulative "memory versus time" graph that looks like a sawtooth wave (e.g. memory usage grows until GC runs trim it down) and the user could then strategically insert manual GC.gc() calls to debug where large objects are lasting for a long time, etc... something more sophisticated could try to analyze the allocation sites and give an idea of which modules/functions are responsible for the most allocations, etc... I'm going to build just enough of this to get an idea for our Metalhead issues, but it would be neat if we could do similar things for GPUs as well.

You can follow along with my work on this branch, but note that I'm mostly just hooking into the GC in C-land, then using Julia code to pull the results out, just like the profiler does.

As far as efficiency goes, stacktrace() calls backtrace() which calls jl_backtrace_from_here(), which does an awful lot of work as compared to rec_backtrace(). My guess is because it is doing all the Julia object conversion voodoo live, at that moment. You'd probably do better to call curr_offset += rec_backtrace(pointer_to_array + current_offset, max_size - current_offset) over and over again, then after the fact convert the giant array of backtrace pointers to actual readable Julia objects. But I wouldn't do that until you check to see what the time cost is.

@maleadt
Copy link
Member Author

maleadt commented Nov 26, 2018

You can follow along with my work on this branch, but note that I'm mostly just hooking into the GC in C-land, then using Julia code to pull the results out, just like the profiler does.

Cool, that looks very promising. Would be trivial to add a kind type and expose the allocation tracking. Will keep an eye on that, thanks for chiming in!

@maleadt
Copy link
Member Author

maleadt commented Nov 29, 2018

Looks like rec_backtrace is exported but local so this wouldn't be 1.0 compatible. Let's go with the slower approach for now, it's only meant for debugging anyway.

@maleadt maleadt merged commit 4987303 into master Nov 29, 2018
@bors bors bot deleted the tb/trace_pool branch November 29, 2018 09:39
@staticfloat
Copy link
Contributor

staticfloat commented Dec 1, 2018

Would be trivial to add a kind type and expose the allocation tracking.

I've done something like that, thanks for the suggestion. I'm currently drowning in backtrace swampland, but once I get things all working together, I hope to be able to support your usecase as well.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants