Pin CPU buffers when doing memory copies #735

maleadt · 2021-02-24T09:26:38Z

We should explore automatically pinning CPU memory buffers, or async memory copies are really executed synchronously:

using CUDA, LinearAlgebra

function expensive_computation(a,b,c)
    NVTX.@range "mul!" mul!(c, a, b)
    NVTX.@range "broadcast!" broadcast!(sin, c, c)
    Array(c)    # this cannot be executed asynchronously, because the destination memory can be paged out
end

# one "iteration", performing the above calculation twice in two tasks
# and comparing the output.
function iteration(a,b,c)
    results = Vector{Any}(undef, 2)
    NVTX.@range "iteration" @sync begin
        @async begin
            results[1] = NVTX.@range "run 1" expensive_computation(a,b,c)
        end
        @async begin
            results[2] = NVTX.@range "run 2" expensive_computation(a,b,c)
        end
    end
    results[1] == results[2]
end

function main(N=1024)
    a = CUDA.rand(N,N)
    b = CUDA.rand(N,N)
    c = CUDA.rand(N,N)
    synchronize()
    NVTX.@range "warmup" iteration(a,b,c)
    GC.gc(true)
    NVTX.@range "main" iteration(a,b,c)
end

maleadt added enhancement New feature or request performance How fast can we go? labels Feb 24, 2021

This was referenced Mar 8, 2021

Turn on using tasks in dataloader FluxML/Flux.jl#1530

Open

Rework memory pinning and speed up async ops on unpinned memory #760

Merged

maleadt closed this as completed in #760 Mar 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pin CPU buffers when doing memory copies #735

Pin CPU buffers when doing memory copies #735

maleadt commented Feb 24, 2021

Pin CPU buffers when doing memory copies #735

Pin CPU buffers when doing memory copies #735

Comments

maleadt commented Feb 24, 2021