Skip to content

Unreasonablely slow copy kernel #1301

Closed
@GiggleLiu

Description

@GiggleLiu

I tried to implement a permutedims kernel, however, it is much slower than pytorch version. Then I tried to delete all computations and left only a copy kernel, it is still very slow.

julia> using CUDA: @cartesianidx, AbstractGPUArray, gpu_call, @linearidx
julia> using CUDA, BenchmarkTools, Random

julia> function mycopy!(dest::AbstractGPUArray, src::AbstractGPUArray)
           function copy_kernel(ctx, dest, src)
               LI = @linearidx dest
               @inbounds dest[LI] = src[LI]
               return
           end
           gpu_call(copy_kernel, dest, src)
           return dest
       end
mycopy! (generic function with 1 method)

julia> t = CUDA.randn(fill(2, 28)...);

julia> @benchmark CUDA.@sync mycopy!($(copy(t)), $t)
BenchmarkTools.Trial: 33 samples with 1 evaluation.
 Range (min  max):  154.534 ms  155.534 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     154.973 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   154.982 ms ± 225.369 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

As a comparison, pytorch

In [19]: import torch

In [20]: t = torch.zeros((2,)*28, device="cuda:0");

In [21]: timeit t.permute(tuple(torch.randperm(28))).clone(); torch.cuda.synchronize()
2.83 ms ± 600 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)

The copy method in Julia base:

julia> @benchmark CUDA.@sync CUDA.copy($t)
BenchmarkTools.Trial: 136 samples with 1 evaluation.
 Range (min  max):   7.204 ms  243.923 ms  ┊ GC (min  max): 0.00%  0.65%
 Time  (median):      7.496 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   36.795 ms ±  77.836 ms  ┊ GC (mean ± σ):  0.48% ± 0.20%

CUDA version

(@v1.7) pkg> st CUDA
      Status `~/.julia/environments/v1.7/Project.toml`
  [052768ef] CUDA v3.6.2

GPU is V100, and system cuda version is 11.4

Related issues:
#1298 under-Peter/OMEinsum.jl#133

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions