Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type instabilities lead to insane number of CPU allocations on grouped convolutions #520

Open
mashu opened this issue Jul 5, 2023 · 2 comments

Comments

@mashu
Copy link
Contributor

mashu commented Jul 5, 2023

Julia 1.9.1
  [052768ef] CUDA v4.4.0
  [872c559c] NNlib v0.9.1

Here is the MWE

using CUDA
using NNlib

function mwe()
    channels = 256
    x = rand(Float32,1024, channels, 64)
    w = rand(Float32,2, 1, channels)
    @info "NNlib.conv"
    NNlib.conv(x, w, groups=channels);
    @time NNlib.conv(x, w, groups=channels);
    @info "NNlib.depthwiseconv"
    NNlib.depthwiseconv(x,w);
    @time NNlib.depthwiseconv(x,w);
    @info "Done"
end

Result of above is run twice

julia> DepthwiseMWE.mwe()
[ Info: NNlib.conv
  0.031946 seconds (12.84 k allocations: 82.142 MiB, 10.82% gc time)
[ Info: NNlib.depthwiseconv
  0.032803 seconds (70 allocations: 79.931 MiB, 19.57% gc time)
[ Info: Done

julia> DepthwiseMWE.mwe()
[ Info: NNlib.conv
  0.031491 seconds (12.84 k allocations: 82.142 MiB, 30.70% gc time)
[ Info: NNlib.depthwiseconv
  0.029980 seconds (69 allocations: 79.931 MiB, 18.81% gc time)
[ Info: Done

Expected result ~70 CPU allocations, not 128400 CPU allocations, in a deeper network it puts considerable pressure on GC and kills performance.

I tried depthwiseconv in my code but it has another problem that it's not GPU friendly.

So it's either making depathwiseconv GPU friendly or fixing insane allocations of conv.

@mashu mashu closed this as completed Jul 5, 2023
@mashu
Copy link
Contributor Author

mashu commented Jul 5, 2023

Closing no issue when on GPU

x_d = CUDA.CuArray(x)
w_d = CUDA.CuArray(w)
CUDA.@time Flux.conv(x_d, w_d, groups=256);

 0.002870 seconds (127 CPU allocations: 5.984 KiB) (1 GPU allocation: 127.875 MiB, 0.46% memmgmt time)

@ToucheSir ToucheSir changed the title Depthwise convolutions lead to insane number of CPU allocations or GPU version broken Type instabilities lead to insane number of CPU allocations on grouped convolutions Jul 5, 2023
@ToucheSir
Copy link
Member

Since the MWE has a lot of useful information, I'm taking the liberty of reopening this with a different focus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants