-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bilinear upsampling #636
bilinear upsampling #636
Conversation
Thanks for your contribution!
In the interim, you can add the NNlib PR to the Manifest to get CI working. |
I just updated the indexing, the logic of the function, aligned the variable naming with the CPU version (still a bit opaque) and changed to The thread thing is still open. Edit: The backward pass is 16 times faster now by the way :> |
I suggest you have a look at Revise.jl if you aren't already, it even works with GPU kernels! That way you never have to restart Julia (well, unless there's a fatal CUDA error). |
Yes I have used Revise, works great. But I'm an expert in producing CUDA errors :D By the way: I merged your changes into my local branch and now I get this stacktrace: julia> ∇upsample_bilinear(CUDA.ones(6,4,2,1),(3,2))
ERROR: MethodError: no method matching GPUCompiler.PTXCompilerTarget(; cap=v"6.1.0", exitable=false, debuginfo=false)
Closest candidates are:
GPUCompiler.PTXCompilerTarget(; cap, minthreads, maxthreads, blocks_per_sm, maxregs) at util.jl:450 got unsupported keyword arguments "exitable", "debuginfo"
GPUCompiler.PTXCompilerTarget(::VersionNumber, ::Union{Nothing, Int64, Tuple{Vararg{Int64, var"#s10"}} where var"#s10"}, ::Union{Nothing, Int64, Tuple{Vararg{Int64, var"#s9"}} where var"#s9"}, ::Union{Nothing, Int64}, ::Union{Nothing, Int64}) at /home/max/.julia/packages/GPUCompiler/9oa8s/src/ptx.jl:8 got unsupported keyword arguments "cap", "exitable", "debuginfo"
GPUCompiler.PTXCompilerTarget(::Any, ::Any, ::Any, ::Any, ::Any) at /home/max/.julia/packages/GPUCompiler/9oa8s/src/ptx.jl:8 got unsupported keyword arguments "cap", "exitable", "debuginfo"
Stacktrace:
[1] kwerr(kw::NamedTuple{(:cap, :exitable, :debuginfo), Tuple{VersionNumber, Bool, Bool}}, args::Type)
@ Base ./error.jl:157
[2] CUDACompilerTarget(; cap::VersionNumber, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ CUDA ~/.julia/dev/CUDA/src/compiler/gpucompiler.jl:17
[3] cufunction_compile(source::GPUCompiler.FunctionSpec; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ CUDA ~/.julia/dev/CUDA/src/compiler/execution.jl:300
[4] cufunction_compile(source::GPUCompiler.FunctionSpec)
@ CUDA ~/.julia/dev/CUDA/src/compiler/execution.jl:298
[5] check_cache(cache::Dict{UInt64, Any}, compiler::Any, linker::Any, spec::GPUCompiler.FunctionSpec{typeof(CUDA.CUDNN.∇upsample_bilinear_kernel!), Tuple{Int64, Float32, Float32, CuDeviceArray{Float32, 4, 1}, CuDeviceArray{Float32, 4, 1}}}, prekey::UInt64; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ GPUCompiler ~/.julia/packages/GPUCompiler/9oa8s/src/cache.jl:40
[6] #cached_compilation#101
@ ~/.julia/dev/CUDA/lib/cudnn/upsampling.jl:49 [inlined]
[7] cached_compilation
@ ~/.julia/packages/GPUCompiler/9oa8s/src/cache.jl:65 [inlined]
[8] cufunction(f::typeof(CUDA.CUDNN.∇upsample_bilinear_kernel!), tt::Type{Tuple{Int64, Float32, Float32, CuDeviceArray{Float32, 4, 1}, CuDeviceArray{Float32, 4, 1}}}; name::String, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ CUDA ~/.julia/dev/CUDA/src/compiler/execution.jl:290
[9] macro expansion
@ ~/.julia/dev/CUDA/src/compiler/execution.jl:101 [inlined]
[10] ∇upsample_bilinear(Δ::CuArray{Float32, 4}, scale::Tuple{Int64, Int64}; size::Nothing)
@ CUDA.CUDNN ~/.julia/dev/CUDA/lib/cudnn/nnlib.jl:324
[11] ∇upsample_bilinear(Δ::CuArray{Float32, 4}, scale::Tuple{Int64, Int64})
@ CUDA.CUDNN ~/.julia/dev/CUDA/lib/cudnn/nnlib.jl:308
[12] top-level scope
@ none:1 |
If you use CUDA#master, you also need to use some deps from #master, currently GPUCompiler#master. See the Manifest. |
Thanks, that works. Should the kernel calls be |
Strange, I think the tests fail because it seemingly downloads NNlib 0.7.11 instead of the latest master from my branch. However the manifest points to the right repo I think. What am I missing? |
No, memory copies are implicitly synchronizing.
Maybe try adding an explicit revision ( |
Codecov Report
@@ Coverage Diff @@
## master #636 +/- ##
==========================================
+ Coverage 79.62% 79.73% +0.11%
==========================================
Files 122 122
Lines 7356 7382 +26
==========================================
+ Hits 5857 5886 +29
+ Misses 1499 1496 -3
Continue to review full report at Codecov.
|
Hey, I think this is ready to merge if you agree :) It might be that there will be another PR in the same direction in the near future. |
Looks good! Time for a squash and rebase? |
Hmm somehow like this? Or did you mean to move the kernel code only? I noticed that now I get an invalid LLVM IR error in the kernel due to unsupported operand |
ce92c3a
to
16d4dd6
Compare
CUDA.jl also defines |
Cool, thanks! I'm not too good at git and probably should get familiar with rebase. |
[ci skip]
99b89ec
to
3da5eff
Compare
CI green, so good to go. Thanks for the PR! |
Very cool, thank you! :) I learned a lot. And good catch with the attribution! |
Hi, I've ported pytorch's bilinear upsampling code and put it where I thought it's best. It is compatible with NNlib and produces the same forward/backward results (see PR). But actually, it's also usable by julia-images in theory. I haven't tested, but it should also be compliant with open-cv.
Edit: tests should pass as soon as NNlib PR is merged.