Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bilinear upsampling #636

Merged
merged 2 commits into from
Feb 11, 2021
Merged

bilinear upsampling #636

merged 2 commits into from
Feb 11, 2021

Conversation

maxfreu
Copy link
Contributor

@maxfreu maxfreu commented Jan 11, 2021

Hi, I've ported pytorch's bilinear upsampling code and put it where I thought it's best. It is compatible with NNlib and produces the same forward/backward results (see PR). But actually, it's also usable by julia-images in theory. I haven't tested, but it should also be compliant with open-cv.

Edit: tests should pass as soon as NNlib PR is merged.

@maleadt
Copy link
Member

maleadt commented Jan 20, 2021

Thanks for your contribution!

Edit: tests should pass as soon as NNlib PR is merged.

In the interim, you can add the NNlib PR to the Manifest to get CI working.
EDIT: I've pushed a commit doing so.

lib/cudnn/upsampling.jl Outdated Show resolved Hide resolved
lib/cudnn/upsampling.jl Outdated Show resolved Hide resolved
lib/cudnn/upsampling.jl Outdated Show resolved Hide resolved
lib/cudnn/upsampling.jl Outdated Show resolved Hide resolved
@maleadt maleadt marked this pull request as draft January 20, 2021 07:38
@maleadt maleadt added cuda array Stuff about CuArray. enhancement New feature or request labels Jan 20, 2021
@maxfreu
Copy link
Contributor Author

maxfreu commented Jan 20, 2021

I just updated the indexing, the logic of the function, aligned the variable naming with the CPU version (still a bit opaque) and changed to @atomic. My patience was well tested by endless restarts of the kernel due to various errors, but now it works :)

The thread thing is still open.

Edit: The backward pass is 16 times faster now by the way :>

@maleadt
Copy link
Member

maleadt commented Jan 20, 2021

My patience was well tested by endless restarts of the kernel due to various errors, but now it works :)

I suggest you have a look at Revise.jl if you aren't already, it even works with GPU kernels! That way you never have to restart Julia (well, unless there's a fatal CUDA error).

@maxfreu
Copy link
Contributor Author

maxfreu commented Jan 21, 2021

Yes I have used Revise, works great. But I'm an expert in producing CUDA errors :D

By the way: I merged your changes into my local branch and now I get this stacktrace:

julia> ∇upsample_bilinear(CUDA.ones(6,4,2,1),(3,2))
ERROR: MethodError: no method matching GPUCompiler.PTXCompilerTarget(; cap=v"6.1.0", exitable=false, debuginfo=false)
Closest candidates are:
  GPUCompiler.PTXCompilerTarget(; cap, minthreads, maxthreads, blocks_per_sm, maxregs) at util.jl:450 got unsupported keyword arguments "exitable", "debuginfo"
  GPUCompiler.PTXCompilerTarget(::VersionNumber, ::Union{Nothing, Int64, Tuple{Vararg{Int64, var"#s10"}} where var"#s10"}, ::Union{Nothing, Int64, Tuple{Vararg{Int64, var"#s9"}} where var"#s9"}, ::Union{Nothing, Int64}, ::Union{Nothing, Int64}) at /home/max/.julia/packages/GPUCompiler/9oa8s/src/ptx.jl:8 got unsupported keyword arguments "cap", "exitable", "debuginfo"
  GPUCompiler.PTXCompilerTarget(::Any, ::Any, ::Any, ::Any, ::Any) at /home/max/.julia/packages/GPUCompiler/9oa8s/src/ptx.jl:8 got unsupported keyword arguments "cap", "exitable", "debuginfo"
Stacktrace:
  [1] kwerr(kw::NamedTuple{(:cap, :exitable, :debuginfo), Tuple{VersionNumber, Bool, Bool}}, args::Type)
    @ Base ./error.jl:157
  [2] CUDACompilerTarget(; cap::VersionNumber, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/dev/CUDA/src/compiler/gpucompiler.jl:17
  [3] cufunction_compile(source::GPUCompiler.FunctionSpec; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/dev/CUDA/src/compiler/execution.jl:300
  [4] cufunction_compile(source::GPUCompiler.FunctionSpec)
    @ CUDA ~/.julia/dev/CUDA/src/compiler/execution.jl:298
  [5] check_cache(cache::Dict{UInt64, Any}, compiler::Any, linker::Any, spec::GPUCompiler.FunctionSpec{typeof(CUDA.CUDNN.∇upsample_bilinear_kernel!), Tuple{Int64, Float32, Float32, CuDeviceArray{Float32, 4, 1}, CuDeviceArray{Float32, 4, 1}}}, prekey::UInt64; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/9oa8s/src/cache.jl:40
  [6] #cached_compilation#101
    @ ~/.julia/dev/CUDA/lib/cudnn/upsampling.jl:49 [inlined]
  [7] cached_compilation
    @ ~/.julia/packages/GPUCompiler/9oa8s/src/cache.jl:65 [inlined]
  [8] cufunction(f::typeof(CUDA.CUDNN.∇upsample_bilinear_kernel!), tt::Type{Tuple{Int64, Float32, Float32, CuDeviceArray{Float32, 4, 1}, CuDeviceArray{Float32, 4, 1}}}; name::String, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/dev/CUDA/src/compiler/execution.jl:290
  [9] macro expansion
    @ ~/.julia/dev/CUDA/src/compiler/execution.jl:101 [inlined]
 [10] ∇upsample_bilinear::CuArray{Float32, 4}, scale::Tuple{Int64, Int64}; size::Nothing)
    @ CUDA.CUDNN ~/.julia/dev/CUDA/lib/cudnn/nnlib.jl:324
 [11] ∇upsample_bilinear::CuArray{Float32, 4}, scale::Tuple{Int64, Int64})
    @ CUDA.CUDNN ~/.julia/dev/CUDA/lib/cudnn/nnlib.jl:308
 [12] top-level scope
    @ none:1

@maleadt
Copy link
Member

maleadt commented Jan 21, 2021

If you use CUDA#master, you also need to use some deps from #master, currently GPUCompiler#master. See the Manifest.

@maxfreu
Copy link
Contributor Author

maxfreu commented Jan 29, 2021

If you use CUDA#master, you also need to use some deps from #master, currently GPUCompiler#master. See the Manifest.

Thanks, that works.

Should the kernel calls be CUDA.@synced?

@maxfreu
Copy link
Contributor Author

maxfreu commented Jan 31, 2021

Strange, I think the tests fail because it seemingly downloads NNlib 0.7.11 instead of the latest master from my branch. However the manifest points to the right repo I think. What am I missing?

@maleadt
Copy link
Member

maleadt commented Feb 2, 2021

Should the kernel calls be CUDA.@synced?

No, memory copies are implicitly synchronizing.

Strange, I think the tests fail because it seemingly downloads NNlib 0.7.11 instead of the latest master from my branch. However the manifest points to the right repo I think. What am I missing?

Maybe try adding an explicit revision (add https://github.com/maxfreu/NNlib.jl#abcdefg) instead of master?

@codecov
Copy link

codecov bot commented Feb 8, 2021

Codecov Report

Merging #636 (16d4dd6) into master (514b826) will increase coverage by 0.11%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #636      +/-   ##
==========================================
+ Coverage   79.62%   79.73%   +0.11%     
==========================================
  Files         122      122              
  Lines        7356     7382      +26     
==========================================
+ Hits         5857     5886      +29     
+ Misses       1499     1496       -3     
Impacted Files Coverage Δ
src/nnlib.jl 70.00% <100.00%> (+55.71%) ⬆️
src/pool.jl 72.42% <0.00%> (+0.82%) ⬆️
src/pool/binned.jl 95.65% <0.00%> (+0.86%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 514b826...3da5eff. Read the comment docs.

@maxfreu
Copy link
Contributor Author

maxfreu commented Feb 8, 2021

Hey, I think this is ready to merge if you agree :) It might be that there will be another PR in the same direction in the near future.

@maleadt
Copy link
Member

maleadt commented Feb 8, 2021

Looks good! Time for a squash and rebase?
I'd also put the kernel in src/nnlib.jl (similarly for the tests), since this isn't a CUDNN feature, but a fully custom kernel, so doesn't really belong in lib/CUDNN.

@maxfreu
Copy link
Contributor Author

maxfreu commented Feb 10, 2021

Hmm somehow like this? Or did you mean to move the kernel code only? I noticed that now I get an invalid LLVM IR error in the kernel due to unsupported operand floor - don't know where that comes from.

@maleadt
Copy link
Member

maleadt commented Feb 10, 2021

CUDA.jl also defines floor, a much more limited version implemented using libdevice. I've fixed that and squashed/rebased this PR. (and just FYI, it's easier to do so if you rebase while developing instead of merging)

@maxfreu
Copy link
Contributor Author

maxfreu commented Feb 10, 2021

Cool, thanks! I'm not too good at git and probably should get familiar with rebase.

@maleadt maleadt marked this pull request as ready for review February 10, 2021 15:51
[ci skip]
@maleadt
Copy link
Member

maleadt commented Feb 11, 2021

CI green, so good to go. Thanks for the PR!

@maleadt maleadt merged commit 7d56067 into JuliaGPU:master Feb 11, 2021
@maxfreu
Copy link
Contributor Author

maxfreu commented Feb 11, 2021

Very cool, thank you! :) I learned a lot. And good catch with the attribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda array Stuff about CuArray. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants