Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2D Transpose Convolutions #54

Merged
merged 20 commits into from
Feb 1, 2019
Merged

Conversation

tejank10
Copy link
Contributor

@tejank10 tejank10 commented Jul 7, 2018

Added 2D transpose convolutions and tests

@MikeInnes
Copy link
Member

This is great, but I think we can simplify a bit -- we don't actually need the conv_transpose alias. How about if the ConvTranspose layer just calls the gradient function directly, and we also define the derivative of that function.

Aside from not needing the NNlib patch, this has the big bonus that nested AD will then also work through convolutions.

@tejank10
Copy link
Contributor Author

tejank10 commented Jul 12, 2018

That sounds cool. I was working on writing the gradient hook up for it. I felt that we should refactor the conv interface.
Currently, ∇conv_data and conv interface do not accept same number of arguements. ∇conv_data takes the gradients (dy) as one of the inputs. So, ∇conv_data ends up taking 3 arguements. I feel that this is more suited for ∇conv_data to act as gradient of conv rather than a forward pass of transposed convolution.

When this setup of ∇conv_data is used for transposed conv, it throws an error in the backward pass.

ERROR: Gradient is not a tuple of length 3

because the back_ function requires the length of gradient tuple to be the same as the number of arguements to the ∇conv_data.

EDIT: Ahh nvm, fixed it :)

@tejank10
Copy link
Contributor Author

This should fix the failing gradtest

@tejank10 tejank10 changed the base branch from julia-0.6 to master September 19, 2018 05:56
@tejank10
Copy link
Contributor Author

This has been now resolved for v1.0.

@MikeInnes
Copy link
Member

It'd be good if we could take the opportunity to make the interface a bit more consistent. If I understand correctly, ∇conv_data and ∇conv_filter are currently taking an unnecessary extra argument, so it'd be better to just remove those everywhere, rather than adding the argument to conv. We will probably have to add deprecations for the current forms though.

@tejank10
Copy link
Contributor Author

tejank10 commented Oct 10, 2018

I realized that for ∇conv_data and ∇conv_filter extra arguments are required. Using cdims or ctdims cannot give us exact dimensions every time because of integer division.
For eg: During convolution if size(input) = (10,10,1,1) and size(kernel)=(3,3,1,1) and stride=2 then size(output)=(4,4,1,1).
If output is passed through ∇conv_data hoping it to use ctdims to infer the dimensions of ∇input using dimensions of output and kernel, it would produce size(∇input)=(9,9,1,1)

src/conv.jl Outdated
@@ -36,8 +53,14 @@ function crosscor(x::A, w::A; pad = 0, stride = 1, dilation = 1) where A<:Abstra
x, w, pad = pad_, stride = stride_, dilation = dilation)
end

∇conv_data(dy::A, x::A, w::A; pad = 0, stride = 1, dilation = 1, flipkernel = 0) where A<:AbstractArray =
∇conv_data!(zero(x), dy, x, w; pad = pad, stride = stride, dilation = dilation, flipkernel=flipkernel)
function ∇conv_data(dy::A, w::A, x_dims=nothing; pad = 0, stride = 1, dilation = 1, flipkernel = 0) where A<:AbstractArray
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple small things

  • Can you make this a keyword argument size
  • Can you add a simple three-arg wrapper with a deprecation warning
  • You should compare x_dims === nothing so that type inference can remove the check

Copy link
Contributor Author

@tejank10 tejank10 Oct 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When size method is being used in the function, it gives MethodError if this argument is named size. How about using dims instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dims is a bit weird because it usually refers to the dimensions you act on. How about just calling Base.size inside the function?

Copy link
Member

@MikeInnes MikeInnes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok great, this is all but there now.

Once this is merged we're going to have to upper-bound CuArrays and Flux when we tag NNlib again. We'll have to update both of those for the new API as well (I know you have a PR for flux already)

if size === nothing
size = cdims(Base.size(x), dilation_dims(w, dilation), pad_, stride_)
end
conv!(similar(x, size), x, w, pad = pad_, stride = stride_, dilation = dilation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the size argument actually make sense for conv? I don't know if there's a similar ambiguity in the sizes as compared with the transpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conv acts as the gradient function of input for conv transpose during the backward pass. Hence, just like conv_data there exists an ambiguity here as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that's fine, just checking

src/conv.jl Outdated
∇conv_filter(dy::A, x::A, w::A; pad = 0, stride = 1, dilation = 1, flipkernel=0) where A<:AbstractArray =
∇conv_filter!(zero(w), dy, x, w; pad = pad, stride = stride, dilation = dilation, flipkernel=flipkernel)
∇conv_filter(dy::A, x::A, size::Tuple; pad = 0, stride = 1, dilation = 1, flipkernel=0) where A<:AbstractArray =
∇conv_filter!(zeros(eltype(dy),size), dy, x; pad = pad, stride = stride, dilation = dilation, flipkernel=flipkernel)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use similar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, you mean to use zero(similar(dy, size)), right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just similar should be fine, if you're about to overwrite it anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we have similar in use in this branch, which is failing at nan test for ∇conv_filter. However, if zero is used instead of similar then the tests pass.

src/conv.jl Outdated

∇conv_filter(dy::A, x::A, w::A; pad = 0, stride = 1, dilation = 1, flipkernel=0) where A<:AbstractArray =
∇conv_filter!(zero(w), dy, x, w; pad = pad, stride = stride, dilation = dilation, flipkernel=flipkernel)
∇conv_filter(dy::A, x::A, size::Tuple; pad = 0, stride = 1, dilation = 1, flipkernel=0) where A<:AbstractArray =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you intentionally doing data and filter in different ways? Why not use a size kwarg for both?

Copy link
Contributor Author

@tejank10 tejank10 Oct 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is intentional. Because for conv_data it serves a dual purpose. When size=nothing it performs a conv_transpose, else it is a conv_grad.
conv_filter has only one purpose, that is to find the gradient of the filter. Hence, we require the value of size to be passed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But they are not actually different operations, right? conv transpose is just the gradient with a particular inferred size. (Lmk if my understanding is off here.)

So we could in principle just write a size-inference for conv_filter; but if you don't want to do that for now it'd be fine to just do size as a kwarg without a value (which will error if it's not provided).

@tejank10
Copy link
Contributor Author

nan test for grad_conv_filter3d is failing in this update, will fix it soon

@vchuravy
Copy link

vchuravy commented Dec 2, 2018

bump! this would be great to have

@staticfloat
Copy link
Contributor

I didn't see any big missing pieces, but I am worried that we might not have sufficient test coverage. I am working on getting codecov or something hooked up so that we can be sure that what we're merging covers as many corner cases as possible.

@staticfloat
Copy link
Contributor

@tejank10 can you rebase this on top of the latest master? I have merged Codecov support so we can make sure that our test cases are properly stressing each codepath of the convolutions now.

@codecov-io
Copy link

codecov-io commented Dec 14, 2018

Codecov Report

Merging #54 into master will increase coverage by 1.69%.
The diff coverage is 83.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #54      +/-   ##
==========================================
+ Coverage   70.46%   72.15%   +1.69%     
==========================================
  Files           9        9              
  Lines         579      607      +28     
==========================================
+ Hits          408      438      +30     
+ Misses        171      169       -2
Impacted Files Coverage Δ
src/impl/conv.jl 89.52% <100%> (+2.11%) ⬆️
src/conv.jl 66.66% <71.42%> (+7.34%) ⬆️
src/activation.jl 84.21% <0%> (-9.13%) ⬇️
src/NNlib.jl 100% <0%> (+100%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 085adb7...0d27d79. Read the comment docs.

@pbarker
Copy link

pbarker commented Jan 24, 2019

bump 🙂 is any help needed to get this out the door?

@staticfloat
Copy link
Contributor

@tejank10 Great, thanks. Can you synthesize a few more tests to exercise the codepaths that are missing (as evidenced by the code coverage).

I'm particularly interested that we hit the first couple of branches in ctdims() and wdims() and that we hit the size == nothing branch in crosscor(), ∇conv_data(), and ∇conv_filter(). If tests for those branches can be added, I think this is ready to merge.

@staticfloat
Copy link
Contributor

Awesome. I'm calling this good, and will be testing it out with an autoencoder experiment in the near future!

@staticfloat staticfloat merged commit 8546f3c into FluxML:master Feb 1, 2019
ToucheSir added a commit that referenced this pull request Feb 13, 2023
* print/convert batchedadjtrans over cuarray

* Update test/batchedadjtrans.jl

Co-authored-by: Brian Chen <ToucheSir@users.noreply.github.com>

Co-authored-by: Brian Chen <ToucheSir@users.noreply.github.com>
ToucheSir added a commit that referenced this pull request Feb 13, 2023
* print/convert batchedadjtrans over cuarray

* Update test/batchedadjtrans.jl

Co-authored-by: Brian Chen <ToucheSir@users.noreply.github.com>

Co-authored-by: Brian Chen <ToucheSir@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants