Skip to content
This repository has been archived by the owner on Mar 12, 2021. It is now read-only.

Add correct batch size for RNN hidden layer #706

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

DhairyaLGandhi
Copy link
Member

Fixes FluxML/Flux.jl#1114

The context here is that on the first call to the layer (also simulated via calling the Flux.reset! on the structure), the gradients for the hidden layer were returned as the same size as the state at the time (a vector), whereas on stepping through time, the size of the state would change. This needs to be accounted for while sending the correct size back. We do this automatically for subsequent calls, only with the first run, is when we see this issue.

@MikeInnes @maleadt, would this be an alright fix?

@maleadt
Copy link
Member

maleadt commented May 5, 2020

Maybe add a test?

@maleadt maleadt added the bugfix label May 5, 2020
@maleadt maleadt requested a review from MikeInnes May 5, 2020 11:26
@MikeInnes
Copy link
Collaborator

As I understand it, ho should already have been expanded (via hbatch) during the forwards pass. Its gradient dho should have the same size as ho, in which case AFAICT this patch shouldn't be needed.

I don't think this PR would do any harm, but that seems fishy and it may be worth understanding why it happens in case something else is going wrong.

@DhairyaLGandhi
Copy link
Member Author

DhairyaLGandhi commented May 5, 2020

h_ = CUDNN.hBatch(x, h)
and others is where that expansion happens (for h). But that doesn't happen in the forwards pass...

This is just to apply the same treatment as h to dho.

@MikeInnes
Copy link
Collaborator

I believe that line mirrors this one which does happen in the forward pass (we just do it again in the backwards pass because we don't have access to the expanded h).

It would be worth printing out the size of ho and dho at the start of the pullback. Pullbacks are supposed to be able to rely on them being the same size (and if they are then there's no issue).

@DhairyaLGandhi
Copy link
Member Author

Found this

(size(ho), size(dho)) = ((8, 10), (8,))

This was when the model was reset!d

All subsequent calls have the same size. I am assuming its because h is a vector that we get a vector back? here

function pullback(rnn::RNNDesc{T}, x::CuArray{T}, h::CuArray{T}) where T <: Union{Float32,Float64}

@DhairyaLGandhi
Copy link
Member Author

What should the test actually be? Just calling into the backwardData function?

@MikeInnes
Copy link
Collaborator

If h is a vector then dh should also be a vector; I'm not sure if we handle that correctly. However, it seems likely to be independent of the current issue.

The fact that size(dho) != size(ho) implies that some other function is getting called with a matrix (ho) but producing a vector (dho) as a gradient, which is wrong. Ideally we'd fix that function, whatever it is.

@jeremiedb
Copy link

@DhairyaLGandhi @MikeInnes Have you had a moment to look at the above?

Using the following latest release, the issue seems not to be reproducible anymore:

  [3a865a2d] CuArrays v2.2.2
  [a93c6f00] DataFrames v0.21.2
  [587475ba] Flux v0.10.4
  [28b8d3ca] GR v0.50.1
  [91a5bcdd] Plots v1.4.3
  [2913bbd2] StatsBase v0.33.0
  [e88e6eb3] Zygote v0.4.22

However, I couldn't identify what change to either Flux, CuArrays or Zygote may have solved the issue. Any pointer would be welcome.

@DhairyaLGandhi
Copy link
Member Author

Am I right in understanding that the mwe in FluxML/Flux.jl#1114 works as expected now?

@jeremiedb
Copy link

Yes!

@DhairyaLGandhi
Copy link
Member Author

Wonder if the Adapt issue is related then, incorrect size in the backwards pass could result from an ndims of a Transpose or similar.

@DhairyaLGandhi
Copy link
Member Author

Ref JuliaGPU/Adapt.jl#24, I suppose

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RNN on GPU fails on first backward call
4 participants