Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Activations #860

Merged
merged 18 commits into from
Nov 19, 2019
Merged

Activations #860

merged 18 commits into from
Nov 19, 2019

Conversation

dsweber2
Copy link
Contributor

Taking derivatives w.r.t. the parameters results in complaints about mutability (mwe below). To get around this, I made the storage array a Zygote.Buffer, and then return a copy after inserting everything. I tried an accumulate! based version, which worked on commit ecc9ce9, but broke when I caught up.

Simple example:

c = Chain(Dense(3,5,relu), Dense(5,1,relu))
X = Float32.([1.0; 1.0; 1.0])
gradient(()->Flux.activations(c, X)[2][1], params(c))

@MikeInnes
Copy link
Member

Thanks a lot for the patch!

I wonder if we could just have activations return a tuple, similar to how applychain works? Then it should be Zygote-compatible by default.

@dsweber2
Copy link
Contributor Author

Yeah, that's definitely a nicer way to do this (in the new commit). There are a couple of extra terms in the gradient that I'm sort of mystified by, but they don't seem to be causing problems.

IdDict{Any,Any} with 6 entries:
  Float32[0.687749 -0.78338 0.341401; 0.6577… => Float32[0.0 0.0 0.0; 0.0 0.0 0.0; … ; 0.0 0.0 0.0; 0.0 0.0 0.0]
  Float32[0.0]                                => Float32[0.0]
  IOBuffer(data=UInt8[...], readable=true, w… => RefValue{Any}((data = nothing, readable = nothing, writable = nothing, seekable = not…
  IOBuffer(data=UInt8[...], readable=true, w… => RefValue{Any}((data = nothing, readable = nothing, writable = nothing, seekable = not…
  Float32[0.0, 0.0, 0.0, 0.0, 0.0]            => Float32[0.0, 0.0, 0.0, 0.0, 0.0]
  Float32[-0.745811 0.0290723 … -0.874623 -0… => Float32[0.0 0.0 … 0.0 0.0]

What do you think of adding a Chain method that takes in two arguments, the second being a list of indices of the depths at which to return the transform? That would remove the need of an activations function completely, though not really decrease the total number of lines of code.

@MikeInnes
Copy link
Member

This looks good, thanks. Would be great to have a quick test for it so we don't miss it again.

What do you think of adding a Chain method that takes in two arguments, the second being a list of indices of the depths at which to return the transform?

I don't understand this; would be great to perhaps see an example of what this would look like.

@dsweber2
Copy link
Contributor Author

dsweber2 commented Oct 9, 2019

Something like (c::Chain)(x, i) = extraChain(c.layers, x)[i] in addition to what I wrote. Might be a more efficient implementation. Small usage example:

C = Chain(Dense(10, 5, σ), Dense(5, 2), softmax)
C(randn(10), :) # equivalent to
activations(C, randn(10))
# also allows
C(randn(10), [1,3]) # wouldn't return the second term

return (res, extraChain(Base.tail(fs), res)...)
end

extraChain(::Tuple{}, x) = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be the empty tuple, so that the compiler can unroll everything

@testset "Activations" begin
c = Chain(Dense(3,5,relu), Dense(5,1,relu))
X = Float32.([1.0; 1.0; 1.0])
@test_nowarn gradient(()->Flux.activations(c, X)[2][1], params(c))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a regular test, i.e. making sure the output is right? Rather than chaining dense layers, it might be useful to chain something simple like Chain(x -> x^2, x -> x+1) or something so that the outputs and gradients are trivial.

Otherwise really happy with this patch!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I see that there are some other tests above here; what's the need for the additional @test_nowarn here? If it's redundant, it'd be best to remove.

@dsweber2
Copy link
Contributor Author

sorry about the extra commits; looks like rebasing to master made this a bit of a mess. I just made the two suggestions you made.

@MikeInnes MikeInnes changed the base branch from zygote to master November 15, 2019 10:53
@@ -31,6 +31,8 @@ applychain(fs::Tuple, x) = applychain(tail(fs), first(fs)(x))

(c::Chain)(x) = applychain(c.layers, x)

(c::Chain)(x) = extraChain(c.layers, x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definition doesn't look right to me.

@MikeInnes
Copy link
Member

No worries, it just needs to target the master branch rather than the old zygote branch. CI appears to have an issue BTW.

@dsweber2
Copy link
Contributor Author

CI seems to be because I wasn't fully on the master branch, though nightly is having some issue with initializing Zygote. The extra (c::Chain) definition was from the rewrite I was talking about, it should be gone now.

@MikeInnes
Copy link
Member

bors r+

bors bot added a commit that referenced this pull request Nov 19, 2019
860: Activations r=MikeInnes a=dsweber2

Taking derivatives w.r.t. the parameters results in complaints about mutability (mwe below). To get around this, I made the storage array a `Zygote.Buffer`, and then return a copy after inserting everything. I tried an `accumulate!` based version, which worked on commit ecc9ce9, but broke when I caught up.

Simple example:
```
c = Chain(Dense(3,5,relu), Dense(5,1,relu))
X = Float32.([1.0; 1.0; 1.0])
gradient(()->Flux.activations(c, X)[2][1], params(c))
```

Co-authored-by: dsweber2 <david.weber2@gmail.com>
Co-authored-by: Mosè Giordano <m.giordano@ucl.ac.uk>
@bors
Copy link
Contributor

bors bot commented Nov 19, 2019

Build failed

@MikeInnes
Copy link
Member

Failure is #923.

@MikeInnes MikeInnes merged commit 5839e16 into FluxML:master Nov 19, 2019
@MikeInnes
Copy link
Member

Thanks @dsweber2!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants