Fix Fill ctor #445

willtebbutt · 2020-01-05T19:29:12Z

The Fill constructor adjoint fell over if you used any indexing operations on a given Fill. This is annoying as there are legitimate reasons for wanting to do this.

MikeInnes · 2020-01-06T17:49:45Z

Indexing operations should end up using the zygote-defined adjoints that produce arrays. I don't know anything about getindex_value, but strictly speaking if that's the issue it might be equally valid to just add an adjoint for it.

Making .value work seems reasonable (does FillArrays support that as part of its API?) in any case, so this isn't any strong objection.

willtebbutt · 2020-01-06T18:03:46Z

It seems to me that for .value and getindex_value it makes most sense to treat a Fill as a struct, but for array-like operations it makes sense (as you say) to treat it like an array, as you say. So I reckon that an implementation like this is probably reasonable?

MikeInnes · 2020-01-10T14:27:31Z

I think it's acceptable, but it does have the downside that you might get errors doing things like getindex_value(xs) + xs[1], since we'll try to sum a namedtuple and array adjoint together. It's somewhat more consistent to make sure that all operations on fill arrays return arrays as adjoints (in this case the adjoint of getindex_value(xs) is xs ./ length(xs), I think).

willtebbutt · 2020-01-10T15:05:28Z

Does this not just require us to ensure that accum is defined appropriately for that particular NamedTuple and whatever 1-hot array is used? Whatever solution we have, it's important to maintain the correct complexity here. If we start throwing 1-hot arrays around, and accumulating them into dense arrays, we lose performance. In this case I think that even x[1] should probably return a named tuple...

MikeInnes · 2020-01-10T17:32:54Z

accum wouldn't have enough type information to know how to add the namedtuple and array. We'd have to define a FillArrayAdjoint type that we can dispatch on.

But that seems more complex than just using FillArray to represent its own adjoint, which is certainly possible for getindex_value.

I don't know if it's common to actually use scalar indexing with FillArrays, but if it so I think it may actually be justified to make the gradient of this information a FillArray, avoiding any actual array allocation while keeping the implementation neat.

willtebbutt · 2020-01-10T17:37:44Z

accum wouldn't have enough type information to know how to add the namedtuple and array. We'd have to define a FillArrayAdjoint type that we can dispatch on.

True.

But that seems more complex than just using FillArray to represent its own adjoint, which is certainly possible for getindex_value.

A Fill isn't a reasonable representation of it's own adjoint. If I do x[1], what Fill should I use to represent the adjoint?

MikeInnes · 2020-01-10T18:38:29Z

The gradient for y = x[i] is x̄ = Fill(ȳ / length(x), size(x)), the same gradient as mean(xs). xs[i] isn't implemented that way, but behaves equivalently, which means that the gradient is ambiguous; either the one-hot array or the FillArray will give valid results.

The only downside is a slightly surprising (but arguably correct) result when taking the gradient of a FillArray directly, but in general I don't think this can impact other gradients, which makes it a neat solution. Since FillAdjointArrays can be returned to the user as well, they'd be forced to behave as arrays for consistency in the same way, making them equivalent.

willtebbutt · 2020-01-11T10:43:55Z

Yeah, surprising... I can't disagree that it behaves equivalently under addition, but as you allude to it feels like a hack -- there's obviously loads of other functionality defined on Fills for which it would be equivalent.

Anyway, are you sufficiently happy with the approach I've got here for it to be merged for now?

MikeInnes · 2020-01-11T11:09:29Z

I agree it seems hacky at first, but I think this one grows on you. This case increasingly makes me feel that the intuition that xs[i] should have a certain gradient because of the way it's spelled is flawed. I really can't think of a logical argument that the intuitive behaviour is more correct.

Anyway, yes, we can merge this for now and revisit if anyone complains about accum errors.

bors r+

445: Fix Fill ctor r=MikeInnes a=willtebbutt The Fill constructor adjoint fell over if you used any indexing operations on a given `Fill`. This is annoying as there are legitimate reasons for wanting to do this. Co-authored-by: willtebbutt <wt0881@my.bristol.ac.uk>

bors · 2020-01-11T11:23:49Z

Build failed

ci/gitlab/gitlab.com

oxinabox · 2020-01-14T11:07:05Z

Yeah, surprising... I can't disagree that it behaves equivalently under addition, but as you allude to it feels like a hack -- there's obviously loads of other functionality defined on Fills for which it would be equivalent.

addition is all that matters though.

willtebbutt · 2020-01-14T11:12:39Z

addition is all that matters though.

True. I should at least be self-consistent (I definitely meant to say wouldn't be equivalent, as opposed to would).

@MikeInnes what would you like to do about the bors failure?

MikeInnes · 2020-01-14T11:55:29Z

bors r+

445: Fix Fill ctor r=MikeInnes a=willtebbutt The Fill constructor adjoint fell over if you used any indexing operations on a given `Fill`. This is annoying as there are legitimate reasons for wanting to do this. Co-authored-by: willtebbutt <wt0881@my.bristol.ac.uk>

bors · 2020-01-14T12:17:15Z

Build succeeded

ci/gitlab/gitlab.com

MikeInnes · 2020-01-14T12:24:46Z

In slack discussions I pointed out that in cases like gradient(x -> x[1] + sum(x), x) and gradient(x -> x[1], x) + x force us to choose a 'basis' for the FillArrayAdjoint anyway; AFAICT the (implicit) proposal is that FillArrayAdjoint behaves exactly like a FillArray when it actually gets used. In effect, the only change made in my proposal is to implicitly calculate

gradient(f, x) + zero(x)

when taking gradients; if this is problem, it's a problem with using an efficient representation in any way.

The argument for an efficient but non-hacky (if it is a hack) option rings false here. We go whole-hog and use the efficient but counter-intuitive behaviour everywhere, or we go back to using dense arrays for x[i] and get the right behaviour.

Personally, I think there's plenty of precedent for supporting mathematical operations that are numerically ambiguous (basis of vectors, branch cuts etc.) by making an arbitrary decision. In any case, I'm not too personally concerned about the conclusion, but do think it's important to understand that there's a tradeoff here; we can't have it both ways.

If I'm missing something here I think a detailed discussion of those test cases would be helpful.

MikeInnes · 2020-01-14T13:19:25Z

An interesting additional perspective came up in discussion between me and @oxinabox. In effect this is just another version of the "natural"/"structural" differentiation issue that has come up many times (e.g. Taylor series, sparse arrays). Unlike some other cases where conversion between the two differentials is simply possible or not, in this case it's just slightly lossy (with respect to the specific distribution of the gradient over the array). This is what gives us the option of doing a pseudo-natural approach which is somewhat arbitrary on this front. It's an interesting tradeoff, at any rate.

Fix Fill ctor

7f4c63a

willtebbutt requested a review from Roger-luo January 5, 2020 19:29

willtebbutt mentioned this pull request Jan 10, 2020

AbstractArray or Composite? JuliaDiff/ChainRulesCore.jl#85

Open

bors bot merged commit 7f4c63a into master Jan 14, 2020

oxinabox mentioned this pull request Jan 15, 2020

How to represent structural deriviatives #462

Open

CarloLucibello deleted the wct/fix-Fill branch June 22, 2020 05:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Fill ctor #445

Fix Fill ctor #445

willtebbutt commented Jan 5, 2020

MikeInnes commented Jan 6, 2020 •

edited

Loading

willtebbutt commented Jan 6, 2020

MikeInnes commented Jan 10, 2020

willtebbutt commented Jan 10, 2020

MikeInnes commented Jan 10, 2020

willtebbutt commented Jan 10, 2020

MikeInnes commented Jan 10, 2020 •

edited

Loading

willtebbutt commented Jan 11, 2020

MikeInnes commented Jan 11, 2020

bors bot commented Jan 11, 2020

oxinabox commented Jan 14, 2020

willtebbutt commented Jan 14, 2020 •

edited

Loading

MikeInnes commented Jan 14, 2020

bors bot commented Jan 14, 2020

MikeInnes commented Jan 14, 2020

MikeInnes commented Jan 14, 2020

Fix Fill ctor #445

Fix Fill ctor #445

Conversation

willtebbutt commented Jan 5, 2020

MikeInnes commented Jan 6, 2020 • edited Loading

willtebbutt commented Jan 6, 2020

MikeInnes commented Jan 10, 2020

willtebbutt commented Jan 10, 2020

MikeInnes commented Jan 10, 2020

willtebbutt commented Jan 10, 2020

MikeInnes commented Jan 10, 2020 • edited Loading

willtebbutt commented Jan 11, 2020

MikeInnes commented Jan 11, 2020

bors bot commented Jan 11, 2020

Build failed

oxinabox commented Jan 14, 2020

willtebbutt commented Jan 14, 2020 • edited Loading

MikeInnes commented Jan 14, 2020

bors bot commented Jan 14, 2020

Build succeeded

MikeInnes commented Jan 14, 2020

MikeInnes commented Jan 14, 2020

MikeInnes commented Jan 6, 2020 •

edited

Loading

MikeInnes commented Jan 10, 2020 •

edited

Loading

willtebbutt commented Jan 14, 2020 •

edited

Loading