Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Fill ctor #445

Merged
merged 1 commit into from
Jan 14, 2020
Merged

Fix Fill ctor #445

merged 1 commit into from
Jan 14, 2020

Conversation

willtebbutt
Copy link
Member

The Fill constructor adjoint fell over if you used any indexing operations on a given Fill. This is annoying as there are legitimate reasons for wanting to do this.

@MikeInnes
Copy link
Member

MikeInnes commented Jan 6, 2020

Indexing operations should end up using the zygote-defined adjoints that produce arrays. I don't know anything about getindex_value, but strictly speaking if that's the issue it might be equally valid to just add an adjoint for it.

Making .value work seems reasonable (does FillArrays support that as part of its API?) in any case, so this isn't any strong objection.

@willtebbutt
Copy link
Member Author

It seems to me that for .value and getindex_value it makes most sense to treat a Fill as a struct, but for array-like operations it makes sense (as you say) to treat it like an array, as you say. So I reckon that an implementation like this is probably reasonable?

@MikeInnes
Copy link
Member

I think it's acceptable, but it does have the downside that you might get errors doing things like getindex_value(xs) + xs[1], since we'll try to sum a namedtuple and array adjoint together. It's somewhat more consistent to make sure that all operations on fill arrays return arrays as adjoints (in this case the adjoint of getindex_value(xs) is xs ./ length(xs), I think).

@willtebbutt
Copy link
Member Author

Does this not just require us to ensure that accum is defined appropriately for that particular NamedTuple and whatever 1-hot array is used? Whatever solution we have, it's important to maintain the correct complexity here. If we start throwing 1-hot arrays around, and accumulating them into dense arrays, we lose performance. In this case I think that even x[1] should probably return a named tuple...

@MikeInnes
Copy link
Member

accum wouldn't have enough type information to know how to add the namedtuple and array. We'd have to define a FillArrayAdjoint type that we can dispatch on.

But that seems more complex than just using FillArray to represent its own adjoint, which is certainly possible for getindex_value.

I don't know if it's common to actually use scalar indexing with FillArrays, but if it so I think it may actually be justified to make the gradient of this information a FillArray, avoiding any actual array allocation while keeping the implementation neat.

@willtebbutt
Copy link
Member Author

accum wouldn't have enough type information to know how to add the namedtuple and array. We'd have to define a FillArrayAdjoint type that we can dispatch on.

True.

But that seems more complex than just using FillArray to represent its own adjoint, which is certainly possible for getindex_value.

A Fill isn't a reasonable representation of it's own adjoint. If I do x[1], what Fill should I use to represent the adjoint?

@MikeInnes
Copy link
Member

MikeInnes commented Jan 10, 2020

The gradient for y = x[i] is x̄ = Fill(ȳ / length(x), size(x)), the same gradient as mean(xs). xs[i] isn't implemented that way, but behaves equivalently, which means that the gradient is ambiguous; either the one-hot array or the FillArray will give valid results.

The only downside is a slightly surprising (but arguably correct) result when taking the gradient of a FillArray directly, but in general I don't think this can impact other gradients, which makes it a neat solution. Since FillAdjointArrays can be returned to the user as well, they'd be forced to behave as arrays for consistency in the same way, making them equivalent.

@willtebbutt
Copy link
Member Author

Yeah, surprising... I can't disagree that it behaves equivalently under addition, but as you allude to it feels like a hack -- there's obviously loads of other functionality defined on Fills for which it would be equivalent.

Anyway, are you sufficiently happy with the approach I've got here for it to be merged for now?

@MikeInnes
Copy link
Member

I agree it seems hacky at first, but I think this one grows on you. This case increasingly makes me feel that the intuition that xs[i] should have a certain gradient because of the way it's spelled is flawed. I really can't think of a logical argument that the intuitive behaviour is more correct.

Anyway, yes, we can merge this for now and revisit if anyone complains about accum errors.

bors r+

bors bot added a commit that referenced this pull request Jan 11, 2020
445: Fix Fill ctor r=MikeInnes a=willtebbutt

The Fill constructor adjoint fell over if you used any indexing operations on a given `Fill`. This is annoying as there are legitimate reasons for wanting to do this.

Co-authored-by: willtebbutt <wt0881@my.bristol.ac.uk>
@bors
Copy link
Contributor

bors bot commented Jan 11, 2020

Build failed

@oxinabox
Copy link
Member

Yeah, surprising... I can't disagree that it behaves equivalently under addition, but as you allude to it feels like a hack -- there's obviously loads of other functionality defined on Fills for which it would be equivalent.

addition is all that matters though.

@willtebbutt
Copy link
Member Author

willtebbutt commented Jan 14, 2020

addition is all that matters though.

True. I should at least be self-consistent (I definitely meant to say wouldn't be equivalent, as opposed to would).

@MikeInnes what would you like to do about the bors failure?

@MikeInnes
Copy link
Member

bors r+

bors bot added a commit that referenced this pull request Jan 14, 2020
445: Fix Fill ctor r=MikeInnes a=willtebbutt

The Fill constructor adjoint fell over if you used any indexing operations on a given `Fill`. This is annoying as there are legitimate reasons for wanting to do this.

Co-authored-by: willtebbutt <wt0881@my.bristol.ac.uk>
@bors
Copy link
Contributor

bors bot commented Jan 14, 2020

Build succeeded

@bors bors bot merged commit 7f4c63a into master Jan 14, 2020
@MikeInnes
Copy link
Member

In slack discussions I pointed out that in cases like gradient(x -> x[1] + sum(x), x) and gradient(x -> x[1], x) + x force us to choose a 'basis' for the FillArrayAdjoint anyway; AFAICT the (implicit) proposal is that FillArrayAdjoint behaves exactly like a FillArray when it actually gets used. In effect, the only change made in my proposal is to implicitly calculate

gradient(f, x) + zero(x)

when taking gradients; if this is problem, it's a problem with using an efficient representation in any way.

The argument for an efficient but non-hacky (if it is a hack) option rings false here. We go whole-hog and use the efficient but counter-intuitive behaviour everywhere, or we go back to using dense arrays for x[i] and get the right behaviour.

Personally, I think there's plenty of precedent for supporting mathematical operations that are numerically ambiguous (basis of vectors, branch cuts etc.) by making an arbitrary decision. In any case, I'm not too personally concerned about the conclusion, but do think it's important to understand that there's a tradeoff here; we can't have it both ways.

If I'm missing something here I think a detailed discussion of those test cases would be helpful.

@MikeInnes
Copy link
Member

An interesting additional perspective came up in discussion between me and @oxinabox. In effect this is just another version of the "natural"/"structural" differentiation issue that has come up many times (e.g. Taylor series, sparse arrays). Unlike some other cases where conversion between the two differentials is simply possible or not, in this case it's just slightly lossy (with respect to the specific distribution of the gradient over the array). This is what gives us the option of doing a pseudo-natural approach which is somewhat arbitrary on this front. It's an interesting tradeoff, at any rate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants