-
-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Fill ctor #445
Fix Fill ctor #445
Conversation
Indexing operations should end up using the zygote-defined adjoints that produce arrays. I don't know anything about Making |
It seems to me that for |
I think it's acceptable, but it does have the downside that you might get errors doing things like |
Does this not just require us to ensure that |
But that seems more complex than just using I don't know if it's common to actually use scalar indexing with FillArrays, but if it so I think it may actually be justified to make the gradient of this information a |
True.
A |
The gradient for The only downside is a slightly surprising (but arguably correct) result when taking the gradient of a FillArray directly, but in general I don't think this can impact other gradients, which makes it a neat solution. Since |
Yeah, surprising... I can't disagree that it behaves equivalently under addition, but as you allude to it feels like a hack -- there's obviously loads of other functionality defined on Anyway, are you sufficiently happy with the approach I've got here for it to be merged for now? |
I agree it seems hacky at first, but I think this one grows on you. This case increasingly makes me feel that the intuition that Anyway, yes, we can merge this for now and revisit if anyone complains about bors r+ |
Build failed |
addition is all that matters though. |
True. I should at least be self-consistent (I definitely meant to say wouldn't be equivalent, as opposed to would). @MikeInnes what would you like to do about the bors failure? |
bors r+ |
Build succeeded |
In slack discussions I pointed out that in cases like gradient(f, x) + zero(x) when taking gradients; if this is problem, it's a problem with using an efficient representation in any way. The argument for an efficient but non-hacky (if it is a hack) option rings false here. We go whole-hog and use the efficient but counter-intuitive behaviour everywhere, or we go back to using dense arrays for Personally, I think there's plenty of precedent for supporting mathematical operations that are numerically ambiguous (basis of vectors, branch cuts etc.) by making an arbitrary decision. In any case, I'm not too personally concerned about the conclusion, but do think it's important to understand that there's a tradeoff here; we can't have it both ways. If I'm missing something here I think a detailed discussion of those test cases would be helpful. |
An interesting additional perspective came up in discussion between me and @oxinabox. In effect this is just another version of the "natural"/"structural" differentiation issue that has come up many times (e.g. Taylor series, sparse arrays). Unlike some other cases where conversion between the two differentials is simply possible or not, in this case it's just slightly lossy (with respect to the specific distribution of the gradient over the array). This is what gives us the option of doing a pseudo-natural approach which is somewhat arbitrary on this front. It's an interesting tradeoff, at any rate. |
The Fill constructor adjoint fell over if you used any indexing operations on a given
Fill
. This is annoying as there are legitimate reasons for wanting to do this.