Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tensor Functions #11

Open
rleegates opened this issue Feb 20, 2015 · 7 comments
Open

Tensor Functions #11

rleegates opened this issue Feb 20, 2015 · 7 comments

Comments

@rleegates
Copy link

Hi Frédéric,

as far as I can tell, the package currently only supports functions that yield a scalar value. Any plans on extending this to tensor-valued functions of tensors?

Best regards,
Robert

@fredo-dedup
Copy link
Contributor

Hi,

I had no plans for this but yes this is possible. Calculation time should be O(n). It would need some prior thinking on how to present the results, especially for higher order derivations.
I labelled your issue as an enhancement request (not sure I'll have time for this though).

Don't know if this would work for you, but you have a workaround with the current version :

  • end your expression with a f[i] (fbeing the variable containing the tensor)
  • generate your derivation expression with rdiff :
  • build the loop around the expression : for i in 1:n ; $expr : end

@rleegates
Copy link
Author

Hi Frédéric,

thank you for your quick reply. I'll try your workaround when I find the time, as I'm currently involved in another project. As far as I can tell, the workaround provides for the differentiation of tensor-valued functions with respect to scalars, however, I'm sure it could be extended to more complicated structures. Just FYI, what I'd ideally be looking for is the computation of partial and/or total derivatives of functionals of the type f_{ij}(g_{ij}(x_{ij}), k(x_{ij})) such that its derivative wrt x yields d/dx_{kl} f_{ij} = df_{ij}/dg_{mn} dg_{mn}/dx_{kl} + df_{ij}/dk dk/dx_{kl} in which either contractions (first term) or dyadics (second term) appear. I was pondering doing this symbolically, however your package would be a nice alternative, as it would enable me to skip the code generation from the symbolic expression. In addition, my use-cases become even more complicated when the tensor function is applied to the eigenvalues of x_{ij}, a point where I'd be unsure if symbolic computations will suffice. If I can be of help in implementing such features, we could continue this discussion by email.

Best regards,
Robert

@alexbw
Copy link

alexbw commented Jul 24, 2016

FYI, if just writing down all the gradients of lots of tensor-valued functions is the blocker, this has been done (at least twice) in the autograd-family of libraries.

In autograd: https://github.com/HIPS/autograd/blob/master/autograd/numpy/numpy_grads.py
In the Torch version of autograd: https://github.com/twitter/torch-autograd/blob/master/src/gradfuns.lua

EDIT: the most confusing gradients for those exhaustive links above are those having to do with tensor resizing, indexing and broadcasting. I'm happy to help, and walk through the code with anyone interested in porting them to Julia, if that's interesting to someone.

@dfdx
Copy link

dfdx commented Aug 14, 2016

@alexbw I'm definitely interested to port tensor gradients to Julia (e.g. see dfdx/Espresso.jl#2 for some details). Would you suggest any "entry point" to get started (either in code or in theoretical papers)?

@alexbw
Copy link

alexbw commented Aug 15, 2016

On the issue you linked, I think you're conflating the partial derivatives
you need to write with the method you will use to perform automatic
differentiation of output w.r.t. input. Indeed we do require functions to
have scalar output in torch-autograd, but I believe autograd supports
calculation of the Jacobian (non-scalar output) by doing multiple passes of
the function, once per column of the Jacobian. So, if you get scalar-valued
outputs working, you just need some small extra effort to get tensor-valued
outputs.

I would recommend just lifting the gradients from autograd or
torch-autograd. In autograd the file is called "numpy_grads.py", I believe,
and it's "gradfuns.lua" in torch-autograd.

On Sun, Aug 14, 2016 at 12:29 PM Andrei Zhabinski notifications@github.com
wrote:

@alexbw https://github.com/alexbw I'm definitely interested to port
tensor gradients to Julia (e.g. see dfdx/Espresso.jl#2
dfdx/Espresso.jl#2 for some details). Would
you suggest any "entry point" to get started (either in code or in
theoretical papers)?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#11 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAJ4j2eWMtju55jQZH_AOgCZaaWuqPWmks5qf0JJgaJpZM4DjlTK
.

@dfdx
Copy link

dfdx commented Aug 16, 2016

@alexbw Thanks for your answer. For the code you linked, am I right saying that gradients there are represented as Python/Lua functions that take previous gradients (i.e. gradients of arguments) and produce a new gradient for the current operation itself? That is something like this:

grad_1 = make_gradient_myfunc(A, B)
grad_1(already_computed_gradients_of_A_and_B)

Also I don't really understand meaning of so common unbroadcast function there. I see that it sums out some dimensions of a tensor, but which ones and for what purpose?

@alexbw
Copy link

alexbw commented Aug 22, 2016

Yes, the function signature, for some function like e.g. sum(x,y):

gradSum[1] = function(incomingGradient, answerOfSum, x, y) ... end
gradSum[2] = function(incomingGradient, answerOfSum, x, y) ... end

to calculate the partial gradients for each argument of sum(x,y)

Unbroadcast (used to be called "sumToMatchShape") is used a lot to match
gradient shapes when there has been replication. If you replicate a tensor
in the forward pass, the action you must take in the backwards pass is to
sum (not select) the replicated parts together.

On Tue, Aug 16, 2016 at 7:01 PM Andrei Zhabinski notifications@github.com
wrote:

@alexbw https://github.com/alexbw Thanks for your answer. For the code
you linked, am I right saying that gradients there are represented as
Python/Lua functions that take previous gradients (i.e. gradients of
arguments) and produce a new gradient for the current operation itself?
That is something like this:

grad_1 = make_gradient_myfunc(A, B)
grad_1(already_computed_gradients_of_A_and_B)

Also I don't really understand meaning of so common unbroadcast function
there. I see that it sums out some dimensions of a tensor, but which ones
and for what purpose?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#11 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAJ4jzgvByusr7z3gxK9GO5AQW3MbtrIks5qgkFNgaJpZM4DjlTK
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants