Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-level derivatives (and seeding) #21

Closed
kewp opened this issue Mar 17, 2019 · 18 comments
Closed

Multi-level derivatives (and seeding) #21

kewp opened this issue Mar 17, 2019 · 18 comments
Labels
question Further information is requested

Comments

@kewp
Copy link

kewp commented Mar 17, 2019

I'm getting strange behavior when trying to calculate a gradient. I wonder if it's because in my objective function I use calculate derivatives.

To explain: I see the way one calculates involves seeding autodiff with the target variable:

VectorXdual u(5);
u << 1.0, 1.1, 1.2, 1.3, 1.4;

dual w = energy(u);

VectorXd dwdu(5);
int i; for (i=0; i<5; i++)
{
    seed(wrt(u[i]));
    dwdu[i] = energy(u).grad;
    unseed(wrt(u[i]));
}

So if I understand, seeding is a way to (globally?) track the variable of interest. But what happens if my target function also calculates derivatives?

dual f(VectorXdual u)
{
   // ....
}

dual energy(VectorXdual u)
{
  double dfdx = derivative(f, wrt(x), u);
  // ...
}

I see in forward.hpp that derivative() also uses seeding. Do the two clash with one another? Can a nested derivative calculation like this work?

@allanleal
Copy link
Member

allanleal commented Mar 18, 2019

I think you should avoid this usage. Define your function energy using dual as if you'd use double.

Why do you need to call derivative inside energy? As soon as you start mixing double and dual above, there is a danger of calculating wrong derivatives, since the double variables will not have the introspection mechanism of the dual variable.

@kewp
Copy link
Author

kewp commented Mar 18, 2019

Because my energy calculations are based on derivatives of other functions.

I’m coding up finite elements. I’m calculating the strain energy of the system, which is based on the strain which is the partial derivative of displacement!

@allanleal
Copy link
Member

allanleal commented Mar 18, 2019

Because my energy calculations are based on derivatives of other functions.

How would you compute energy without automatic differentiation variables? Would you be able to easily compute these internal derivatives by manually implementing them?

I also happen to define functions that compute thermodynamic properties which themselves require derivatives of other properties. I have to implement these "internal" derivatives manually (in general not difficult) to keep a consistent usage of automatic differentiation types.

@allanleal
Copy link
Member

So if I understand, seeding is a way to (globally?) track the variable of interest.

Forgot to reply this. Seeding set the grad data member of dual to 1.0. This permits the calculation of derivative of a function with respect to that variable (while all other dual arguments in the function are unseeded, with grad equals to 0.0.

@kewp
Copy link
Author

kewp commented Mar 18, 2019

I can do them manually. I was hoping to not have to :)

So nested derivatives are out in forward mode? In one of your reverse mode examples you calculate a derivative and then a further derivative of that. Does this restriction only apply to forward autodiff?

@allanleal
Copy link
Member

allanleal commented Mar 18, 2019

So nested derivatives are out in forward mode?

Not necessarily - but I would always recommend a consistent usage of dual throughout. Use double only when the value is a constant. If you are computing a dependent variable, make sure you use dual (or var, if using reverse.hpp).

In one of your reverse mode examples you calculate a derivative and then a further derivative of that. Does this restriction only apply to forward autodiff?

You speak about higher order derivatives I guess. This is also possible with dual, but I would suggest you not to go this route now - focus on first order, and later, if absolutely needed, increase the order of derivatives. Reason: higher order derivatives can dramatically increase the computation effort.

@kewp
Copy link
Author

kewp commented Mar 18, 2019

In your first reply above you said I’m mixing dual and double. Where do I do that in the code I posted?

@allanleal
Copy link
Member

Here:

dual energy(VectorXdual u)
{
  double dfdx = derivative(f, wrt(x), u);
  // ...
}

Since dfdx depends on u, any usage of dfdx afterwards will not be autodiff-enabled.

@allanleal allanleal added the question Further information is requested label Mar 18, 2019
@kewp
Copy link
Author

kewp commented Mar 18, 2019

Do you know newton raphson? It’s a technique for finding the zero of a function.

My purpose is to find the set of degrees of freedom which minimizes my objective function. Here I call the objective energy. My degrees of freedom are my duals(or vars). And to solve I take the derivative in terms of the vars - this gives me a vector. Then I take the derivatives of this vector - giving a matrix. Solving for Ax = b where A is this matrix and b the aforementioned vector, you should get a new vector x which shows how much to modify your original degrees of freedom to get to the minimum energy. Do this iteratively and you have a generic minimization procedure for any function based on a set of variables.

Does this all sound plausible / possible with autodiff? I’m just worried because of our discussing nested derivatives (though what I describe does seem like your example of taking derivatives of a derivative)

@kewp
Copy link
Author

kewp commented Mar 18, 2019

Since dfdx depends on u, any usage of dfdx afterwards will not be autodiff-enabled.

Ok - I see what you mean here now

@allanleal
Copy link
Member

allanleal commented Mar 18, 2019

Do you know newton raphson? It’s a technique for finding the zero of a function.

It's in the core of everything I do! ;)

So, you have a minimization problem; you know the objective function; and you want to go up to the Hessian of this objective function to perform the minimization with a Newton-based numerical optimization algorithm?

Can you easily implement the gradient of energy function using dual (or var)? If you can, then you can use jacobian method (forward) on this gradient function (defined using dual) to compute the Hessian (matrix).

This is what I do to minimize Gibbs energy when computing chemical equilibrium - the gradient of the Gibbs energy are the chemical potentials, which I can define directly, instead of defining the Gibbs energy, and then using autodiff to compute them.

From the chemical potentials, I can compute their gradient, which is the Hessian of Gibbs energy. Important message -- all of this using only first-order automatic differentiation.

@kewp
Copy link
Author

kewp commented Mar 18, 2019

Yes, as you describe, I’m doing a minimization. Good to hear you are using it for similar purposes. I just need to calculate my inner derivatives manually and I should be all set.

@kewp
Copy link
Author

kewp commented Mar 18, 2019

How do you mean “all only first order autodiff” - surely something like the hessian is second order?

@kewp
Copy link
Author

kewp commented Mar 18, 2019

By the way - I actually typedef dual and var as ‘real’ and use that throughout so that I can switch between forward and reverse easily. Works well.

@kewp
Copy link
Author

kewp commented Mar 18, 2019

Sorry I meant define, not typedef. Though I guess typedef would works too?

@allanleal
Copy link
Member

allanleal commented Mar 18, 2019

How do you mean “all only first order autodiff” - surely something like the hessian is second order?

Hessian are second-order derivatives of the energy function. However, they are also the first-order derivatives of the gradient of the energy function. If you can implement these gradients using dual, you'll be able to easily construct the Hessian matrix (using first-order automatic differentiation).

@allanleal
Copy link
Member

By the way - I actually typedef dual and var as ‘real’ and use that throughout so that I can switch between forward and reverse easily. Works well.

This is perfectly fine.

@allanleal
Copy link
Member

I'll close this issue, but feel free to reopen it in the future if further discussions are still needed. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants