Auto-initializing all instances of var as differetiable #2

dcoeurjo · 2017-01-27T12:01:45Z

Hi,

I'm playing a bit with your nice AD tool and I'm having a bug with C++ operator and expressions.

Please have a look to https://gist.github.com/dcoeurjo/ce2b7f5e16edd348b7e4ca061ae6ceb5

For short: I have four points in the plane and I compute a kind of energy (sum of squared euclidean distance to a anchor point) and I want to differentiate this energy w.r.t. the anchor point.

If I expand the for loop when computing the energy (line 28), everything looks fine. When I use a for loop (line 17), then the differentiation fails (returning 0.0 for de/dx for instance). In terms of C++ operators, both energy expressions look similar.

Would you have any idea ?

The text was updated successfully, but these errors were encountered:

dcoeurjo · 2017-01-27T12:07:56Z

Please note that In this code snippet, pointsx and pointsy can simply be std::vector<double> entities.. but this does not change anything to my problem. What am I doing wrong?

ZigaSajovic · 2017-01-27T14:31:09Z

UPDATE

Please see the update in my other comment bellow. The code dcoeurjo posted hereon performs as expected, i.e. exactly the same as the solution in this comment.

Dear dcoeurjo,

note the line 15 in your code, where you declare the variable var sum, you had forgotten to initialize it as a differentiable variable. The code for the energy function should read

var energy(const std::vector<var> &pointsx,
           const std::vector<var> &pointsy,
           const var &x,
           const var &y)
{
  var sum=0.0;
  //initialize sum as a placeholder
  dCpp::initPlaceHolder(sum);
  for (auto i=0; i<pointsx.size(); ++i)
    sum += (pointsx[i] - x)* (pointsx[i] - x) + (pointsy[i] - y)* (pointsy[i] - y);

 return sum;
}

The function energy_expanded works correctly as is, because all variables you operate on are already initialized to be differentiable.

With the edited code, both the call to energy and energy_expanded return the same values.

Please Note that currently all variables have to be initialized in order to be differentiable, so they can be used either as a constant, or as a differentiable variable.

This may change in the next release (it has changed, see the update bellow).

I am closing this issue, but if you have any other questions, please ask.

SideNote

Also Note, that your code as it is now, also performs differentiation with respect to the elements of px and py. In order to avoid this, either have them be vectors of doubles, or vectors of var that have not been initialized as differentiable, i.e. delete the following lines of code

  for (auto i=0; i<4; ++i)
    {
      dCpp::init(px[i]);
      dCpp::init(py[i]);
    }

or make px and py vectors of doubles.

dcoeurjo · 2017-01-27T15:19:54Z

Excellent. thanks a lot, that makes sense.

(thanks also for the comment on px and py. in my problem I would like also to differentiate according to these variables).

thanks again for you quick reply and your nice code ;)

ZigaSajovic · 2017-01-27T20:26:29Z

Closure of the issue

Your original code should now work as expected. Thank you for opening this issue, the specifics of the update can be found bellow.

The title of the issue was edited to reflect this update.

UPDATE

All instances of var are hereon by default initialized as placeHolders of the order the space is set to.

Before

Previously one had to initialize placeholders (see the energy function in my comment above).

var sum=0.0;
dCpp::initPlaceHolder(sum);

Now

All instances of var are automatically initialized as placeHolders by the constructors, meaning that the above is equivalent to

var sum=0.0;

This means all instances of var can be differentiated with respect to initialized differentiable variables.

Note

You cannot differentiate a variable with respect to an uninitialized differentiable variable. In order to enable differentiation with respect to a variable (e,g, x), one should proceed as before, i.e.

var x=0.0;
dCpp::init(x);

dcoeurjo · 2017-01-30T12:17:30Z

dear @ZigaSajovic,

thanks again for your feedbacks. There is still something that puzzles me a bit (I've updated my dCpp clone to the last release). Here you have two codes that should do the same thing: https://gist.github.com/dcoeurjo/c5b9b1cc62b094807ff7ab8aa42f12a4

The problem is the following: we have four (fixed) points (corners of a unit square) and a fifth one. I want to move the fifth one such that it minimizes the sum of squared distances to the four anchor points. In this setting, the problem is very simple (explicit gradient w.r.t. the position of the fifth point). It is just a toy example here. So I use the gradients from dCpp in a gradient descent scheme to end up to the (global) minimum which is exactly (0.5,0.5).

barycenter.cpp is perfectly fine. Even if I don't understand why I need to re-init x and y in the main loop (lines 38 and 39). The new values for x and y (line 46 and 47) should keep the property that the variables are initialized differentiable variables.
barycenter2.cpp differs from the previous one in the sense that anchor points are differentiable variables (but I only use them as constant ones). However,
- beside the initializer_list construction in lines 26-27, I still need to init the vector entries (l:33-37). Not a big deal, maybe related to the previous point.
- the biggest issue is that the minimization is not stable: if you run 10 times the code you end up with many correct outputs but sometimes the minimization fails and the x,ypoint goes to zero (see trace here with 2 runs: https://gist.github.com/dcoeurjo/1d78ba534a1cd76bfaa158afd58d4f5b)

When the anchor points are std::vector (barycenter.cpp) I always get the good answer.
I've tried on several machines (linux, macOS), checked valgrind output messages to make sure that there is no memory issues.... I have no idea where this non-deterministic problem may come from.

Would you have any idea ?

ZigaSajovic · 2017-01-30T13:05:10Z

To whom it may concern

dCpp is a flexible tool, allowing implementation and analysis of programs through operational calculus.

Implementations of differentiable (sub) programs operating on differentiable derivatives of other (sub) programs, where the entire process may again be differentiable. This allows trainable training processes, or other flexible program analysis through operational calculus.
Operational calculus on programming spaces is the paper in which the theory is derived and the process of its use to the purpose of program analysis and deep learning is outlined.
Implementing Operational calculus on programming spaces for Differentiable computing is the paper in which the implementation of this theory into dCpp is explained, where the reader is guided through the code and the theory simultaneously, as to better understand this flexible tool.

Please forgive me for the sparse tutorial in the ReadMe on the title page of this repository. The dCpp project is a flexible tool, meant to be used in conjunction with the above two papers.

Finally, I extend my gratitude to @dcoeurjo , as this discussion, along with his code and my corrections will serve as a great read for anyone starting out with dCpp, where most of the issues one might encounter in the beginning are resolved.

dear @dcoeurjo

barycenter.cpp the issue is that e.d(&x) returns a var of order 0 in this specific case, because e is of order 1, therefore e.d(&x) is of order 0. Now, when calculations are performed, on two variables (as + in your case), the order of the output is reduced to the min of the order of both variables. This is because dCpp allows arbitrary order, and if you operate on two variables of different order, only calculations to the min order of the two can be valid, i.e. if x is 3 times differentiable, and y is 2 times differentiable, the calculation z=x+y only has access up to second order derivatives of y (where x up to third). So, as information about third derivatives of y is missing, there is no way to compute third derivatives of z, so it is a variable of second order. In your case, this means that you don't want to add the derivative of e to x, as a differentiable operation (you simply modify it, and do not wish to make this modification differentiable). Please note, that the entire program is being differentiated, not each iteration seperately. So if you want to differentiate each iteration seperatelly, you need to reinitialize the variables each iteration. Without this, the gradient will depend on the entire execution of the program combined, as the entire program is being differentiated. In any case, whether you want the gradient to depend on all iterations, or only one, you should modify these lines

x += -lambda*e.d(&x).id;
y += -lambda*e.d(&y).id;

Note

The section bellow (on initializations before/in the loop body) concerns the general case, where the variables x and y might have been differentiated with respect to some other variable in the body of the loop. In the specific case of @dcoeurjo code (after my edits, full code can be found bellow), it makes no difference, as the above expression simply adds a constant to x and y,

x += -lambda*e.d(&x).id;
y += -lambda*e.d(&y).id;

which does not change their derivatives (it remains equal to 1, as addition of a constant does not alter it) and the function energy also does not perform any differentiation of x and y with respect to other variable (assigning x or y as a result of a differentiable operation). Therefore their only derivatives are still derivatives with respect to themselves, and equal to 1, as before the body of the loop is executed (or right after initialization).

I will still include instructions on the general case, for completeness sake.

If you want each iteration to be differentiated separately (as would seem to fit your needs, as your goal is a simply gradient descent simulation, where all differentiations are to be done independently), initialize them each iteration

//non relevant code for demonstration is omitted 
var x(50.1);
var y(50.1);
//non relevant code for demonstration is omitted 
while (norm > 0.0000000001)
   {
    e = energy(px,py,x,y);
  //non relevant code for demonstration is omitted 

    x += -lambda*e.d(&x).id;
    y += -lambda*e.d(&y).id;

    norm = e.d(&x).id*e.d(&x).id+e.d(&y).id*e.d(&y).id;
     dCpp::init(x);
    dCpp::init(y);
  }

If you want the gradient to depend on all iterations, initialize at the beginning only

(see When would x += -lambda*e.d(&x) be appropriate andWhy is this flexibility useful bellow for more)

//non relevant code for demonstration is omitted 
var x(50.1);
  var y(50.1);
  dCpp::init(x);
  dCpp::init(y);
//non relevant code for demonstration is omitted 
while (norm > 0.0000000001)
   {

    e = energy(px,py,x,y);
   //non relevant code for demonstration is omitted 

    x += -lambda*e.d(&x).id;
    y += -lambda*e.d(&y).id;

    norm = e.d(&x).id*e.d(&x).id+e.d(&y).id*e.d(&y).id;
  }

barycenter2.cpp contains the same issue as above, plus the following

std::vector<var> px = {0.0, 1.0, 1.0, 0.0};
 std::vector<var> py = {0.0, 0.0, 1.0, 1.0};
 dCpp::initSpace(1);

You see, the space is set to order 0 at the time of decleration of px and py. Therefore their order is set to 0. Simply permute the lines

dCpp::initSpace(1);
std::vector<var> px = {0.0, 1.0, 1.0, 0.0};
std::vector<var> py = {0.0, 0.0, 1.0, 1.0};

Along with the corrections I mentioned above. For clarity, I am appending both codes in their correct form here, assuming you wish to differentiate each iteration separately (as you are performing gradient descent)

barycenter.cpp

int main()
{
  std::vector<double> px = {0.0, 1.0, 1.0, 0.0};
  std::vector<double> py = {0.0, 0.0, 1.0, 1.0};
  dCpp::initSpace(1);

  var x(50.1);
  var y(50.1);
  dCpp::init(x);
  dCpp::init(y);

  double lambda=0.1;
  var e;
  double norm = 1.0;
  while (norm > 0.0000000001)
   {

    e = energy(px,py,x,y);
    std::cout<< "x= "<<x.id<<" y= "<<y.id<<std::endl;
    std::cout<<"e = "<<e.id<<std::endl;
    std::cout<<"de/dx = "<<e.d(&x).id<<std::endl;
    std::cout<<"de/dy = "<<e.d(&y).id<<std::endl<<std::endl;

    x += -lambda*e.d(&x).id;
    y += -lambda*e.d(&y).id;

    norm = e.d(&x).id*e.d(&x).id+e.d(&y).id*e.d(&y).id;
    // the following two lines have no effect in your particular case
    // as you do not perform operations that change the derivative of x and y
    dCpp::init(x);
    dCpp::init(y);
  }

  return 0;
}

Which outputs

barycenter2.cpp

int main()
{
  dCpp::initSpace(1);
  std::vector<var> px = {0.0, 1.0, 1.0, 0.0};
  std::vector<var> py = {0.0, 0.0, 1.0, 1.0};


  var x(50.1);
  var y(50.1);
  dCpp::init(x);
  dCpp::init(y);

  double lambda=0.1;
  var e;
  double norm = 1.0;
  while (norm > 0.0000000001)
   {

    e = energy(px,py,x,y);
    std::cout<< "x= "<<x.id<<" y= "<<y.id<<std::endl;
    std::cout<<"e = "<<e.id<<std::endl;
    std::cout<<"de/dx = "<<e.d(&x).id<<std::endl;
    std::cout<<"de/dy = "<<e.d(&y).id<<std::endl<<std::endl;

    x += -lambda*e.d(&x).id;
    y += -lambda*e.d(&y).id;

    norm = e.d(&x).id*e.d(&x).id+e.d(&y).id*e.d(&y).id;
    // the following two lines have no effect in your particular case
    // as you do not perform operations that change the derivative of x and y
    dCpp::init(x);
    dCpp::init(y);
  }

  return 0;
}

Which outputs

When would `x+=-lambda*e.d(&x)` be appropriate

If you wanted to differentiate the entire process of training/gradient descent, with respect to x ( differentiable process of operating on differentiable derivatives , see bellow for more). You would also omit initialization in each iteration of the loop, because you would have wanted to differentiate the entire process of optimization (as opposed to each loop separately, which is what you want now).

A simple use of this would be optimization of the optimization process. In your specific case, you would be optimizing the entire gradient descent process with respect to some (hyper) parameter.

Why is this flexibility useful

dCpp allows you to differentiate the entire program, or only parts of it, depending on your needs. SOmetimes you want to optimize the parameters of the process of optimization itself, in which case derivatives have to depend on all iterations of the loop. Sometimes you perform a simple gradient descent as in this case, and you want to differentiate each iteration separately.

Sometimes you want to write programs, that operate on differentiable derivatives of other parts of the program (while only coding the original program). By interweaving space re-initialization, and re-initializations of variables, dCpp allows all such combinations.

For examples of such procedures and derivations of the theory, you may consult my paper Operational calculus on programming spaces.

The paper Implementing Operational calculus on programming spaces for Differentiable computing accompanies dCpp, as to how the theory of the above paper is implemented in dCpp, which may be consulted at your convenience.

Theory offers more than mere AD (but AD is a perfectly valid use of it), so please feel free to inquire about anything else that might be of use to you.

dcoeurjo · 2017-01-30T14:27:28Z

Thanks a lot !
Your details make sense and I see what I did wrong.

I'm an auto. diff. newbie and I'm not sure to fully get your point on global vs. per iteration differentiation (whether I should init the variable in the loop or not). In my case, this choice does not change the results and I should probably go into the details in your arXiv paper.
For instance are you saying that if the energy expression changes at each the iteration, then I would have to reinitialize the variables (since the differentiation may no longer be the same) ?

Thanks again for your time and the great tool you share.

ZigaSajovic · 2017-01-30T14:42:17Z

I edited my post above with a Note, explaining why it makes no difference in your code. Thank you for the follow up question, as it made me clarify some points further. Please let me now if the edit clarified the remaining confusion.

To answer your question, if you performed any differentiable operations (operations changing their derivative) on variables with respect to which you are differentiating within the body of the loop, and wish for the differentiation to be independent (each iteration separately, independently), than you should re-initialize. In your case (after my edit), the only change to x and y you make, are adding/subtracting a constant from them. This does not change their derivatives, so you do not have to re-initialize them.

dcoeurjo · 2017-02-01T07:16:15Z

Fantastic !
Thanks a lot the the insights, that makes definitely sense.

ZigaSajovic added the question label Jan 27, 2017

ZigaSajovic self-assigned this Jan 27, 2017

ZigaSajovic closed this as completed Jan 27, 2017

ZigaSajovic added the enhancement label Jan 27, 2017

ZigaSajovic changed the title ~~Issue with loops in dCpp~~ Auto-initializing all instances of var as differetiable Jan 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-initializing all instances of var as differetiable #2

Auto-initializing all instances of var as differetiable #2

dcoeurjo commented Jan 27, 2017

dcoeurjo commented Jan 27, 2017

ZigaSajovic commented Jan 27, 2017 •

edited

Loading

dcoeurjo commented Jan 27, 2017

ZigaSajovic commented Jan 27, 2017 •

edited

Loading

dcoeurjo commented Jan 30, 2017 •

edited

Loading

ZigaSajovic commented Jan 30, 2017 •

edited

Loading

dcoeurjo commented Jan 30, 2017

ZigaSajovic commented Jan 30, 2017 •

edited

Loading

dcoeurjo commented Feb 1, 2017

Auto-initializing all instances of var as differetiable #2

Auto-initializing all instances of var as differetiable #2

Comments

dcoeurjo commented Jan 27, 2017

dcoeurjo commented Jan 27, 2017

ZigaSajovic commented Jan 27, 2017 • edited Loading

UPDATE

SideNote

dcoeurjo commented Jan 27, 2017

ZigaSajovic commented Jan 27, 2017 • edited Loading

Closure of the issue

UPDATE

Before

Now

Note

dcoeurjo commented Jan 30, 2017 • edited Loading

ZigaSajovic commented Jan 30, 2017 • edited Loading

To whom it may concern

dear @dcoeurjo

Note

barycenter.cpp

barycenter2.cpp

When would x+=-lambda*e.d(&x) be appropriate

Why is this flexibility useful

dcoeurjo commented Jan 30, 2017

ZigaSajovic commented Jan 30, 2017 • edited Loading

dcoeurjo commented Feb 1, 2017

ZigaSajovic commented Jan 27, 2017 •

edited

Loading

ZigaSajovic commented Jan 27, 2017 •

edited

Loading

dcoeurjo commented Jan 30, 2017 •

edited

Loading

ZigaSajovic commented Jan 30, 2017 •

edited

Loading

When would `x+=-lambda*e.d(&x)` be appropriate

ZigaSajovic commented Jan 30, 2017 •

edited

Loading