Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-initializing all instances of var as differetiable #2

Closed
dcoeurjo opened this issue Jan 27, 2017 · 9 comments
Closed

Auto-initializing all instances of var as differetiable #2

dcoeurjo opened this issue Jan 27, 2017 · 9 comments

Comments

@dcoeurjo
Copy link

Hi,

I'm playing a bit with your nice AD tool and I'm having a bug with C++ operator and expressions.

Please have a look to https://gist.github.com/dcoeurjo/ce2b7f5e16edd348b7e4ca061ae6ceb5

For short: I have four points in the plane and I compute a kind of energy (sum of squared euclidean distance to a anchor point) and I want to differentiate this energy w.r.t. the anchor point.

If I expand the for loop when computing the energy (line 28), everything looks fine. When I use a for loop (line 17), then the differentiation fails (returning 0.0 for de/dx for instance). In terms of C++ operators, both energy expressions look similar.

Would you have any idea ?

@dcoeurjo
Copy link
Author

Please note that In this code snippet, pointsx and pointsy can simply be std::vector<double> entities.. but this does not change anything to my problem. What am I doing wrong?

@ZigaSajovic
Copy link
Owner

ZigaSajovic commented Jan 27, 2017

UPDATE

Please see the update in my other comment bellow. The code dcoeurjo posted hereon performs as expected, i.e. exactly the same as the solution in this comment.


Dear dcoeurjo,

note the line 15 in your code, where you declare the variable var sum, you had forgotten to initialize it as a differentiable variable. The code for the energy function should read

var energy(const std::vector<var> &pointsx,
           const std::vector<var> &pointsy,
           const var &x,
           const var &y)
{
  var sum=0.0;
  //initialize sum as a placeholder
  dCpp::initPlaceHolder(sum);
  for (auto i=0; i<pointsx.size(); ++i)
    sum += (pointsx[i] - x)* (pointsx[i] - x) + (pointsy[i] - y)* (pointsy[i] - y);

 return sum;
}

The function energy_expanded works correctly as is, because all variables you operate on are already initialized to be differentiable.

With the edited code, both the call to energy and energy_expanded return the same values.

image

Please Note that currently all variables have to be initialized in order to be differentiable, so they can be used either as a constant, or as a differentiable variable.

This may change in the next release (it has changed, see the update bellow).

I am closing this issue, but if you have any other questions, please ask.

SideNote

Also Note, that your code as it is now, also performs differentiation with respect to the elements of px and py. In order to avoid this, either have them be vectors of doubles, or vectors of var that have not been initialized as differentiable, i.e. delete the following lines of code

  for (auto i=0; i<4; ++i)
    {
      dCpp::init(px[i]);
      dCpp::init(py[i]);
    }

or make px and py vectors of doubles.

@dcoeurjo
Copy link
Author

Excellent. thanks a lot, that makes sense.

(thanks also for the comment on px and py. in my problem I would like also to differentiate according to these variables).

thanks again for you quick reply and your nice code ;)

@ZigaSajovic
Copy link
Owner

ZigaSajovic commented Jan 27, 2017

Closure of the issue

Your original code should now work as expected. Thank you for opening this issue, the specifics of the update can be found bellow.

The title of the issue was edited to reflect this update.

UPDATE

All instances of var are hereon by default initialized as placeHolders of the order the space is set to.

Before

Previously one had to initialize placeholders (see the energy function in my comment above).

var sum=0.0;
dCpp::initPlaceHolder(sum);

Now

All instances of var are automatically initialized as placeHolders by the constructors, meaning that the above is equivalent to

var sum=0.0;

This means all instances of var can be differentiated with respect to initialized differentiable variables.

Note

You cannot differentiate a variable with respect to an uninitialized differentiable variable. In order to enable differentiation with respect to a variable (e,g, x), one should proceed as before, i.e.

var x=0.0;
dCpp::init(x);

@ZigaSajovic ZigaSajovic changed the title Issue with loops in dCpp Auto-initializing all instances of var as differetiable Jan 28, 2017
@dcoeurjo
Copy link
Author

dcoeurjo commented Jan 30, 2017

dear @ZigaSajovic,

thanks again for your feedbacks. There is still something that puzzles me a bit (I've updated my dCpp clone to the last release). Here you have two codes that should do the same thing: https://gist.github.com/dcoeurjo/c5b9b1cc62b094807ff7ab8aa42f12a4

The problem is the following: we have four (fixed) points (corners of a unit square) and a fifth one. I want to move the fifth one such that it minimizes the sum of squared distances to the four anchor points. In this setting, the problem is very simple (explicit gradient w.r.t. the position of the fifth point). It is just a toy example here. So I use the gradients from dCpp in a gradient descent scheme to end up to the (global) minimum which is exactly (0.5,0.5).

  • barycenter.cpp is perfectly fine. Even if I don't understand why I need to re-init x and y  in the main loop (lines 38 and 39). The new values for x and y (line 46 and 47) should keep the property that the variables are initialized differentiable variables.

  • barycenter2.cpp differs from the previous one in the sense that anchor points are differentiable variables (but I only use them as constant ones). However,

    • beside the initializer_list construction in lines 26-27, I still need to init the vector entries (l:33-37). Not a big deal, maybe related to the previous point.
    • the biggest issue is that the minimization is not stable: if you run 10 times the code you end up with many correct outputs but sometimes the minimization fails and the x,ypoint goes to zero (see trace here with 2 runs: https://gist.github.com/dcoeurjo/1d78ba534a1cd76bfaa158afd58d4f5b)

When the anchor points are std::vector (barycenter.cpp) I always get the good answer.
I've tried on several machines (linux, macOS), checked valgrind output messages to make sure that there is no memory issues.... I have no idea where this non-deterministic problem may come from.

Would you have any idea ?

@ZigaSajovic
Copy link
Owner

ZigaSajovic commented Jan 30, 2017

To whom it may concern

dCpp is a flexible tool, allowing implementation and analysis of programs through operational calculus.

  • Implementations of differentiable (sub) programs operating on differentiable derivatives of other (sub) programs, where the entire process may again be differentiable. This allows trainable training processes, or other flexible program analysis through operational calculus.
  • Operational calculus on programming spaces is the paper in which the theory is derived and the process of its use to the purpose of program analysis and deep learning is outlined.
  • Implementing Operational calculus on programming spaces for Differentiable computing is the paper in which the implementation of this theory into dCpp is explained, where the reader is guided through the code and the theory simultaneously, as to better understand this flexible tool.

Please forgive me for the sparse tutorial in the ReadMe on the title page of this repository. The dCpp project is a flexible tool, meant to be used in conjunction with the above two papers.

Finally, I extend my gratitude to @dcoeurjo , as this discussion, along with his code and my corrections will serve as a great read for anyone starting out with dCpp, where most of the issues one might encounter in the beginning are resolved.

dear @dcoeurjo

  • barycenter.cpp the issue is that e.d(&x) returns a var of order 0 in this specific case, because e is of order 1, therefore e.d(&x) is of order 0. Now, when calculations are performed, on two variables (as + in your case), the order of the output is reduced to the min of the order of both variables. This is because dCpp allows arbitrary order, and if you operate on two variables of different order, only calculations to the min order of the two can be valid, i.e. if x is 3 times differentiable, and y is 2 times differentiable, the calculation z=x+y only has access up to second order derivatives of y (where x up to third). So, as information about third derivatives of y is missing, there is no way to compute third derivatives of z, so it is a variable of second order. In your case, this means that you don't want to add the derivative of e to x, as a differentiable operation (you simply modify it, and do not wish to make this modification differentiable). Please note, that the entire program is being differentiated, not each iteration seperately. So if you want to differentiate each iteration seperatelly, you need to reinitialize the variables each iteration. Without this, the gradient will depend on the entire execution of the program combined, as the entire program is being differentiated. In any case, whether you want the gradient to depend on all iterations, or only one, you should modify these lines
x += -lambda*e.d(&x).id;
y += -lambda*e.d(&y).id;

Note

The section bellow (on initializations before/in the loop body) concerns the general case, where the variables x and y might have been differentiated with respect to some other variable in the body of the loop. In the specific case of @dcoeurjo code (after my edits, full code can be found bellow), it makes no difference, as the above expression simply adds a constant to x and y,

x += -lambda*e.d(&x).id;
y += -lambda*e.d(&y).id;

which does not change their derivatives (it remains equal to 1, as addition of a constant does not alter it) and the function energy also does not perform any differentiation of x and y with respect to other variable (assigning x or y as a result of a differentiable operation). Therefore their only derivatives are still derivatives with respect to themselves, and equal to 1, as before the body of the loop is executed (or right after initialization).

I will still include instructions on the general case, for completeness sake.


If you want each iteration to be differentiated separately (as would seem to fit your needs, as your goal is a simply gradient descent simulation, where all differentiations are to be done independently), initialize them each iteration

//non relevant code for demonstration is omitted 
var x(50.1);
var y(50.1);
//non relevant code for demonstration is omitted 
while (norm > 0.0000000001)
   {
    e = energy(px,py,x,y);
  //non relevant code for demonstration is omitted 

    x += -lambda*e.d(&x).id;
    y += -lambda*e.d(&y).id;

    norm = e.d(&x).id*e.d(&x).id+e.d(&y).id*e.d(&y).id;
     dCpp::init(x);
    dCpp::init(y);
  }

If you want the gradient to depend on all iterations, initialize at the beginning only

(see When would x += -lambda*e.d(&x) be appropriate andWhy is this flexibility useful bellow for more)

//non relevant code for demonstration is omitted 
var x(50.1);
  var y(50.1);
  dCpp::init(x);
  dCpp::init(y);
//non relevant code for demonstration is omitted 
while (norm > 0.0000000001)
   {

    e = energy(px,py,x,y);
   //non relevant code for demonstration is omitted 

    x += -lambda*e.d(&x).id;
    y += -lambda*e.d(&y).id;

    norm = e.d(&x).id*e.d(&x).id+e.d(&y).id*e.d(&y).id;
  }
  • barycenter2.cpp contains the same issue as above, plus the following
std::vector<var> px = {0.0, 1.0, 1.0, 0.0};
 std::vector<var> py = {0.0, 0.0, 1.0, 1.0};
 dCpp::initSpace(1);

You see, the space is set to order 0 at the time of decleration of px and py. Therefore their order is set to 0. Simply permute the lines

dCpp::initSpace(1);
std::vector<var> px = {0.0, 1.0, 1.0, 0.0};
std::vector<var> py = {0.0, 0.0, 1.0, 1.0};

Along with the corrections I mentioned above. For clarity, I am appending both codes in their correct form here, assuming you wish to differentiate each iteration separately (as you are performing gradient descent)

barycenter.cpp

int main()
{
  std::vector<double> px = {0.0, 1.0, 1.0, 0.0};
  std::vector<double> py = {0.0, 0.0, 1.0, 1.0};
  dCpp::initSpace(1);

  var x(50.1);
  var y(50.1);
  dCpp::init(x);
  dCpp::init(y);

  double lambda=0.1;
  var e;
  double norm = 1.0;
  while (norm > 0.0000000001)
   {

    e = energy(px,py,x,y);
    std::cout<< "x= "<<x.id<<" y= "<<y.id<<std::endl;
    std::cout<<"e = "<<e.id<<std::endl;
    std::cout<<"de/dx = "<<e.d(&x).id<<std::endl;
    std::cout<<"de/dy = "<<e.d(&y).id<<std::endl<<std::endl;

    x += -lambda*e.d(&x).id;
    y += -lambda*e.d(&y).id;

    norm = e.d(&x).id*e.d(&x).id+e.d(&y).id*e.d(&y).id;
    // the following two lines have no effect in your particular case
    // as you do not perform operations that change the derivative of x and y
    dCpp::init(x);
    dCpp::init(y);
  }

  return 0;
}

Which outputs
image

barycenter2.cpp

int main()
{
  dCpp::initSpace(1);
  std::vector<var> px = {0.0, 1.0, 1.0, 0.0};
  std::vector<var> py = {0.0, 0.0, 1.0, 1.0};


  var x(50.1);
  var y(50.1);
  dCpp::init(x);
  dCpp::init(y);

  double lambda=0.1;
  var e;
  double norm = 1.0;
  while (norm > 0.0000000001)
   {

    e = energy(px,py,x,y);
    std::cout<< "x= "<<x.id<<" y= "<<y.id<<std::endl;
    std::cout<<"e = "<<e.id<<std::endl;
    std::cout<<"de/dx = "<<e.d(&x).id<<std::endl;
    std::cout<<"de/dy = "<<e.d(&y).id<<std::endl<<std::endl;

    x += -lambda*e.d(&x).id;
    y += -lambda*e.d(&y).id;

    norm = e.d(&x).id*e.d(&x).id+e.d(&y).id*e.d(&y).id;
    // the following two lines have no effect in your particular case
    // as you do not perform operations that change the derivative of x and y
    dCpp::init(x);
    dCpp::init(y);
  }

  return 0;
}

Which outputs

image

When would x+=-lambda*e.d(&x) be appropriate

If you wanted to differentiate the entire process of training/gradient descent, with respect to x ( differentiable process of operating on differentiable derivatives , see bellow for more). You would also omit initialization in each iteration of the loop, because you would have wanted to differentiate the entire process of optimization (as opposed to each loop separately, which is what you want now).

A simple use of this would be optimization of the optimization process. In your specific case, you would be optimizing the entire gradient descent process with respect to some (hyper) parameter.

Why is this flexibility useful

dCpp allows you to differentiate the entire program, or only parts of it, depending on your needs. SOmetimes you want to optimize the parameters of the process of optimization itself, in which case derivatives have to depend on all iterations of the loop. Sometimes you perform a simple gradient descent as in this case, and you want to differentiate each iteration separately.

Sometimes you want to write programs, that operate on differentiable derivatives of other parts of the program (while only coding the original program). By interweaving space re-initialization, and re-initializations of variables, dCpp allows all such combinations.

For examples of such procedures and derivations of the theory, you may consult my paper Operational calculus on programming spaces.

The paper Implementing Operational calculus on programming spaces for Differentiable computing accompanies dCpp, as to how the theory of the above paper is implemented in dCpp, which may be consulted at your convenience.

Theory offers more than mere AD (but AD is a perfectly valid use of it), so please feel free to inquire about anything else that might be of use to you.

@dcoeurjo
Copy link
Author

Thanks a lot !
Your details make sense and I see what I did wrong.

I'm an auto. diff. newbie and I'm not sure to fully get your point on global vs. per iteration differentiation (whether I should init the variable in the loop or not). In my case, this choice does not change the results and I should probably go into the details in your arXiv paper.
For instance are you saying that if the energy expression changes at each the iteration, then I would have to reinitialize the variables (since the differentiation may no longer be the same) ?

Thanks again for your time and the great tool you share.

@ZigaSajovic
Copy link
Owner

ZigaSajovic commented Jan 30, 2017

I edited my post above with a Note, explaining why it makes no difference in your code. Thank you for the follow up question, as it made me clarify some points further. Please let me now if the edit clarified the remaining confusion.

To answer your question, if you performed any differentiable operations (operations changing their derivative) on variables with respect to which you are differentiating within the body of the loop, and wish for the differentiation to be independent (each iteration separately, independently), than you should re-initialize. In your case (after my edit), the only change to x and y you make, are adding/subtracting a constant from them. This does not change their derivatives, so you do not have to re-initialize them.

@dcoeurjo
Copy link
Author

dcoeurjo commented Feb 1, 2017

Fantastic !
Thanks a lot the the insights, that makes definitely sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants