Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A possible memory issue? #160

Closed
sergeigribanov opened this issue Jun 4, 2021 · 9 comments
Closed

A possible memory issue? #160

sergeigribanov opened this issue Jun 4, 2021 · 9 comments

Comments

@sergeigribanov
Copy link

sergeigribanov commented Jun 4, 2021

I would like to use autodiff in my project. The project is related to the problem of nonlinear constrained optimization. In the case of my problem, the optimization is done not once, but hundreds of thousands or millions of times in a row. The maximum dimension of the parameter space can be on the order of several hundred. I want to use autodiff to calculate the gradient and hessian of an objective function in an optimization algorithm. Before using autodiff in my project, I decided to conduct a series of tests. In one of my tests, I came across a memory usage issue. The code of the test program is the following:

// C++ includes
#include <iostream>

// Eigen includes
#include <Eigen/Core>
using namespace Eigen;

// autodiff include
#include <autodiff/reverse.hpp>
#include <autodiff/reverse/eigen.hpp>
using namespace autodiff;

var f(const VectorXvar& x)
{
  return x.dot(x);
}

int main() {
  const int n = 100;
  const int numberOfEntries = 1000000;
  for (int i = 0; i < numberOfEntries; ++i) {
    // VectorXvar x = VectorXvar::Random(n);
    VectorXvar x = VectorXvar::Ones(n);
    var u = f(x);
    VectorXd g;
    MatrixXd H = hessian(u, x, g);
  }
  std::cout << "Finished!" << std::endl;
  return 0;
}

CMake file that I use to build this test program is the following:

cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(autodiff-tests)

find_package(Eigen3 REQUIRED NO_MODULE)
find_package(autodiff)

add_executable(reverse-hessian ${CMAKE_CURRENT_SOURCE_DIR}/src/reverse-hessian.cpp)
target_link_libraries(reverse-hessian Eigen3::Eigen autodiff::autodiff)

The issue happens when the variable numberOfEntries is equal to 1M. In this case, memory usage gradually rises to 100%. Eventually, the process ends with the message: Killed.

If the variable numberOfEntries is equal to 100K, then the program runs without issues.

I have tested this program on my laptop (16 GB of RAM, CPU: intel core i7, OS: Arch Linux).

@jan-grimo
Copy link
Contributor

I've encountered a memory leak in reverse auto-differentiation as well. My scalar function contains a dot-product of variables too, although there's plenty of other complexity too. I also run into OOM issues due to repeated evaluation.

Compiling @sergeigribanov's program and running it through valgrind --tool=memcheck --leak-check=full with 10 loop repeats yields the following:

==15024== 876,320 (560 direct, 875,760 indirect) bytes in 10 blocks are definitely lost in loss record 14 of 14
[...]
==15024==    by 0x128117: std::shared_ptr<autodiff::detail::IndependentVariableExpr<double> > std::make_shared<autodiff::detail::IndependentVariableExpr<double>, int const&>(int const&) (shared_ptr.h:718)
==15024==    by 0x1276F8: autodiff::detail::Variable<double>::Variable<int>(int const&) (var.hpp:1052)
==15024==    by 0x127081: Eigen::DenseBase<Eigen::Matrix<autodiff::detail::Variable<double>, -1, 1, 0, -1, 1> >::Ones(long) (CwiseNullaryOp.h:582)
==15024==    by 0x126447: main (main.cpp:23)

I'm having trouble making heads or tails of this. Reducing the program to

var f(const VectorXvar& x) { return x.dot(x); }

int main() {
  VectorXvar x = VectorXvar::Ones(7);
  var u = f(x);
  VectorXd g;
  MatrixXd H = hessian(u, x, g);
  std::cout << "Finished!" << std::endl;
  return 0;
}

yields the following ctor/dtor counts, omitting Variable(const int&) and Variable(const double&) which aren't particularly interesting

 N  Fn
-----------------------------
 8  Variable()
    - 7 placement new in VectorXvar::Ones(n)
    - 1 in dot product
                                                                                 
10  Variable(const Variable&)
    - 1 in VectorXvar::Ones(n) -> Variable(const int&) Eigen::nullaryop ctor
    - 1 in VectorXvar::Constant(n)
    - 1 in Eigen::nullaryop copy
    - 7 in Eigen::nullaryop eval
                                                                                 
23  Variable(const ExprPtr&)
    - 10 forwarded from Variable(const Variable&) in Eigen::nullaryop
    - 1 in dot product (independent expr)
    - 12 in dot product (dependent expr)
                                                                                 
32  ~Variable()
    - 1 temporary Variable in VectorXvar::Ones(n) from int that's copied
    - 1 moved-from Variable (?) between
      Eigen::[...]::Ones(n) and Eigen::[...]::Constant(n, var(1))
    - 7 in Eigen::nullaryop eval assignment
    - 1 in Eigen::nullaryop evaluator dtor
    - 1 in Eigen::nullaryop dtor
    - 13 in dot product
    - 8 uncaught by gdb at end of main
                                                                                 
 9  IndependentVariable()
    - 1 in VectorXvar::Ones(n) -> Variable<double>(const int&)
    - 7 in VectorXvar::Ones(n) (field initializations)
    - 1 in dot product initialization
                                                                                 
 8  ~IndependentVariable()
    - 7 in Eigen::nullaryop eval assignment
    - 1 in dot product

Most interestingly, the program does not leak if hessian is not called. The otherwise missing ~IndependentVariable() call is then performed in a ~Matrix() stack that has to be the VectorXvar deletion at the end of main.

Any ideas what this could be @allanleal so I can look closer?

@allanleal
Copy link
Member

Thanks for reporting this. Are you able to create a reproducible example that does not involve Eigen? This could simplify the memory leaking checking.

@jan-grimo
Copy link
Contributor

I don't think I can. The Eigen-related functionality seems to be important for this. If I replace the VectorXvar with individual var instances and redefine the function as a manual dot-product of sorts, the program doesn't leak.

I can try to run this through cvise though. Not quite sure how to preserve the exact leak, but it might narrow down the culprit?

@jan-grimo
Copy link
Contributor

I whipped up a graphviz representation here and ran it against an even smaller variant to keep the graphs reasonably small:

VectorXvar x = VectorXvar::Ones(3);
var u = x.dot(x);

std::ofstream dp("dp-pre.dot");
dp << graphviz(u.expr);

VectorXd g;
MatrixXd H = hessian(u, x, g);

std::ofstream dpo("dp-post.dot");
dpo << graphviz(u.expr);

The resulting graphs are here as in svg form. The short summary is that forming the expression tree is unproblematic and looks okay, but running the gradients generates shared_ptr cycles which can't be freed. I highlighted one I found at a glance in the graph in red.

A fix might be as easy as clearing all of the gradx members in the expression tree before returning from hessian, but I've not had any success so far.

@jan-grimo
Copy link
Contributor

There's two possible solutions to my eyes:

  1. Separate out the grad and gradx members from the expression tree entirely. That way, there can be no ownership cycles. The gradient and hessian coefficient expression trees could be kept and reevaluated in some combination with the update functionality.
  2. Traverse the full tree after gradient and hessian eigen.hpp functions, nulling out all VariableExpr gradx members. Cycles exist, but are temporary. All gradient or hessian coefficient expression trees need to be freed within the gradient and hessian functions to avoid having cycles in memory that could leak.

Personally, I'd advocate for 1 (it feels cleaner to me), but I think I've still not completely understood the relationships between grad and gradx in combination with the seed functions. I think it should be possible to reformulate the calculation of gradients and hessians in a way that the grad and gradx members aren't part of the expression tree itself. What do you think @allanleal?

@allanleal
Copy link
Member

allanleal commented Aug 18, 2021

I'm also trying to fix this here, but with no avail. I've managed to reduce the reproducible problem to its simplest form:

#include <autodiff/reverse/var/var.hpp>
#include <autodiff/reverse/var/eigen.hpp>
using namespace autodiff;

var f(const var& x)
{
    return x*x;
}

int main()
{
    var x(1.0);
    var u = f(x);

    x.seedx();

    u.expr->propagatex(detail::constant<double>(1.0));

    return 0;
}

We may need to use a mix of shared_ptr and weak_ptr to eliminate these memory leaks, but I'm not fully certain how to at the moment. This seems the standard way of resolving these memory leak issues with cyclic shared pointers. But I'm also open to your proposed solutions above.

FYI, grad member holds the value of the gradient (useful and more efficient if 1st order derivatives are being computed). The gradx member is used for higher-order derivatives (instead of being of double type, it is of an expression type). This permits this expression to be derived again and again to obtain higher order derivatives.

@allanleal
Copy link
Member

Wondering also if we could get rid of shared_prt altogether (or just not using them when creating the expression nodes that correspond to operators). We would then free these expression graph nodes later ourselves, instead of relying on a reference count hitting zero.

@jan-grimo
Copy link
Contributor

jan-grimo commented Aug 18, 2021

use a mix of shared_ptr and weak_ptr

I thought about this too, but I was convinced that sometimes the gradx member would refer to an existing expression in the tree, and sometimes own its own expression. Having a maybe-owning pointer in each expression would complicate reasoning about the whole data structure quite a bit and operator overloading too.

This permits this expression to be derived again and again to obtain higher order derivatives

So the main purpose for storing them somewhere is to reuse them for higher-order derivatives? There's no inherent reason they need to be stored within the expression tree itself? So hypothetically hessian could return a VectorXvar gradient and a MatrixXvar hessian containing the coefficient calculation expressions for reuse instead? Maybe supply a function for converting those to values by calling val on each coefficient?

get rid of shared_ptr altogether

I don't see a good way to keep structural sharing between expression trees with this approach

@allanleal
Copy link
Member

This issue has been fixed with PR #177 provided by @jargonzombies! Thanks a lot for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants