Allow arena_matrix to use move semantics #2928

SteveBronder · 2023-08-02T19:49:52Z

Summary

This PR does two things

Allows arena_matrix to use move semantics

If an arena_matrix' s constructor is passed an rvalue, instead of allocating memory onto the stack allocator and then doing a copy, arena_matrix now pushes that rvalue into a chainable_ptr that sits in the chainable stacks var_alloc_stack_ and is deleted after stan::math::recover_memory() is called. This saves us the allocation and the copy, instead we just delay paying the deletion cost which we would have paid anyway when the temporary was deleted

After talking with @bob-carpenter I think I'd like to remove the hard copies we do for reverse mode functions using Eigen types. Right now our pattern is to make an arena matrix for the result, pass that to the reverse mode callback, and then when we return from the function we make a hard copy into an actual Eigen::Matrix type. The hard copy into the Eigen matrix is to make sure assignments done later to the matrix (i.e. X(1, 1) = a) do not overwrite the memory necessary for the reverse pass.

Instead, I suggest we just do what Eigen does and include docs that tell people that auto is unsafe to use recklessly with the stan math library. Instead users who want to do assignment to elements of the matrices should have the assigned object's type be the appropriate Eigen matrix type.

For example, in the below code, using auto for the assignment from the function and then doing an assignment to an element of the matrix would mess up the reverse mode pass of the function the matrix was generated in. However, assigning the result of the function to an actual matrix type and then doing the element assignment is safe.

Eigen::Matrix<var, -1, 1> y;
Eigen::Matrix<var, -1, -1> X;
// Bad!! Will change memory used by reverse pass callback within multiply!
auto mu = multiply(X, y);
mu(4) = 1.0;
// Good! Will not change memory used by reverse pass callback within multiply
Eigen::Matrix<var, -1, 1> mu_good = multiply(X, y);
mu_good(4) = 1.0;

This is a breaking change so I think we should do a major version bump. In the tests below I've found anywhere from a 5% to 15% speed increase from using this.

The graph below plots the relative % improvement of using move semantics relative to the current develop. We are testing the expression below, which just does a bunch of multiplies, adds, and then a sum.

var lp = sum(stan::math::add(stan::math::multiply(stan::math::multiply(x, y), std::move(y)), std::move(x)));

The variables x and y in the above are matrices of the same size. Only the forward pass of reverse mode autodiff is used in the benchmark since this should not have any effect on the reverse passes performance.

We can see that for small sizes this is great. As we get to larger sizes it's still good, but at that point a lot of the actual computation and fetching data for the operations is taking up enough time that the memory allocation is not a huge concern.

You can run this performance benchmark yourself using the following

git clone https://github.com/SteveBronder/stan-perf
cd stan-perf
git checkout tmp-values
cmake -S . -B "build" -DCMAKE_BUILD_TYPE=Release
cd ./build
make -j4 move_ex move_ex_orig
# WARNING: Will use a lot of memory for the large problems and take a while
taskset -c 1 ./benchmarks/move_stuff/move_ex --benchmark_out_format=csv --benchmark_out=./benchmarks/move_stuff/move_ex.csv --benchmark_report_aggregates_only=false --benchmark_repetitions=30 --benchmark_report_aggregates_only=false --benchmark_display_aggregates_only=true --benchmark_enable_random_interleaving=true
taskset -c 1 ./benchmarks/move_stuff/move_ex_orig --benchmark_out_format=csv --benchmark_out=./benchmarks/move_stuff/move_ex_orig.csv  --benchmark_report_aggregates_only=false --benchmark_display_aggregates_only=true --benchmark_repetitions=30 --benchmark_enable_random_interleaving=true
Rscript ./plots/move_stuff/plots.R

Tests

Tests are added to the arena_matrix_test file for checking moves work. This can be run with

python ./runTests.py ./test/unit/math/rev/core/arena_matrix_test

Side Effects

Yes

In order to utilize this we need to use perfect forwarding in our reverse mode functions. See the new multiply and add signatures in this PR for an example. I think I could do this for most of our functions that can use it with a day or two of work, though now our functions will be more "modern" but also harder to understand.

Release notes

Allows arena_matrix to use move semantics along with the stack allocator.

Checklist

Math issue #(issue number)
Copyright holder: Simons Foundation

The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
- unit tests pass (to run, use: ./runTests.py test/unit)
- header checks pass, (make test-headers)
- dependencies checks pass, (make test-math-dependencies)
- docs build, (make doxygen)
- code passes the built in C++ standards checks (make cpplint)
the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested

…e-move-semantics' into feature/reverse-mode-move-semantics

…de-move-semantics

…-dev/math into feature/reverse-mode-move-semantics

SteveBronder · 2023-08-10T20:25:03Z

@t4c1 sorry but would you mind taking a quick glance at this?

@andrjohns would you feel comfortable reviewing this?

t4c1 · 2023-08-15T17:54:57Z

stan/math/prim/prob/normal_log.hpp

-  return normal_lpdf<propto, T_y, T_loc, T_scale>(y, mu, sigma);
+inline return_type_t<T_y, T_loc, T_scale> normal_log(T_y&& y, T_loc&& mu,
+                                                     T_scale&& sigma) {
+  return normal_lpdf<propto>(y, mu, sigma);


I am assuming you want to use perfect forwarding here. That would be:

Suggested change

return normal_lpdf<propto>(y, mu, sigma);

return normal_lpdf<propto>(std::forward<T_y>(y), std::forward<T_loc>(mu), std::forward<T_scale>(sigma));

Yes thanks! The normal distribution changes are mostly just to show what the new signatures would look like. But I'd like to make them so that we can forward them into operands_and_partials.

SteveBronder · 2023-09-05T15:58:54Z

@andrjohns would you mind taking a look at this?

syclik · 2023-09-06T13:26:10Z

@SteveBronder, this is rad!

I'm concerned with the impact it has on our use of auto through the Math codebase. Will we need to go through the current uses of auto everywhere to be safe? Or does this really only affect downstream usage.

Just to be absolutely clear, this is a "concern" and it is not blocking in my mind. It seems like we should do this; I just want to be cognizant of the effort to get this properly done before embarking on it.

syclik · 2024-04-19T13:59:21Z

Yes, I'll give it a good read-through again. Just need a little bit of time to get in the right mindset to get into deep C++ perfect forwarding. I'll try to set aside the hours necessary this weekend.

Fix csr_matrix_times_vector linker error

syclik · 2024-04-22T13:12:27Z

@SteveBronder, I'm going to edit the PR to have the base branch be 5.0-breaking-changes instead of develop.

Add support for Windows ARM64

SteveBronder · 2024-04-22T21:50:53Z

Thanks! I'll update the merge conflicts tmrw

use a seperate class for csr_matrix adjoint

Don't set build and clean rules for SUNDIALS if external libs are used

…de-move-semantics

SteveBronder · 2024-04-26T18:21:48Z

@syclik @andrjohns fixed the merge conflict, all good for me to merge to 5.0?

andrjohns · 2024-04-26T18:47:12Z

@syclik @andrjohns fixed the merge conflict, all good for me to merge to 5.0?

Yep, all good from me

syclik · 2024-04-26T19:35:47Z

I’m still reading it. I was on it and it was taking a bit more time to really get it done. I’ll make sure to read through it carefully.

…

On Fri, Apr 26, 2024 at 2:47 PM Andrew Johnson ***@***.***> wrote: @syclik <https://github.com/syclik> @andrjohns <https://github.com/andrjohns> fixed the merge conflict, all good for me to merge to 5.0? Yep, all good from me — Reply to this email directly, view it on GitHub <#2928 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADH6F2C3GHORBAMX5GBDE3Y7KOMNAVCNFSM6AAAAAA3BYBQHSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZZHE2DQNBWGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

syclik · 2024-04-26T19:39:25Z