[WIP] Adds boost root finders with reverse mode specializations #2720

SteveBronder · 2022-04-29T23:00:28Z

Summary

Adds a wrapper for boost's Halley root finding algorithm along with a reverse mode specialization similar to our algebra solver. Note these functions are only meant to be used at the Stan math level. Specifically for the quantile functions in stan-dev/design-docs#42

This also works for forward mode, but all it does is iterate through the boost functions so it is probably very slow.

Tests

Tests for forward and reverse mode using the cubed and 5th root examples from boost's docs. Tests can be run with

./runTests.py -j4 ./test/unit/math/ -f root_finder

Side Effects

Note that I had to write specializations for frexp and sign for fvar and var types because boost would throw compiler errors if they did not exist.

The WIP here is because of some questions for @bgoodri

Would we ever want to use the newton raphson or schroder root finders? I wrote this keeping those in mind and it would be very simple to support.
Right now the API takes in a tuple of functors to calculate the value and derivatives. But you can see in the 5th root example this is kind of inefficient since we are just doing x^n and could share that computation between the calculations. For the reverse mode specialization we do need a functor to calculate just the function we want to solve, but we could also allow for the user to pass two functors where one returns a tuple with all the values for the root finder and another that just gives back the value of the function we want to solve.
Do we need functions for working over vectors elementwise or should we make that later?

Release notes

Adds boost root finders with reverse mode specializations for

Checklist

Math issue #
Copyright holder: Steve Bronder

The copyright holder is typically you or your assignee, such as a university or company. By submitting this pull request, the copyright holder is agreeing to the license the submitted work under the following licenses:
- Code: BSD 3-clause (https://opensource.org/licenses/BSD-3-Clause)
- Documentation: CC-BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
the basic tests are passing
- unit tests pass (to run, use: ./runTests.py test/unit)
- header checks pass, (make test-headers)
- dependencies checks pass, (make test-math-dependencies)
- docs build, (make doxygen)
- code passes the built in C++ standards checks (make cpplint)
the code is written in idiomatic C++ and changes are documented in the doxygen
the new changes are tested

bgoodri · 2022-04-29T23:12:32Z

(1) I imagine it will end up like the ODE solvers where we have slightly different versions with a mostly common set of arguments. But Halley is probably the one that would get used the most inside Stan Math.

bgoodri · 2022-04-29T23:13:49Z

(2) It seems like it would be good to permit (or require) one functor that returns a tuple of: level, first derivative, second derivative.

bgoodri · 2022-04-29T23:15:46Z

(3) If a root needs to be found for multiple parameter values, I would say that the caller is responsible for invoking root_finder multiple times.

bgoodri · 2022-04-29T23:17:16Z

I think we should default the digits argument according to the recommendation in the Boost docs.

The value of digits is crucial to good performance of these functions, if it is set too high then at best you will get one extra (unnecessary) iteration, and at worst the last few steps will proceed by bisection. Remember that the returned value can never be more accurate than f(x) can be evaluated, and that if f(x) suffers from cancellation errors as it tends to zero then the computed steps will be effectively random. The value of digits should be set so that iteration terminates before this point: remember that for second and third order methods the number of correct digits in the result is increasing quite substantially with each iteration, digits should be set by experiment so that the final iteration just takes the next value into the zone where f(x) becomes inaccurate. A good starting point for digits would be 0.6D for Newton and 0.4D for Halley or Shröder iteration, where D is std::numeric_limits::digits.

bgoodri · 2022-04-29T23:21:05Z

You could also easily test a * x^2 + b * x + c = 0 where a, b, and / or c are var, fvar<var>, etc. to make sure our implicit da / dx, db / dx, and dc / dx match those obtained by differentiating x = (-b + sqrt(b^2 - 4 * a * c)) / (2 * a) explicitly.

bgoodri · 2022-04-29T23:25:03Z

From your email, guess, min, and max should definitely be templated but we just need to pull out the underlying numbers since we wouldn't need to differentiate with respect to them.

bgoodri · 2022-04-29T23:58:43Z

Also, @charlesm93 might have some thoughts. As I mentioned yesterday, the motivation for having a one-dimensional specialization of algebraic_solver in Stan Math (we are not worried about exposing it to the Stan language yet) is

Everything is in terms of scalars so no there is Eigen overhead making vectors of size 1, 1x1 Jacobians, etc.
We can use bounds to ensure the correct root is found
We can utilize second derivatives to obtain a tripling of the number of correct digits when close to the root

I guess the main question is what is the fastest way of implicitly differentiating the root with respect to unknown parameters? A concrete example would be thinking about implementing the inverse CDF of the Beta distribution in C++ with possibly unknown shape parameters. Given a cumulative probability, p, we would need to find the value of x between 0 and 1, such that beta_cdf(x, alpha, beta) - p = 0. We know the first derivative wrt x is the beta PDF and the second derivative is the first derivative of the beta PDF and can presume within Stan Math that the caller has specified those two derivatives explicitly. Once x is found, its sensitivity to p is the reciprocal of the beta PDF, but the sensitivity of x to alpha and beta requires implicit differentiation.

bgoodri · 2022-04-30T00:40:18Z

We should probably also do the beta inverse CDF as a test case. Boost has a more advanced way of obtaining an initial guess in this case, but they still usually do Halley iteration

var p, alpha, beta; // give these values
using namespace boost::math; 
beta my_beta(alpha, beta);
auto x = quantile(my_beta, p);
// check our version against this

SteveBronder · 2022-05-03T20:57:02Z

So I got the tests for beta up and running, for the beta lcdf it seems to work fine for values where lgamma(alpha + beta) > 1. Maybe I wrote the derivative of the pdf wrt x wrong? Or there could be a numerics trick I'm missing.

For the test against boost's quantile function, var does not work with the boost distributions because of some functions they defined that we do not have defined for vars. But using finite diff I get answers that are within 4e-4. Perhaps I'm not doing something quite right there as well? Or it could be just finite diffs own error. There are also some places that finite diff gives -nan values for alpha and beta but where autodiff can return the correct values

# from running 
# ./runTests.py -j2 test/unit/math/rev/functor/root_finder_test.cpp
--- Finit Diff----
fx: 0.288633
grads: 
p: 0.6987
alpha: 0.245729
beta: -0.121384
--- Auto Diff----
fx: 0.288714
grads: 
p: 0.700012
alpha: 0.248911
beta: -0.12257

grad diffs: 
p: -0.00131203
alpha: -0.00318197
beta: 0.00118579

SteveBronder · 2022-05-03T20:57:31Z

Also do we want to test lcdf or cdf functions?

bgoodri · 2022-05-10T16:05:40Z

There are analytical derivatives for the beta quantile case at

https://functions.wolfram.com/GammaBetaErf/InverseBetaRegularized/20/01/

…eature/root-finder

SteveBronder · 2022-05-12T01:05:32Z

@bgoodri so I got the rev test up and running with the explicit gradients for the beta cdf. I also wrote a lil' function that just checks the gradients and our function to make sure they are near'ish to each other. Though the test is failing for some odd values that I wouldn't expect them to. The failing ones all print their values after a failure so you can take a look if you like (though note after one failure it will just keep printing all the sets of values it is testing. This is just because I have not figured out how to stop that from happening in gtest). I wonder if it could be some numerically inaccuracy with our beta_cdf function?

charlesm93 · 2022-05-18T12:52:23Z

(2): if the function is going to be used internally, we should try and optimize as much as possible. I see two options:

having one functor argument which returns the value and one functor argument which returns a tuple with the value and the derivatives.
passing a single functor which admits an argument indicating the order of the derivatives we need to compute. Internally, only do the calculations for the specified order, and return a tuple of value and derivatives (with the latter potentially just containing nans).

If this is exposed to the user, we may want to trade some efficiency for a simpler API.

charlesm93 · 2022-05-18T13:11:06Z

Why is there a specialized rev method? The implicit function is valid for both forward and reverse mode autodiff. Since the solution is a scalar, the full Jacobian of derivatives reduces to a vector, so we don't gain much from doing contractions with either an initial tangent or a cotangent.

I think it would make sense to explicitly compute the full Jacobian and then do the vector-vector product for either forward or reverse mode.

charlesm93

Overall this looks good to me but it still requires some additional work. Some of the C++ looks arcane to me, so I've asked for clarifications and I'm relying on the quality of the unit tests.

Here are the main requests:

The unit tests need to check that the correct root is returned. You can work out the solution analytically; plug the root into the function being evaluated and check that it is close to 0 within the specified tolerance; or use algebra_solver as a benchmark.
When setting tuning parameters for the rootfinder, leave a comment to justify those choices. Are these arbitrary tuning parameters or are they required to make the root finder work?
Need additional tests which prompt the error messages.
Address comments on the file. A handful of them are about doxygen doc.

Some suggestions:

Consider using the implicit function theorem for both forward and reverse mode autodiff. I don't think using a direct method for forward diff is justified.
As mentioned in the discussion, consider passing only one functor rather than a tuple of functors.

charlesm93 · 2022-05-18T12:44:07Z

stan/math/prim/functor/root_finder.hpp

+namespace math {
+namespace internal {
+template <typename Tuple, typename... Args>
+inline auto func_with_derivs(Tuple&& f_tuple, Args&&... args) {


This needs some doxygen doc. It's not clear what this function does.

charlesm93 · 2022-05-18T12:54:29Z

stan/math/prim/functor/root_finder.hpp

+ * initial lower bracket
+ * @param max The maximum possible value for the result, this is used as an
+ * initial upper bracket
+ * @param digits The desired number of binary digits precision


Indicate that digits cannot exceed the precision of f_tuple.

charlesm93 · 2022-05-18T13:04:41Z

stan/math/prim/functor/root_finder.hpp

+        return boost::math::tools::halley_iterate(args...);
+      },
+      std::forward<FTuple>(f_tuple), guess, min, max, digits, max_iter,
+      std::forward<Types>(args)...);


This my lacking C++ skills speaking but could you describe how the call to root_finder_tol works here? How does the [](auto&&... args) argument work?

return root_finder_tol( [](auto&&... args) { return boost::math::tools::halley_iterate(args...); }, std::forward<FTuple>(f_tuple), guess, min, max, digits, max_iter, std::forward<Types>(args)...);

The lambda is in this call to tell root_finder_tol which root finder it is using. So in this case we pass a lambda expecting a parameter pack that calls the halley root finder. If we used a struct here it would look like

struct halley_iterator { template <typename Types> operator(Types&&... args) { return boost::math::tools::halley_iterate(args...); } };

charlesm93 · 2022-05-18T13:07:24Z

stan/math/prim/functor/root_finder.hpp

+auto root_finder(SolverFun&& f_solver, FTuple&& f_tuple,
+                 const GuessScalar guess, const MinScalar min,
+                 const MaxScalar max, Types&&... args) {
+  constexpr int digits = 16;


how was this default chosen? Maybe add one sentence about this choice in the doxygen doc.

I need to update this to be in line with what boost's docs say

The value of digits is crucial to good performance of these functions, if it is set too high then at best you will get one extra (unnecessary) iteration, and at worst the last few steps will proceed by bisection. Remember that the returned value can never be more accurate than f(x) can be evaluated, and that if f(x) suffers from cancellation errors as it tends to zero then the computed steps will be effectively random. The value of digits should be set so that iteration terminates before this point: remember that for second and third order methods the number of correct digits in the result is increasing quite substantially with each iteration, digits should be set by experiment so that the final iteration just takes the next value into the zone where f(x) becomes inaccurate. A good starting point for digits would be 0.6D for Newton and 0.4D for Halley or Shröder iteration, where D is std::numeric_limits::digits.

https://www.boost.org/doc/libs/1_62_0/libs/math/doc/html/math_toolkit/roots/roots_deriv.html

charlesm93 · 2022-05-18T13:23:49Z

stan/math/rev/functor/root_finder.hpp

+        [&x_var, &f](auto&&... args) { return f(x_var, std::move(args)...); },
+        std::move(args_vals_tuple));
+    fx_var.grad();
+    Jf_x = x_var.adj();


This is a scalar. You might want to call it df_dx, since J usually suggest a Jacobian matrix (but this is a minor detail). Does nested_rev_autodiff mean you're using reverse mode? For computing a scalar, I recommend using forward mode.

Does nested_rev_autodiff mean you're using reverse mode?

Yes

For computing a scalar, I recommend using forward mode.

What is the intuition on fwd being faster in this case? I thought if it was many input single output then reverse is the best?

charlesm93 · 2022-05-18T13:28:06Z

test/unit/math/mix/functor/root_finder_test.cpp

+          return beta_lpdf_deriv(x, alpha, beta, p);
+        };
+  double guess = .3;
+  double min = 0.0000000001;


Would min = 0 not work?

If not this should be indicated in a comment.

Yes I can update this

charlesm93 · 2022-05-18T13:29:14Z

test/unit/math/mix/functor/root_finder_test.cpp

+  double p = 0.45;
+  double alpha = .5;
+  double beta = .4;
+  stan::test::expect_ad(full_f, alpha, beta, p);


Reminder for me: this checks both reverse and forward mode AD, and higher-order AD as well, correct?

Does expect_ad allow a tolerance parameter -- as a way to keep track of how closely finite-diff and autodiff agree?

in addition to testing derivatives, are you also checking that the root finder returns the correct value?

Reminder for me: this checks both reverse and forward mode AD, and higher-order AD as well, correct?

Yep!

Does expect_ad allow a tolerance parameter -- as a way to keep track of how closely finite-diff and autodiff agree?

Yes, though I'm not sure yet which level of tolerance we can / should expect

in addition to testing derivatives, are you also checking that the root finder returns the correct value?

It just checks finite diff vs autodiff of the function for correct gradients, if the primative and reverse mode specializations give back different answers it would throw an error, but I do need some test against like our current algebra solver to make sure things are as we expect.

charlesm93 · 2022-05-18T13:38:29Z

test/unit/math/mix/functor/root_finder_test.cpp

+  double alpha = .4;
+  double beta = .5;
+  stan::test::expect_ad(full_f, alpha, beta, p);
+}


Same comment about this unit test that I made for the previous test.

charlesm93 · 2022-05-18T13:40:27Z

test/unit/math/rev/functor/root_finder_test.cpp

+            << finit_grad_fx(1)
+            << "\n"
+               "beta: "
+            << finit_grad_fx(2) << "\n";


I understand this is here for the discussion in the PR, but in time, we'll remove the std::cout.

charlesm93 · 2022-05-18T13:46:05Z

test/unit/math/rev/functor/root_finder_test.cpp

+  double known_p_grad = deriv_p(p, a, b);
+  double known_alpha_grad = deriv_a(p, a, b);
+  double known_beta_grad = deriv_b(p, a, b);
+  std::cout << "--- Mathematica Calculate Grad----\n";


If using another software as a benchmark, it would be good to have the code or a link to the code (or in the case of Mathematica, the prompt that you use)

Yes def agree would be good to copy/paste those into here

bgoodri · 2022-05-18T14:10:38Z

Thanks @charlesm93

SteveBronder · 2022-05-23T16:55:38Z

Thanks @charlesm93! I'm going to rework the API rn, I'd like to make it a bit more efficient for reverse mode which would mean supplying two separate functors, one for calculating the function and its derivatives all at the same time for efficiency, and another functor for calculating just the function

…eature/root-finder

SteveBronder · 2022-06-03T21:47:57Z

@bgoodri alrighty so I got reverse mode to be adjoints to be right up to 1e-9. It turns out that we need more precision when doing the forward pass during reverse mode which I found kind of weird and interesting. For the beta cdf example it seemsl like we need about 23 (???) digits of precision which is very odd to me. Like we only work in doubles so I'm not sure why having anything over 16 would be a thing? But tests now pass for testing alpha,beta, and p from .1 to .9

bob-carpenter · 2022-06-09T17:59:12Z

It turns out that we need more precision when doing the forward pass during reverse mode which I found kind of weird and interesting

The forward-mode values are often used as part of the derivative calculations, so I suspect that may be the problem.

For the beta cdf example it seemsl like we need about 23 (???) digits of precision which is very odd to me

As you point out, we only have about 16 digits of relative precision in double-precision, so we can't return results with that high precision. CPUs often do higher-precision internal arithmetic then truncate the return, but there's simply no way to get 23 digits of accuracy in a floating-point result at double precision. We can get to within 1e-23 of an answer, though in absolute precision.

dmi3kno · 2023-01-04T16:37:01Z

May I suggest that we adopt existing (closed-form) QF/CDF pairs for tests? Here are suggested QFs and CDFs (where missing in Stan).

Unbounded:
- logistic distribution
  $$Q(u)=\mu+s[\ln(u)-\ln(1-u)]$$
Semi-bounded:
- Weibull
  $$Q(u)=\lambda[-\ln(1-u)]^{1/k}$$
- Gumbel distributions
  $$Q(u)=\mu-\beta\ln(-\ln(u))$$
- Exponential (simplest QF, but perhaps most difficult to invert)
  $$Q(u)=(1/\lambda)\ln(1-u)$$
Bounded:
- Kumaraswamy distribution
  $$Q(u)=(1-(1-u)^{1/b})^{1/a}$$
  $$F(x)=1-(1-x^a)^b$$

All of these distributions have closed form CDF and QF so we always have analytical solution to check against.

SteveBronder added 4 commits April 28, 2022 18:11

Adds scalar root finding function

c33ef74

cleanup

b015dae

cleanup and have the functors pass as a set of tuples

bd9fc69

adds promotion logic to root_solver_tol to avoid boost errors

9b2b477

SteveBronder requested a review from bgoodri April 29, 2022 23:00

stan-buildbot and others added 2 commits April 29, 2022 19:01

[Jenkins] auto-formatting by clang-format version 10.0.0-4ubuntu1

d729001

fix header includes

f2f8c51

fix header includes

04dbda8

SteveBronder and others added 2 commits May 3, 2022 16:47

start working on beta test

c0d4a81

[Jenkins] auto-formatting by clang-format version 10.0.0-4ubuntu1

b08f3ad

SteveBronder and others added 5 commits May 10, 2022 18:03

update rev test

33e0b50

Merge commit '2e45ac5788d650f1f2bf05c6bc3df9c3f0ab69b5' into HEAD

f468582

[Jenkins] auto-formatting by clang-format version 10.0.0-4ubuntu1

78f84c5

update test

5bff38b

Merge branch 'feature/root-finder' of github.com:stan-dev/math into f…

f5c762b

…eature/root-finder

charlesm93 self-assigned this May 11, 2022

SteveBronder and others added 3 commits May 11, 2022 21:00

add more rev test with explicit derivative

26ca3fc

Merge commit '43ec11b55f3f6d35d8962ec37e879038d19321dc' into HEAD

4c6956f

[Jenkins] auto-formatting by clang-format version 10.0.0-4ubuntu1

91a748c

charlesm93 requested changes May 18, 2022

View reviewed changes

SteveBronder and others added 8 commits May 23, 2022 17:01

update root finder to pass the function F at compile time

ee38928

fix sign

344a450

fix sign

ef8d4e0

Merge branch 'feature/root-finder' of github.com:stan-dev/math into f…

4932068

…eature/root-finder

turn on fifth root mix check

19588e4

Merge remote-tracking branch 'origin/develop' into feature/root-finder

11c4a03

fixup tests to pass

13a8886

[Jenkins] auto-formatting by clang-format version 10.0.0-4ubuntu1

8cd506b

bgoodri mentioned this pull request Jan 4, 2023

[Feature] Add a bracketing rootfinder #2859

Open

[WIP] Adds boost root finders with reverse mode specializations #2720

Are you sure you want to change the base?

[WIP] Adds boost root finders with reverse mode specializations #2720

Conversation

SteveBronder commented Apr 29, 2022

Summary

Tests

Side Effects

Release notes

Checklist

bgoodri commented Apr 29, 2022

bgoodri commented Apr 29, 2022

bgoodri commented Apr 29, 2022

bgoodri commented Apr 29, 2022

bgoodri commented Apr 29, 2022

bgoodri commented Apr 29, 2022

bgoodri commented Apr 29, 2022

bgoodri commented Apr 30, 2022

SteveBronder commented May 3, 2022

SteveBronder commented May 3, 2022

bgoodri commented May 10, 2022

SteveBronder commented May 12, 2022

charlesm93 commented May 18, 2022

charlesm93 commented May 18, 2022

charlesm93 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bgoodri commented May 18, 2022

SteveBronder commented May 23, 2022

SteveBronder commented Jun 3, 2022

bob-carpenter commented Jun 9, 2022 • edited

dmi3kno commented Jan 4, 2023

bob-carpenter commented Jun 9, 2022 •

edited