Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cyl_bessel_j_zero does not respect policies::promote_double<false> #398

Closed
evanmiller opened this issue Jul 8, 2020 · 22 comments · Fixed by #405
Closed

cyl_bessel_j_zero does not respect policies::promote_double<false> #398

evanmiller opened this issue Jul 8, 2020 · 22 comments · Fixed by #405

Comments

@evanmiller
Copy link
Contributor

I'm auditing a code base to eliminate uses of long double on x86 (i.e. eliminate all x87 instructions). After compiling, I dump the disassembly and look for forbidden instructions and function calls.

I found that the following small test program - which I am led to believe by Boost documentation will not promote doubles to long doubles - includes calls to a number of long double functions when compiled.

#include <boost/math/special_functions/bessel.hpp>

int main() {
    typedef boost::math::policies::policy<
        boost::math::policies::promote_double<false>
        > numeric_policy;

    boost::math::cyl_bessel_j_zero((double)0.5, 1, numeric_policy());
}

A sampling of disassembly:

$  otool -t -V a.out | grep 'long double' | head -n 5
00000001000039a0	callq	__ZN5boost4math7lanczos19lanczos_initializerINS1_12lanczos17m64EeE4initC2Ev ## boost::math::lanczos::lanczos_initializer<boost::math::lanczos::lanczos17m64, long double>::init::init()
000000010000db3e	callq	__ZL4fabse ## fabs(long double)
000000010000db67	callq	__ZL3cose ## cos(long double)
000000010000db98	callq	__ZL5floore ## floor(long double)
000000010000dc5d	callq	__ZL3sine ## sin(long double)

Is this expected behavior, or a bug?

I realize that I can disable x87 with a compiler flag - but my compiler is actually crashing when I try that. So I am hoping to resolve the issue within Boost.

@NAThompson
Copy link
Collaborator

NAThompson commented Jul 8, 2020

The Lanczos one I recognize: It's required. There is an ill-conditioned filter computation that really does need that extra precision; see here. Looks like its also responsible for the call to abs.

@evanmiller
Copy link
Contributor Author

I am porting code to a platform without native 80-bit or 128-bit (i.e. 'long double' is equivalent to 'double').

Will the Lanczos code fail to converge on such a platform?

@NAThompson
Copy link
Collaborator

@evanmiller : It won't fail to converge, the filter coefficients will just not be as accurate. For digital filters, we generally want half-ulp accuracy precomputed once, under the assumption that applying the filter to data will dominate the execution time.

@evanmiller
Copy link
Contributor Author

@NAThompson Is it reasonable to ask that this coefficient computation respect the promote_double policy? It is important for my application that the same result is returned on all platforms, even if that result is not as accurate as it could be.

@NAThompson
Copy link
Collaborator

NAThompson commented Jul 8, 2020

There's almost zero hope you'll get the same result on all platforms, even if it does respect promote_double; see (for one of many reasons) here.

The semantics of promote_double are somewhat different than what is being done in the Lanczos filters. The promote_double gives the option for a user to obtain half-ulp accuracy in a computation, rather than being limited to a weaker accuracy governed by the condition number of function evaluation.

In the Lanczos filters, we're using argument promotion just to achieve accuracy, because the filter computation is very ill-conditioned.

@evanmiller
Copy link
Contributor Author

There's almost zero hope you'll get the same result on all platforms, even if it does respect promote_double; see (for one of many reasons) here.

Well, I don't need bit-for-bit compatibility. But if there are indeed significant accuracy issues introduced without 80-bit long double, I would like to have control over it for the sake of general cross-platform consistency.

It seems to me that because promote_double defaults to true, a user who sets false is indicating a desire not to use long doubles internally.

In any event, if the current algorithm only guarantees accuracy when 80-bit registers are present, then it appears I will need to find another algorithm.

@NAThompson
Copy link
Collaborator

NAThompson commented Jul 8, 2020

In any event, if the current algorithm only guarantees accuracy when 80-bit registers are present, then it appears I will need to find another algorithm.

I did a very deep lit search for this algorithm, and I couldn't find another one that had the same performance and capabilities. So if you do go find another where you can get the filter coeffs without an ill-conditioned step, I'd be very happy about it.

@evanmiller
Copy link
Contributor Author

I did a very deep lit search for this algorithm, and I couldn't find another one that had the same performance and capabilities. So if you do go find another where you can get the filter coeffs without an ill-conditioned step, I'd be very happy about it.

Well, I'm just looking for a pure 64-bit Bessel algorithm - I'm not trying to reinvent the Lanczos wheel. It could be something as simple as replacing Boost's lgamma with the system's lgamma.

I will need to investigate more to see exactly where the long double dependencies are for the functions that I'm calling.

@NAThompson
Copy link
Collaborator

@evanmiller : Actually I think we're talking about a different Lanczos algorithm. I'm pretty sure there are no filter coefficients needed there. Probably @jzmaddock will know what the Bessel function is doing.

@jzmaddock
Copy link
Collaborator

This does look like a bug to me, @NAThompson the code in question is my old lanczos gamma function approximation, and it should be using a double precision approximation in this case IMO. @evanmiller can you get a call stack for where the long double calls occur?

@evanmiller
Copy link
Contributor Author

@jzmaddock I don't have a stack trace handy; I'm just disassembling a static binary right now.

I think I hit some Lanczos code through this code path earlier today though:

https://github.com/boostorg/math/blob/develop/include/boost/math/special_functions/detail/bessel_ik.hpp#L63

I can post suspicious disassembly here as I come across it. It looks like there is a similar issue affecting the beta function.

@evanmiller
Copy link
Contributor Author

A long double of cos_pi is being compiled into the original test program. It looks like there are a number of places where cos_pi is not being passed a policy:

$ grep -rn cos_pi .
./special_functions/detail/bessel_jy.hpp:409:            T cv = cos_pi(mod_v);
./special_functions/detail/bessel_jy_derivatives_series.hpp:196:      prefix = boost::math::tgamma(-v, pol) * boost::math::cos_pi(v) * p / boost::math::constants::pi<T>();
./special_functions/detail/bessel_jy_derivatives_asym.hpp:78:   const T ci = cos_pi(vd2shifted);
./special_functions/detail/bessel_jy_derivatives_asym.hpp:111:   const T ci = cos_pi(vd2shifted);
./special_functions/detail/bessel_jy_asym.hpp:83:   T ci = cos_pi(v / 2 + 0.25f);
./special_functions/detail/bessel_jy_asym.hpp:115:   T ci = cos_pi(v / 2 + 0.25f);
./special_functions/detail/bessel_jy_series.hpp:193:      prefix = boost::math::tgamma(-v, pol) * boost::math::cos_pi(v) * p / constants::pi<T>();

@evanmiller
Copy link
Contributor Author

cbrt is another source of inadvertent double promotion.

I'm making good progress eliminating the long doubles by passing policies around in more places - will open a PR later today.

@evanmiller
Copy link
Contributor Author

The factorial functions are another source of double promotion:

template <>
inline BOOST_MATH_CONSTEXPR_TABLE_FUNCTION double unchecked_factorial<double>(unsigned i BOOST_MATH_APPEND_EXPLICIT_TEMPLATE_TYPE_SPEC(double))
{
return static_cast<double>(boost::math::unchecked_factorial<long double>(i));
}
template <>
struct max_factorial<double>
{
BOOST_STATIC_CONSTANT(unsigned,
value = ::boost::math::max_factorial<long double>::value);
};

@evanmiller
Copy link
Contributor Author

A small demonstration program is actually just:

#include <boost/math/special_functions/beta.hpp>

int main() { }

Based on the disassembly, it looks like an 80-bit Lanczos approximator is getting force-instantiated before any user code is called at all:

___cxx_global_var_init:
0000000100002dd0        pushq   %rbp
0000000100002dd1        movq    %rsp, %rbp
0000000100002dd4        movq    0x1225(%rip), %rax ## literal pool symbol address: guard variable for boost::math::lanczos::lanczos_initializer<boost::math::lanczos::lanczos17m64, long double>::initializer
0000000100002ddb        cmpb    $0x0, (%rax)
0000000100002dde        jne     0x100002dfe
0000000100002de4        movq    0x123d(%rip), %rdi ## literal pool symbol address: boost::math::lanczos::lanczos_initializer<boost::math::lanczos::lanczos17m64, long double>::initializer
0000000100002deb        callq   boost::math::lanczos::lanczos_initializer<boost::math::lanczos::lanczos17m64, long double>::init::init() ## boost::math::lanczos::lanczos_initializer<boost::math::lanczos::lanczos17m64, long double>::init::init()
0000000100002df0        movq    0x1209(%rip), %rax ## literal pool symbol address: guard variable for boost::math::lanczos::lanczos_initializer<boost::math::lanczos::lanczos17m64, long double>::initializer
0000000100002df7        movq    $0x1, (%rax)
0000000100002dfe        popq    %rbp
0000000100002dff        retq

So it's possible that this Lanczos code is never actually hit in my test program.

@jzmaddock
Copy link
Collaborator

I see this, here's a test program for you (requires current develop):


#include <type_traits>
#include <cmath>

#define BOOST_MATH_ASSERT_UNDEFINED_POLICY false
//
// Poison the long double std math functions so we can find accidental usage of these
// when the user has requested that we do *not* use them.
//
// THIS IS NOT STD CONFORMING!!!  Is there a better way?
//
namespace std {
      long double fabs(long double, void* = 0);
      long double abs(long double, void* = 0);
      long double sin(long double, void* = 0);
      long double cos(long double, void* = 0);
      long double tan(long double, void* = 0);
      long double asin(long double, void* = 0);
      long double acos(long double, void* = 0);
      long double atan(long double, void* = 0);
      long double exp(long double, void* = 0);
      long double log(long double, void* = 0);
      long double pow(long double, long double, void* = 0);
}
#define TEST_GROUP_5
#define TEST_GROUP_6
#include "compile_test/instantiate.hpp"

int main()
{
   //boost::math::foo(0.0L);
   instantiate(0.0);
}

The instantiation call stack gives the source of the error(s), the first of which is caused by:

template <>
inline float binomial_coefficient<float, policies::policy<> >(unsigned n, unsigned k, const policies::policy<>& pol)
{
   return policies::checked_narrowing_cast<float, policies::policy<> >(binomial_coefficient<double>(n, k, pol), "boost::math::binomial_coefficient<%1%>(unsigned,unsigned)");
}

Which should be using promote_double<false> in it's forwarding-policy.

Many thanks for looking into this!!

@evanmiller
Copy link
Contributor Author

Clever solution! Just so you know: frexp is another function popular with the long-double crowd - at least according to the disassembly I was examining.

A complete audit is outside the scope of what I'm doing right now, but I'll send along pull requests for any more of those double-long weeds that I come across. The first batch of proposed changes is in #399.

@evanmiller
Copy link
Contributor Author

Changing binomial_coefficient gets rid of the 80-bit Lanczos, hooray! Here is my altered version... is this what you had in mind?

template <>
inline float binomial_coefficient<float, policies::policy<> >(unsigned n, unsigned k, const policies::policy<>& pol)
{
    typedef policies::normalise<
        policies::policy<>,
        policies::promote_float<true>,
        policies::promote_double<false>,
        policies::discrete_quantile<>,
        policies::assert_undefined<> >::type forwarding_policy;
   return policies::checked_narrowing_cast<float, forwarding_policy>(binomial_coefficient<double>(n, k, forwarding_policy()), "boost::math::binomial_coefficient<%1%>(unsigned,unsigned)");
}

If so, I will add it to my pull request.

@jzmaddock
Copy link
Collaborator

is this what you had in mind?

Yes exactly so!

Once your PR is in I'll try and do a thorough trawl for other issues.

@evanmiller
Copy link
Contributor Author

Ok! I've committed the binomial_coefficient change. Once Travis passes, the PR will be ready for review.

The last function affecting my own code base is unchecked_factorial, as mentioned above. I tried rewriting it to be more generic (i.e. using a single type-independent array), but my template-fu was found wanting. So if your trawl could start there, I would appreciate it!

@evanmiller
Copy link
Contributor Author

With the PR merged, I am now down to two sources of long double with this cyl_bessel_j_zero function:

  • unchecked_factorial, as mentioned

  • make_big_value, as in (lots of these):

double const boost::math::tools::make_big_value<double>(long double, char const*, boost::integral_constant<bool, true> const&, boost::integral_constant<bool, false> const&)

I can't yet tell if I can expect identical behavior on 64-bit and 80-bit platforms. For example, make_big_value looks like it gets pulled in from bessel_j0, e.g.

static const T P1[] = {
static_cast<T>(BOOST_MATH_BIG_CONSTANT(T, 64, -4.1298668500990866786e+11)),
static_cast<T>(BOOST_MATH_BIG_CONSTANT(T, 64, 2.7282507878605942706e+10)),
static_cast<T>(BOOST_MATH_BIG_CONSTANT(T, 64, -6.2140700423540120665e+08)),
static_cast<T>(BOOST_MATH_BIG_CONSTANT(T, 64, 6.6302997904833794242e+06)),
static_cast<T>(BOOST_MATH_BIG_CONSTANT(T, 64, -3.6629814655107086448e+04)),
static_cast<T>(BOOST_MATH_BIG_CONSTANT(T, 64, 1.0344222815443188943e+02)),
static_cast<T>(BOOST_MATH_BIG_CONSTANT(T, 64, -1.2117036164593528341e-01))
};

I think that because a double is being returned on both platforms the behavior will be identical, but I am not 100% certain.

@evanmiller
Copy link
Contributor Author

This seems to eliminate the long doubles in make_big_value when compiling with -mlong-double-64:

diff --git a/include/boost/math/tools/big_constant.hpp b/include/boost/math/tools/big_constant.hpp
index aad7ea133..727f44790 100644
--- a/include/boost/math/tools/big_constant.hpp
+++ b/include/boost/math/tools/big_constant.hpp
@@ -33,9 +33,12 @@ struct numeric_traits<__float128>
    static const int max_exponent = 16384;
    static const bool is_specialized = true;
 };
-#else
+#elif LDBL_DIG > DBL_DIG
 typedef long double largest_float;
 #define BOOST_MATH_LARGEST_FLOAT_C(x) x##L
+#else
+typedef double largest_float;
+#define BOOST_MATH_LARGEST_FLOAT_C(x) x
 #endif

 template <class T>

NAThompson pushed a commit that referenced this issue Jul 18, 2020
Before this change, any function calling unchecked_factorial<double>
would generate long double code. This created a headache for users
attempting to eliminate long doubles from their compiled code.

A more elegant solution might use templates to create a single lookup
array for all major floating-point types, but the present solution
(adding an explicit double array) works just fine.

See #398.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants