Skip to content

8346914: UB issue in scalbnA #25656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

dholmes-ora
Copy link
Member

@dholmes-ora dholmes-ora commented Jun 5, 2025

This fixes address a problem with signed integer overflow in the C fdlibm scalbnA function.

Testing this code is extremely difficult. First, the only time this code will get executed is if intrinsics have been disabled by -XX:-InlineIntrinsics. Second, finding the math routines and the arguments thereto which actually reach this function is also difficult. I have found 3 tests only that hit the scalbnA function at the point where the potential overflow occurs, but beyond that I cannot determine what arguments will cause the different code paths to be taken. Consequently the only testing I could do here was to make a copy of the original scalbnA function and then place a check in the callers that the old and new code produced the same result. Again how much coverage this actually gave is not known. That test code still remains in the PR as the initial commit.

Due to the testing problem this test relies on detailed code inspection and analysis, so here are the changes and the reasoning for them:

    // Convert to unsigned to avoid signed integer overflow
[1] unsigned u_k = ((unsigned) k) + n;

[2] if (u_k > 0x7fe && u_k <= 0x7fffffff) return hugeX*copysignA(hugeX,x); /* overflow */
[3] if (u_k > 0 && u_k <= 0x7fe) { /* normal result */
[4]   set_high(&x, (hx&0x800fffff)|((k+n)<<20));
      return x;
    }

[5] if (u_k <= (unsigned)-54) {
      if (n > 50000) /* in case integer overflow in n+k */
        return hugeX*copysignA(hugeX,x); /*overflow*/
      else return tiny*copysignA(tiny,x); /*underflow*/
    }
[6] k = u_k + 54; /* subnormal result */
    set_high(&x, (hx&0x800fffff)|(k<<20));
    return x*twom54;

[1] We use an unsigned variable, u_k, for the potentially overflowing addition

[2] We check the value of u_k adjusting the bounds to emulate a signed-int range

[3] Again we check u_k and adjust the range

[4] We know k+n is in range so we use that directly. I didn't use u_k here because I didn't want to have to reason about whether the use of an unsigned type would change anything in the expression

[5] We check if u_k is logically less than what -54 would be

[6] We bring u_k back into positive range by adding 54 and then store safely into k

Thanks.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25656/head:pull/25656
$ git checkout pull/25656

Update a local copy of the PR:
$ git checkout pull/25656
$ git pull https://git.openjdk.org/jdk.git pull/25656/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25656

View PR using the GUI difftool:
$ git pr show -t 25656

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25656.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 5, 2025

👋 Welcome back dholmes! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jun 5, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 5, 2025
@openjdk
Copy link

openjdk bot commented Jun 5, 2025

@dholmes-ora The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot hotspot-dev@openjdk.org label Jun 5, 2025
@mlbridge
Copy link

mlbridge bot commented Jun 5, 2025

Webrevs

Comment on lines +118 to 122
if (u_k > 0x7fe && u_k <= 0x7fffffff) return hugeX*copysignA(hugeX,x); /* overflow */
if (u_k > 0 && u_k <= 0x7fe) { /* normal result */
set_high(&x, (hx&0x800fffff)|((k+n)<<20));
return x;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious if this could be? It looks like this is what the ifs are doing. And it is at least easier for me to see that this is the same as the code before.

Suggested change
if (u_k > 0x7fe && u_k <= 0x7fffffff) return hugeX*copysignA(hugeX,x); /* overflow */
if (u_k > 0 && u_k <= 0x7fe) { /* normal result */
set_high(&x, (hx&0x800fffff)|((k+n)<<20));
return x;
}
if ((int)u_k > 0) {
if (u_k > 0x7fe) {
return hugeX*copysignA(hugeX,x); /* overflow */
}
set_high(&x, (hx&0x800fffff)|((k+n)<<20));
return x;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change. It's easier to follow, at least for me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is equivalent, but I did not want to disrupt the general control flow of the algorithm so that it still closely resembles the original fdlibm code.

I also initially misread your suggestion and now I am wondering if I can actually simplify it to a simple two line change:

unsigned u_k = k + n; // avoid UB signed integer overflow
k = (int) u_k;  // safely assign to k

does that bypass any UB?

Copy link
Contributor

@jdksjolen jdksjolen Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it be OK to do (k+n)<<20 but UB to do k = k+n? The addition may still overflow, which would be UB. Edit: Duh, ignore this. I understand now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dholmes-ora , if k = (int) u_k; does not say to the compiler that it can assume that 0 <= u_k < 2**31 - 1, then this seems like a good changeset. Frankly, I would not trust myself in this matter, I find this very finicky.

Copy link
Member

@xmas92 xmas92 Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I thought it would be nice if we could have expressed this as simply k = wrapping_add(k, n);. But it would require implementing such a method correctly, which seems non-trivial. (Some compilers have intrinsics for this I think).

Might it is the case that casting the number here is also UB (if the unsigned number is to large). What we really need is a bit_cast (and use 2s complement, which I think is only guaranteed in C++20, but I think we already make assumptions about this, or we could use int32_t which at least should be 2s complement, not 100%).

But then again looking at this code we also read from none active members of a union, which I think is a compiler language extension. (Only valid in C).

If we keep the structure as it is, I would find the code more readable if it was u_k <= (unsigned)std::numeric_limits<int>::max()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it is the case that casting the number here is also UB

The casting is implementation-defined until C++20, and all supported platforms define it as the "obvious"
two's-complement conversion.

INT_MAX is equivalent to, but less wordy than using std::numeric_limits<int>::max(). But this
doesn't matter with my suggested change to limit the scope of use of u_k.

return x;
}
if (k <= -54) {

if (u_k <= (unsigned)-54) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just be? Or is this less clear / easier to miss?

Suggested change
if (u_k <= (unsigned)-54) {
if (u_k <= -54u) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it's less clear and far easier to miss.
I'll vote for if (u_k <= (unsigned)-54) {.
Reading if (u_k <= -54u) { just cooks my brain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suprised that is even valid TBH. It strikes me as a numerical oxymoron.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). —end note ]

So this is OK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hard part for me is that when I see -54u , I need to make sure it's return type is unsigned, which is not immediately obvious to me. Reading what the standard say about the built-in unary minus operator I find: "The type of the result is the type of the promoted type of expression." Which means that the return type of -54u really is unsigned. But I do agree with David, it looks like a numerical oxymoron.

@theRealAph
Copy link
Contributor

Second, finding the math routines and the arguments thereto which actually reach this function is also difficult. I have found 3 tests only that hit the scalbnA function at the point where the potential overflow occurs, but beyond that I cannot determine what arguments will cause the different code paths to be taken. Consequently the only testing I could do here was to make a copy of the original scalbnA function and then place a check in the callers that the old and new code produced the same result. Again how much coverage this actually gave is not known. That test code still remains in the PR as the initial commit.

I'm somewhat perplexed by this. Should I conclude that you have never succeeded in triggering the UB, and that you do not know if UB ever occurs?

@dholmes-ora
Copy link
Member Author

I'm somewhat perplexed by this. Should I conclude that you have never succeeded in triggering the UB, and that you do not know if UB ever occurs?

Correct. The potential for UB was spotted by code inspection as Kim reported in the JBS issue. I was unable to determine what argument to what mathematical function (tan/sin/cos) would trigger the overflow paths. If there are any math gurus out there that would know, please speak up.

Copy link

@kimbarrett kimbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've come up with a simpler solution. See detailed comments.

// Convert to unsigned to avoid signed integer overflow
unsigned u_k = ((unsigned) k) + n;

if (u_k > 0x7fe && u_k <= 0x7fffffff) return hugeX*copysignA(hugeX,x); /* overflow */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think (unsigned)INT_MAX would be more explicit about what's going on.
This is also starting to push my limits on sufficiently simple to be a one-line if, and even more-so with my
suggested change.
I note that this isn't distinguishing between (1) n > 0 and k + n overflows and wraps around to negative
int vs (2) n < 0 and k + n is negative. And that makes later code (both pre-existing and changed)
harder to understand. I think better here would be u_k > 0x7fe && n > 0 => overflow, with some later
adjustments. Then, if the test fails and we're not huge, k = (int)u_k; and use k as before, dropping
u_k, so discarding the remainder of the currently proposed changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing the full suggestion here. In the original code this line:

if (k > 0x7fe) return hugeX*copysignA(hugeX,x); /* overflow  */

is defining a logical overflow, not the wrapping overflow that we are trying to deal with. The wrapping overflow results in a negative value, which is the third case that gets handled. So AFAICS we need to use u_k all the way through until the end.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the diff for what I'm suggesting.

diff --git a/src/hotspot/share/runtime/sharedRuntimeMath.hpp b/src/hotspot/share/runtime/sharedRuntimeMath.hpp
index 91dda2a4fe8..321f3be580a 100644
--- a/src/hotspot/share/runtime/sharedRuntimeMath.hpp
+++ b/src/hotspot/share/runtime/sharedRuntimeMath.hpp
@@ -111,16 +111,23 @@ static double scalbnA(double x, int n) {
     if (n< -50000) return tiny*x;       /*underflow*/
   }
   if (k==0x7ff) return x+x;             /* NaN or Inf */
-  k = k+n;
-  if (k > 0x7fe) return hugeX*copysignA(hugeX,x); /* overflow  */
+  // If the high (sign) bit of u_k is set, then either
+  // n is positive and k+n would overflow, or
+  // n is negative and |n| > k.
+  unsigned u_k = (unsigned)k + (unsigned)n;
+  if ((u_k > 0x7fe) && (n > 0)) {
+    // Either (k+n > 0x7fe && k+n <= INT_MAX) or k+n would overflow.
+    return hugeX*copysignA(hugeX,x); /* overflow  */
+  }
+  // Set k to k+n, now that we know k+n wouldn't overflow.
+  // We know that k+n <= (int)0x7fe, and might be negative if n is negative.
+  k = (int)u_k;
   if (k > 0) {                          /* normal result */
     set_high(&x, (hx&0x800fffff)|(k<<20));
     return x;
   }
   if (k <= -54) {
-    if (n > 50000)      /* in case integer overflow in n+k */
-      return hugeX*copysignA(hugeX,x);  /*overflow*/
-    else return tiny*copysignA(tiny,x); /*underflow*/
+    return tiny*copysignA(tiny,x); /*underflow*/
   }
   k += 54;                              /* subnormal result */
   set_high(&x, (hx&0x800fffff)|(k<<20));

Copy link
Member Author

@dholmes-ora dholmes-ora Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+  if ((u_k > 0x7fe) && (n > 0)) {
+    // Either (k+n > 0x7fe && k+n <= INT_MAX) or k+n would overflow.
+    return hugeX*copysignA(hugeX,x); /* overflow  */
+  }

But that is not correct - we should only take this "overflow" path for 0x7fe < k+n <= INT_MAX. Your suggestion makes us take this path if k+n overflows to negative. ??

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+ // We know that k+n <= (int)0x7fe, and might be negative if n is negative.

It can be negative if n is positive too.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that is not correct - we should only take this "overflow" path for
((k+n) > 0x7fe && (k+n) <= INT_MAX). Your suggestion makes us take this
path if (k+n) overflows to negative. ??

It is intentional that the new test is true for the (k+n) => overflow case.
It fully handles the overflow case, eliminating the need for the later fixup
of the case where ((k <= -54) && (n > 5000)) (though (n > 0) would work
just as well; I don't know why that 5000 was inserted). That fixup returned
the same hugeX-based result as here.

It can be negative if n is positive too.

(k+n) cannot be negative with n positive here, even under wrapping
semantics, because we can't get here in that case due to the prior overflow
detection.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay @kimbarrett I suggest that you take over this issue and PR. I do not have the knowledge of the code that you have and I cannot affirm that your statements are correct.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, my suggestion above isn't quite right. It doesn't properly handle subnormal x. I forgot that k (before adding n) can be negative in the subnormal case. So the correct test of u_k is not
((u_k > 0x7fe) && (n > 0))
but rather
((u_k > 0x7fe) && ((k|n) > 0))
I'll be putting more details in JBS.

Comment on lines +118 to 122
if (u_k > 0x7fe && u_k <= 0x7fffffff) return hugeX*copysignA(hugeX,x); /* overflow */
if (u_k > 0 && u_k <= 0x7fe) { /* normal result */
set_high(&x, (hx&0x800fffff)|((k+n)<<20));
return x;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might it is the case that casting the number here is also UB

The casting is implementation-defined until C++20, and all supported platforms define it as the "obvious"
two's-complement conversion.

INT_MAX is equivalent to, but less wordy than using std::numeric_limits<int>::max(). But this
doesn't matter with my suggested change to limit the scope of use of u_k.

return x;
}
if (k <= -54) {

if (u_k <= (unsigned)-54) {
if (n > 50000) /* in case integer overflow in n+k */
return hugeX*copysignA(hugeX,x); /*overflow*/
Copy link

@kimbarrett kimbarrett Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This case (n > 50000) isn't possible with my suggest change to limit the usage scope for u_k, and
can be deleted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, as an alternative, there is a Java-only implementation of scalb (and supporting functionality) in java.lang.Math that could be ported to C as another way to avoid this issue.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

java.lang.Math.scalb() doesn't seem useful here. It just transforms the
scale factor into a double power of 2 and then multiplies. It's not clear that
would result in exactly the same result for all arguments as this. And this is
(mostly) avoiding doing a double multiply for (perceived) performance reasons.
(For all I know, the complexity here could swamp the cost of a double
multiply.) Being certain of keeping the same results (on edge cases) is the
only reason to stay with this (but fixed to remove UB), rather than just
switching to using the C/C++ library scalbn. (And I don't know that
there is any potential difference between scalbnA and scalbn that would
actually matter. That would require more analysis than I've had time to do.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand your comment here Kim but I am certainly not going to mess with the existing algorithm. All I am trying to do is avoid the case where a direct k+n would wrap to a negative value. I have no idea what the possible range of values for n might be or whether we have to account for underflow as well, but that seems a distinct issue to dealing with the potential UB overflow.

@kimbarrett
Copy link

The JBS issue also talks about copysignA and suggests we should just use copysign if we're keeping scalbnA.
Please either address that here or file a new issue for copysignA.

@kimbarrett
Copy link

For the record, I still think it would be better to just delete scalbnA (and
copysignA) and simply use the standard C scalbn. But I'm not going to
insist on that, given the narrow scope of use and the challenges involved in
figuring out whether that could result in any compatibility issue.

@kimbarrett
Copy link

use the standard C scalbn.

Long ago we couldn't use the standard C scalbn because (1) Visual Studio
didn't provide it at all, and (2) we were using C++98/03 / C89 for gcc/clang
and it was version conditionalized out as being a C99 function.

@dholmes-ora
Copy link
Member Author

Lots to process here. I'm away for a few days so will get back to this next week. Thanks @kimbarrett !

@dholmes-ora
Copy link
Member Author

The JBS issue also talks about copysignA and suggests we should just use copysign if we're keeping scalbnA. Please either address that here or file a new issue for copysignA.

It doesn't really suggest that it simply says "is there a reason to prefer copysignA over the <math.h> copysign? ". I don't have an answer to that any more than I can answer the scalbnA versus scalbn question. You need to a libmath expert to answer those types of questions. All I have tried to do here is address the UB that was spotted.

@kimbarrett
Copy link

The JBS issue also talks about copysignA and suggests we should just use copysign if we're keeping scalbnA. Please either address that here or file a new issue for copysignA.

It doesn't really suggest that it simply says "is there a reason to prefer copysignA over the <math.h> copysign? ". I don't have an answer to that any more than I can answer the scalbnA versus scalbn question. You need to a libmath expert to answer those types of questions. All I have tried to do here is address the UB that was spotted.

OK, I'll be more direct than I was in JBS.

We don't need copysignA. Just use copysign from <math.h>.

Rationale: Long ago we had our own copysign, because we couldn't get it from
<math.h>. It's a C99 function. For gcc/clang we were using C++98/03, which
only includes C89 library functions. So gcc/clang <math.h> version restricted
it out. And MSVC++ <math.h> didn't have it at all. Later, MSVC++ added
copysign, without any version restriction since they didn't do Standard
versions back then. This collided with ours, so we renamed ours. Later still
we switched to C++14, which includes C99 library functions, so it's no longer
version restricted by gcc/clang. So we no longer need our own, and should just
use the one from <math.h>.

Whether this is done under this issue or a new one, I don't really care, so long
as the issue isn't lost.

@kimbarrett
Copy link

Due to the testing problem...

Why not write a gtest for scalnbA?

@dholmes-ora
Copy link
Member Author

Due to the testing problem...

Why not write a gtest for scalnbA?

Quite simply because I have no idea what the code does, nor how to effectively test it. Even exhaustive testing of inputs comparing the old code and new is not possible given we are dealing with a double input type and double return.

@dholmes-ora
Copy link
Member Author

We don't need copysignA. Just use copysign from <math.h>.

Rationale: Long ago we had our own copysign, because we couldn't get it from
<math.h>.

Okay but do we know that what we have and what the math library provides are exactly the same?

@kimbarrett
Copy link

We don't need copysignA. Just use copysign from <math.h>.
Rationale: Long ago we had our own copysign, because we couldn't get it from
<math.h>.

Okay but do we know that what we have and what the math library provides are exactly the same?

Yes.

@kimbarrett kimbarrett mentioned this pull request Jun 20, 2025
3 tasks
@dholmes-ora dholmes-ora deleted the 8346914-scalbnA branch June 25, 2025 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-dev@openjdk.org rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

7 participants