Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some NaN issues for RISCV-64 #428

Closed
sampollard opened this issue Mar 18, 2022 · 4 comments
Closed

Some NaN issues for RISCV-64 #428

sampollard opened this issue Mar 18, 2022 · 4 comments

Comments

@sampollard
Copy link

Calling wrong absolute value for single-precision float in regression

While I was running CompCert's test suite for RISCV-64, I noticed a few
discrepancies between QEMU emulation and CompCert's interpreter. Here
are the differences from running the following:

./ccomp -interp -stdlib runtime/ -fall -quiet test/regression/NaNs.c > interp
cd test
make CCOMPOPTS='-static'
cd ..
qemu-riscv64 test/regression/NaNs.compcert > qemu
diff -u0 qemu interp

Gives

--- qemu	2022-03-18 21:43:44.472093160 +0000
+++ interp	2022-03-18 21:42:44.020907845 +0000
@@ -213 +213 @@
-single(snan(-9)) = 0x7fc00000
+single(snan(-9)) = 0xffc00000
@@ -248 +248 @@
-single(qnan(-1)) = 0x7fc00000
+single(qnan(-1)) = 0xffc00000
@@ -424,2 +424,2 @@
-double(snan(5)) = 0x7ff8000000000000
-abs(snan(5)) = 0x7fc00000
+double(snan(5)) = 0x7ff80000a0000000
+abs(snan(5)) = 0x7fc00005
@@ -459,2 +459,2 @@
-double(qnan(6)) = 0x7ff8000000000000
-abs(qnan(6)) = 0x7fc00000
+double(qnan(6)) = 0x7ff80000c0000000
+abs(qnan(6)) = 0x7fc00006
@@ -494,2 +494,2 @@
-double(snan(-9)) = 0x7ff8000000000000
-abs(snan(-9)) = 0x7fc00000
+double(snan(-9)) = 0xfff8000120000000
+abs(snan(-9)) = 0x7fc00009
@@ -529,2 +529,2 @@
-double(qnan(-1)) = 0x7ff8000000000000
-abs(qnan(-1)) = 0x7fc00000
+double(qnan(-1)) = 0xfff8000020000000
+abs(qnan(-1)) = 0x7fc00001

As it turns out, right here

printf("abs(%s) = 0x%08x\n", valname[i], bits_of_single(__builtin_fabs(val[i])));

the regression test calls __builtin_fabs on a single-precision float
when I think the desire was for __builtin_fabsf. As it currently is
written, the assembly generated is:

        fcvt.d.s        f0, f10
        fabs.d  f0, f0
        fcvt.s.d        f10, f0

However, what I think we want is just the fabs.s f10 f10, which is
what is generated if you change Line 92 to __builtin_fabsf.
Making this changes fixes some issues but not others, specifically

% diff -u0 qemu-fabsf interp-fabsf
--- qemu-fabsf	2022-03-18 21:46:04.087521157 +0000
+++ interp-fabsf	2022-03-18 21:46:15.339555650 +0000
@@ -213 +213 @@
-single(snan(-9)) = 0x7fc00000
+single(snan(-9)) = 0xffc00000
@@ -248 +248 @@
-single(qnan(-1)) = 0x7fc00000
+single(qnan(-1)) = 0xffc00000
@@ -424 +424 @@
-double(snan(5)) = 0x7ff8000000000000
+double(snan(5)) = 0x7ff80000a0000000
@@ -459 +459 @@
-double(qnan(6)) = 0x7ff8000000000000
+double(qnan(6)) = 0x7ff80000c0000000
@@ -494 +494 @@
-double(snan(-9)) = 0x7ff8000000000000
+double(snan(-9)) = 0xfff8000120000000
@@ -529 +529 @@
-double(qnan(-1)) = 0x7ff8000000000000
+double(qnan(-1)) = 0xfff8000020000000

Conversion between float and double

However, these still do not match up. According to the IEEE 754-2019
standard, they are both valid treatments of NaNs. However, CompCert in
these conversion cases does propagate the sign and payload of NaNs
whereas QEMU always returns the canonical NaN (note that RISCV describes
a "canonical" NaN much more strictly than the IEEE 754 standard. For
IEEE 754, any NaN in binary format is canonical, the definition only
applies to decimal floats. For RISC-V,

Except when otherwise stated, if the result of a floating-point operation is NaN, it is the canonical NaN. The canonical NaN has a positive sign and all significand bits clear except the MSB, a.k.a. the quiet bit. For single-precision floating-point, this corresponds to the pattern 0x7fc00000
Implementors are free to provide a NaN payload propagation scheme as a nonstandard extension enabled by a nonstandard operating mode. However, the canonical NaN scheme described above must always be supported and should be the default mode.
Sec. 11.3 of the RISC-V manual (2019 revision).

I couldn't find any reference that denotes "otherwise noted" to apply to
floating-point conversions, specifically FCVT.S.D and FCVT.D.S. The
only ones which specify NaN propagation are the various loads, stores,
sign injections (FSGN*, FL*, FS*, FMV.*, FLW, and FSW, to be
precise).

What's the Point?

I believe that CompCert's handling of FCVT.S.D. and FCVT.D.S. are
incorrect. If I'm interpreting the RISC-V standard correctly, if passed
in a NaN they should return the canonical NaN.

Versions

  • OS: Ubuntu 20.04, Linux 23dc8915bd86 3.10.0-1160.42.2.el7.x86_64 #1 SMP Tue Aug 31 20:15:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

  • The most recent compcert version, commit 9d3521b

    Configured with

    ./configure rv64-linux -toolprefix riscv64-linux-gnu- -clightgen

  • QEMU: qemu-riscv64 version 6.2.90 (v7.0.0-rc0-8-g1d60bb4b14)

  • GCC: riscv64-linux-gnu-gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0

@xavierleroy
Copy link
Contributor

the regression test calls __builtin_fabs on a single-precision float when I think the desire was for __builtin_fabsf

The test was probably written before __builtin_fabsf was introduced :-) You're right that it would be more useful to test __builtin_fabsf.

I noticed a few discrepancies between QEMU emulation and CompCert's interpreter.

The last time I checked, QEMU did not emulate the NaN payloads of the emulated architecture, and produced the NaN payloads of the processor running QEMU instead. (It makes sense, as NaN emulation adds run-time costs and is rarely useful.) That's why I'd rather test on real hardware. However, I don't have access to a RISC-V machine right now.

I believe that CompCert's handling of FCVT.S.D. and FCVT.D.S. are incorrect. If I'm interpreting the RISC-V standard correctly, if passed in a NaN they should return the canonical NaN.

That's a plausible reading of the RISC-V specs. In this respect, it would behaves differently from the other architectures supported by CompCert. This could certainly be accommodated in CompCert's formalization, at the cost of some code duplication. However I'd like to observe what real hardware does first.

@sampollard
Copy link
Author

I was able to run a smaller version of the test on a SiFive Freedom U740 SoC. The snippet of relevant C is this:

  float x = single_of_bits(0xFF800009);
  double y = double_of_bits(0xFFF0000000000009);
  // This is a signaling NaN, by the IEEE 754-2019 standard, abs should
  // propagate the NaN payload, and so the bits should be 0x7F800009
  printf("snan(-9)             = 0x%08X = 0xFF800009\n", bits_of_single(x));
  printf("abs(snan(-9))        = 0x%08X = 0x7F800009\n", bits_of_single(__builtin_fabsf(x)));

  // This signaling nan should not be propagated because FCVT.S.D or FCVT.D.S
  // (type casting) is _not_ one of the operations which preserves NaN payload
  printf("FCVT.S.D snan(-9)    = 0x%08X = 0x7FC00000\n", bits_of_single((float) y));
  printf("FCVT.D.S snan(-9)    = 0x%016llX = 0x7FF8000000000000\n", bits_of_double((double) x));

compiled with ccomp -static nan.c, version 3.10. And running on the RISC-V core, we get the expected output:

./a.out
snan(-9)             = 0xFF800009 = 0xFF800009
abs(snan(-9))        = 0x7F800009 = 0x7F800009
FCVT.S.D snan(-9)    = 0x7FC00000 = 0x7FC00000
FCVT.D.S snan(-9)    = 0x7FF8000000000000 = 0x7FF8000000000000

However, when run with compcert's interpreter, we get

./ccomp -interp -quiet nan.c
snan(-9)             = 0xFF800009 = 0xFF800009
abs(snan(-9))        = 0x7F800009 = 0x7F800009
FCVT.S.D snan(-9)    = 0xFFC00000 = 0x7FC00000
FCVT.D.S snan(-9)    = 0xFFF8000120000000 = 0x7FF8000000000000

@xavierleroy
Copy link
Contributor

Thanks for the testing and the confirmation. (I wish I had access to real RISC-V hardware, but so far every product from SiFive has been either unavailable or unreliable.) A fix is already in AbsInt's version of CompCert, and I'll backport it to the public version shortly.

@xavierleroy
Copy link
Contributor

Should be fixed by commit 967c04d . Thanks again for all the feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants