Some NaN issues for RISCV-64 #428

sampollard · 2022-03-18T22:59:07Z

Calling wrong absolute value for single-precision float in regression

While I was running CompCert's test suite for RISCV-64, I noticed a few
discrepancies between QEMU emulation and CompCert's interpreter. Here
are the differences from running the following:

./ccomp -interp -stdlib runtime/ -fall -quiet test/regression/NaNs.c > interp
cd test
make CCOMPOPTS='-static'
cd ..
qemu-riscv64 test/regression/NaNs.compcert > qemu
diff -u0 qemu interp

Gives

--- qemu	2022-03-18 21:43:44.472093160 +0000
+++ interp	2022-03-18 21:42:44.020907845 +0000
@@ -213 +213 @@
-single(snan(-9)) = 0x7fc00000
+single(snan(-9)) = 0xffc00000
@@ -248 +248 @@
-single(qnan(-1)) = 0x7fc00000
+single(qnan(-1)) = 0xffc00000
@@ -424,2 +424,2 @@
-double(snan(5)) = 0x7ff8000000000000
-abs(snan(5)) = 0x7fc00000
+double(snan(5)) = 0x7ff80000a0000000
+abs(snan(5)) = 0x7fc00005
@@ -459,2 +459,2 @@
-double(qnan(6)) = 0x7ff8000000000000
-abs(qnan(6)) = 0x7fc00000
+double(qnan(6)) = 0x7ff80000c0000000
+abs(qnan(6)) = 0x7fc00006
@@ -494,2 +494,2 @@
-double(snan(-9)) = 0x7ff8000000000000
-abs(snan(-9)) = 0x7fc00000
+double(snan(-9)) = 0xfff8000120000000
+abs(snan(-9)) = 0x7fc00009
@@ -529,2 +529,2 @@
-double(qnan(-1)) = 0x7ff8000000000000
-abs(qnan(-1)) = 0x7fc00000
+double(qnan(-1)) = 0xfff8000020000000
+abs(qnan(-1)) = 0x7fc00001

As it turns out, right here

CompCert/test/regression/NaNs.c

Line 92 in 9d3521b

    
           printf("abs(%s) = 0x%08x\n", valname[i], bits_of_single(__builtin_fabs(val[i])));

the regression test calls __builtin_fabs on a single-precision float
when I think the desire was for __builtin_fabsf. As it currently is
written, the assembly generated is:

        fcvt.d.s        f0, f10
        fabs.d  f0, f0
        fcvt.s.d        f10, f0

However, what I think we want is just the fabs.s f10 f10, which is
what is generated if you change Line 92 to __builtin_fabsf.
Making this changes fixes some issues but not others, specifically

% diff -u0 qemu-fabsf interp-fabsf
--- qemu-fabsf	2022-03-18 21:46:04.087521157 +0000
+++ interp-fabsf	2022-03-18 21:46:15.339555650 +0000
@@ -213 +213 @@
-single(snan(-9)) = 0x7fc00000
+single(snan(-9)) = 0xffc00000
@@ -248 +248 @@
-single(qnan(-1)) = 0x7fc00000
+single(qnan(-1)) = 0xffc00000
@@ -424 +424 @@
-double(snan(5)) = 0x7ff8000000000000
+double(snan(5)) = 0x7ff80000a0000000
@@ -459 +459 @@
-double(qnan(6)) = 0x7ff8000000000000
+double(qnan(6)) = 0x7ff80000c0000000
@@ -494 +494 @@
-double(snan(-9)) = 0x7ff8000000000000
+double(snan(-9)) = 0xfff8000120000000
@@ -529 +529 @@
-double(qnan(-1)) = 0x7ff8000000000000
+double(qnan(-1)) = 0xfff8000020000000

Conversion between float and double

However, these still do not match up. According to the IEEE 754-2019
standard, they are both valid treatments of NaNs. However, CompCert in
these conversion cases does propagate the sign and payload of NaNs
whereas QEMU always returns the canonical NaN (note that RISCV describes
a "canonical" NaN much more strictly than the IEEE 754 standard. For
IEEE 754, any NaN in binary format is canonical, the definition only
applies to decimal floats. For RISC-V,

Except when otherwise stated, if the result of a floating-point operation is NaN, it is the canonical NaN. The canonical NaN has a positive sign and all significand bits clear except the MSB, a.k.a. the quiet bit. For single-precision floating-point, this corresponds to the pattern 0x7fc00000
Implementors are free to provide a NaN payload propagation scheme as a nonstandard extension enabled by a nonstandard operating mode. However, the canonical NaN scheme described above must always be supported and should be the default mode.
Sec. 11.3 of the RISC-V manual (2019 revision).

I couldn't find any reference that denotes "otherwise noted" to apply to
floating-point conversions, specifically FCVT.S.D and FCVT.D.S. The
only ones which specify NaN propagation are the various loads, stores,
sign injections (FSGN*, FL*, FS*, FMV.*, FLW, and FSW, to be
precise).

What's the Point?

I believe that CompCert's handling of FCVT.S.D. and FCVT.D.S. are
incorrect. If I'm interpreting the RISC-V standard correctly, if passed
in a NaN they should return the canonical NaN.

Versions

OS: Ubuntu 20.04, Linux 23dc8915bd86 3.10.0-1160.42.2.el7.x86_64 #1 SMP Tue Aug 31 20:15:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
The most recent compcert version, commit 9d3521b

Configured with

./configure rv64-linux -toolprefix riscv64-linux-gnu- -clightgen
QEMU: qemu-riscv64 version 6.2.90 (v7.0.0-rc0-8-g1d60bb4b14)
GCC: riscv64-linux-gnu-gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0

The text was updated successfully, but these errors were encountered:

xavierleroy · 2022-03-21T12:33:02Z

the regression test calls __builtin_fabs on a single-precision float when I think the desire was for __builtin_fabsf

The test was probably written before __builtin_fabsf was introduced :-) You're right that it would be more useful to test __builtin_fabsf.

I noticed a few discrepancies between QEMU emulation and CompCert's interpreter.

The last time I checked, QEMU did not emulate the NaN payloads of the emulated architecture, and produced the NaN payloads of the processor running QEMU instead. (It makes sense, as NaN emulation adds run-time costs and is rarely useful.) That's why I'd rather test on real hardware. However, I don't have access to a RISC-V machine right now.

I believe that CompCert's handling of FCVT.S.D. and FCVT.D.S. are incorrect. If I'm interpreting the RISC-V standard correctly, if passed in a NaN they should return the canonical NaN.

That's a plausible reading of the RISC-V specs. In this respect, it would behaves differently from the other architectures supported by CompCert. This could certainly be accommodated in CompCert's formalization, at the cost of some code duplication. However I'd like to observe what real hardware does first.

sampollard · 2022-04-15T19:46:45Z

I was able to run a smaller version of the test on a SiFive Freedom U740 SoC. The snippet of relevant C is this:

  float x = single_of_bits(0xFF800009);
  double y = double_of_bits(0xFFF0000000000009);
  // This is a signaling NaN, by the IEEE 754-2019 standard, abs should
  // propagate the NaN payload, and so the bits should be 0x7F800009
  printf("snan(-9)             = 0x%08X = 0xFF800009\n", bits_of_single(x));
  printf("abs(snan(-9))        = 0x%08X = 0x7F800009\n", bits_of_single(__builtin_fabsf(x)));

  // This signaling nan should not be propagated because FCVT.S.D or FCVT.D.S
  // (type casting) is _not_ one of the operations which preserves NaN payload
  printf("FCVT.S.D snan(-9)    = 0x%08X = 0x7FC00000\n", bits_of_single((float) y));
  printf("FCVT.D.S snan(-9)    = 0x%016llX = 0x7FF8000000000000\n", bits_of_double((double) x));

compiled with ccomp -static nan.c, version 3.10. And running on the RISC-V core, we get the expected output:

./a.out
snan(-9)             = 0xFF800009 = 0xFF800009
abs(snan(-9))        = 0x7F800009 = 0x7F800009
FCVT.S.D snan(-9)    = 0x7FC00000 = 0x7FC00000
FCVT.D.S snan(-9)    = 0x7FF8000000000000 = 0x7FF8000000000000

However, when run with compcert's interpreter, we get

./ccomp -interp -quiet nan.c
snan(-9)             = 0xFF800009 = 0xFF800009
abs(snan(-9))        = 0x7F800009 = 0x7F800009
FCVT.S.D snan(-9)    = 0xFFC00000 = 0x7FC00000
FCVT.D.S snan(-9)    = 0xFFF8000120000000 = 0x7FF8000000000000

xavierleroy · 2022-04-18T17:26:10Z

Thanks for the testing and the confirmation. (I wish I had access to real RISC-V hardware, but so far every product from SiFive has been either unavailable or unreliable.) A fix is already in AbsInt's version of CompCert, and I'll backport it to the public version shortly.

xavierleroy · 2022-04-25T14:35:22Z

Should be fixed by commit 967c04d . Thanks again for all the feedback.

xavierleroy closed this as completed in 967c04d Apr 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some NaN issues for RISCV-64 #428

Some NaN issues for RISCV-64 #428

sampollard commented Mar 18, 2022

xavierleroy commented Mar 21, 2022

sampollard commented Apr 15, 2022

xavierleroy commented Apr 18, 2022

xavierleroy commented Apr 25, 2022

Some NaN issues for RISCV-64 #428

Some NaN issues for RISCV-64 #428

Comments

sampollard commented Mar 18, 2022

Calling wrong absolute value for single-precision float in regression

Conversion between float and double

What's the Point?

Versions

xavierleroy commented Mar 21, 2022

sampollard commented Apr 15, 2022

xavierleroy commented Apr 18, 2022

xavierleroy commented Apr 25, 2022