Skip to content

riscv_fpu: IEEE 754 rounding and exception conformance for the scalar FPU#239

Open
SolAstrius wants to merge 9 commits into
LekKit:stagingfrom
pufit:fix/fpu-gaps
Open

riscv_fpu: IEEE 754 rounding and exception conformance for the scalar FPU#239
SolAstrius wants to merge 9 commits into
LekKit:stagingfrom
pufit:fix/fpu-gaps

Conversation

@SolAstrius

@SolAstrius SolAstrius commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Summary

Bring RVVM's scalar F/D floating-point rounding and exception behaviour up to IEEE 754 / RISC-V conformance.

This began as the #204 RMM (roundTiesToAway) fix and grew to close the rounding and flag gaps the conformance suite exposed.

What's fixed

  • RMM rounding computed via exact error-free transforms, and the instruction's static rm field honoured (not just frm) for all arithmetic ops.
  • FP-op dispatch no longer size-optimised (func_opt_size dropped) — roughly 2x interpreter FP throughput at zero size cost, since the FP path is not JIT'd.
  • The FMA family: rounding mode honoured (including the fnmadd operand-negation fix), RMM exact ties via an error-free FMA, and the underflow flag set by true IEEE after-rounding tininess.
  • fmul ties-away near the underflow boundary, where the Dekker product error itself underflows.
  • sqrt(-0.0) returns -0.0 without raising invalid.
  • A spurious INEXACT flag leaked by integer rounding, removed.

Everything stays within the existing fpu_lib wrapper discipline — no raw host FP, no host-fenv-only tricks — so it remains compatible with a software-fenv / jittable softfloat path.

Validation

  • MPFR-oracle bare-metal harness (rvvm-hal examples/rmm-test): 43652/0 over the five OP-FP ops and the four FMA ops, each under both dynamic frm=RMM and the static ,rmm suffix, including subnormal and underflow-boundary ties.
  • Together with the canonical-NaN changes (riscv_fpu: canonicalize NaN results and mal-boxed narrow operands #240), riscv-arch-test (ACT4, Spike reference) F/D/I/M reaches 260/260.

Notes

Commits are split one-fix-per-commit for review. No functional change outside src/cpu/riscv_fpu.{c,h} and src/util/fpu_lib.{c,h}.

@SolAstrius

Copy link
Copy Markdown
Contributor Author

@purplesyringa please help us with your wisdom 🙏

@SolAstrius SolAstrius changed the title riscv_fpu: IEEE 754 conformance for the scalar FPU riscv_fpu: IEEE 754 rounding and exception conformance for the scalar FPU Jun 21, 2026
Comment thread src/cpu/riscv_fpu.c
if (unlikely(rmm || (rm != 0x07 && eff != frm))) {
const uint32_t host = fpu_get_rounding_mode();
fpu_set_rounding_mode(rmm ? FPU_LIB_ROUND_NE : eff);
riscv_emulate_f_opc_op_impl(vm, insn, rmm);

@purplesyringa purplesyringa Jun 21, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it actually make sense to handle RMM in riscv_fpu? This feels like something that is better integrated into fpu_lib, where the rest of math-heavy logic lives. This arbitrary separation confused me.

Comment thread src/util/fpu_lib.h
}
if (fpu_is_negative32(f)) {
// Raise invalid flag, return canonical NaN
if (fpu_is_negative32(f) && (fpu_bit_f32_to_u32(f) << 1) != 0) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a #define FPU_LIB_FPxx_NEGATIVE_ZERO to fpu_lib to make the intent of this comparison clearer?

Comment thread src/util/fpu_lib.c
// unchanged: the round-to-nearest path below adds +/-0.5, which is inexact once
// 0.5 underflows a large value's ULP and would raise a spurious INEXACT -- e.g.
// on an out-of-range fcvt-to-int, leaving NX wrongly set alongside NV.
if (likely(!fpu_is_fractional32(f))) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the intention of this if is: if it's just to avoid raising wrong exceptions, you're already preventing that with the fpu_get_exception logic below. I don't necessarily think adding a fast path here is wrong, but it seems out of scope unless I'm missing something.

Signed-off-by: Sol Astrius Phoenix <sol@astrius.ink>
Signed-off-by: Sol Astrius Phoenix <sol@astrius.ink>
Signed-off-by: Sol Astrius Phoenix <sol@astrius.ink>
Signed-off-by: Sol Astrius Phoenix <sol@astrius.ink>
Signed-off-by: Sol Astrius Phoenix <sol@astrius.ink>
Signed-off-by: Sol Astrius Phoenix <sol@astrius.ink>
Signed-off-by: Sol Astrius Phoenix <sol@astrius.ink>
…low boundary

Signed-off-by: Sol Astrius Phoenix <sol@astrius.ink>
Signed-off-by: Sol Astrius Phoenix <sol@astrius.ink>
Comment thread src/util/fpu_lib.c
break;
}
return f;
fpu_set_exceptions(exc);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you benchmarked this? If this is a measurable slow-down, here's an alternative to the "remove fractional if" advice: maybe look at fpu_exponent32(f) to detect which group it belongs to and choose the path appropriately, so that exceptions are never raised:

  • f is large enough that it's already necessarily whole and thus doesn't need rounding.
  • f is in a range where adding 0.5 is always exact.
  • f is small enough that its rounded value can be clearly determined to be 0 (or maybe 1, depending on the rounding mode; I trust you can check this case-by-case yourself).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, nevermind, I don't think there's actually necessarily a range with exact addition like that. e.g. adding 0.1 to 1111.10001010101 (in binary) will cause an inexact result even though the exponent is quite small.

If exceptions are slow, maybe this is a good reason to manipulate the mantissa manually? We have some similar code in fpu_is_fractional32, and it doesn't seem too complex as to be clearly worse than setting exceptions.

Alternatively, maybe always rounding to zero and making the rounding decision based on the fractional part is a good idea.

Comment thread src/util/fpu_lib.c
case FPU_LIB_ROUND_NE:
case FPU_LIB_ROUND_MM:
return fpu_add32(f, fpu_bit_u32_to_f32(0x3F000000U | s));
r = fpu_add32(f, fpu_bit_u32_to_f32(0x3F000000U | s));

@purplesyringa purplesyringa Jun 21, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about it, I don't think this logic actually ever worked? If this addition is inexact, then it absolutely can round in the wrong direction. Consider f = prev(0.5), where prev denotes the float just before 0.5. Then:

  • f = 0.0111111111111111111111111_2
  • 0.5 = 0.1_2
  • f + 0.5 = 0.1111111111111111111111111_2, which is inexact, so if the rounding mode in the environment is round-to-nearest, this rounds to 1. But round(f) should've been 0, not 1.

Maybe you can fix this by temporarily changing the active rounding mode to flooring, but at this point maybe inspecting the fractional part by hand is more efficient.

Comment thread src/cpu/riscv_fpu.c
Comment on lines +570 to +571
* in RNE. funct3 == rm only carries a rounding mode on rounding-capable ops, so
* this never misfires on fsgnj/fcmp/fclass/fmv.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this means? e.g. fsgnj.s has 0 in bits 12-14, so this code will read rm = 0 and temporarily switch the rounding mode to whatever 0 represents, unless I'm missing something. Shouldn't the mode change be gated to only run on specific instructions?

Comment thread src/cpu/riscv_fpu.c
* range and exact. Flag-isolated, as the residual machinery can raise spurious
* exceptions.
*/
static forceinline fpu_f32_t riscv_rmm_div_apply_f32(fpu_f32_t n, fpu_f32_t a, fpu_f32_t b)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why is this named *_apply_* when the corresponding fixup functions for addition and multiplication don't have _apply in the middle?

Comment thread src/cpu/riscv_fpu.h
fpu_set_rounding_mode(host);
} else if (rm != 0x07 && eff != frm) {
// Static host-native mode that differs from frm.
const uint32_t host = fpu_get_rounding_mode();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a repetition of the logic in riscv_emulate_f_opc_op. Can the two be merged together?

@purplesyringa

Copy link
Copy Markdown
Contributor

LGTM re: sqrt(-0.0), fnmadd fix, dropping func_opt_size. NACK re: INEXACT in rounding, honoring rm. Will review the FMA UF logic tomorrow. I feel like RMM is a little too complex to review and should be a separate patch, there's plenty of complexity here without it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants