[AArch64] Don't try to vectorize fixed point to fp narrowing conversion #130665

pranavk · 2025-03-10T20:31:15Z

GCC, correctly, doesn't vectorize in this case. Absence of direct instructions to convert larger fixed point to lower floating point precision inadvertently causes rounding leading to subtle differences across ISAs.

https://godbolt.org/z/ssEchMWrE

Co-authored by: @echristo

llvmbot · 2025-03-10T20:31:51Z

@llvm/pr-subscribers-backend-aarch64

Author: Pranav Kant (pranavk)

Changes

GCC, correctly, doesn't vectorize in this case. Absence of direct instructions to convert larger fixed point to lower floating point precision inadvertently causes rounding leading to subtle differences across ISAs.

https://godbolt.org/z/ssEchMWrE

Full diff: https://github.com/llvm/llvm-project/pull/130665.diff

1 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+5-13)

diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 9511206c0660a..8ba763df1360b 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -5095,19 +5095,11 @@ SDValue AArch64TargetLowering::LowerVectorINT_TO_FP(SDValue Op,
   uint64_t VTSize = VT.getFixedSizeInBits();
   uint64_t InVTSize = InVT.getFixedSizeInBits();
   if (VTSize < InVTSize) {
-    MVT CastVT =
-        MVT::getVectorVT(MVT::getFloatingPointVT(InVT.getScalarSizeInBits()),
-                         InVT.getVectorNumElements());
-    if (IsStrict) {
-      In = DAG.getNode(Opc, dl, {CastVT, MVT::Other},
-                       {Op.getOperand(0), In});
-      return DAG.getNode(ISD::STRICT_FP_ROUND, dl, {VT, MVT::Other},
-                         {In.getValue(1), In.getValue(0),
-                          DAG.getIntPtrConstant(0, dl, /*isTarget=*/true)});
-    }
-    In = DAG.getNode(Opc, dl, CastVT, In);
-    return DAG.getNode(ISD::FP_ROUND, dl, VT, In,
-                       DAG.getIntPtrConstant(0, dl, /*isTarget=*/true));
+    // Due to the absence of any vector instructions to directly convert
+    // larger fixed point to lower floating point, we end up using intermediate
+    // representation before finally getting VTSize-d floating point. This extra
+    // rounding can lead to subtly incorrect results.
+    return SDValue();
   }
 
   if (VTSize > InVTSize) {

rnk · 2025-03-10T20:40:49Z

Test case? Maybe you have one but forgot to git add the new file.

davemgreen · 2025-03-10T20:57:44Z

There will be a lot of test cases that need updating. We should attempt to limit the damage this does, and I don't believe the default expansion is very efficient for unsigned operations.

echristo · 2025-03-10T21:00:49Z

There will be a lot of test cases that need updating. We should attempt to limit the damage this does, and I don't believe the default expansion is very efficient for unsigned operations.

I did notice that arguably turning off vectorization of this operation might be the most efficient way. It does leave the loop rather than unrolling it, but the code is quite a bit tighter and the scalar conversion is precise.

davemgreen · 2025-03-12T21:46:30Z

I did notice that arguably turning off vectorization of this operation might be the most efficient way. It does leave the loop rather than unrolling it, but the code is quite a bit tighter and the scalar conversion is precise.

Yeah - we should probably try to address both. There are two issues going on - no matter where the vector i64->f32 convert comes from (vectorization, intrinsics, optimizations, the front end) we should be producing correctly rounded code (*). This patch is the bug fix for that issue. Then there is the question of whether the vectorizer should be producing the vector operations. That will probably require a cost-model change to increase the costs (which might mean they are still used in loops with other vector operations, they will just be scalarized for that portion, it still might be profitable overall).

(* I was wondering if there was something that we could do with fast-math to make this OK. Especially as AFAICT it has been this way for 10+ years without anyone having an issue so far. None of the existing flags match exactly, but maybe contract or afn could be co-opted into it. (If fptrunc+fptrunc could be contracted, maybe fptrunc can be "uncontracted" into fptrunc+fptrunc). Something for a follow-on if we need it maybe, we might be able to use SVE instructions instead for some of them if the architecture feature is present).

If you can update all the failing test cases we can take a look at whether they need to change. I believe, for example, that i32->half conversions are always fine (https://alive2.llvm.org/ce/z/RXvYNv). That probably applies to higher int sizes too as the range or a half is relatively narrow. BFloats I would guess the same as float, unfortunately, but it would be good if we could prove one was right or wrong.

github-actions · 2025-03-18T00:28:37Z

✅ With the latest revision this PR passed the C/C++ code formatter.

pranavk · 2025-03-18T00:48:59Z

Thanks for taking a look.

I think you are right about ->half conversions. For now, I have made this conditional on target type being f32 to try to limit the impact, and also to avoid the backend crash for one of the tests (llvm/test/CodeGen/AArch64/sitofp-to-tbl.ll:sitofp_v8i8_to_v8f16).

The diffs look pretty inefficient as we guessed.

For the test that crashes, backend crashes because sitofp's operand in this test (v8i8) gets promoted to v8i32 instead of custom-lowered. So the place where I added SDValue is never encountered which in other cases leads to expansion of vector ops to scalar nodes. This leads of v8i32in the DAG by the time we call SelectionDAG::Legalize() which now encounters the return SDValue I introduced here which fails expansion and ends up trying to find an appropriate libcall for v4i32 -> v4f16 conversion which fails with stack trace below:

Assertion `LC != RTLIB::UNKNOWN_LIBCALL && "Unable to legalize as libcall"' failed
...
#10 0x000055b3966f8f03 (anonymous namespace)::SelectionDAGLegalize::ConvertNodeToLibcall(llvm::SDNode*) /usr/local/google/home/prka/p/llvm/llvm_main2/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:4800:21
#11 0x000055b3966e7e92 (anonymous namespace)::SelectionDAGLegalize::LegalizeOp(llvm::SDNode*) /usr/local/google/home/prka/p/llvm/llvm_main2/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:1353:7
#12 0x000055b3966e5f04 llvm::SelectionDAG::Legalize() /usr/local/google/home/prka/p/llvm/llvm_main2/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:5861:13
#13 0x000055b396891413 llvm::SelectionDAGISel::CodeGenAndEmitDAG() /usr/local/google/home/prka/p/llvm/llvm_main2/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1078:3
#14 0x000055b396890193 llvm::SelectionDAGISel::SelectBasicBlock(llvm::ilist_iterator_w_bits<llvm::ilist_detail::node_options<llvm::Instruction, true, false, void, true, llvm::BasicBlock>, false, true>, llvm::ilist_iterator_w_bits<llvm::ilist_detail::node_options<llvm::Instruction, true, false, void, true, llvm::BasicBlock>, false, true>, bool&) /usr/local/google/home/prka/p/llvm/llvm_main2/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:877:1
#15 0x000055b39688fd76 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) /usr/local/google/home/prka/p/llvm/llvm_main2/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:1903:11
#16 0x000055b39688cd99 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) /usr/local/google/home/prka/p/llvm/llvm_main2/llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp:614:3
#17 0x000055b39482de05 (anonymous namespace)::AArch64DAGToDAGISel::runOnMachineFunction(llvm::Mach

davemgreen

The unsigned variants are trying to do some sort of math to keep things in vectors, but it would be better to just scalarize as we have a scalar instruction and the number of vector lanes is only 2. Can we try and get it to scalarize instead?

davemgreen · 2025-03-18T12:10:34Z

llvm/test/CodeGen/AArch64/aarch64-int-to-fp-vectorize.ll

@@ -0,0 +1,89 @@
+; RUN: llc -o - %s | FileCheck %s


I think you can remove this test, given the others that change.

davemgreen · 2025-03-18T12:14:58Z

llvm/test/CodeGen/AArch64/complex-int-to-fp.ll

 ; CHECK: autogen_SD19655
 ; CHECK: scvtf
 ; CHECK: ret


Remove these old check lines

davemgreen · 2025-03-18T12:17:34Z

llvm/test/CodeGen/AArch64/fp16-v16-instructions.ll

-; CHECK-NEXT:    scvtf v1.2d, v7.2d
-; CHECK-NEXT:    fcvtn2 v4.4s, v5.2d
-; CHECK-NEXT:    fcvtn2 v2.4s, v3.2d
+; CHECK-NEXT:    fmov x12, d0


Is it possible to have this still work as before? It is converting to a fp16, but probably doing so in multiple steps so there is still a i64->fp32 convert.

I tried looking. If we want to do that, the check likely needs to be moved earlier before LowerVectorINT_TO_FP (and I am not sure there's any appropriate position for that earlier) as there doesn't seem to be any way to distinguish between instructions that are generated by the compiler (during split, promote, etc.) vs original sint_to_fp instruction once we reach LowerVectorINT_TO_FP.

Okay, I pattern-matched that and was able to exclude all such tests that eventually end up rounding to f16 as that doesn't exhibit rounding errors we are worried about. That did get rid of ~5 tests that remain unchanged with this PR now.

I will refactor the existing code in a while. Perhaps there are elegant ways to pattern match a thing like this.

After llvm#130665 these operations will be scalarized to avoid double-rounding. This updates the cost model to match. In the future we might be able to use SVE instructions to help, but for the moment the costs shoule be higher. Costsize and Latency costs are not yet expected to be accurate.

davemgreen

I've put together a patch in #132366 to update the costs. I will look into updating the GISel codegen too, if you are not looking into it. I am hoping it is just an update to the legalization descriptors for the operations.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This is a GISel equivalent of llvm#130665, preventing a double-rounding issue in sitofp/uitofp by scalarizing i64->f32 converts. Most of the changes are made in the ActionDefinitionsBuilder for G_SITOFP/G_UITOFP. Because it is legal to convert i64->f16 itofp without double-rounding, but not a fpround i64->f16, that variant is lowered to build the two extends.

davemgreen

I updated the complex-int-to-fp.ll test to remove the older CHECK lines, to make sure they shouldn't cause a problem.

LGTM, so long as the tests pass. Thanks

pranavk · 2025-03-24T21:05:56Z

Rebased. If tests pass, I will merge this.

Thank you for reviewing and other patches (cost-model, and GISel changes).

After #130665 these operations are scalarized to avoid double-rounding. This updates the cost model to match. In the future we might be able to use SVE instructions to help, but for the moment the costs should be higher. Costsize and Latency costs are not yet expected to be accurate. The vector insert/extract will use the cost of VectorInsertExtractBaseCost (2 by default).

This is a GISel equivalent of llvm#130665, preventing a double-rounding issue in sitofp/uitofp by scalarizing i64->f32 converts. Most of the changes are made in the ActionDefinitionsBuilder for G_SITOFP/G_UITOFP. Because it is legal to convert i64->f16 itofp without double-rounding, but not a fpround i64->f16, that variant is lowered to build the two extends.

llvmbot added the backend:AArch64 label Mar 10, 2025

pranavk requested review from echristo and lntue March 10, 2025 20:31

davemgreen mentioned this pull request Mar 13, 2025

[AArch64][CostModel] Increase the cost of illegal SVE int-to-fp converts #130756

Merged

davemgreen reviewed Mar 18, 2025

View reviewed changes

pranavk force-pushed the sint_to_fp branch 2 times, most recently from 1dfeb67 to e6a91ff Compare March 20, 2025 21:36

davemgreen mentioned this pull request Mar 21, 2025

[AArch64] Update costs for scalarizing i64->f32 int_to_fp. #132366

Merged

davemgreen reviewed Mar 21, 2025

View reviewed changes

davemgreen mentioned this pull request Mar 24, 2025

[AArch64][GlobalISel] Fix lowering of i64->f32 itofp. #132703

Open

pranavk force-pushed the sint_to_fp branch from e6a91ff to 474e20a Compare March 24, 2025 17:52

davemgreen approved these changes Mar 24, 2025

View reviewed changes

pranavk added 2 commits March 24, 2025 20:49

init

11f1314

modify tests

04aafe4

pranavk force-pushed the sint_to_fp branch from 474e20a to 04aafe4 Compare March 24, 2025 20:50

pranavk merged commit 0919ab3 into llvm:main Mar 25, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64] Don't try to vectorize fixed point to fp narrowing conversion #130665

[AArch64] Don't try to vectorize fixed point to fp narrowing conversion #130665

pranavk commented Mar 10, 2025 •

edited

Loading

llvmbot commented Mar 10, 2025

rnk commented Mar 10, 2025

davemgreen commented Mar 10, 2025

echristo commented Mar 10, 2025

davemgreen commented Mar 12, 2025

github-actions bot commented Mar 18, 2025 •

edited

Loading

pranavk commented Mar 18, 2025 •

edited

Loading

davemgreen left a comment

davemgreen Mar 18, 2025

davemgreen Mar 18, 2025

davemgreen Mar 18, 2025

pranavk Mar 20, 2025

pranavk Mar 20, 2025

pranavk Mar 20, 2025

davemgreen left a comment

davemgreen left a comment

pranavk commented Mar 24, 2025

[AArch64] Don't try to vectorize fixed point to fp narrowing conversion #130665

[AArch64] Don't try to vectorize fixed point to fp narrowing conversion #130665

Conversation

pranavk commented Mar 10, 2025 • edited Loading

llvmbot commented Mar 10, 2025

rnk commented Mar 10, 2025

davemgreen commented Mar 10, 2025

echristo commented Mar 10, 2025

davemgreen commented Mar 12, 2025

github-actions bot commented Mar 18, 2025 • edited Loading

pranavk commented Mar 18, 2025 • edited Loading

davemgreen left a comment

Choose a reason for hiding this comment

davemgreen Mar 18, 2025

Choose a reason for hiding this comment

davemgreen Mar 18, 2025

Choose a reason for hiding this comment

davemgreen Mar 18, 2025

Choose a reason for hiding this comment

pranavk Mar 20, 2025

Choose a reason for hiding this comment

pranavk Mar 20, 2025

Choose a reason for hiding this comment

pranavk Mar 20, 2025

Choose a reason for hiding this comment

davemgreen left a comment

Choose a reason for hiding this comment

davemgreen left a comment

Choose a reason for hiding this comment

pranavk commented Mar 24, 2025

pranavk commented Mar 10, 2025 •

edited

Loading

github-actions bot commented Mar 18, 2025 •

edited

Loading

pranavk commented Mar 18, 2025 •

edited

Loading