Skip to content

[SelectionDAG][AArch64] Legalize power of 2 vector.[de]interleaveN #141513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 3, 2025

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented May 26, 2025

After #139893, we now have [de]interleave intrinsics for factors 2-8 inclusive, with the plan to eventually get the loop vectorizer to emit a single intrinsic for these factors instead of recursively deinterleaving (to support scalable non-power-of-2 factors and to remove the complexity in the interleaved access pass).

AArch64 currently supports scalable interleaved groups of factors 2 and 4 from the loop vectorizer. For factor 4 this is currently emitted as a series of recursive [de]interleaves, and normally converted to a target intrinsic in the interleaved access pass.

However if for some reason the interleaved access pass doesn't catch it, the [de]interleave4 intrinsic will need to be lowered by the backend.

This patch legalizes the node and any other power-of-2 factor to smaller factors, so if a target can lower [de]interleave2 it should be able to handle this without crashing.

Factor 3 will probably be more complicated to lower so I've left it out for now. We can disable it in the AArch64 cost model when implementing the loop vectorizer changes.

@llvmbot
Copy link
Member

llvmbot commented May 26, 2025

@llvm/pr-subscribers-llvm-selectiondag

@llvm/pr-subscribers-backend-aarch64

Author: Luke Lau (lukel97)

Changes

After #139893, we now have [de]interleave intrinsics for factors 2-8 inclusive, with the plan to eventually get the loop vectorizer to emit a single intrinsic for these factors instead of recursively deinterleaving (to support scalable non-power-of-2 factors and to remove the complexity in the interleaved access pass).

AArch64 currently supports scalable interleaved groups of factors 2 and 4 from the loop vectorizer. For factor 4 this is currently emitted as a series of recursive [de]interleaves, and normally converted to a target intrinsic in the interleaved access pass.

However if for some reason the interleaved access pass doesn't catch it, the [de]interleave4 intrinsic will need to be lowered by the backend, which this patch adds support for.

Factor 3 will probably be more complicated to lower so I've left it out for now. We can disable it in the cost model when implementing the loop vectorizer changes.


Full diff: https://github.com/llvm/llvm-project/pull/141513.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+57)
  • (modified) llvm/test/CodeGen/AArch64/sve-vector-deinterleave.ll (+64-1)
  • (modified) llvm/test/CodeGen/AArch64/sve-vector-interleave.ll (+64)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 4dacd2273306e..08b9f098efb1e 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -29441,6 +29441,35 @@ AArch64TargetLowering::LowerVECTOR_DEINTERLEAVE(SDValue Op,
   EVT OpVT = Op.getValueType();
   assert(OpVT.isScalableVector() &&
          "Expected scalable vector in LowerVECTOR_DEINTERLEAVE.");
+  assert(Op->getNumOperands() == 2 ||
+         Op->getNumOperands() == 4 && "Expected factor to be 2 or 4.");
+
+  // Deinterleave 'ab cd ac bd' as a series of factor 2 deinterleaves.
+  if (Op.getNumOperands() == 4) {
+    SDVTList VTList = DAG.getVTList({OpVT, OpVT});
+    // ac ac
+    SDNode *LHS0 = DAG.getNode(ISD::VECTOR_DEINTERLEAVE, DL, VTList,
+                               Op.getOperand(0), Op.getOperand(1))
+                       .getNode();
+    // bd bd
+    SDNode *RHS0 = DAG.getNode(ISD::VECTOR_DEINTERLEAVE, DL, VTList,
+                               Op.getOperand(2), Op.getOperand(3))
+                       .getNode();
+    // aa cc
+    SDNode *LHS1 = DAG.getNode(ISD::VECTOR_DEINTERLEAVE, DL, VTList,
+                               SDValue(LHS0, 0), SDValue(RHS0, 0))
+                       .getNode();
+    // bb dd
+    SDNode *RHS1 = DAG.getNode(ISD::VECTOR_DEINTERLEAVE, DL, VTList,
+                               SDValue(LHS0, 1), SDValue(RHS0, 1))
+                       .getNode();
+
+    // aa bb cc dd
+    return DAG.getMergeValues({SDValue(LHS1, 0), SDValue(RHS1, 0),
+                               SDValue(LHS1, 1), SDValue(RHS1, 1)},
+                              DL);
+  }
+
   SDValue Even = DAG.getNode(AArch64ISD::UZP1, DL, OpVT, Op.getOperand(0),
                              Op.getOperand(1));
   SDValue Odd = DAG.getNode(AArch64ISD::UZP2, DL, OpVT, Op.getOperand(0),
@@ -29454,6 +29483,34 @@ SDValue AArch64TargetLowering::LowerVECTOR_INTERLEAVE(SDValue Op,
   EVT OpVT = Op.getValueType();
   assert(OpVT.isScalableVector() &&
          "Expected scalable vector in LowerVECTOR_INTERLEAVE.");
+  assert(Op->getNumOperands() == 2 ||
+         Op->getNumOperands() == 4 && "Expected factor to be 2 or 4.");
+
+  // Interleave 'aa bb cc dd' as a series of factor 2 interleaves.
+  if (Op.getNumOperands() == 4) {
+    SDVTList VTList = DAG.getVTList({OpVT, OpVT});
+    // ac ac
+    SDNode *LHS0 = DAG.getNode(ISD::VECTOR_INTERLEAVE, DL, VTList,
+                               Op.getOperand(0), Op.getOperand(2))
+                       .getNode();
+    // bd bd
+    SDNode *RHS0 = DAG.getNode(ISD::VECTOR_INTERLEAVE, DL, VTList,
+                               Op.getOperand(1), Op.getOperand(3))
+                       .getNode();
+    // ab cd
+    SDNode *LHS1 = DAG.getNode(ISD::VECTOR_INTERLEAVE, DL, VTList,
+                               SDValue(LHS0, 0), SDValue(RHS0, 0))
+                       .getNode();
+    // ab cd
+    SDNode *RHS1 = DAG.getNode(ISD::VECTOR_INTERLEAVE, DL, VTList,
+                               SDValue(LHS0, 1), SDValue(RHS0, 1))
+                       .getNode();
+
+    // ab cd ab cd
+    return DAG.getMergeValues({SDValue(LHS1, 0), SDValue(LHS1, 1),
+                               SDValue(RHS1, 0), SDValue(RHS1, 1)},
+                              DL);
+  }
 
   SDValue Lo = DAG.getNode(AArch64ISD::ZIP1, DL, OpVT, Op.getOperand(0),
                            Op.getOperand(1));
diff --git a/llvm/test/CodeGen/AArch64/sve-vector-deinterleave.ll b/llvm/test/CodeGen/AArch64/sve-vector-deinterleave.ll
index adf1b48b6998a..9a871e20b4b09 100644
--- a/llvm/test/CodeGen/AArch64/sve-vector-deinterleave.ll
+++ b/llvm/test/CodeGen/AArch64/sve-vector-deinterleave.ll
@@ -151,6 +151,70 @@ define {<vscale x 2 x i64>, <vscale x 2 x i64>} @vector_deinterleave_nxv2i64_nxv
   ret {<vscale x 2 x i64>, <vscale x 2 x i64>} %retval
 }
 
+define {<vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>} @vector_deinterleave_nxv16i8_nxv64i8(<vscale x 64 x i8> %vec) {
+; CHECK-LABEL: vector_deinterleave_nxv16i8_nxv64i8:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    uzp1 z4.b, z2.b, z3.b
+; CHECK-NEXT:    uzp1 z5.b, z0.b, z1.b
+; CHECK-NEXT:    uzp2 z3.b, z2.b, z3.b
+; CHECK-NEXT:    uzp2 z6.b, z0.b, z1.b
+; CHECK-NEXT:    uzp1 z0.b, z5.b, z4.b
+; CHECK-NEXT:    uzp2 z2.b, z5.b, z4.b
+; CHECK-NEXT:    uzp1 z1.b, z6.b, z3.b
+; CHECK-NEXT:    uzp2 z3.b, z6.b, z3.b
+; CHECK-NEXT:    ret
+  %retval = call {<vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>} @llvm.vector.deinterleave4.nxv64i8(<vscale x 64 x i8> %vec)
+  ret {<vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>, <vscale x 16 x i8>} %retval
+}
+
+define {<vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>} @vector_deinterleave_nxv8i16_nxv32i16(<vscale x 32 x i16> %vec) {
+; CHECK-LABEL: vector_deinterleave_nxv8i16_nxv32i16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    uzp1 z4.h, z2.h, z3.h
+; CHECK-NEXT:    uzp1 z5.h, z0.h, z1.h
+; CHECK-NEXT:    uzp2 z3.h, z2.h, z3.h
+; CHECK-NEXT:    uzp2 z6.h, z0.h, z1.h
+; CHECK-NEXT:    uzp1 z0.h, z5.h, z4.h
+; CHECK-NEXT:    uzp2 z2.h, z5.h, z4.h
+; CHECK-NEXT:    uzp1 z1.h, z6.h, z3.h
+; CHECK-NEXT:    uzp2 z3.h, z6.h, z3.h
+; CHECK-NEXT:    ret
+  %retval = call {<vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>} @llvm.vector.deinterleave4.nxv32i16(<vscale x 32 x i16> %vec)
+  ret {<vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>, <vscale x 8 x i16>} %retval
+}
+
+define {<vscale x 4 x i32>, <vscale x 4 x i32>, <vscale x 4 x i32>, <vscale x 4 x i32>} @vector_deinterleave_nxv4i32_nxv16i32(<vscale x 16 x i32> %vec) {
+; CHECK-LABEL: vector_deinterleave_nxv4i32_nxv16i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    uzp1 z4.s, z2.s, z3.s
+; CHECK-NEXT:    uzp1 z5.s, z0.s, z1.s
+; CHECK-NEXT:    uzp2 z3.s, z2.s, z3.s
+; CHECK-NEXT:    uzp2 z6.s, z0.s, z1.s
+; CHECK-NEXT:    uzp1 z0.s, z5.s, z4.s
+; CHECK-NEXT:    uzp2 z2.s, z5.s, z4.s
+; CHECK-NEXT:    uzp1 z1.s, z6.s, z3.s
+; CHECK-NEXT:    uzp2 z3.s, z6.s, z3.s
+; CHECK-NEXT:    ret
+  %retval = call {<vscale x 4 x i32>, <vscale x 4 x i32>, <vscale x 4 x i32>, <vscale x 4 x i32>} @llvm.vector.deinterleave4.nxv16i32(<vscale x 16 x i32> %vec)
+  ret {<vscale x 4 x i32>, <vscale x 4 x i32>, <vscale x 4 x i32>, <vscale x 4 x i32>} %retval
+}
+
+define {<vscale x 2 x i64>, <vscale x 2 x i64>, <vscale x 2 x i64>, <vscale x 2 x i64>} @vector_deinterleave_nxv2i64_nxv8i64(<vscale x 8 x i64> %vec) {
+; CHECK-LABEL: vector_deinterleave_nxv2i64_nxv8i64:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    uzp1 z4.d, z2.d, z3.d
+; CHECK-NEXT:    uzp1 z5.d, z0.d, z1.d
+; CHECK-NEXT:    uzp2 z3.d, z2.d, z3.d
+; CHECK-NEXT:    uzp2 z6.d, z0.d, z1.d
+; CHECK-NEXT:    uzp1 z0.d, z5.d, z4.d
+; CHECK-NEXT:    uzp2 z2.d, z5.d, z4.d
+; CHECK-NEXT:    uzp1 z1.d, z6.d, z3.d
+; CHECK-NEXT:    uzp2 z3.d, z6.d, z3.d
+; CHECK-NEXT:    ret
+  %retval = call {<vscale x 2 x i64>, <vscale x 2 x i64>, <vscale x 2 x i64>, <vscale x 2 x i64>} @llvm.vector.deinterleave4.nxv8i64(<vscale x 8 x i64> %vec)
+  ret {<vscale x 2 x i64>, <vscale x 2 x i64>, <vscale x 2 x i64>, <vscale x 2 x i64>} %retval
+}
+
 ; Predicated
 define {<vscale x 16 x i1>, <vscale x 16 x i1>} @vector_deinterleave_nxv16i1_nxv32i1(<vscale x 32 x i1> %vec) {
 ; CHECK-LABEL: vector_deinterleave_nxv16i1_nxv32i1:
@@ -279,7 +343,6 @@ define {<vscale x 2 x i32>, <vscale x 2 x i32>} @vector_deinterleave_nxv2i32_nxv
   ret {<vscale x 2 x i32>, <vscale x 2 x i32>} %retval
 }
 
-
 ; Floating declarations
 declare {<vscale x 2 x half>,<vscale x 2 x half>} @llvm.vector.deinterleave2.nxv4f16(<vscale x 4 x half>)
 declare {<vscale x 4 x half>, <vscale x 4 x half>} @llvm.vector.deinterleave2.nxv8f16(<vscale x 8 x half>)
diff --git a/llvm/test/CodeGen/AArch64/sve-vector-interleave.ll b/llvm/test/CodeGen/AArch64/sve-vector-interleave.ll
index 288034422d9c0..990faf0d320e3 100644
--- a/llvm/test/CodeGen/AArch64/sve-vector-interleave.ll
+++ b/llvm/test/CodeGen/AArch64/sve-vector-interleave.ll
@@ -146,6 +146,70 @@ define <vscale x 4 x i64> @interleave2_nxv4i64(<vscale x 2 x i64> %vec0, <vscale
   ret <vscale x 4 x i64> %retval
 }
 
+define <vscale x 64 x i8> @interleave4_nxv16i8(<vscale x 16 x i8> %vec0, <vscale x 16 x i8> %vec1, <vscale x 16 x i8> %vec2, <vscale x 16 x i8> %vec3) {
+; CHECK-LABEL: interleave4_nxv16i8:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    zip1 z4.b, z1.b, z3.b
+; CHECK-NEXT:    zip1 z5.b, z0.b, z2.b
+; CHECK-NEXT:    zip2 z3.b, z1.b, z3.b
+; CHECK-NEXT:    zip2 z6.b, z0.b, z2.b
+; CHECK-NEXT:    zip1 z0.b, z5.b, z4.b
+; CHECK-NEXT:    zip2 z1.b, z5.b, z4.b
+; CHECK-NEXT:    zip1 z2.b, z6.b, z3.b
+; CHECK-NEXT:    zip2 z3.b, z6.b, z3.b
+; CHECK-NEXT:    ret
+  %retval = call <vscale x 64 x i8> @llvm.vector.interleave4.nxv16i8(<vscale x 16 x i8> %vec0, <vscale x 16 x i8> %vec1, <vscale x 16 x i8> %vec2, <vscale x 16 x i8> %vec3)
+  ret <vscale x 64 x i8> %retval
+}
+
+define <vscale x 32 x i16> @interleave4_nxv8i16(<vscale x 8 x i16> %vec0, <vscale x 8 x i16> %vec1, <vscale x 8 x i16> %vec2, <vscale x 8 x i16> %vec3) {
+; CHECK-LABEL: interleave4_nxv8i16:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    zip1 z4.h, z1.h, z3.h
+; CHECK-NEXT:    zip1 z5.h, z0.h, z2.h
+; CHECK-NEXT:    zip2 z3.h, z1.h, z3.h
+; CHECK-NEXT:    zip2 z6.h, z0.h, z2.h
+; CHECK-NEXT:    zip1 z0.h, z5.h, z4.h
+; CHECK-NEXT:    zip2 z1.h, z5.h, z4.h
+; CHECK-NEXT:    zip1 z2.h, z6.h, z3.h
+; CHECK-NEXT:    zip2 z3.h, z6.h, z3.h
+; CHECK-NEXT:    ret
+  %retval = call <vscale x 32 x i16> @llvm.vector.interleave4.nxv8i16(<vscale x 8 x i16> %vec0, <vscale x 8 x i16> %vec1, <vscale x 8 x i16> %vec2, <vscale x 8 x i16> %vec3)
+  ret <vscale x 32 x i16> %retval
+}
+
+define <vscale x 16 x i32> @interleave4_nxv4i32(<vscale x 4 x i32> %vec0, <vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2, <vscale x 4 x i32> %vec3) {
+; CHECK-LABEL: interleave4_nxv4i32:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    zip1 z4.s, z1.s, z3.s
+; CHECK-NEXT:    zip1 z5.s, z0.s, z2.s
+; CHECK-NEXT:    zip2 z3.s, z1.s, z3.s
+; CHECK-NEXT:    zip2 z6.s, z0.s, z2.s
+; CHECK-NEXT:    zip1 z0.s, z5.s, z4.s
+; CHECK-NEXT:    zip2 z1.s, z5.s, z4.s
+; CHECK-NEXT:    zip1 z2.s, z6.s, z3.s
+; CHECK-NEXT:    zip2 z3.s, z6.s, z3.s
+; CHECK-NEXT:    ret
+  %retval = call <vscale x 16 x i32> @llvm.vector.interleave4.nxv4i32(<vscale x 4 x i32> %vec0, <vscale x 4 x i32> %vec1, <vscale x 4 x i32> %vec2, <vscale x 4 x i32> %vec3)
+  ret <vscale x 16 x i32> %retval
+}
+
+define <vscale x 8 x i64> @interleave4_nxv8i64(<vscale x 2 x i64> %vec0, <vscale x 2 x i64> %vec1, <vscale x 2 x i64> %vec2, <vscale x 2 x i64> %vec3) {
+; CHECK-LABEL: interleave4_nxv8i64:
+; CHECK:       // %bb.0:
+; CHECK-NEXT:    zip1 z4.d, z1.d, z3.d
+; CHECK-NEXT:    zip1 z5.d, z0.d, z2.d
+; CHECK-NEXT:    zip2 z3.d, z1.d, z3.d
+; CHECK-NEXT:    zip2 z6.d, z0.d, z2.d
+; CHECK-NEXT:    zip1 z0.d, z5.d, z4.d
+; CHECK-NEXT:    zip2 z1.d, z5.d, z4.d
+; CHECK-NEXT:    zip1 z2.d, z6.d, z3.d
+; CHECK-NEXT:    zip2 z3.d, z6.d, z3.d
+; CHECK-NEXT:    ret
+  %retval = call <vscale x 8 x i64> @llvm.vector.interleave4.nxv8i64(<vscale x 2 x i64> %vec0, <vscale x 2 x i64> %vec1, <vscale x 2 x i64> %vec2, <vscale x 2 x i64> %vec3)
+  ret <vscale x 8 x i64> %retval
+}
+
 ; Predicated
 
 define <vscale x 32 x i1> @interleave2_nxv32i1(<vscale x 16 x i1> %vec0, <vscale x 16 x i1> %vec1) {

@paulwalker-arm
Copy link
Collaborator

Rather than each target having to do this can you move the expansion into the target neutral part of operation legalisation? with AArch64TargetLowering::LowerVECTOR_DEINTERLEAVE returning SDValue() for the cases where expansion is required.

This will make it easier to support the other interleave factors as well because the common expansion can just expand one level down (i.e. (de)interleave8 -> (de)interleave4, then (de)interleave4 -> (de)interleave2), which gives a target the option to say the intermediate step is legal or to custom lower it.

@llvmbot llvmbot added the llvm:SelectionDAG SelectionDAGISel as well label May 27, 2025
@lukel97
Copy link
Contributor Author

lukel97 commented May 27, 2025

Rather than each target having to do this can you move the expansion into the target neutral part of operation legalisation? with AArch64TargetLowering::LowerVECTOR_DEINTERLEAVE returning SDValue() for the cases where expansion is required.

This will make it easier to support the other interleave factors as well because the common expansion can just expand one level down (i.e. (de)interleave8 -> (de)interleave4, then (de)interleave4 -> (de)interleave2), which gives a target the option to say the intermediate step is legal or to custom lower it.

Good idea, I've done this in c2e329d and added a test for (de)interleave8 codegen. It's worth noting that the target will always need to be able to lower at least (de)interleave2, since you need it to decompose factor 8 into factor 4.

@lukel97 lukel97 changed the title [AArch64] Lower vector.[de]interleave4 [SelectionDAG][AArch64] Legalize power of 2 vector.[de]interleaveN May 27, 2025
@lukel97 lukel97 requested a review from RKSimon May 27, 2025 16:39
Copy link
Collaborator

@paulwalker-arm paulwalker-arm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than a comment recommendation this looks good to me.

EVT VecVT = Node->getValueType(0);
SmallVector<EVT> HalfVTs(Factor / 2, VecVT);
// Deinterleave at Factor/2 so each result contains two factors interleaved:
// ab cd ab cd -> [ac bd] [ac bd]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this and the related comments it would be better to use unique letters throughout to make it clearer all lanes are distinct.

@lukel97 lukel97 merged commit 9a2d4d1 into llvm:main Jun 3, 2025
11 checks passed
rorth pushed a commit to rorth/llvm-project that referenced this pull request Jun 11, 2025
…lvm#141513)

After llvm#139893, we now have
[de]interleave intrinsics for factors 2-8 inclusive, with the plan to
eventually get the loop vectorizer to emit a single intrinsic for these
factors instead of recursively deinterleaving (to support scalable
non-power-of-2 factors and to remove the complexity in the interleaved
access pass).

AArch64 currently supports scalable interleaved groups of factors 2 and
4 from the loop vectorizer. For factor 4 this is currently emitted as a
series of recursive [de]interleaves, and normally converted to a target
intrinsic in the interleaved access pass.

However if for some reason the interleaved access pass doesn't catch it,
the [de]interleave4 intrinsic will need to be lowered by the backend.

This patch legalizes the node and any other power-of-2 factor to smaller
factors, so if a target can lower [de]interleave2 it should be able to
handle this without crashing.

Factor 3 will probably be more complicated to lower so I've left it out
for now. We can disable it in the AArch64 cost model when implementing
the loop vectorizer changes.
DhruvSrivastavaX pushed a commit to DhruvSrivastavaX/lldb-for-aix that referenced this pull request Jun 12, 2025
…lvm#141513)

After llvm#139893, we now have
[de]interleave intrinsics for factors 2-8 inclusive, with the plan to
eventually get the loop vectorizer to emit a single intrinsic for these
factors instead of recursively deinterleaving (to support scalable
non-power-of-2 factors and to remove the complexity in the interleaved
access pass).

AArch64 currently supports scalable interleaved groups of factors 2 and
4 from the loop vectorizer. For factor 4 this is currently emitted as a
series of recursive [de]interleaves, and normally converted to a target
intrinsic in the interleaved access pass.

However if for some reason the interleaved access pass doesn't catch it,
the [de]interleave4 intrinsic will need to be lowered by the backend.

This patch legalizes the node and any other power-of-2 factor to smaller
factors, so if a target can lower [de]interleave2 it should be able to
handle this without crashing.

Factor 3 will probably be more complicated to lower so I've left it out
for now. We can disable it in the AArch64 cost model when implementing
the loop vectorizer changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 llvm:SelectionDAG SelectionDAGISel as well
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants