Skip to content

Commit

Permalink
Arm64: Implement VectorTableLookup/VectorTableLookupExtension intrins…
Browse files Browse the repository at this point in the history
…insic + Consecutive registers support (#80297)

* Add VectorTableLookup 2/3/4 in hwinstrinsiclistarm64.h

* Add VectorTableLookup

* fixes to libraries

* Prototype of simple tbl

* Some progress

* Some more updates

* working model

* Vector64<byte> support

* Add VectorTableLookup_3

* Add VectorTableLookup_4

* cleanup

* Remove regCount from LclVarDsc

* Some more cleanup

Some more cleanup

* setNextConsecutiveRegisterAssignment

* Some more cleanup

* TARGET_ARM64

* Use getNextConsecutiveRefPositions instead of nextConsecutiveRefPosition field

* jit format

* Move getNextConsecutiveRefPosition

* SA1141: Use tuple syntax

* Remove the unwanted field list code

* revert the flag that was mistakenly changed

* Add test cases

* FIELD_LIST

* Use FIELD_LIST approach

* jit format and fix arm build

* fix assert failure

* Add summary docs

Add summary docs in all the required files.

* Make APIs public again

* cleanup

* Handle case for reg mod 32

* Remove references from ref until API is approved

* Use generic getFreeCandidates()

* Add entries in ExtraAPis

* Set CLSCompliant=false

* Move in inner class

* Remove CLSCompliant flag

* Add a suppression file for System.Runtime.Intrinsics on the new APIs until it they go through API review

* Review feedback

* Add workaround for building tests

* review feedback

* TP: remove needsConsecutive parameter from BuildUse()

* TP: Remove pseudo intrinsic entries

* More fixes

* Add the missing csproj

* Fix test cases

* Add fake lib for AdvSimd.Arm64* as well

* Remove the workaround

* Use template to control if consecutive registers is needed or not

* jit format

* fix the workaround

* Revert "fix the workaround"

This reverts commit 1cb22d0.

* Revert "Remove the workaround"

This reverts commit b0b6a5e.

* Add VectorTableLookupExtensions  in libraries

* Add support for VectorTableLookupExtension

* WIP: available regs

* WIP: Remove test hacks

* Update getFreeCandidates() for consecutive registers

* Add missing resetRegState()

* Do not assume the current assigned register for consecutiveRegisters refposition is good.

If a refposition is marked as needConsecutive, then do not just assume that the existing register assigned
is good. We still go through the allocation for it to make sure that we allocate it a register such that the
consecutive registers are also free.

* Handle case for copyReg

For copyReg, if we assigned a different register, do not forget to free the existing register it was holding

* Update setNextConsecutiveRegister() with UPPER_VECTOR_RESTORE

* Update code around copyReg

Updated code such that if the refPosition is already assigned a register, then
check if assignedRegister satisfies are needs (for first / non-first refposition).
If not, performs copyReg.

TODO: Extract the code surrounding and including copyReg until where we `continue`.

* Create the VectorTableLookup fake CoreLib as a reference assembly

Make the AdvSimd.Arm64 tests reference the VectorTableLookup fake
CoreLib as reference assembly; and ensure that it is not included as a
ProjectReference by the toplevel HardwareIntrinsics merged test
runners.

The upshot is that the AdvSimd.Arm64 tests can call the extra APIs via
a direct reference to CoreLib (instead of through System.Runtime), but
the fake library is not copied into any test artifact directories, and
the Mono AOT compiler never sees it.

That said, after applying this, the test fails during AOT compilation
of the *real* CoreLib

```
Mono Ahead of Time compiler - compiling assembly /Users/alklig/work/dotnet-runtime/runtime-bugs2/artifacts/tests/coreclr/osx.arm64.Release/Tests/Core_Root/System.Private.CoreLib.dll
  AOTID EA8D702E-9736-3BD5-435B-A9D5EEADCC78
  %"System.ValueTuple`2<System.Runtime.Intrinsics.Vector128`1<byte>, System.Runtime.Intrinsics.Vector128`1<byte>>"* %arg_table
  <16 x i8>

  * Assertion: should not be reached at /Users/alklig/work/dotnet-runtime/runtime-bugs2/src/mono/mono/mini/mini-llvm.c:1455
```

* Rename VectorTableLookup to VectorTableLookup.RefOnly

* Start consecutive refpositions with RefTypeUse and never with RefTypeUpperVectorSave

* Add test cases for VectorTableLookupExtension

* Pass the missing defaultValues

* Use platform neutral BitScanForward

* jit format

* Remove the fake testlib workaround

* Fix mono failures

* Fix x64 TP regression

* Fix test cases

* fix some more tp regression

* Fix test build

* misc. changes

* Fix the bug where we were not freeing copyReg causing an assert in tier0

* Refactor little bit to reduce checks for VectorTableLookup

* Add template parameter for allocateReg/copyReg/select

* Comments

* Fix mono failures

* Added some more comments

* Call allocateReg/assignCopyReg/select methods only for refpositions that need consecutive registers

* Add heuristics to pick best possible set of registers which will need less spilling

* setNextConsecutiveRegisterAssignment() no longer checks for areNextConsecutiveRegistersFree()

* Rename getFreeCandidates() -> getConsecutiveCandidates()

* fix parameters to areNextConsecutiveRegistersFree()

* Rename and update canAssignNextConsecutiveRegisters()

* Add the missing setNextConsecutiveRegisterAssignment() calls

* Fix a condition for upperVector

* Update spill heurisitic to handle cases for jitstressregs

* Misc. remove popcount() check from getConsecutiveRegisters()

* jit format

* Fix a bug in canAssignNextConsecutiveRegisters()

* Add filterConsecutiveCandidates() and perform free/busy candidates scan

* Consume the new free/busy consecutive candidates method

* Handle case where 'copyReg == assignedReg'

* Misc. cleanup

* Include LsraExtraFPSetForConsecutive for stress regs

* handle case where 'assignedInterval == nullptr' for try_SPILL_COST()

* fix build error

* Call consecutiveCandidates() only for first refposition

* Only perform special handling for non-uppervectorrestore

* jit format

* Add impVectorTableLookup/impVectorTableLookupExtension

* Add the missing break

* Update assert

* Move definitions in GenTree, fix assert

* fix arm issue

* Remove common functions

* Rename info.needsConsecutiveRegisters to info.compNeedsConsecutiveRegisters

* Use needsConsecutiveRegisters template parameter for all configurations

* Handle case of round-robin in getConsecutiveRegisters()

* Disable tests for Mono

* Initialize outArray in test

* Add IsSupported checks for VectorLookup/VectorLookupExtension

* Fix the test cases for RunReflectionScenario_UnsafeRead()

* Review feedback

* wip

* fix a typo in test case

* Add filterConsecutiveCandidatesForSpill() to select range that needs fewer register spilling

* Add mono support.

* Delay free the registers for VectorTableLookupExtension

* fix mono build error

---------

Co-authored-by: Tanner Gooding <tagoo@outlook.com>
Co-authored-by: Aleksey Kliger <alklig@microsoft.com>
Co-authored-by: Zoltan Varga <vargaz@gmail.com>
  • Loading branch information
4 people committed Apr 4, 2023
1 parent 4271678 commit f92d72a
Show file tree
Hide file tree
Showing 27 changed files with 4,431 additions and 53 deletions.
8 changes: 8 additions & 0 deletions src/coreclr/jit/codegenlinear.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1624,6 +1624,14 @@ void CodeGen::genConsumeRegs(GenTree* tree)
genConsumeRegs(tree->gtGetOp1());
genConsumeRegs(tree->gtGetOp2());
}
else if (tree->OperIsFieldList())
{
for (GenTreeFieldList::Use& use : tree->AsFieldList()->Uses())
{
GenTree* fieldNode = use.GetNode();
genConsumeRegs(fieldNode);
}
}
#endif
else if (tree->OperIsLocalRead())
{
Expand Down
4 changes: 4 additions & 0 deletions src/coreclr/jit/compiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -6718,6 +6718,10 @@ int Compiler::compCompileHelper(CORINFO_MODULE_HANDLE classPtr,
compBasicBlockID = 0;
#endif

#ifdef TARGET_ARM64
info.compNeedsConsecutiveRegisters = false;
#endif

/* Initialize emitter */

if (!compIsForInlining())
Expand Down
8 changes: 8 additions & 0 deletions src/coreclr/jit/compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -2809,6 +2809,10 @@ class Compiler
CORINFO_CLASS_HANDLE clsHnd,
CORINFO_SIG_INFO* sig,
CorInfoType simdBaseJitType);

#ifdef TARGET_ARM64
GenTreeFieldList* gtConvertTableOpToFieldList(GenTree* op, unsigned fieldCount);
#endif
#endif // FEATURE_HW_INTRINSICS

GenTree* gtNewMustThrowException(unsigned helper, var_types type, CORINFO_CLASS_HANDLE clsHnd);
Expand Down Expand Up @@ -10061,6 +10065,10 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
// Number of class profile probes in this method
unsigned compHandleHistogramProbeCount;

#ifdef TARGET_ARM64
bool compNeedsConsecutiveRegisters;
#endif

} info;

ReturnTypeDesc compRetTypeDesc; // ABI return type descriptor for the method
Expand Down
4 changes: 4 additions & 0 deletions src/coreclr/jit/fginline.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1447,6 +1447,10 @@ void Compiler::fgInsertInlineeBlocks(InlineInfo* pInlineInfo)

lvaGenericsContextInUse |= InlineeCompiler->lvaGenericsContextInUse;

#ifdef TARGET_ARM64
info.compNeedsConsecutiveRegisters |= InlineeCompiler->info.compNeedsConsecutiveRegisters;
#endif

// If the inlinee compiler encounters switch tables, disable hot/cold splitting in the root compiler.
// TODO-CQ: Implement hot/cold splitting of methods with switch tables.
if (InlineeCompiler->fgHasSwitch && opts.compProcedureSplitting)
Expand Down
33 changes: 33 additions & 0 deletions src/coreclr/jit/gentree.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -23010,6 +23010,7 @@ GenTree* Compiler::gtNewSimdShuffleNode(var_types type,
op2->AsVecCon()->gtSimdVal = vecCns;

return gtNewSimdHWIntrinsicNode(type, op1, op2, lookupIntrinsic, simdBaseJitType, simdSize, isSimdAsHWIntrinsic);

#else
#error Unsupported platform
#endif // !TARGET_XARCH && !TARGET_ARM64
Expand Down Expand Up @@ -23879,6 +23880,38 @@ GenTree* Compiler::gtNewSimdWithElementNode(var_types type,
return gtNewSimdHWIntrinsicNode(type, op1, op2, op3, hwIntrinsicID, simdBaseJitType, simdSize, isSimdAsHWIntrinsic);
}

#ifdef TARGET_ARM64
//------------------------------------------------------------------------
// gtConvertTableOpToFieldList: Convert a operand that represents table of rows into
// field list, where each field represents a row in the table.
//
// Arguments:
// op -- Operand to convert.
// fieldCount -- Number of fields or rows present.
//
// Return Value:
// The GenTreeFieldList node.
//
GenTreeFieldList* Compiler::gtConvertTableOpToFieldList(GenTree* op, unsigned fieldCount)
{
LclVarDsc* opVarDsc = lvaGetDesc(op->AsLclVar());
unsigned lclNum = lvaGetLclNum(opVarDsc);
unsigned fieldSize = opVarDsc->lvSize() / fieldCount;
var_types fieldType = TYP_SIMD16;

GenTreeFieldList* fieldList = new (this, GT_FIELD_LIST) GenTreeFieldList();
int offset = 0;
for (unsigned fieldId = 0; fieldId < fieldCount; fieldId++)
{
GenTreeLclFld* fldNode = gtNewLclFldNode(lclNum, fieldType, offset);
fieldList->AddField(this, fldNode, offset, fieldType);

offset += fieldSize;
}
return fieldList;
}
#endif // TARGET_ARM64

GenTree* Compiler::gtNewSimdWithLowerNode(var_types type,
GenTree* op1,
GenTree* op2,
Expand Down
11 changes: 11 additions & 0 deletions src/coreclr/jit/hwintrinsic.h
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,9 @@ enum HWIntrinsicFlag : unsigned int

// The intrinsic supports some sort of containment analysis
HW_Flag_SupportsContainment = 0x2000,

// The intrinsic needs consecutive registers
HW_Flag_NeedsConsecutiveRegisters = 0x4000,
#else
#error Unsupported platform
#endif
Expand Down Expand Up @@ -751,6 +754,14 @@ struct HWIntrinsicInfo
return (flags & HW_Flag_SpecialCodeGen) != 0;
}

#ifdef TARGET_ARM64
static bool NeedsConsecutiveRegisters(NamedIntrinsic id)
{
HWIntrinsicFlag flags = lookupFlags(id);
return (flags & HW_Flag_NeedsConsecutiveRegisters) != 0;
}
#endif

static bool HasRMWSemantics(NamedIntrinsic id)
{
HWIntrinsicFlag flags = lookupFlags(id);
Expand Down
77 changes: 77 additions & 0 deletions src/coreclr/jit/hwintrinsicarm64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1900,7 +1900,84 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,
retNode = impAssignMultiRegTypeToVar(op1, sig->retTypeSigClass DEBUGARG(CorInfoCallConvExtension::Managed));
break;
}
case NI_AdvSimd_VectorTableLookup:
case NI_AdvSimd_Arm64_VectorTableLookup:
{
assert(sig->numArgs == 2);

CORINFO_ARG_LIST_HANDLE arg1 = sig->args;
CORINFO_ARG_LIST_HANDLE arg2 = info.compCompHnd->getArgNext(arg1);
var_types argType = TYP_UNKNOWN;
CORINFO_CLASS_HANDLE argClass = NO_CLASS_HANDLE;

argType = JITtype2varType(strip(info.compCompHnd->getArgType(sig, arg2, &argClass)));
op2 = getArgForHWIntrinsic(argType, argClass);
argType = JITtype2varType(strip(info.compCompHnd->getArgType(sig, arg1, &argClass)));
op1 = impPopStack().val;

if (op1->TypeGet() == TYP_STRUCT)
{
info.compNeedsConsecutiveRegisters = true;
unsigned fieldCount = info.compCompHnd->getClassNumInstanceFields(argClass);

if (!op1->OperIs(GT_LCL_VAR))
{
unsigned tmp = lvaGrabTemp(true DEBUGARG("VectorTableLookup temp tree"));

impAssignTempGen(tmp, op1, CHECK_SPILL_NONE);
op1 = gtNewLclvNode(tmp, argType);
}

op1 = gtConvertTableOpToFieldList(op1, fieldCount);
}
else
{
assert(varTypeIsSIMD(op1->TypeGet()));
}

retNode = gtNewSimdHWIntrinsicNode(retType, op1, op2, intrinsic, simdBaseJitType, simdSize);
break;
}
case NI_AdvSimd_VectorTableLookupExtension:
case NI_AdvSimd_Arm64_VectorTableLookupExtension:
{
assert(sig->numArgs == 3);

CORINFO_ARG_LIST_HANDLE arg1 = sig->args;
CORINFO_ARG_LIST_HANDLE arg2 = info.compCompHnd->getArgNext(arg1);
CORINFO_ARG_LIST_HANDLE arg3 = info.compCompHnd->getArgNext(arg2);
var_types argType = TYP_UNKNOWN;
CORINFO_CLASS_HANDLE argClass = NO_CLASS_HANDLE;

argType = JITtype2varType(strip(info.compCompHnd->getArgType(sig, arg3, &argClass)));
op3 = getArgForHWIntrinsic(argType, argClass);
argType = JITtype2varType(strip(info.compCompHnd->getArgType(sig, arg2, &argClass)));
op2 = impPopStack().val;
op1 = impPopStack().val;

if (op2->TypeGet() == TYP_STRUCT)
{
info.compNeedsConsecutiveRegisters = true;
unsigned fieldCount = info.compCompHnd->getClassNumInstanceFields(argClass);

if (!op2->OperIs(GT_LCL_VAR))
{
unsigned tmp = lvaGrabTemp(true DEBUGARG("VectorTableLookupExtension temp tree"));

impAssignTempGen(tmp, op2, CHECK_SPILL_NONE);
op2 = gtNewLclvNode(tmp, argType);
}

op2 = gtConvertTableOpToFieldList(op2, fieldCount);
}
else
{
assert(varTypeIsSIMD(op1->TypeGet()));
}

retNode = gtNewSimdHWIntrinsicNode(retType, op1, op2, op3, intrinsic, simdBaseJitType, simdSize);
break;
}
default:
{
return nullptr;
Expand Down
104 changes: 104 additions & 0 deletions src/coreclr/jit/hwintrinsiccodegenarm64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1002,6 +1002,110 @@ void CodeGen::genHWIntrinsic(GenTreeHWIntrinsic* node)
(emitSize == EA_8BYTE) ? INS_OPTS_8B : INS_OPTS_16B);
break;

case NI_AdvSimd_VectorTableLookup:
case NI_AdvSimd_Arm64_VectorTableLookup:
{
unsigned regCount = 0;
if (intrin.op1->OperIsFieldList())
{
GenTreeFieldList* fieldList = intrin.op1->AsFieldList();
GenTree* firstField = fieldList->Uses().GetHead()->GetNode();
op1Reg = firstField->GetRegNum();
INDEBUG(regNumber argReg = op1Reg);
for (GenTreeFieldList::Use& use : fieldList->Uses())
{
regCount++;
#ifdef DEBUG

GenTree* argNode = use.GetNode();
assert(argReg == argNode->GetRegNum());
argReg = REG_NEXT(argReg);
#endif
}
}
else
{
regCount = 1;
op1Reg = intrin.op1->GetRegNum();
}

switch (regCount)
{
case 2:
ins = INS_tbl_2regs;
break;
case 3:
ins = INS_tbl_3regs;
break;
case 4:
ins = INS_tbl_4regs;
break;
default:
assert(regCount == 1);
assert(ins == INS_tbl);
break;
}

GetEmitter()->emitIns_R_R_R(ins, emitSize, targetReg, op1Reg, op2Reg, opt);
break;
}

case NI_AdvSimd_VectorTableLookupExtension:
case NI_AdvSimd_Arm64_VectorTableLookupExtension:
{
assert(isRMW);
unsigned regCount = 0;
op1Reg = intrin.op1->GetRegNum();
op3Reg = intrin.op3->GetRegNum();
assert(targetReg != op3Reg);
if (intrin.op2->OperIsFieldList())
{
GenTreeFieldList* fieldList = intrin.op2->AsFieldList();
GenTree* firstField = fieldList->Uses().GetHead()->GetNode();
op2Reg = firstField->GetRegNum();
INDEBUG(regNumber argReg = op2Reg);
for (GenTreeFieldList::Use& use : fieldList->Uses())
{
regCount++;
#ifdef DEBUG

GenTree* argNode = use.GetNode();

// registers should be consecutive
assert(argReg == argNode->GetRegNum());
// and they should not interfere with targetReg
assert(targetReg != argReg);
argReg = REG_NEXT(argReg);
#endif
}
}
else
{
regCount = 1;
op2Reg = intrin.op2->GetRegNum();
}

switch (regCount)
{
case 2:
ins = INS_tbx_2regs;
break;
case 3:
ins = INS_tbx_3regs;
break;
case 4:
ins = INS_tbx_4regs;
break;
default:
assert(regCount == 1);
assert(ins == INS_tbx);
break;
}

GetEmitter()->emitIns_Mov(INS_mov, emitTypeSize(node), targetReg, op1Reg, /* canSkip */ true);
GetEmitter()->emitIns_R_R_R(ins, emitSize, targetReg, op2Reg, op3Reg, opt);
break;
}
default:
unreached();
}
Expand Down
8 changes: 4 additions & 4 deletions src/coreclr/jit/hwintrinsiclistarm64.h
Original file line number Diff line number Diff line change
Expand Up @@ -477,8 +477,8 @@ HARDWARE_INTRINSIC(AdvSimd, SubtractSaturateScalar,
HARDWARE_INTRINSIC(AdvSimd, SubtractScalar, 8, 2, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_sub, INS_sub, INS_fsub, INS_fsub}, HW_Category_SIMD, HW_Flag_SIMDScalar)
HARDWARE_INTRINSIC(AdvSimd, SubtractWideningLower, 8, 2, {INS_ssubl, INS_usubl, INS_ssubl, INS_usubl, INS_ssubl, INS_usubl, INS_ssubw, INS_usubw, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_BaseTypeFromSecondArg|HW_Flag_SpecialCodeGen)
HARDWARE_INTRINSIC(AdvSimd, SubtractWideningUpper, 16, 2, {INS_ssubl2, INS_usubl2, INS_ssubl2, INS_usubl2, INS_ssubl2, INS_usubl2, INS_ssubw2, INS_usubw2, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_BaseTypeFromSecondArg|HW_Flag_SpecialCodeGen)
HARDWARE_INTRINSIC(AdvSimd, VectorTableLookup, 8, 2, {INS_tbl, INS_tbl, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_NoFlag)
HARDWARE_INTRINSIC(AdvSimd, VectorTableLookupExtension, 8, 3, {INS_tbx, INS_tbx, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_HasRMWSemantics)
HARDWARE_INTRINSIC(AdvSimd, VectorTableLookup, 8, 2, {INS_tbl, INS_tbl, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_SpecialImport|HW_Flag_SpecialCodeGen|HW_Flag_NeedsConsecutiveRegisters)
HARDWARE_INTRINSIC(AdvSimd, VectorTableLookupExtension, 8, 3, {INS_tbx, INS_tbx, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_SpecialImport|HW_Flag_SpecialCodeGen|HW_Flag_HasRMWSemantics|HW_Flag_NeedsConsecutiveRegisters)
HARDWARE_INTRINSIC(AdvSimd, Xor, -1, 2, {INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor, INS_eor}, HW_Category_SIMD, HW_Flag_Commutative)
HARDWARE_INTRINSIC(AdvSimd, ZeroExtendWideningLower, 8, 1, {INS_uxtl, INS_uxtl, INS_uxtl, INS_uxtl, INS_uxtl, INS_uxtl, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_BaseTypeFromFirstArg)
HARDWARE_INTRINSIC(AdvSimd, ZeroExtendWideningUpper, 16, 1, {INS_uxtl2, INS_uxtl2, INS_uxtl2, INS_uxtl2, INS_uxtl2, INS_uxtl2, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_BaseTypeFromFirstArg)
Expand Down Expand Up @@ -651,8 +651,8 @@ HARDWARE_INTRINSIC(AdvSimd_Arm64, TransposeEven,
HARDWARE_INTRINSIC(AdvSimd_Arm64, TransposeOdd, -1, 2, {INS_trn2, INS_trn2, INS_trn2, INS_trn2, INS_trn2, INS_trn2, INS_trn2, INS_trn2, INS_trn2, INS_trn2}, HW_Category_SIMD, HW_Flag_NoFlag)
HARDWARE_INTRINSIC(AdvSimd_Arm64, UnzipEven, -1, 2, {INS_uzp1, INS_uzp1, INS_uzp1, INS_uzp1, INS_uzp1, INS_uzp1, INS_uzp1, INS_uzp1, INS_uzp1, INS_uzp1}, HW_Category_SIMD, HW_Flag_NoFlag)
HARDWARE_INTRINSIC(AdvSimd_Arm64, UnzipOdd, -1, 2, {INS_uzp2, INS_uzp2, INS_uzp2, INS_uzp2, INS_uzp2, INS_uzp2, INS_uzp2, INS_uzp2, INS_uzp2, INS_uzp2}, HW_Category_SIMD, HW_Flag_NoFlag)
HARDWARE_INTRINSIC(AdvSimd_Arm64, VectorTableLookup, 16, 2, {INS_tbl, INS_tbl, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_NoFlag)
HARDWARE_INTRINSIC(AdvSimd_Arm64, VectorTableLookupExtension, 16, 3, {INS_tbx, INS_tbx, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_HasRMWSemantics)
HARDWARE_INTRINSIC(AdvSimd_Arm64, VectorTableLookup, 16, 2, {INS_tbl, INS_tbl, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_SpecialImport|HW_Flag_SpecialCodeGen|HW_Flag_NeedsConsecutiveRegisters)
HARDWARE_INTRINSIC(AdvSimd_Arm64, VectorTableLookupExtension, 16, 3, {INS_tbx, INS_tbx, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMD, HW_Flag_SpecialImport|HW_Flag_SpecialCodeGen|HW_Flag_HasRMWSemantics|HW_Flag_NeedsConsecutiveRegisters)
HARDWARE_INTRINSIC(AdvSimd_Arm64, ZipHigh, -1, 2, {INS_zip2, INS_zip2, INS_zip2, INS_zip2, INS_zip2, INS_zip2, INS_zip2, INS_zip2, INS_zip2, INS_zip2}, HW_Category_SIMD, HW_Flag_NoFlag)
HARDWARE_INTRINSIC(AdvSimd_Arm64, ZipLow, -1, 2, {INS_zip1, INS_zip1, INS_zip1, INS_zip1, INS_zip1, INS_zip1, INS_zip1, INS_zip1, INS_zip1, INS_zip1}, HW_Category_SIMD, HW_Flag_NoFlag)

Expand Down
Loading

0 comments on commit f92d72a

Please sign in to comment.