Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Expose AVX10 converged vector ISA #98069

Closed
anthonycanino opened this issue Feb 6, 2024 · 18 comments
Closed

[API Proposal]: Expose AVX10 converged vector ISA #98069

anthonycanino opened this issue Feb 6, 2024 · 18 comments
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.Intrinsics avx10 Related to the AVX10 architecture

Comments

@anthonycanino
Copy link
Contributor

anthonycanino commented Feb 6, 2024

Background and motivation

Intel's latest AVX offering, AVX10, introduces a converged vector ISA that will be supported on all Intel processors. Please find full details in the AVX10 technical paper and AVX10 architectural specification.

For brevity, we highlight the following relevant features from the technical paper and specification:

  1. AVX10 will be a versioned ISA, where a version N will include all instructions/features of version N-1, N-2 etc.

  2. AVX10 supports multiple vector lengths, where CPUID exposes the max vector length available for a processor.

  3. AVX10.1 will support AVX512 and the following extensions for vector lengths 128, 256, and 512 bits:

    • AVX512F, AVX512BW, AVX512DQ, and AVX512CD

    • AVX512_VBMI, AVX512_IFMA

    • AVX512_VNNI

    • AVX512_BF16

    • AVX512_VPOPCNTDQ,, AVX512_VBMI2, VAES, GFNI, VPCLMULQDQ, AVX512_BITALG

    • AVX512_FP16

  4. On future Performance-cores (P-cores) based processors, the maximum vector length will be 512-bit; On future Efficient-cores (E-cores) based processors it will be 256-bit.

API Proposal

AVX10 Versioning

As part of AVX10 versioning, developers can expect that all features and capabilities of version N will be included in version N+1. As such, we propose that we capture this incremental versioning via class inheritance with the nomenclature Avx10vN:

class Avx10v1 
{
  
}

class Avx10v2 : Avx10v1 
{
  
}

class Avx10v3 : Avx10v2 
{
  
}

Developers will be able to continue to check for ISA support via Avx10vN.IsSupported as is currently done for specific .NET ISA APIs.

AVX10 Vector Length

As AVX10 allows implementations to support different maximum vector lengths, we propose the following nested class structure:

class Avx10v1
{
  public static Vector128<ulong> Abs(Vector128<long> v);

  class V256 
  { 
    public static Vector256<ulong> Abs(Vector256<long> v);
  }

  class V512 
  {
    public static Vector512<ulong> Abs(Vector512<long> v);
  }

}

Avx10v1.IsSupported() returns true if 128-bit vectors are enabled via the CPUID vector length bits for AVX10. As the AVX10 architecture specification states that the highest enumerated vector length implies all smaller vector lengths are supported, a developer may check Avx10v1.V256.IsSupported() and safely use Avx10v1 methods.

AVX10 and AVX512

AVX10.1 represents a pre-enabling step for AVX10, and is the convergence of the previously listed AVX512 instruction sets --- there is overlap with existing exposed .NET AVX512 ISA APIs. We propose that if AVX10v1.V512.IsSupported() returns true, then the corresponding AVX512 APIs for the aforementioned extensions can safely be used:

Vector512<long> v1 = Vector512.Create((long)someParam);
if (Avx10v1.V512.IsSupported()) {
  Vector512<ulong> v2 = Avx512F.Abs(v1);
  Vector512<double> v3 = Avx512DQ.ConvertToVector512Double(v2);
  // etc
}

For an AVX10/256 implementation, the subset of VL instructions for existing AVX512 instruction sets will be available _without the presence of 512-bit support. As existing AVX512VL (and the associated nested VL classes in .NET APIs) implies 512-bit, we will expose the 128-bit VL instructions under the top level Avx10v1 class and 256-bit VL instructions under the V256 nested classes. For clarity, it is possible to have AVX10v1.V256.IsSupported() == true but AVX512F.IsSupported() == false on AVX10/256 implementation.

Avx10v1 API

Given the aforementioned API discussion, we propose the following API, where all current AVX512VL family instructions are consolidated under the top level Avx10v1 and V256 nested classes:

namespace System.Runtime.Intrinsics.X86;

public abstract class Avx10v1 : Avx2
{
    public static new bool IsSupported { get; }
    
    /// From AVX512F VL
    public static Vector128<ulong> Abs(Vector128<long> value);
    public static Vector128<int> AlignRight32(Vector128<int> left, Vector128<int> right, [ConstantExpected] byte mask);
    public static Vector128<uint> AlignRight32(Vector128<uint> left, Vector128<uint> right, [ConstantExpected] byte mask);
    public static Vector128<long> AlignRight64(Vector128<long> left, Vector128<long> right, [ConstantExpected] byte mask);
    public static Vector128<ulong> AlignRight64(Vector128<ulong> left, Vector128<ulong> right, [ConstantExpected] byte mask);
    public static Vector128<int> CompareGreaterThanOrEqual(Vector128<int> left, Vector128<int> right);
    public static Vector128<int> CompareLessThan(Vector128<int> left, Vector128<int> right);
    public static Vector128<int> CompareLessThanOrEqual(Vector128<int> left, Vector128<int> right);
    public static Vector128<int> CompareNotEqual(Vector128<int> left, Vector128<int> right);
    public static Vector128<long> CompareGreaterThanOrEqual(Vector128<long> left, Vector128<long> right);
    public static Vector128<long> CompareLessThan(Vector128<long> left, Vector128<long> right);
    public static Vector128<long> CompareLessThanOrEqual(Vector128<long> left, Vector128<long> right);
    public static Vector128<long> CompareNotEqual(Vector128<long> left, Vector128<long> right);
    public static Vector128<uint> CompareGreaterThan(Vector128<uint> left, Vector128<uint> right);
    public static Vector128<uint> CompareGreaterThanOrEqual(Vector128<uint> left, Vector128<uint> right);
    public static Vector128<uint> CompareLessThan(Vector128<uint> left, Vector128<uint> right);
    public static Vector128<uint> CompareLessThanOrEqual(Vector128<uint> left, Vector128<uint> right);
    public static Vector128<uint> CompareNotEqual(Vector128<uint> left, Vector128<uint> right);
    public static Vector128<ulong> CompareGreaterThan(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<ulong> CompareGreaterThanOrEqual(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<ulong> CompareLessThan(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<ulong> CompareLessThanOrEqual(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<ulong> CompareNotEqual(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<byte> ConvertToVector128Byte(Vector128<int> value);
    public static Vector128<byte> ConvertToVector128Byte(Vector128<long> value);
    public static Vector128<byte> ConvertToVector128Byte(Vector128<uint> value);
    public static Vector128<byte> ConvertToVector128Byte(Vector128<ulong> value);
    public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector128<uint> value);
    public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector128<ulong> value);
    public static Vector128<double> ConvertToVector128Double(Vector128<uint> value);
    public static Vector128<short> ConvertToVector128Int16(Vector128<int> value);
    public static Vector128<short> ConvertToVector128Int16(Vector128<long> value);
    public static Vector128<short> ConvertToVector128Int16(Vector128<uint> value);
    public static Vector128<short> ConvertToVector128Int16(Vector128<ulong> value);
    public static Vector128<short> ConvertToVector128Int16WithSaturation(Vector128<int> value);
    public static Vector128<short> ConvertToVector128Int16WithSaturation(Vector128<long> value);
    public static Vector128<int> ConvertToVector128Int32(Vector128<long> value);
    public static Vector128<int> ConvertToVector128Int32(Vector128<ulong> value);
    public static Vector128<int> ConvertToVector128Int32WithSaturation(Vector128<long> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector128<int> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector128<long> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector128<uint> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector128<ulong> value);
    public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector128<int> value);
    public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector128<long> value);
    public static Vector128<float> ConvertToVector128Single(Vector128<uint> value);
    public static Vector128<ushort> ConvertToVector128UInt16(Vector128<int> value);
    public static Vector128<ushort> ConvertToVector128UInt16(Vector128<long> value);
    public static Vector128<ushort> ConvertToVector128UInt16(Vector128<uint> value);
    public static Vector128<ushort> ConvertToVector128UInt16(Vector128<ulong> value);
    public static Vector128<ushort> ConvertToVector128UInt16WithSaturation(Vector128<uint> value);
    public static Vector128<ushort> ConvertToVector128UInt16WithSaturation(Vector128<ulong> value);
    public static Vector128<uint> ConvertToVector128UInt32(Vector128<long> value);
    public static Vector128<uint> ConvertToVector128UInt32(Vector128<ulong> value);
    public static Vector128<uint> ConvertToVector128UInt32(Vector128<float> value);
    public static Vector128<uint> ConvertToVector128UInt32(Vector128<double> value);
    public static Vector128<uint> ConvertToVector128UInt32WithSaturation(Vector128<ulong> value);
    public static Vector128<uint> ConvertToVector128UInt32WithTruncation(Vector128<float> value);
    public static Vector128<uint> ConvertToVector128UInt32WithTruncation(Vector128<double> value);
    public static Vector128<float> Fixup(Vector128<float> left, Vector128<float> right, Vector128<int> table, [ConstantExpected] byte control);
    public static Vector128<double> Fixup(Vector128<double> left, Vector128<double> right, Vector128<long> table, [ConstantExpected] byte control);
    public static Vector128<float> GetExponent(Vector128<float> value);
    public static Vector128<double> GetExponent(Vector128<double> value);
    public static Vector128<float> GetMantissa(Vector128<float> value, [ConstantExpected(Max = (byte)(0x0F))] byte control);
    public static Vector128<double> GetMantissa(Vector128<double> value, [ConstantExpected(Max = (byte)(0x0F))] byte control);
    public static Vector128<long> Max(Vector128<long> left, Vector128<long> right);
    public static Vector128<ulong> Max(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<long> Min(Vector128<long> left, Vector128<long> right);
    public static Vector128<ulong> Min(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<long> PermuteVar2x64x2(Vector128<long> lower, Vector128<long> indices, Vector128<long> upper);
    public static Vector128<ulong> PermuteVar2x64x2(Vector128<ulong> lower, Vector128<ulong> indices, Vector128<ulong> upper);
    public static Vector128<double> PermuteVar2x64x2(Vector128<double> lower, Vector128<long> indices, Vector128<double> upper);
    public static Vector128<int> PermuteVar4x32x2(Vector128<int> lower, Vector128<int> indices, Vector128<int> upper);
    public static Vector128<uint> PermuteVar4x32x2(Vector128<uint> lower, Vector128<uint> indices, Vector128<uint> upper);
    public static Vector128<float> PermuteVar4x32x2(Vector128<float> lower, Vector128<int> indices, Vector128<float> upper);
    public static Vector128<float> Reciprocal14(Vector128<float> value);
    public static Vector128<double> Reciprocal14(Vector128<double> value);
    public static Vector128<float> ReciprocalSqrt14(Vector128<float> value);
    public static Vector128<double> ReciprocalSqrt14(Vector128<double> value);
    public static Vector128<int> RotateLeft(Vector128<int> value, [ConstantExpected] byte count);
    public static Vector128<uint> RotateLeft(Vector128<uint> value, [ConstantExpected] byte count);
    public static Vector128<long> RotateLeft(Vector128<long> value, [ConstantExpected] byte count);
    public static Vector128<ulong> RotateLeft(Vector128<ulong> value, [ConstantExpected] byte count);
    public static Vector128<int> RotateLeftVariable(Vector128<int> value, Vector128<uint> count);
    public static Vector128<uint> RotateLeftVariable(Vector128<uint> value, Vector128<uint> count);
    public static Vector128<long> RotateLeftVariable(Vector128<long> value, Vector128<ulong> count);
    public static Vector128<ulong> RotateLeftVariable(Vector128<ulong> value, Vector128<ulong> count);
    public static Vector128<int> RotateRight(Vector128<int> value, [ConstantExpected] byte count);
    public static Vector128<uint> RotateRight(Vector128<uint> value, [ConstantExpected] byte count);
    public static Vector128<long> RotateRight(Vector128<long> value, [ConstantExpected] byte count);
    public static Vector128<ulong> RotateRight(Vector128<ulong> value, [ConstantExpected] byte count);
    public static Vector128<int> RotateRightVariable(Vector128<int> value, Vector128<uint> count);
    public static Vector128<uint> RotateRightVariable(Vector128<uint> value, Vector128<uint> count);
    public static Vector128<long> RotateRightVariable(Vector128<long> value, Vector128<ulong> count);
    public static Vector128<ulong> RotateRightVariable(Vector128<ulong> value, Vector128<ulong> count);
    public static Vector128<float> RoundScale(Vector128<float> value, [ConstantExpected] byte control);
    public static Vector128<double> RoundScale(Vector128<double> value, [ConstantExpected] byte control);
    public static Vector128<float> Scale(Vector128<float> left, Vector128<float> right);
    public static Vector128<double> Scale(Vector128<double> left, Vector128<double> right);
    public static Vector128<long> ShiftRightArithmetic(Vector128<long> value, Vector128<long> count);
    public static Vector128<long> ShiftRightArithmetic(Vector128<long> value, [ConstantExpected] byte count);
    public static Vector128<long> ShiftRightArithmeticVariable(Vector128<long> value, Vector128<ulong> count);
    public static Vector128<sbyte> TernaryLogic(Vector128<sbyte> a, Vector128<sbyte> b, Vector128<sbyte> c, [ConstantExpected] byte control);
    public static Vector128<byte> TernaryLogic(Vector128<byte> a, Vector128<byte> b, Vector128<byte> c, [ConstantExpected] byte control);
    public static Vector128<short> TernaryLogic(Vector128<short> a, Vector128<short> b, Vector128<short> c, [ConstantExpected] byte control);
    public static Vector128<ushort> TernaryLogic(Vector128<ushort> a, Vector128<ushort> b, Vector128<ushort> c, [ConstantExpected] byte control);
    public static Vector128<int> TernaryLogic(Vector128<int> a, Vector128<int> b, Vector128<int> c, [ConstantExpected] byte control);
    public static Vector128<uint> TernaryLogic(Vector128<uint> a, Vector128<uint> b, Vector128<uint> c, [ConstantExpected] byte control);
    public static Vector128<long> TernaryLogic(Vector128<long> a, Vector128<long> b, Vector128<long> c, [ConstantExpected] byte control);
    public static Vector128<ulong> TernaryLogic(Vector128<ulong> a, Vector128<ulong> b, Vector128<ulong> c, [ConstantExpected] byte control);
    public static Vector128<float> TernaryLogic(Vector128<float> a, Vector128<float> b, Vector128<float> c, [ConstantExpected] byte control);
    public static Vector128<double> TernaryLogic(Vector128<double> a, Vector128<double> b, Vector128<double> c, [ConstantExpected] byte control);

    /// From AVX512BW VL
    public static Vector128<byte> CompareGreaterThan(Vector128<byte> left, Vector128<byte> right);
    public static Vector128<byte> CompareGreaterThanOrEqual(Vector128<byte> left, Vector128<byte> right);
    public static Vector128<byte> CompareLessThan(Vector128<byte> left, Vector128<byte> right);
    public static Vector128<byte> CompareLessThanOrEqual(Vector128<byte> left, Vector128<byte> right);
    public static Vector128<byte> CompareNotEqual(Vector128<byte> left, Vector128<byte> right);
    public static Vector128<short> CompareGreaterThanOrEqual(Vector128<short> left, Vector128<short> right);
    public static Vector128<short> CompareLessThan(Vector128<short> left, Vector128<short> right);
    public static Vector128<short> CompareLessThanOrEqual(Vector128<short> left, Vector128<short> right);
    public static Vector128<short> CompareNotEqual(Vector128<short> left, Vector128<short> right);
    public static Vector128<sbyte> CompareGreaterThanOrEqual(Vector128<sbyte> left, Vector128<sbyte> right);
    public static Vector128<sbyte> CompareLessThan(Vector128<sbyte> left, Vector128<sbyte> right);
    public static Vector128<sbyte> CompareLessThanOrEqual(Vector128<sbyte> left, Vector128<sbyte> right);
    public static Vector128<sbyte> CompareNotEqual(Vector128<sbyte> left, Vector128<sbyte> right);
    public static Vector128<ushort> CompareGreaterThan(Vector128<ushort> left, Vector128<ushort> right);
    public static Vector128<ushort> CompareGreaterThanOrEqual(Vector128<ushort> left, Vector128<ushort> right);
    public static Vector128<ushort> CompareLessThan(Vector128<ushort> left, Vector128<ushort> right);
    public static Vector128<ushort> CompareLessThanOrEqual(Vector128<ushort> left, Vector128<ushort> right);
    public static Vector128<ushort> CompareNotEqual(Vector128<ushort> left, Vector128<ushort> right);
    public static Vector128<byte> ConvertToVector128Byte(Vector128<short> value);
    public static Vector128<byte> ConvertToVector128Byte(Vector128<ushort> value);
    public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector128<ushort> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector128<short> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector128<ushort> value);
    public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector128<short> value);
    public static Vector128<short> PermuteVar8x16(Vector128<short> left, Vector128<short> control);
    public static Vector128<ushort> PermuteVar8x16(Vector128<ushort> left, Vector128<ushort> control);
    public static Vector128<short> PermuteVar8x16x2(Vector128<short> lower, Vector128<short> indices, Vector128<short> upper);
    public static Vector128<ushort> PermuteVar8x16x2(Vector128<ushort> lower, Vector128<ushort> indices, Vector128<ushort> upper);
    public static Vector128<short> ShiftLeftLogicalVariable(Vector128<short> value, Vector128<ushort> count);
    public static Vector128<ushort> ShiftLeftLogicalVariable(Vector128<ushort> value, Vector128<ushort> count);
    public static Vector128<short> ShiftRightArithmeticVariable(Vector128<short> value, Vector128<ushort> count);
    public static Vector128<short> ShiftRightLogicalVariable(Vector128<short> value, Vector128<ushort> count);
    public static Vector128<ushort> ShiftRightLogicalVariable(Vector128<ushort> value, Vector128<ushort> count);
    public static Vector128<ushort> SumAbsoluteDifferencesInBlock32(Vector128<byte> left, Vector128<byte> right, [ConstantExpected] byte control);

    /// From AVX512CD VL
    public static Vector128<int> DetectConflicts(Vector128<int> value);
    public static Vector128<uint> DetectConflicts(Vector128<uint> value);
    public static Vector128<long> DetectConflicts(Vector128<long> value);
    public static Vector128<ulong> DetectConflicts(Vector128<ulong> value);
    public static Vector128<int> LeadingZeroCount(Vector128<int> value);
    public static Vector128<uint> LeadingZeroCount(Vector128<uint> value);
    public static Vector128<long> LeadingZeroCount(Vector128<long> value);
    public static Vector128<ulong> LeadingZeroCount(Vector128<ulong> value);

    /// From AVX512DQ VL
    public static Vector128<int> BroadcastPairScalarToVector128(Vector128<int> value);
    public static Vector128<uint> BroadcastPairScalarToVector128(Vector128<uint> value);
    public static Vector128<double> ConvertToVector128Double(Vector128<long> value);
    public static Vector128<double> ConvertToVector128Double(Vector128<ulong> value);
    public static Vector128<long> ConvertToVector128Int64(Vector128<float> value);
    public static Vector128<long> ConvertToVector128Int64(Vector128<double> value);
    public static Vector128<long> ConvertToVector128Int64WithTruncation(Vector128<float> value);
    public static Vector128<long> ConvertToVector128Int64WithTruncation(Vector128<double> value);
    public static Vector128<float> ConvertToVector128Single(Vector128<long> value);
    public static Vector128<float> ConvertToVector128Single(Vector128<ulong> value);
    public static Vector128<ulong> ConvertToVector128UInt64(Vector128<float> value);
    public static Vector128<ulong> ConvertToVector128UInt64(Vector128<double> value);
    public static Vector128<ulong> ConvertToVector128UInt64WithTruncation(Vector128<float> value);
    public static Vector128<ulong> ConvertToVector128UInt64WithTruncation(Vector128<double> value);
    public static Vector128<long> MultiplyLow(Vector128<long> left, Vector128<long> right);
    public static Vector128<ulong> MultiplyLow(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<float> Range(Vector128<float> left, Vector128<float> right, [ConstantExpected(Max = (byte)(0x0F))] byte control);
    public static Vector128<double> Range(Vector128<double> left, Vector128<double> right, [ConstantExpected(Max = (byte)(0x0F))] byte control);
    public static Vector128<float> Reduce(Vector128<float> value, [ConstantExpected] byte control);
    public static Vector128<double> Reduce(Vector128<double> value, [ConstantExpected] byte control);

    /// From AVX512_Vbmi_VL
    public static Vector128<sbyte> PermuteVar16x8(Vector128<sbyte> left, Vector128<sbyte> control);
    public static Vector128<byte> PermuteVar16x8(Vector128<byte> left, Vector128<byte> control);
    public static Vector128<byte> PermuteVar16x8x2(Vector128<byte> lower, Vector128<byte> indices, Vector128<byte> upper);
    public static Vector128<sbyte> PermuteVar16x8x2(Vector128<sbyte> lower, Vector128<sbyte> indices, Vector128<sbyte> upper);
       
    public abstract class V256 : Avx2
    {
        public static new bool IsSupported { get; }
        
        /// From AVX512F VL
        public static Vector256<ulong> Abs(Vector256<long> value);
        public static Vector256<int> AlignRight32(Vector256<int> left, Vector256<int> right, [ConstantExpected] byte mask);
        public static Vector256<uint> AlignRight32(Vector256<uint> left, Vector256<uint> right, [ConstantExpected] byte mask);
        public static Vector256<long> AlignRight64(Vector256<long> left, Vector256<long> right, [ConstantExpected] byte mask);
        public static Vector256<ulong> AlignRight64(Vector256<ulong> left, Vector256<ulong> right, [ConstantExpected] byte mask);
        public static Vector256<int> CompareGreaterThanOrEqual(Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareLessThan(Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareLessThanOrEqual(Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareNotEqual(Vector256<int> left, Vector256<int> right);
        public static Vector256<long> CompareGreaterThanOrEqual(Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareLessThan(Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareLessThanOrEqual(Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareNotEqual(Vector256<long> left, Vector256<long> right);
        public static Vector256<uint> CompareGreaterThan(Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareGreaterThanOrEqual(Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareLessThan(Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareLessThanOrEqual(Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareNotEqual(Vector256<uint> left, Vector256<uint> right);
        public static Vector256<ulong> CompareGreaterThan(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareGreaterThanOrEqual(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareLessThan(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareLessThanOrEqual(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareNotEqual(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector128<byte> ConvertToVector128Byte(Vector256<int> value);
        public static Vector128<byte> ConvertToVector128Byte(Vector256<long> value);
        public static Vector128<byte> ConvertToVector128Byte(Vector256<uint> value);
        public static Vector128<byte> ConvertToVector128Byte(Vector256<ulong> value);
        public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector256<uint> value);
        public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector256<ulong> value);
        public static Vector128<short> ConvertToVector128Int16(Vector256<int> value);
        public static Vector128<short> ConvertToVector128Int16(Vector256<long> value);
        public static Vector128<short> ConvertToVector128Int16(Vector256<uint> value);
        public static Vector128<short> ConvertToVector128Int16(Vector256<ulong> value);
        public static Vector128<short> ConvertToVector128Int16WithSaturation(Vector256<int> value);
        public static Vector128<short> ConvertToVector128Int16WithSaturation(Vector256<long> value);
        public static Vector128<int> ConvertToVector128Int32(Vector256<long> value);
        public static Vector128<int> ConvertToVector128Int32(Vector256<ulong> value);
        public static Vector128<int> ConvertToVector128Int32WithSaturation(Vector256<long> value);
        public static Vector128<sbyte> ConvertToVector128SByte(Vector256<int> value);
        public static Vector128<sbyte> ConvertToVector128SByte(Vector256<long> value);
        public static Vector128<sbyte> ConvertToVector128SByte(Vector256<uint> value);
        public static Vector128<sbyte> ConvertToVector128SByte(Vector256<ulong> value);
        public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector256<int> value);
        public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector256<long> value);
        public static Vector128<ushort> ConvertToVector128UInt16(Vector256<int> value);
        public static Vector128<ushort> ConvertToVector128UInt16(Vector256<long> value);
        public static Vector128<ushort> ConvertToVector128UInt16(Vector256<uint> value);
        public static Vector128<ushort> ConvertToVector128UInt16(Vector256<ulong> value);
        public static Vector128<ushort> ConvertToVector128UInt16WithSaturation(Vector256<uint> value);
        public static Vector128<ushort> ConvertToVector128UInt16WithSaturation(Vector256<ulong> value);
        public static Vector128<uint> ConvertToVector128UInt32(Vector256<long> value);
        public static Vector128<uint> ConvertToVector128UInt32(Vector256<ulong> value);
        public static Vector128<uint> ConvertToVector128UInt32(Vector256<double> value);
        public static Vector128<uint> ConvertToVector128UInt32WithSaturation(Vector256<ulong> value);
        public static Vector128<uint> ConvertToVector128UInt32WithTruncation(Vector256<double> value);
        public static Vector256<double> ConvertToVector256Double(Vector128<uint> value);
        public static Vector256<float> ConvertToVector256Single(Vector256<uint> value);
        public static Vector256<uint> ConvertToVector256UInt32(Vector256<float> value);
        public static Vector256<uint> ConvertToVector256UInt32WithTruncation(Vector256<float> value);
        public static Vector256<float> Fixup(Vector256<float> left, Vector256<float> right, Vector256<int> table, [ConstantExpected] byte control);
        public static Vector256<double> Fixup(Vector256<double> left, Vector256<double> right, Vector256<long> table, [ConstantExpected] byte control);
        public static Vector256<float> GetExponent(Vector256<float> value);
        public static Vector256<double> GetExponent(Vector256<double> value);
        public static Vector256<float> GetMantissa(Vector256<float> value, [ConstantExpected(Max = (byte)(0x0F))] byte control);
        public static Vector256<double> GetMantissa(Vector256<double> value, [ConstantExpected(Max = (byte)(0x0F))] byte control);
        public static Vector256<long> Max(Vector256<long> left, Vector256<long> right);
        public static Vector256<ulong> Max(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<long> Min(Vector256<long> left, Vector256<long> right);
        public static Vector256<ulong> Min(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<long> PermuteVar4x64(Vector256<long> value, Vector256<long> control);
        public static Vector256<ulong> PermuteVar4x64(Vector256<ulong> value, Vector256<ulong> control);
        public static Vector256<double> PermuteVar4x64(Vector256<double> value, Vector256<long> control);
        public static Vector256<long> PermuteVar4x64x2(Vector256<long> lower, Vector256<long> indices, Vector256<long> upper);
        public static Vector256<ulong> PermuteVar4x64x2(Vector256<ulong> lower, Vector256<ulong> indices, Vector256<ulong> upper);
        public static Vector256<double> PermuteVar4x64x2(Vector256<double> lower, Vector256<long> indices, Vector256<double> upper);
        public static Vector256<int> PermuteVar8x32x2(Vector256<int> lower, Vector256<int> indices, Vector256<int> upper);
        public static Vector256<uint> PermuteVar8x32x2(Vector256<uint> lower, Vector256<uint> indices, Vector256<uint> upper);
        public static Vector256<float> PermuteVar8x32x2(Vector256<float> lower, Vector256<int> indices, Vector256<float> upper);
        public static Vector256<float> Reciprocal14(Vector256<float> value);
        public static Vector256<double> Reciprocal14(Vector256<double> value);
        public static Vector256<float> ReciprocalSqrt14(Vector256<float> value);
        public static Vector256<double> ReciprocalSqrt14(Vector256<double> value);
        public static Vector256<int> RotateLeft(Vector256<int> value, [ConstantExpected] byte count);
        public static Vector256<uint> RotateLeft(Vector256<uint> value, [ConstantExpected] byte count);
        public static Vector256<long> RotateLeft(Vector256<long> value, [ConstantExpected] byte count);
        public static Vector256<ulong> RotateLeft(Vector256<ulong> value, [ConstantExpected] byte count);
        public static Vector256<int> RotateLeftVariable(Vector256<int> value, Vector256<uint> count);
        public static Vector256<uint> RotateLeftVariable(Vector256<uint> value, Vector256<uint> count);
        public static Vector256<long> RotateLeftVariable(Vector256<long> value, Vector256<ulong> count);
        public static Vector256<ulong> RotateLeftVariable(Vector256<ulong> value, Vector256<ulong> count);
        public static Vector256<int> RotateRight(Vector256<int> value, [ConstantExpected] byte count);
        public static Vector256<uint> RotateRight(Vector256<uint> value, [ConstantExpected] byte count);
        public static Vector256<long> RotateRight(Vector256<long> value, [ConstantExpected] byte count);
        public static Vector256<ulong> RotateRight(Vector256<ulong> value, [ConstantExpected] byte count);
        public static Vector256<int> RotateRightVariable(Vector256<int> value, Vector256<uint> count);
        public static Vector256<uint> RotateRightVariable(Vector256<uint> value, Vector256<uint> count);
        public static Vector256<long> RotateRightVariable(Vector256<long> value, Vector256<ulong> count);
        public static Vector256<ulong> RotateRightVariable(Vector256<ulong> value, Vector256<ulong> count);
        public static Vector256<float> RoundScale(Vector256<float> value, [ConstantExpected] byte control);
        public static Vector256<double> RoundScale(Vector256<double> value, [ConstantExpected] byte control);
        public static Vector256<float> Scale(Vector256<float> left, Vector256<float> right);
        public static Vector256<double> Scale(Vector256<double> left, Vector256<double> right);
        public static Vector256<long> ShiftRightArithmetic(Vector256<long> value, Vector128<long> count);
        public static Vector256<long> ShiftRightArithmetic(Vector256<long> value, [ConstantExpected] byte count);
        public static Vector256<long> ShiftRightArithmeticVariable(Vector256<long> value, Vector256<ulong> count);
        public static Vector256<double> Shuffle2x128(Vector256<double> left, Vector256<double> right, [ConstantExpected] byte control);
        public static Vector256<int> Shuffle2x128(Vector256<int> left, Vector256<int> right, [ConstantExpected] byte control);
        public static Vector256<long> Shuffle2x128(Vector256<long> left, Vector256<long> right, [ConstantExpected] byte control);
        public static Vector256<float> Shuffle2x128(Vector256<float> left, Vector256<float> right, [ConstantExpected] byte control);
        public static Vector256<uint> Shuffle2x128(Vector256<uint> left, Vector256<uint> right, [ConstantExpected] byte control);
        public static Vector256<ulong> Shuffle2x128(Vector256<ulong> left, Vector256<ulong> right, [ConstantExpected] byte control);
        public static Vector256<sbyte> TernaryLogic(Vector256<sbyte> a, Vector256<sbyte> b, Vector256<sbyte> c, [ConstantExpected] byte control);
        public static Vector256<byte> TernaryLogic(Vector256<byte> a, Vector256<byte> b, Vector256<byte> c, [ConstantExpected] byte control);
        public static Vector256<short> TernaryLogic(Vector256<short> a, Vector256<short> b, Vector256<short> c, [ConstantExpected] byte control);
        public static Vector256<ushort> TernaryLogic(Vector256<ushort> a, Vector256<ushort> b, Vector256<ushort> c, [ConstantExpected] byte control);
        public static Vector256<int> TernaryLogic(Vector256<int> a, Vector256<int> b, Vector256<int> c, [ConstantExpected] byte control);
        public static Vector256<uint> TernaryLogic(Vector256<uint> a, Vector256<uint> b, Vector256<uint> c, [ConstantExpected] byte control);
        public static Vector256<long> TernaryLogic(Vector256<long> a, Vector256<long> b, Vector256<long> c, [ConstantExpected] byte control);
        public static Vector256<ulong> TernaryLogic(Vector256<ulong> a, Vector256<ulong> b, Vector256<ulong> c, [ConstantExpected] byte control);
        public static Vector256<float> TernaryLogic(Vector256<float> a, Vector256<float> b, Vector256<float> c, [ConstantExpected] byte control);
        public static Vector256<double> TernaryLogic(Vector256<double> a, Vector256<double> b, Vector256<double> c, [ConstantExpected] byte control);

        /// From AVX512BW VL
        public static Vector256<byte> CompareGreaterThan(Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareGreaterThanOrEqual(Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareLessThan(Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareLessThanOrEqual(Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareNotEqual(Vector256<byte> left, Vector256<byte> right);
        public static Vector256<short> CompareGreaterThanOrEqual(Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareLessThan(Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareLessThanOrEqual(Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareNotEqual(Vector256<short> left, Vector256<short> right);
        public static Vector256<sbyte> CompareGreaterThanOrEqual(Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareLessThan(Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareLessThanOrEqual(Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareNotEqual(Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<ushort> CompareGreaterThan(Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareGreaterThanOrEqual(Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareLessThan(Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareLessThanOrEqual(Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareNotEqual(Vector256<ushort> left, Vector256<ushort> right);
        public static Vector128<byte> ConvertToVector128Byte(Vector256<short> value);
        public static Vector128<byte> ConvertToVector128Byte(Vector256<ushort> value);
        public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector256<ushort> value);
        public static Vector128<sbyte> ConvertToVector128SByte(Vector256<short> value);
        public static Vector128<sbyte> ConvertToVector128SByte(Vector256<ushort> value);
        public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector256<short> value);
        public static Vector256<short> PermuteVar16x16(Vector256<short> left, Vector256<short> control);
        public static Vector256<ushort> PermuteVar16x16(Vector256<ushort> left, Vector256<ushort> control);
        public static Vector256<short> PermuteVar16x16x2(Vector256<short> lower, Vector256<short> indices, Vector256<short> upper);
        public static Vector256<ushort> PermuteVar16x16x2(Vector256<ushort> lower, Vector256<ushort> indices, Vector256<ushort> upper);
        public static Vector256<short> ShiftLeftLogicalVariable(Vector256<short> value, Vector256<ushort> count);
        public static Vector256<ushort> ShiftLeftLogicalVariable(Vector256<ushort> value, Vector256<ushort> count);
        public static Vector256<short> ShiftRightArithmeticVariable(Vector256<short> value, Vector256<ushort> count);
        public static Vector256<short> ShiftRightLogicalVariable(Vector256<short> value, Vector256<ushort> count);
        public static Vector256<ushort> ShiftRightLogicalVariable(Vector256<ushort> value, Vector256<ushort> count);
        public static Vector256<ushort> SumAbsoluteDifferencesInBlock32(Vector256<byte> left, Vector256<byte> right, [ConstantExpected] byte control);

        /// FROM AVX512CD VL
        public static Vector256<int> DetectConflicts(Vector256<int> value);
        public static Vector256<uint> DetectConflicts(Vector256<uint> value);
        public static Vector256<long> DetectConflicts(Vector256<long> value);
        public static Vector256<ulong> DetectConflicts(Vector256<ulong> value);
        public static Vector256<int> LeadingZeroCount(Vector256<int> value);
        public static Vector256<uint> LeadingZeroCount(Vector256<uint> value);
        public static Vector256<long> LeadingZeroCount(Vector256<long> value);
        public static Vector256<ulong> LeadingZeroCount(Vector256<ulong> value);
        
        /// From AVX512DQ VL
        public static Vector256<int> BroadcastPairScalarToVector256(Vector128<int> value);
        public static Vector256<uint> BroadcastPairScalarToVector256(Vector128<uint> value);
        public static Vector256<float> BroadcastPairScalarToVector256(Vector128<float> value);
        public static Vector128<float> ConvertToVector128Single(Vector256<long> value);
        public static Vector128<float> ConvertToVector128Single(Vector256<ulong> value);
        public static Vector256<double> ConvertToVector256Double(Vector256<long> value);
        public static Vector256<double> ConvertToVector256Double(Vector256<ulong> value);
        public static Vector256<long> ConvertToVector256Int64(Vector128<float> value);
        public static Vector256<long> ConvertToVector256Int64(Vector256<double> value);
        public static Vector256<long> ConvertToVector256Int64WithTruncation(Vector128<float> value);
        public static Vector256<long> ConvertToVector256Int64WithTruncation(Vector256<double> value);
        public static Vector256<ulong> ConvertToVector256UInt64(Vector128<float> value);
        public static Vector256<ulong> ConvertToVector256UInt64(Vector256<double> value);
        public static Vector256<ulong> ConvertToVector256UInt64WithTruncation(Vector128<float> value);
        public static Vector256<ulong> ConvertToVector256UInt64WithTruncation(Vector256<double> value);
        public static Vector256<long> MultiplyLow(Vector256<long> left, Vector256<long> right);
        public static Vector256<ulong> MultiplyLow(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<float> Range(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = (byte)(0x0F))] byte control);
        public static Vector256<double> Range(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = (byte)(0x0F))] byte control);
        public static Vector256<float> Reduce(Vector256<float> value, [ConstantExpected] byte control);
        public static Vector256<double> Reduce(Vector256<double> value, [ConstantExpected] byte control);

        /// From AVX512_Vbmi_VL
        public static Vector256<sbyte> PermuteVar32x8(Vector256<sbyte> left, Vector256<sbyte> control);
        public static Vector256<byte> PermuteVar32x8(Vector256<byte> left, Vector256<byte> control);
        public static Vector256<byte> PermuteVar32x8x2(Vector256<byte> lower, Vector256<byte> indices, Vector256<byte> upper);
        public static Vector256<sbyte> PermuteVar32x8x2(Vector256<sbyte> lower, Vector256<sbyte> indices, Vector256<sbyte> upper);
    }

    public abstract class V512 : Avx512F
    {
      public static new bool IsSupported { get; }
      // no changes, place holder for future versions
    }
}

In the above surface areas, we have consolidated existing AVX512VL APIs from .NET. The following is the full set of covered extensions, which we will continue to add on.

This includes the VL subsets for the following extensions:

  • AVX512F, AVX512BW, AVX512DQ, and AVX512CD

  • AVX512_VBMI, AVX512_IFMA

  • AVX512_VNNI

  • AVX512_BF16

  • AVX512_VPOPCNTDQ,, AVX512_VBMI2, VAES, GFNI, VPCLMULQDQ, AVX512_BITALG

  • AVX512_FP16

Note that for Avx10v1 and Avx10v1.V256, AVX IFMA and AVX VNNI are implied, but we cannot directly inherit. Likewise, for V512, the above extensions are implied: those which currently have existing .NET implementations include AVX512BW, AVX512DQ, AVX512CD, and AVX512Vbmi

API Usage

Please see the aforementioned discussion.

Alternative Designs

Alternative Versioning

One alternative design we are considering is to expose all `AVX10`` methods under a single class and provide a form of versioning --- defined via method attributes --- on the API:

class Avx10 
{
    public static bool VersionIsAtLeast(ulong version);

    [SupportedAvx10Version(1)]
    public static Vector128<ulong> Method1(Vector128<long> v);

    [SupportedAvx10Version(2)]
    public static Vector128<ulong> Method2(Vector128<long> v);

    class V256 
    {
      [SupportedAvx10Version(1)]
      public static Vector256<ulong> Method1(Vector256<long> v);

      [SupportedAvx10Version(2)]
      public static Vector256<ulong> Method2(Vector256<long> v);
    }

    class V512 
    {
      [SupportedAvx10Version(2)]
      public static Vector512<ulong> Method2(Vector512<long> v);
    }
}

The developer may check for the specific Avx10 version necessary and then may use it and all preceding version methods without having to explicitly refer to additional classes Avx10v1, Avx10v2 etc:

Vector256<ulong> v1 = ...;
if (Avx10.VersionIsAtLeast(2))
{
  v1 = Avx10.V256.Method1(v1);
  v1 = Avx10.V256.Method2(v1);
}

To help developers ensure they are using the API correctly, we propose to create an analyzer that will ensure that a proper version check is in place for all methods used, and flag a warning if a method is used outside of a proper version check.

V512 Surface Area

For developer ease-of-us, one alternative design is to duplicate the AVX512 512-bit API surface in the V512 class, so the developer does not have to explicitly reference existing AVX512 APIs. Note that this requires duplicating a large amount of API surface.

namespace System.Runtime.Intrinsics.X86;

public abstract class Avx10v1 : Avx2
{
    // defined above
       
    public abstract class V256 
    {
       // defined above
    }

    public abstract class V512
    {
        public static new bool IsSupported { get; }

        /// From AVX512BW
        public static Vector512<byte> Abs(Vector512<sbyte> value);
        public static Vector512<ushort> Abs(Vector512<short> value);
        public static Vector512<sbyte> Add(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<byte> Add(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<short> Add(Vector512<short> left, Vector512<short> right);
        public static Vector512<ushort> Add(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<sbyte> AddSaturate(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<byte> AddSaturate(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<short> AddSaturate(Vector512<short> left, Vector512<short> right);
        public static Vector512<ushort> AddSaturate(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<sbyte> AlignRight(Vector512<sbyte> left, Vector512<sbyte> right, [ConstantExpected] byte mask);
        public static Vector512<byte> AlignRight(Vector512<byte> left, Vector512<byte> right, [ConstantExpected] byte mask);
        public static Vector512<byte> Average(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<ushort> Average(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<byte> BlendVariable(Vector512<byte> left, Vector512<byte> right, Vector512<byte> mask);
        public static Vector512<short> BlendVariable(Vector512<short> left, Vector512<short> right, Vector512<short> mask);
        public static Vector512<sbyte> BlendVariable(Vector512<sbyte> left, Vector512<sbyte> right, Vector512<sbyte> mask);
        public static Vector512<ushort> BlendVariable(Vector512<ushort> left, Vector512<ushort> right, Vector512<ushort> mask);
        public static Vector512<byte> BroadcastScalarToVector512(Vector128<byte> value);
        public static Vector512<sbyte> BroadcastScalarToVector512(Vector128<sbyte> value);
        public static Vector512<short> BroadcastScalarToVector512(Vector128<short> value);
        public static Vector512<ushort> BroadcastScalarToVector512(Vector128<ushort> value);
        public static Vector512<byte> CompareEqual(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<byte> CompareGreaterThan(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<byte> CompareGreaterThanOrEqual(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<byte> CompareLessThan(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<byte> CompareLessThanOrEqual(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<byte> CompareNotEqual(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<short> CompareEqual(Vector512<short> left, Vector512<short> right);
        public static Vector512<short> CompareGreaterThan(Vector512<short> left, Vector512<short> right);
        public static Vector512<short> CompareGreaterThanOrEqual(Vector512<short> left, Vector512<short> right);
        public static Vector512<short> CompareLessThan(Vector512<short> left, Vector512<short> right);
        public static Vector512<short> CompareLessThanOrEqual(Vector512<short> left, Vector512<short> right);
        public static Vector512<short> CompareNotEqual(Vector512<short> left, Vector512<short> right);
        public static Vector512<sbyte> CompareEqual(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<sbyte> CompareGreaterThan(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<sbyte> CompareGreaterThanOrEqual(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<sbyte> CompareLessThan(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<sbyte> CompareLessThanOrEqual(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<sbyte> CompareNotEqual(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<ushort> CompareEqual(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<ushort> CompareGreaterThan(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<ushort> CompareGreaterThanOrEqual(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<ushort> CompareLessThan(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<ushort> CompareLessThanOrEqual(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<ushort> CompareNotEqual(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector256<byte> ConvertToVector256Byte(Vector512<short> value);
        public static Vector256<byte> ConvertToVector256Byte(Vector512<ushort> value);
        public static Vector256<byte> ConvertToVector256ByteWithSaturation(Vector512<ushort> value);
        public static Vector256<sbyte> ConvertToVector256SByte(Vector512<short> value);
        public static Vector256<sbyte> ConvertToVector256SByte(Vector512<ushort> value);
        public static Vector256<sbyte> ConvertToVector256SByteWithSaturation(Vector512<short> value);
        public static Vector512<short> ConvertToVector512Int16(Vector256<sbyte> value);
        public static Vector512<short> ConvertToVector512Int16(Vector256<byte> value);
        public static Vector512<ushort> ConvertToVector512UInt16(Vector256<sbyte> value);
        public static Vector512<ushort> ConvertToVector512UInt16(Vector256<byte> value);
        public static new unsafe Vector512<sbyte> LoadVector512(sbyte* address);
        public static new unsafe Vector512<byte> LoadVector512(byte* address);
        public static new unsafe Vector512<short> LoadVector512(short* address);
        public static new unsafe Vector512<ushort> LoadVector512(ushort* address);
        public static Vector512<sbyte> Max(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<byte> Max(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<short> Max(Vector512<short> left, Vector512<short> right);
        public static Vector512<ushort> Max(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<sbyte> Min(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<byte> Min(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<short> Min(Vector512<short> left, Vector512<short> right);
        public static Vector512<ushort> Min(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<int> MultiplyAddAdjacent(Vector512<short> left, Vector512<short> right);
        public static Vector512<short> MultiplyAddAdjacent(Vector512<byte> left, Vector512<sbyte> right);
        public static Vector512<short> MultiplyHigh(Vector512<short> left, Vector512<short> right);
        public static Vector512<ushort> MultiplyHigh(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<short> MultiplyHighRoundScale(Vector512<short> left, Vector512<short> right);
        public static Vector512<short> MultiplyLow(Vector512<short> left, Vector512<short> right);
        public static Vector512<ushort> MultiplyLow(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<sbyte> PackSignedSaturate(Vector512<short> left, Vector512<short> right);
        public static Vector512<short> PackSignedSaturate(Vector512<int> left, Vector512<int> right);
        public static Vector512<byte> PackUnsignedSaturate(Vector512<short> left, Vector512<short> right);
        public static Vector512<ushort> PackUnsignedSaturate(Vector512<int> left, Vector512<int> right);
        public static Vector512<short> PermuteVar32x16(Vector512<short> left, Vector512<short> control);
        public static Vector512<ushort> PermuteVar32x16(Vector512<ushort> left, Vector512<ushort> control);
        public static Vector512<short> PermuteVar32x16x2(Vector512<short> lower, Vector512<short> indices, Vector512<short> upper);
        public static Vector512<ushort> PermuteVar32x16x2(Vector512<ushort> lower, Vector512<ushort> indices, Vector512<ushort> upper);
        public static Vector512<short> ShiftLeftLogical(Vector512<short> value, Vector128<short> count);
        public static Vector512<ushort> ShiftLeftLogical(Vector512<ushort> value, Vector128<ushort> count);
        public static Vector512<short> ShiftLeftLogical(Vector512<short> value, [ConstantExpected] byte count);
        public static Vector512<ushort> ShiftLeftLogical(Vector512<ushort> value, [ConstantExpected] byte count);
        public static Vector512<sbyte> ShiftLeftLogical128BitLane(Vector512<sbyte> value, [ConstantExpected] byte numBytes);
        public static Vector512<byte> ShiftLeftLogical128BitLane(Vector512<byte> value, [ConstantExpected] byte numBytes);
        public static Vector512<short> ShiftLeftLogicalVariable(Vector512<short> value, Vector512<ushort> count);
        public static Vector512<ushort> ShiftLeftLogicalVariable(Vector512<ushort> value, Vector512<ushort> count);
        public static Vector512<short> ShiftRightArithmetic(Vector512<short> value, Vector128<short> count);
        public static Vector512<short> ShiftRightArithmetic(Vector512<short> value, [ConstantExpected] byte count);
        public static Vector512<short> ShiftRightArithmeticVariable(Vector512<short> value, Vector512<ushort> count);
        public static Vector512<short> ShiftRightLogical(Vector512<short> value, Vector128<short> count);
        public static Vector512<ushort> ShiftRightLogical(Vector512<ushort> value, Vector128<ushort> count);
        public static Vector512<short> ShiftRightLogical(Vector512<short> value, [ConstantExpected] byte count);
        public static Vector512<ushort> ShiftRightLogical(Vector512<ushort> value, [ConstantExpected] byte count);
        public static Vector512<sbyte> ShiftRightLogical128BitLane(Vector512<sbyte> value, [ConstantExpected] byte numBytes);
        public static Vector512<byte> ShiftRightLogical128BitLane(Vector512<byte> value, [ConstantExpected] byte numBytes);
        public static Vector512<short> ShiftRightLogicalVariable(Vector512<short> value, Vector512<ushort> count);
        public static Vector512<ushort> ShiftRightLogicalVariable(Vector512<ushort> value, Vector512<ushort> count);
        public static Vector512<sbyte> Shuffle(Vector512<sbyte> value, Vector512<sbyte> mask);
        public static Vector512<byte> Shuffle(Vector512<byte> value, Vector512<byte> mask);
        public static Vector512<short> ShuffleHigh(Vector512<short> value, [ConstantExpected] byte control);
        public static Vector512<ushort> ShuffleHigh(Vector512<ushort> value, [ConstantExpected] byte control);
        public static Vector512<short> ShuffleLow(Vector512<short> value, [ConstantExpected] byte control);
        public static Vector512<ushort> ShuffleLow(Vector512<ushort> value, [ConstantExpected] byte control);
        public static new unsafe void Store(sbyte* address, Vector512<sbyte> source);
        public static new unsafe void Store(byte* address, Vector512<byte> source);
        public static new unsafe void Store(short* address, Vector512<short> source);
        public static new unsafe void Store(ushort* address, Vector512<ushort> source);
        public static Vector512<sbyte> Subtract(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<byte> Subtract(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<short> Subtract(Vector512<short> left, Vector512<short> right);
        public static Vector512<ushort> Subtract(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<sbyte> SubtractSaturate(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<short> SubtractSaturate(Vector512<short> left, Vector512<short> right);
        public static Vector512<byte> SubtractSaturate(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<ushort> SubtractSaturate(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<ushort> SumAbsoluteDifferences(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<ushort> SumAbsoluteDifferencesInBlock32(Vector512<byte> left, Vector512<byte> right, [ConstantExpected] byte control);
        public static Vector512<sbyte> UnpackHigh(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<byte> UnpackHigh(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<short> UnpackHigh(Vector512<short> left, Vector512<short> right);
        public static Vector512<ushort> UnpackHigh(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<sbyte> UnpackLow(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<byte> UnpackLow(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<short> UnpackLow(Vector512<short> left, Vector512<short> right);
        public static Vector512<ushort> UnpackLow(Vector512<ushort> left, Vector512<ushort> right);

        /// From AVX512CD
        public static Vector512<int> DetectConflicts(Vector512<int> value);
        public static Vector512<uint> DetectConflicts(Vector512<uint> value);
        public static Vector512<long> DetectConflicts(Vector512<long> value);
        public static Vector512<ulong> DetectConflicts(Vector512<ulong> value);
        public static Vector512<int> LeadingZeroCount(Vector512<int> value);
        public static Vector512<uint> LeadingZeroCount(Vector512<uint> value);
        public static Vector512<long> LeadingZeroCount(Vector512<long> value);
        public static Vector512<ulong> LeadingZeroCount(Vector512<ulong> value);

        /// From AVX512DQ
        public static Vector512<float> And(Vector512<float> left, Vector512<float> right);
        public static Vector512<double> And(Vector512<double> left, Vector512<double> right);
        public static Vector512<float> AndNot(Vector512<float> left, Vector512<float> right);
        public static Vector512<double> AndNot(Vector512<double> left, Vector512<double> right);
        public static Vector512<int> BroadcastPairScalarToVector512(Vector128<int> value);
        public static Vector512<uint> BroadcastPairScalarToVector512(Vector128<uint> value);
        public static Vector512<float> BroadcastPairScalarToVector512(Vector128<float> value);
        public static unsafe Vector512<long> BroadcastVector128ToVector512(long* address);
        public static unsafe Vector512<ulong> BroadcastVector128ToVector512(ulong* address);
        public static unsafe Vector512<double> BroadcastVector128ToVector512(double* address);
        public static unsafe Vector512<int> BroadcastVector256ToVector512(int* address);
        public static unsafe Vector512<uint> BroadcastVector256ToVector512(uint* address);
        public static unsafe Vector512<float> BroadcastVector256ToVector512(float* address);
        public static Vector256<float> ConvertToVector256Single(Vector512<long> value);
        public static Vector256<float> ConvertToVector256Single(Vector512<ulong> value);
        public static Vector512<double> ConvertToVector512Double(Vector512<long> value);
        public static Vector512<double> ConvertToVector512Double(Vector512<ulong> value);
        public static Vector512<long> ConvertToVector512Int64(Vector256<float> value);
        public static Vector512<long> ConvertToVector512Int64(Vector512<double> value);
        public static Vector512<long> ConvertToVector512Int64WithTruncation(Vector256<float> value);
        public static Vector512<long> ConvertToVector512Int64WithTruncation(Vector512<double> value);
        public static Vector512<ulong> ConvertToVector512UInt64(Vector256<float> value);
        public static Vector512<ulong> ConvertToVector512UInt64(Vector512<double> value);
        public static Vector512<ulong> ConvertToVector512UInt64WithTruncation(Vector256<float> value);
        public static Vector512<ulong> ConvertToVector512UInt64WithTruncation(Vector512<double> value);
        public static new Vector128<long> ExtractVector128(Vector512<long> value, [ConstantExpected] byte index);
        public static new Vector128<ulong> ExtractVector128(Vector512<ulong> value, [ConstantExpected] byte index);
        public static new Vector128<double> ExtractVector128(Vector512<double> value, [ConstantExpected] byte index);
        public static new Vector256<int> ExtractVector256(Vector512<int> value, [ConstantExpected] byte index);
        public static new Vector256<uint> ExtractVector256(Vector512<uint> value, [ConstantExpected] byte index);
        public static new Vector256<float> ExtractVector256(Vector512<float> value, [ConstantExpected] byte index);
        public static new Vector512<long> InsertVector128(Vector512<long> value, Vector128<long> data, [ConstantExpected] byte index);
        public static new Vector512<ulong> InsertVector128(Vector512<ulong> value, Vector128<ulong> data, [ConstantExpected] byte index);
        public static new Vector512<double> InsertVector128(Vector512<double> value, Vector128<double> data, [ConstantExpected] byte index);
        public static new Vector512<int> InsertVector256(Vector512<int> value, Vector256<int> data, [ConstantExpected] byte index);
        public static new Vector512<uint> InsertVector256(Vector512<uint> value, Vector256<uint> data, [ConstantExpected] byte index);
        public static new Vector512<float> InsertVector256(Vector512<float> value, Vector256<float> data, [ConstantExpected] byte index);
        public static Vector512<long> MultiplyLow(Vector512<long> left, Vector512<long> right);
        public static Vector512<ulong> MultiplyLow(Vector512<ulong> left, Vector512<ulong> right);
        public static Vector512<float> Or(Vector512<float> left, Vector512<float> right);
        public static Vector512<double> Or(Vector512<double> left, Vector512<double> right);
        public static Vector512<float> Range(Vector512<float> left, Vector512<float> right, [ConstantExpected(Max = (byte)(0x0F))] byte control);
        public static Vector512<double> Range(Vector512<double> left, Vector512<double> right, [ConstantExpected(Max = (byte)(0x0F))] byte control);
        public static Vector512<float> Reduce(Vector512<float> value, [ConstantExpected] byte control);
        public static Vector512<double> Reduce(Vector512<double> value, [ConstantExpected] byte control);
        public static Vector512<float> Xor(Vector512<float> left, Vector512<float> right);
        public static Vector512<double> Xor(Vector512<double> left, Vector512<double> right);

        /// From AVX512Vbmi
        public static Vector512<sbyte> PermuteVar64x8(Vector512<sbyte> left, Vector512<sbyte> control);
        public static Vector512<byte> PermuteVar64x8(Vector512<byte> left, Vector512<byte> control);
        public static Vector512<byte> PermuteVar64x8x2(Vector512<byte> lower, Vector512<byte> indices, Vector512<byte> upper);
        public static Vector512<sbyte> PermuteVar64x8x2(Vector512<sbyte> lower, Vector512<sbyte> indices, Vector512<sbyte> upper);   
    }
}

Risks

No response

@anthonycanino anthonycanino added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Feb 6, 2024
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Feb 6, 2024
@ghost
Copy link

ghost commented Feb 6, 2024

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

Intel's latest AVX offering, AVX10, introduces a converged vector ISA that will be supported on all Intel processors. Please find full details in the AVX10 technical paper and AVX10 architectural specification.

For brevity, we highlight the following relevant features from the technical paper and specification:

  1. AVX10 will be a versioned ISA, where a version N will include all instructions/features of version N-1, N-2 etc.

  2. AVX10 supports multiple vector lengths, where CPUID exposes the max vector length available for a processor.

  3. AVX10.1 will support AVX512 and the following extensions for vector lengths 128, 256, and 512 bits:

  • AVX512F, AVX512BW, AVX512DQ, and AVX512CD

  • AVX512_VBMI, AVX512_IFMA

  • AVX512_VNNI

  • AVX512_BF16

  • AVX512_VPOPCNTDQ,, AVX512_VBMI2, VAES, GFNI, VPCLMULQDQ, AVX512_BITALG

  • AVX512_FP16

  1. On future Performance-cores (P-cores) based processors, the maximum vector length will be 512-bit; On future Efficient-cores (E-cores) based processors it will be 256-bit.

API Proposal

AVX10 Versioning

As part of AVX10 versioning, developers can expect that all features and capabilities of version N will be included in version N+1. As such, we propose that we capture this incremental versioning via class inheritance with the nomenclature Avx10vN:

class Avx10v1 
{
  
}

class Avx10v2 : Avx10v1 
{
  
}

class Avx10v3 : Avx10v2 
{
  
}

Developers will be able to continue to check for ISA support via Avx10vN.IsSupported as is currently done for specific .NET ISA APIs.

AVX10 Vector Length

As AVX10 allows implementations to support different maximum vector lengths, we propose the following nested class structure, one for each supported vector length:

class Avx10 
{
  class V128 
  { 
    public static Vector128<ulong> Abs(Vector128<long> v);
  }

  class V256 
  { 
    public static Vector256<ulong> Abs(Vector256<long> v);
  }

  class V512 
  {
    public static Vector512<ulong> Abs(Vector512<long> v);
  }

}

Avx10.V128.IsSupported() returns true if 128-bit vectors are enabled via the CPUID vector length bits for AVX10. As the AVX10 architecture specification states that the highest enumerated vector length implies all smaller vector lengths are supported, a developer may check Avx10.V256.IsSupported() and safely use Avx10.V128 methods.

AVX10 and AVX512

AVX10.1 represents a pre-enabling step for AVX10, and is the convergence of the previously listed AVX512 instruction sets --- there is overlap with existing exposed .NET AVX512 ISA APIs. We propose that if AVX10.V512.IsSupported() returns true, then the corresponding AVX512 APIs for the aforementioned extensions can safely be used:

Vector512<long> v1 = Vector512.Create((long)someParam);
if (Avx10.V512.IsSupported()) {
  Vector512<ulong> v2 = Avx512F.Abs(v1);
  Vector512<double> v3 = Avx512DQ.ConvertToVector512Double(v2);
  // etc
}

For an AVX10/256 implementation, the subset of VL instructions for existing AVX512 instruction sets will be available _without the presence of 512-bit support. As existing AVX512VL (and the associated nested VL classes in .NET APIs) implies 512-bit, we will expose the V128 and V256 nested classes which will contain existing and future VL instructions that operate on the associated vector length. For clarity, it is possible to have AVX10.V256.IsSupported() == true but AVX512F.IsSupported() == false on AVX10/256 implementation.

Avx10v1 API

Given the aforementioned API discussion, we propose the following API, where all current AVX512VL family instructions are consolidated under the V128 and V256 nested classes:

class Avx10v1 
{

    class V128 
    {
      /// From AVX512F VL
      public static Vector128<ulong> Abs(Vector128<long> value) => Abs(value);

      /// From AVX512BW VL
      public static Vector128<byte> CompareGreaterThan(Vector128<byte> left, Vector128<byte> right) => CompareGreaterThan(left, right);

      /// From AVX512DQ VL
      public static Vector128<float> Range(Vector128<float> left, Vector128<float> right, [ConstantExpected(Max = (byte)(0x0F))] byte control) => Range(left, right, control);
    }

    class V256 
    {
      /// From AVX512F VL
      public static Vector256<ulong> Abs(Vector256<long> value) => Abs(value);

      /// From AVX512BW VL
      public static Vector256<byte> CompareGreaterThan(Vector256<byte> left, Vector256<byte> right) => CompareGreaterThan(left, right);

      /// From AVX512DQ VL
      public static Vector256<float> Range(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = (byte)(0x0F))] byte control) => Range(left, right, control);
    }

    class V512 
    {
      // no changes, place holder for future versions
    }

}

This includes the VL subsets for the following extensions:

  • AVX512F, AVX512BW, AVX512DQ, and AVX512CD

  • AVX512_VBMI, AVX512_IFMA

  • AVX512_VNNI

  • AVX512_BF16

  • AVX512_VPOPCNTDQ,, AVX512_VBMI2, VAES, GFNI, VPCLMULQDQ, AVX512_BITALG

  • AVX512_FP16

API Usage

Please see the aforementioned discussion.

Alternative Designs

Alternative Versioning

One alternative design we are considering is to expose all `AVX10`` methods under a single class and provide a form of versioning --- defined via method attributes --- on the API:

class Avx10 
{
    public static bool VersionIsAtLeast(ulong version);

    class V128 
    {
      [SupportedAvx10Version(1)]
      public static Vector128<ulong> Method1(Vector128<long> v);

      [SupportedAvx10Version(2)]
      public static Vector128<ulong> Method2(Vector128<long> v);
    }

    class V256 
    {
      [SupportedAvx10Version(1)]
      public static Vector256<ulong> Method1(Vector256<long> v);

      [SupportedAvx10Version(2)]
      public static Vector256<ulong> Method2(Vector256<long> v);
    }

    class V512 
    {
      [SupportedAvx10Version(2)]
      public static Vector512<ulong> Method2(Vector512<long> v);
    }
}

The developer may check for the specific Avx10 version necessary and then may use it and all preceding version methods without having to explicitly refer to additional classes Avx10v1, Avx10v2 etc:

Vector256<ulong> v1 = ...;
if (Avx10.VersionIsAtLeast(2))
{
  v1 = Avx10.V256.Method1(v1);
  v1 = Avx10.V256.Method2(v1);
}

To help developers ensure they are using the API correctly, we propose to create an analyzer that will ensure that a proper version check is in place for all methods used, and flag a warning if a method is used outside of a proper version check.

V512 Surface Area

For developer ease-of-us, one alternative design is to duplicate the AVX512 512-bit API surface in the V512 class, so the developer does not have to explicitly reference existing AVX512 APIs. Note that this requires duplicating a large amount of API surface.

Risks

No response

Author: anthonycanino
Assignees: -
Labels:

api-suggestion, area-System.Runtime.Intrinsics, untriaged

Milestone: -

@anthonycanino
Copy link
Contributor Author

@dotnet/avx512-contrib

@MichalPetryka
Copy link
Contributor

If AVX10 guarantees 128/256 vector support, wouldn't it make more sense to only nest v512 variants in a subtype?

@tannergooding
Copy link
Member

AVX10 guarantees 128, so we don't need the V128 nested type.

However, as per the publicly released (draft) spec, there is not currently a guarantee on 256, so we do currently need the V256 nested type. That would require an official statement from Intel and update to the spec for us to rely on it and place them in the root Avx10 class.

@tannergooding
Copy link
Member

-- Noting that it needs to be in the spec, such that other processor vendors would also be required to guarantee the same. That V256 is always available.

@anthonycanino
Copy link
Contributor Author

Per the notes about V128 I will update the above so that any 128-bit instructions will live under Avx10 directly.

We will also follow up with a complete API surface.

@BruceForstall
Copy link
Member

Per the notes about V128 I will update the above so that any 128-bit instructions will live under Avx10 directly.

nit: there are still some references above to V128.

AVX10 guarantees 128, so we don't need the V128 nested type.

Would it still be useful, for symmetry with the other vector lengths, and to be doubly explicit? Or is it just unnecessary typing/bloat?

@anthonycanino
Copy link
Contributor Author

Made some edits. We will post a full surface area soon.

@tannergooding
Copy link
Member

Or is it just unnecessary typing/bloat?

I think its unnecessary typing/bloat and it makes it pretty inconsistent with the general inheritance hierarchy we have. Since Avx10 will inherit from Avx2, it means you get all the pre-AVX512VL APIs, whether 128-bit or 256-bit available directly from Avx10. It then feels inconsistent to require Avx10.V128 to access an always available set of APIs where the division is just "pre-AVX512VL or post-AVX512VL"

@DeepakRajendrakumaran
Copy link
Contributor

DeepakRajendrakumaran commented Feb 15, 2024

API Surface Area

Avx10v1 API

Given the aforementioned API discussion, we propose the following API, where all current AVX512VL family instructions are consolidated under the top level Avx10v1 and V256 nested classes:

namespace System.Runtime.Intrinsics.X86;

public abstract class Avx10v1 
{
    public static new bool IsSupported { get; }
    
    /// From AVX512F VL
    public static Vector128<ulong> Abs(Vector128<long> value);
    public static Vector128<int> AlignRight32(Vector128<int> left, Vector128<int> right, [ConstantExpected] byte mask);
    public static Vector128<uint> AlignRight32(Vector128<uint> left, Vector128<uint> right, [ConstantExpected] byte mask);
    public static Vector128<long> AlignRight64(Vector128<long> left, Vector128<long> right, [ConstantExpected] byte mask);
    public static Vector128<ulong> AlignRight64(Vector128<ulong> left, Vector128<ulong> right, [ConstantExpected] byte mask);
    public static Vector128<int> CompareGreaterThanOrEqual(Vector128<int> left, Vector128<int> right);
    public static Vector128<int> CompareLessThan(Vector128<int> left, Vector128<int> right);
    public static Vector128<int> CompareLessThanOrEqual(Vector128<int> left, Vector128<int> right);
    public static Vector128<int> CompareNotEqual(Vector128<int> left, Vector128<int> right);
    public static Vector128<long> CompareGreaterThanOrEqual(Vector128<long> left, Vector128<long> right);
    public static Vector128<long> CompareLessThan(Vector128<long> left, Vector128<long> right);
    public static Vector128<long> CompareLessThanOrEqual(Vector128<long> left, Vector128<long> right);
    public static Vector128<long> CompareNotEqual(Vector128<long> left, Vector128<long> right);
    public static Vector128<uint> CompareGreaterThan(Vector128<uint> left, Vector128<uint> right);
    public static Vector128<uint> CompareGreaterThanOrEqual(Vector128<uint> left, Vector128<uint> right);
    public static Vector128<uint> CompareLessThan(Vector128<uint> left, Vector128<uint> right);
    public static Vector128<uint> CompareLessThanOrEqual(Vector128<uint> left, Vector128<uint> right);
    public static Vector128<uint> CompareNotEqual(Vector128<uint> left, Vector128<uint> right);
    public static Vector128<ulong> CompareGreaterThan(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<ulong> CompareGreaterThanOrEqual(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<ulong> CompareLessThan(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<ulong> CompareLessThanOrEqual(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<ulong> CompareNotEqual(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<byte> ConvertToVector128Byte(Vector128<int> value);
    public static Vector128<byte> ConvertToVector128Byte(Vector128<long> value);
    public static Vector128<byte> ConvertToVector128Byte(Vector128<uint> value);
    public static Vector128<byte> ConvertToVector128Byte(Vector128<ulong> value);
    public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector128<uint> value);
    public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector128<ulong> value);
    public static Vector128<double> ConvertToVector128Double(Vector128<uint> value);
    public static Vector128<short> ConvertToVector128Int16(Vector128<int> value);
    public static Vector128<short> ConvertToVector128Int16(Vector128<long> value);
    public static Vector128<short> ConvertToVector128Int16(Vector128<uint> value);
    public static Vector128<short> ConvertToVector128Int16(Vector128<ulong> value);
    public static Vector128<short> ConvertToVector128Int16WithSaturation(Vector128<int> value);
    public static Vector128<short> ConvertToVector128Int16WithSaturation(Vector128<long> value);
    public static Vector128<int> ConvertToVector128Int32(Vector128<long> value);
    public static Vector128<int> ConvertToVector128Int32(Vector128<ulong> value);
    public static Vector128<int> ConvertToVector128Int32WithSaturation(Vector128<long> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector128<int> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector128<long> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector128<uint> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector128<ulong> value);
    public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector128<int> value);
    public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector128<long> value);
    public static Vector128<float> ConvertToVector128Single(Vector128<uint> value);
    public static Vector128<ushort> ConvertToVector128UInt16(Vector128<int> value);
    public static Vector128<ushort> ConvertToVector128UInt16(Vector128<long> value);
    public static Vector128<ushort> ConvertToVector128UInt16(Vector128<uint> value);
    public static Vector128<ushort> ConvertToVector128UInt16(Vector128<ulong> value);
    public static Vector128<ushort> ConvertToVector128UInt16WithSaturation(Vector128<uint> value);
    public static Vector128<ushort> ConvertToVector128UInt16WithSaturation(Vector128<ulong> value);
    public static Vector128<uint> ConvertToVector128UInt32(Vector128<long> value);
    public static Vector128<uint> ConvertToVector128UInt32(Vector128<ulong> value);
    public static Vector128<uint> ConvertToVector128UInt32(Vector128<float> value);
    public static Vector128<uint> ConvertToVector128UInt32(Vector128<double> value);
    public static Vector128<uint> ConvertToVector128UInt32WithSaturation(Vector128<ulong> value);
    public static Vector128<uint> ConvertToVector128UInt32WithTruncation(Vector128<float> value);
    public static Vector128<uint> ConvertToVector128UInt32WithTruncation(Vector128<double> value);
    public static Vector128<float> Fixup(Vector128<float> left, Vector128<float> right, Vector128<int> table, [ConstantExpected] byte control);
    public static Vector128<double> Fixup(Vector128<double> left, Vector128<double> right, Vector128<long> table, [ConstantExpected] byte control);
    public static Vector128<float> GetExponent(Vector128<float> value);
    public static Vector128<double> GetExponent(Vector128<double> value);
    public static Vector128<float> GetMantissa(Vector128<float> value, [ConstantExpected(Max = (byte)(0x0F))] byte control);
    public static Vector128<double> GetMantissa(Vector128<double> value, [ConstantExpected(Max = (byte)(0x0F))] byte control);
    public static Vector128<long> Max(Vector128<long> left, Vector128<long> right);
    public static Vector128<ulong> Max(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<long> Min(Vector128<long> left, Vector128<long> right);
    public static Vector128<ulong> Min(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<long> PermuteVar2x64x2(Vector128<long> lower, Vector128<long> indices, Vector128<long> upper);
    public static Vector128<ulong> PermuteVar2x64x2(Vector128<ulong> lower, Vector128<ulong> indices, Vector128<ulong> upper);
    public static Vector128<double> PermuteVar2x64x2(Vector128<double> lower, Vector128<long> indices, Vector128<double> upper);
    public static Vector128<int> PermuteVar4x32x2(Vector128<int> lower, Vector128<int> indices, Vector128<int> upper);
    public static Vector128<uint> PermuteVar4x32x2(Vector128<uint> lower, Vector128<uint> indices, Vector128<uint> upper);
    public static Vector128<float> PermuteVar4x32x2(Vector128<float> lower, Vector128<int> indices, Vector128<float> upper);
    public static Vector128<float> Reciprocal14(Vector128<float> value);
    public static Vector128<double> Reciprocal14(Vector128<double> value);
    public static Vector128<float> ReciprocalSqrt14(Vector128<float> value);
    public static Vector128<double> ReciprocalSqrt14(Vector128<double> value);
    public static Vector128<int> RotateLeft(Vector128<int> value, [ConstantExpected] byte count);
    public static Vector128<uint> RotateLeft(Vector128<uint> value, [ConstantExpected] byte count);
    public static Vector128<long> RotateLeft(Vector128<long> value, [ConstantExpected] byte count);
    public static Vector128<ulong> RotateLeft(Vector128<ulong> value, [ConstantExpected] byte count);
    public static Vector128<int> RotateLeftVariable(Vector128<int> value, Vector128<uint> count);
    public static Vector128<uint> RotateLeftVariable(Vector128<uint> value, Vector128<uint> count);
    public static Vector128<long> RotateLeftVariable(Vector128<long> value, Vector128<ulong> count);
    public static Vector128<ulong> RotateLeftVariable(Vector128<ulong> value, Vector128<ulong> count);
    public static Vector128<int> RotateRight(Vector128<int> value, [ConstantExpected] byte count);
    public static Vector128<uint> RotateRight(Vector128<uint> value, [ConstantExpected] byte count);
    public static Vector128<long> RotateRight(Vector128<long> value, [ConstantExpected] byte count);
    public static Vector128<ulong> RotateRight(Vector128<ulong> value, [ConstantExpected] byte count);
    public static Vector128<int> RotateRightVariable(Vector128<int> value, Vector128<uint> count);
    public static Vector128<uint> RotateRightVariable(Vector128<uint> value, Vector128<uint> count);
    public static Vector128<long> RotateRightVariable(Vector128<long> value, Vector128<ulong> count);
    public static Vector128<ulong> RotateRightVariable(Vector128<ulong> value, Vector128<ulong> count);
    public static Vector128<float> RoundScale(Vector128<float> value, [ConstantExpected] byte control);
    public static Vector128<double> RoundScale(Vector128<double> value, [ConstantExpected] byte control);
    public static Vector128<float> Scale(Vector128<float> left, Vector128<float> right);
    public static Vector128<double> Scale(Vector128<double> left, Vector128<double> right);
    public static Vector128<long> ShiftRightArithmetic(Vector128<long> value, Vector128<long> count);
    public static Vector128<long> ShiftRightArithmetic(Vector128<long> value, [ConstantExpected] byte count);
    public static Vector128<long> ShiftRightArithmeticVariable(Vector128<long> value, Vector128<ulong> count);
    public static Vector128<sbyte> TernaryLogic(Vector128<sbyte> a, Vector128<sbyte> b, Vector128<sbyte> c, [ConstantExpected] byte control);
    public static Vector128<byte> TernaryLogic(Vector128<byte> a, Vector128<byte> b, Vector128<byte> c, [ConstantExpected] byte control);
    public static Vector128<short> TernaryLogic(Vector128<short> a, Vector128<short> b, Vector128<short> c, [ConstantExpected] byte control);
    public static Vector128<ushort> TernaryLogic(Vector128<ushort> a, Vector128<ushort> b, Vector128<ushort> c, [ConstantExpected] byte control);
    public static Vector128<int> TernaryLogic(Vector128<int> a, Vector128<int> b, Vector128<int> c, [ConstantExpected] byte control);
    public static Vector128<uint> TernaryLogic(Vector128<uint> a, Vector128<uint> b, Vector128<uint> c, [ConstantExpected] byte control);
    public static Vector128<long> TernaryLogic(Vector128<long> a, Vector128<long> b, Vector128<long> c, [ConstantExpected] byte control);
    public static Vector128<ulong> TernaryLogic(Vector128<ulong> a, Vector128<ulong> b, Vector128<ulong> c, [ConstantExpected] byte control);
    public static Vector128<float> TernaryLogic(Vector128<float> a, Vector128<float> b, Vector128<float> c, [ConstantExpected] byte control);
    public static Vector128<double> TernaryLogic(Vector128<double> a, Vector128<double> b, Vector128<double> c, [ConstantExpected] byte control);

    /// From AVX512BW VL
    public static Vector128<byte> CompareGreaterThan(Vector128<byte> left, Vector128<byte> right);
    public static Vector128<byte> CompareGreaterThanOrEqual(Vector128<byte> left, Vector128<byte> right);
    public static Vector128<byte> CompareLessThan(Vector128<byte> left, Vector128<byte> right);
    public static Vector128<byte> CompareLessThanOrEqual(Vector128<byte> left, Vector128<byte> right);
    public static Vector128<byte> CompareNotEqual(Vector128<byte> left, Vector128<byte> right);
    public static Vector128<short> CompareGreaterThanOrEqual(Vector128<short> left, Vector128<short> right);
    public static Vector128<short> CompareLessThan(Vector128<short> left, Vector128<short> right);
    public static Vector128<short> CompareLessThanOrEqual(Vector128<short> left, Vector128<short> right);
    public static Vector128<short> CompareNotEqual(Vector128<short> left, Vector128<short> right);
    public static Vector128<sbyte> CompareGreaterThanOrEqual(Vector128<sbyte> left, Vector128<sbyte> right);
    public static Vector128<sbyte> CompareLessThan(Vector128<sbyte> left, Vector128<sbyte> right);
    public static Vector128<sbyte> CompareLessThanOrEqual(Vector128<sbyte> left, Vector128<sbyte> right);
    public static Vector128<sbyte> CompareNotEqual(Vector128<sbyte> left, Vector128<sbyte> right);
    public static Vector128<ushort> CompareGreaterThan(Vector128<ushort> left, Vector128<ushort> right);
    public static Vector128<ushort> CompareGreaterThanOrEqual(Vector128<ushort> left, Vector128<ushort> right);
    public static Vector128<ushort> CompareLessThan(Vector128<ushort> left, Vector128<ushort> right);
    public static Vector128<ushort> CompareLessThanOrEqual(Vector128<ushort> left, Vector128<ushort> right);
    public static Vector128<ushort> CompareNotEqual(Vector128<ushort> left, Vector128<ushort> right);
    public static Vector128<byte> ConvertToVector128Byte(Vector128<short> value);
    public static Vector128<byte> ConvertToVector128Byte(Vector128<ushort> value);
    public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector128<ushort> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector128<short> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector128<ushort> value);
    public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector128<short> value);
    public static Vector128<short> PermuteVar8x16(Vector128<short> left, Vector128<short> control);
    public static Vector128<ushort> PermuteVar8x16(Vector128<ushort> left, Vector128<ushort> control);
    public static Vector128<short> PermuteVar8x16x2(Vector128<short> lower, Vector128<short> indices, Vector128<short> upper);
    public static Vector128<ushort> PermuteVar8x16x2(Vector128<ushort> lower, Vector128<ushort> indices, Vector128<ushort> upper);
    public static Vector128<short> ShiftLeftLogicalVariable(Vector128<short> value, Vector128<ushort> count);
    public static Vector128<ushort> ShiftLeftLogicalVariable(Vector128<ushort> value, Vector128<ushort> count);
    public static Vector128<short> ShiftRightArithmeticVariable(Vector128<short> value, Vector128<ushort> count);
    public static Vector128<short> ShiftRightLogicalVariable(Vector128<short> value, Vector128<ushort> count);
    public static Vector128<ushort> ShiftRightLogicalVariable(Vector128<ushort> value, Vector128<ushort> count);
    public static Vector128<ushort> SumAbsoluteDifferencesInBlock32(Vector128<byte> left, Vector128<byte> right, [ConstantExpected] byte control);

    /// From AVX512CD VL
    public static Vector128<int> DetectConflicts(Vector128<int> value);
    public static Vector128<uint> DetectConflicts(Vector128<uint> value);
    public static Vector128<long> DetectConflicts(Vector128<long> value);
    public static Vector128<ulong> DetectConflicts(Vector128<ulong> value);
    public static Vector128<int> LeadingZeroCount(Vector128<int> value);
    public static Vector128<uint> LeadingZeroCount(Vector128<uint> value);
    public static Vector128<long> LeadingZeroCount(Vector128<long> value);
    public static Vector128<ulong> LeadingZeroCount(Vector128<ulong> value);

    /// From AVX512DQ VL
    public static Vector128<int> BroadcastPairScalarToVector128(Vector128<int> value);
    public static Vector128<uint> BroadcastPairScalarToVector128(Vector128<uint> value);
    public static Vector128<double> ConvertToVector128Double(Vector128<long> value);
    public static Vector128<double> ConvertToVector128Double(Vector128<ulong> value);
    public static Vector128<long> ConvertToVector128Int64(Vector128<float> value);
    public static Vector128<long> ConvertToVector128Int64(Vector128<double> value);
    public static Vector128<long> ConvertToVector128Int64WithTruncation(Vector128<float> value);
    public static Vector128<long> ConvertToVector128Int64WithTruncation(Vector128<double> value);
    public static Vector128<float> ConvertToVector128Single(Vector128<long> value);
    public static Vector128<float> ConvertToVector128Single(Vector128<ulong> value);
    public static Vector128<ulong> ConvertToVector128UInt64(Vector128<float> value);
    public static Vector128<ulong> ConvertToVector128UInt64(Vector128<double> value);
    public static Vector128<ulong> ConvertToVector128UInt64WithTruncation(Vector128<float> value);
    public static Vector128<ulong> ConvertToVector128UInt64WithTruncation(Vector128<double> value);
    public static Vector128<long> MultiplyLow(Vector128<long> left, Vector128<long> right);
    public static Vector128<ulong> MultiplyLow(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<float> Range(Vector128<float> left, Vector128<float> right, [ConstantExpected(Max = (byte)(0x0F))] byte control);
    public static Vector128<double> Range(Vector128<double> left, Vector128<double> right, [ConstantExpected(Max = (byte)(0x0F))] byte control);
    public static Vector128<float> Reduce(Vector128<float> value, [ConstantExpected] byte control);
    public static Vector128<double> Reduce(Vector128<double> value, [ConstantExpected] byte control);

    /// From AVX512_Vbmi_VL
    public static Vector128<sbyte> PermuteVar16x8(Vector128<sbyte> left, Vector128<sbyte> control);
    public static Vector128<byte> PermuteVar16x8(Vector128<byte> left, Vector128<byte> control);
    public static Vector128<byte> PermuteVar16x8x2(Vector128<byte> lower, Vector128<byte> indices, Vector128<byte> upper);
    public static Vector128<sbyte> PermuteVar16x8x2(Vector128<sbyte> lower, Vector128<sbyte> indices, Vector128<sbyte> upper);
       
    public abstract class V256 
    {
        public static new bool IsSupported { get; }
        
        /// From AVX512F VL
        public static Vector256<ulong> Abs(Vector256<long> value);
        public static Vector256<int> AlignRight32(Vector256<int> left, Vector256<int> right, [ConstantExpected] byte mask);
        public static Vector256<uint> AlignRight32(Vector256<uint> left, Vector256<uint> right, [ConstantExpected] byte mask);
        public static Vector256<long> AlignRight64(Vector256<long> left, Vector256<long> right, [ConstantExpected] byte mask);
        public static Vector256<ulong> AlignRight64(Vector256<ulong> left, Vector256<ulong> right, [ConstantExpected] byte mask);
        public static Vector256<int> CompareGreaterThanOrEqual(Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareLessThan(Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareLessThanOrEqual(Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareNotEqual(Vector256<int> left, Vector256<int> right);
        public static Vector256<long> CompareGreaterThanOrEqual(Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareLessThan(Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareLessThanOrEqual(Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareNotEqual(Vector256<long> left, Vector256<long> right);
        public static Vector256<uint> CompareGreaterThan(Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareGreaterThanOrEqual(Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareLessThan(Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareLessThanOrEqual(Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareNotEqual(Vector256<uint> left, Vector256<uint> right);
        public static Vector256<ulong> CompareGreaterThan(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareGreaterThanOrEqual(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareLessThan(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareLessThanOrEqual(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareNotEqual(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector128<byte> ConvertToVector128Byte(Vector256<int> value);
        public static Vector128<byte> ConvertToVector128Byte(Vector256<long> value);
        public static Vector128<byte> ConvertToVector128Byte(Vector256<uint> value);
        public static Vector128<byte> ConvertToVector128Byte(Vector256<ulong> value);
        public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector256<uint> value);
        public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector256<ulong> value);
        public static Vector128<short> ConvertToVector128Int16(Vector256<int> value);
        public static Vector128<short> ConvertToVector128Int16(Vector256<long> value);
        public static Vector128<short> ConvertToVector128Int16(Vector256<uint> value);
        public static Vector128<short> ConvertToVector128Int16(Vector256<ulong> value);
        public static Vector128<short> ConvertToVector128Int16WithSaturation(Vector256<int> value);
        public static Vector128<short> ConvertToVector128Int16WithSaturation(Vector256<long> value);
        public static Vector128<int> ConvertToVector128Int32(Vector256<long> value);
        public static Vector128<int> ConvertToVector128Int32(Vector256<ulong> value);
        public static Vector128<int> ConvertToVector128Int32WithSaturation(Vector256<long> value);
        public static Vector128<sbyte> ConvertToVector128SByte(Vector256<int> value);
        public static Vector128<sbyte> ConvertToVector128SByte(Vector256<long> value);
        public static Vector128<sbyte> ConvertToVector128SByte(Vector256<uint> value);
        public static Vector128<sbyte> ConvertToVector128SByte(Vector256<ulong> value);
        public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector256<int> value);
        public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector256<long> value);
        public static Vector128<ushort> ConvertToVector128UInt16(Vector256<int> value);
        public static Vector128<ushort> ConvertToVector128UInt16(Vector256<long> value);
        public static Vector128<ushort> ConvertToVector128UInt16(Vector256<uint> value);
        public static Vector128<ushort> ConvertToVector128UInt16(Vector256<ulong> value);
        public static Vector128<ushort> ConvertToVector128UInt16WithSaturation(Vector256<uint> value);
        public static Vector128<ushort> ConvertToVector128UInt16WithSaturation(Vector256<ulong> value);
        public static Vector128<uint> ConvertToVector128UInt32(Vector256<long> value);
        public static Vector128<uint> ConvertToVector128UInt32(Vector256<ulong> value);
        public static Vector128<uint> ConvertToVector128UInt32(Vector256<double> value);
        public static Vector128<uint> ConvertToVector128UInt32WithSaturation(Vector256<ulong> value);
        public static Vector128<uint> ConvertToVector128UInt32WithTruncation(Vector256<double> value);
        public static Vector256<double> ConvertToVector256Double(Vector128<uint> value);
        public static Vector256<float> ConvertToVector256Single(Vector256<uint> value);
        public static Vector256<uint> ConvertToVector256UInt32(Vector256<float> value);
        public static Vector256<uint> ConvertToVector256UInt32WithTruncation(Vector256<float> value);
        public static Vector256<float> Fixup(Vector256<float> left, Vector256<float> right, Vector256<int> table, [ConstantExpected] byte control);
        public static Vector256<double> Fixup(Vector256<double> left, Vector256<double> right, Vector256<long> table, [ConstantExpected] byte control);
        public static Vector256<float> GetExponent(Vector256<float> value);
        public static Vector256<double> GetExponent(Vector256<double> value);
        public static Vector256<float> GetMantissa(Vector256<float> value, [ConstantExpected(Max = (byte)(0x0F))] byte control);
        public static Vector256<double> GetMantissa(Vector256<double> value, [ConstantExpected(Max = (byte)(0x0F))] byte control);
        public static Vector256<long> Max(Vector256<long> left, Vector256<long> right);
        public static Vector256<ulong> Max(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<long> Min(Vector256<long> left, Vector256<long> right);
        public static Vector256<ulong> Min(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<long> PermuteVar4x64(Vector256<long> value, Vector256<long> control);
        public static Vector256<ulong> PermuteVar4x64(Vector256<ulong> value, Vector256<ulong> control);
        public static Vector256<double> PermuteVar4x64(Vector256<double> value, Vector256<long> control);
        public static Vector256<long> PermuteVar4x64x2(Vector256<long> lower, Vector256<long> indices, Vector256<long> upper);
        public static Vector256<ulong> PermuteVar4x64x2(Vector256<ulong> lower, Vector256<ulong> indices, Vector256<ulong> upper);
        public static Vector256<double> PermuteVar4x64x2(Vector256<double> lower, Vector256<long> indices, Vector256<double> upper);
        public static Vector256<int> PermuteVar8x32x2(Vector256<int> lower, Vector256<int> indices, Vector256<int> upper);
        public static Vector256<uint> PermuteVar8x32x2(Vector256<uint> lower, Vector256<uint> indices, Vector256<uint> upper);
        public static Vector256<float> PermuteVar8x32x2(Vector256<float> lower, Vector256<int> indices, Vector256<float> upper);
        public static Vector256<float> Reciprocal14(Vector256<float> value);
        public static Vector256<double> Reciprocal14(Vector256<double> value);
        public static Vector256<float> ReciprocalSqrt14(Vector256<float> value);
        public static Vector256<double> ReciprocalSqrt14(Vector256<double> value);
        public static Vector256<int> RotateLeft(Vector256<int> value, [ConstantExpected] byte count);
        public static Vector256<uint> RotateLeft(Vector256<uint> value, [ConstantExpected] byte count);
        public static Vector256<long> RotateLeft(Vector256<long> value, [ConstantExpected] byte count);
        public static Vector256<ulong> RotateLeft(Vector256<ulong> value, [ConstantExpected] byte count);
        public static Vector256<int> RotateLeftVariable(Vector256<int> value, Vector256<uint> count);
        public static Vector256<uint> RotateLeftVariable(Vector256<uint> value, Vector256<uint> count);
        public static Vector256<long> RotateLeftVariable(Vector256<long> value, Vector256<ulong> count);
        public static Vector256<ulong> RotateLeftVariable(Vector256<ulong> value, Vector256<ulong> count);
        public static Vector256<int> RotateRight(Vector256<int> value, [ConstantExpected] byte count);
        public static Vector256<uint> RotateRight(Vector256<uint> value, [ConstantExpected] byte count);
        public static Vector256<long> RotateRight(Vector256<long> value, [ConstantExpected] byte count);
        public static Vector256<ulong> RotateRight(Vector256<ulong> value, [ConstantExpected] byte count);
        public static Vector256<int> RotateRightVariable(Vector256<int> value, Vector256<uint> count);
        public static Vector256<uint> RotateRightVariable(Vector256<uint> value, Vector256<uint> count);
        public static Vector256<long> RotateRightVariable(Vector256<long> value, Vector256<ulong> count);
        public static Vector256<ulong> RotateRightVariable(Vector256<ulong> value, Vector256<ulong> count);
        public static Vector256<float> RoundScale(Vector256<float> value, [ConstantExpected] byte control);
        public static Vector256<double> RoundScale(Vector256<double> value, [ConstantExpected] byte control);
        public static Vector256<float> Scale(Vector256<float> left, Vector256<float> right);
        public static Vector256<double> Scale(Vector256<double> left, Vector256<double> right);
        public static Vector256<long> ShiftRightArithmetic(Vector256<long> value, Vector128<long> count);
        public static Vector256<long> ShiftRightArithmetic(Vector256<long> value, [ConstantExpected] byte count);
        public static Vector256<long> ShiftRightArithmeticVariable(Vector256<long> value, Vector256<ulong> count);
        public static Vector256<double> Shuffle2x128(Vector256<double> left, Vector256<double> right, [ConstantExpected] byte control);
        public static Vector256<int> Shuffle2x128(Vector256<int> left, Vector256<int> right, [ConstantExpected] byte control);
        public static Vector256<long> Shuffle2x128(Vector256<long> left, Vector256<long> right, [ConstantExpected] byte control);
        public static Vector256<float> Shuffle2x128(Vector256<float> left, Vector256<float> right, [ConstantExpected] byte control);
        public static Vector256<uint> Shuffle2x128(Vector256<uint> left, Vector256<uint> right, [ConstantExpected] byte control);
        public static Vector256<ulong> Shuffle2x128(Vector256<ulong> left, Vector256<ulong> right, [ConstantExpected] byte control);
        public static Vector256<sbyte> TernaryLogic(Vector256<sbyte> a, Vector256<sbyte> b, Vector256<sbyte> c, [ConstantExpected] byte control);
        public static Vector256<byte> TernaryLogic(Vector256<byte> a, Vector256<byte> b, Vector256<byte> c, [ConstantExpected] byte control);
        public static Vector256<short> TernaryLogic(Vector256<short> a, Vector256<short> b, Vector256<short> c, [ConstantExpected] byte control);
        public static Vector256<ushort> TernaryLogic(Vector256<ushort> a, Vector256<ushort> b, Vector256<ushort> c, [ConstantExpected] byte control);
        public static Vector256<int> TernaryLogic(Vector256<int> a, Vector256<int> b, Vector256<int> c, [ConstantExpected] byte control);
        public static Vector256<uint> TernaryLogic(Vector256<uint> a, Vector256<uint> b, Vector256<uint> c, [ConstantExpected] byte control);
        public static Vector256<long> TernaryLogic(Vector256<long> a, Vector256<long> b, Vector256<long> c, [ConstantExpected] byte control);
        public static Vector256<ulong> TernaryLogic(Vector256<ulong> a, Vector256<ulong> b, Vector256<ulong> c, [ConstantExpected] byte control);
        public static Vector256<float> TernaryLogic(Vector256<float> a, Vector256<float> b, Vector256<float> c, [ConstantExpected] byte control);
        public static Vector256<double> TernaryLogic(Vector256<double> a, Vector256<double> b, Vector256<double> c, [ConstantExpected] byte control);

        /// From AVX512BW VL
        public static Vector256<byte> CompareGreaterThan(Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareGreaterThanOrEqual(Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareLessThan(Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareLessThanOrEqual(Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareNotEqual(Vector256<byte> left, Vector256<byte> right);
        public static Vector256<short> CompareGreaterThanOrEqual(Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareLessThan(Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareLessThanOrEqual(Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareNotEqual(Vector256<short> left, Vector256<short> right);
        public static Vector256<sbyte> CompareGreaterThanOrEqual(Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareLessThan(Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareLessThanOrEqual(Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareNotEqual(Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<ushort> CompareGreaterThan(Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareGreaterThanOrEqual(Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareLessThan(Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareLessThanOrEqual(Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareNotEqual(Vector256<ushort> left, Vector256<ushort> right);
        public static Vector128<byte> ConvertToVector128Byte(Vector256<short> value);
        public static Vector128<byte> ConvertToVector128Byte(Vector256<ushort> value);
        public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector256<ushort> value);
        public static Vector128<sbyte> ConvertToVector128SByte(Vector256<short> value);
        public static Vector128<sbyte> ConvertToVector128SByte(Vector256<ushort> value);
        public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector256<short> value);
        public static Vector256<short> PermuteVar16x16(Vector256<short> left, Vector256<short> control);
        public static Vector256<ushort> PermuteVar16x16(Vector256<ushort> left, Vector256<ushort> control);
        public static Vector256<short> PermuteVar16x16x2(Vector256<short> lower, Vector256<short> indices, Vector256<short> upper);
        public static Vector256<ushort> PermuteVar16x16x2(Vector256<ushort> lower, Vector256<ushort> indices, Vector256<ushort> upper);
        public static Vector256<short> ShiftLeftLogicalVariable(Vector256<short> value, Vector256<ushort> count);
        public static Vector256<ushort> ShiftLeftLogicalVariable(Vector256<ushort> value, Vector256<ushort> count);
        public static Vector256<short> ShiftRightArithmeticVariable(Vector256<short> value, Vector256<ushort> count);
        public static Vector256<short> ShiftRightLogicalVariable(Vector256<short> value, Vector256<ushort> count);
        public static Vector256<ushort> ShiftRightLogicalVariable(Vector256<ushort> value, Vector256<ushort> count);
        public static Vector256<ushort> SumAbsoluteDifferencesInBlock32(Vector256<byte> left, Vector256<byte> right, [ConstantExpected] byte control);

        /// FROM AVX512CD VL
        public static Vector256<int> DetectConflicts(Vector256<int> value);
        public static Vector256<uint> DetectConflicts(Vector256<uint> value);
        public static Vector256<long> DetectConflicts(Vector256<long> value);
        public static Vector256<ulong> DetectConflicts(Vector256<ulong> value);
        public static Vector256<int> LeadingZeroCount(Vector256<int> value);
        public static Vector256<uint> LeadingZeroCount(Vector256<uint> value);
        public static Vector256<long> LeadingZeroCount(Vector256<long> value);
        public static Vector256<ulong> LeadingZeroCount(Vector256<ulong> value);
        
        /// From AVX512DQ VL
        public static Vector256<int> BroadcastPairScalarToVector256(Vector128<int> value);
        public static Vector256<uint> BroadcastPairScalarToVector256(Vector128<uint> value);
        public static Vector256<float> BroadcastPairScalarToVector256(Vector128<float> value);
        public static Vector128<float> ConvertToVector128Single(Vector256<long> value);
        public static Vector128<float> ConvertToVector128Single(Vector256<ulong> value);
        public static Vector256<double> ConvertToVector256Double(Vector256<long> value);
        public static Vector256<double> ConvertToVector256Double(Vector256<ulong> value);
        public static Vector256<long> ConvertToVector256Int64(Vector128<float> value);
        public static Vector256<long> ConvertToVector256Int64(Vector256<double> value);
        public static Vector256<long> ConvertToVector256Int64WithTruncation(Vector128<float> value);
        public static Vector256<long> ConvertToVector256Int64WithTruncation(Vector256<double> value);
        public static Vector256<ulong> ConvertToVector256UInt64(Vector128<float> value);
        public static Vector256<ulong> ConvertToVector256UInt64(Vector256<double> value);
        public static Vector256<ulong> ConvertToVector256UInt64WithTruncation(Vector128<float> value);
        public static Vector256<ulong> ConvertToVector256UInt64WithTruncation(Vector256<double> value);
        public static Vector256<long> MultiplyLow(Vector256<long> left, Vector256<long> right);
        public static Vector256<ulong> MultiplyLow(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<float> Range(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = (byte)(0x0F))] byte control);
        public static Vector256<double> Range(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = (byte)(0x0F))] byte control);
        public static Vector256<float> Reduce(Vector256<float> value, [ConstantExpected] byte control);
        public static Vector256<double> Reduce(Vector256<double> value, [ConstantExpected] byte control);

        /// From AVX512_Vbmi_VL
        public static Vector256<sbyte> PermuteVar32x8(Vector256<sbyte> left, Vector256<sbyte> control);
        public static Vector256<byte> PermuteVar32x8(Vector256<byte> left, Vector256<byte> control);
        public static Vector256<byte> PermuteVar32x8x2(Vector256<byte> lower, Vector256<byte> indices, Vector256<byte> upper);
        public static Vector256<sbyte> PermuteVar32x8x2(Vector256<sbyte> lower, Vector256<sbyte> indices, Vector256<sbyte> upper);
    }

    public abstract class V512 
    {
      public static new bool IsSupported { get; }
      // no changes, place holder for future versions
    }
}

V512 Surface Area

For developer ease-of-use, one alternative design is to duplicate the AVX512 512-bit API surface in the V512 class, so the developer does not have to explicitly reference existing AVX512 APIs. Note that this requires duplicating a large amount of API surface.

namespace System.Runtime.Intrinsics.X86;

public abstract class Avx10v1 
{
    public static new bool IsSupported { get; }
    
    /// From AVX512F VL
    public static Vector128<ulong> Abs(Vector128<long> value);
    public static Vector128<int> AlignRight32(Vector128<int> left, Vector128<int> right, [ConstantExpected] byte mask);
    public static Vector128<uint> AlignRight32(Vector128<uint> left, Vector128<uint> right, [ConstantExpected] byte mask);
    public static Vector128<long> AlignRight64(Vector128<long> left, Vector128<long> right, [ConstantExpected] byte mask);
    public static Vector128<ulong> AlignRight64(Vector128<ulong> left, Vector128<ulong> right, [ConstantExpected] byte mask);
    public static Vector128<int> CompareGreaterThanOrEqual(Vector128<int> left, Vector128<int> right);
    public static Vector128<int> CompareLessThan(Vector128<int> left, Vector128<int> right);
    public static Vector128<int> CompareLessThanOrEqual(Vector128<int> left, Vector128<int> right);
    public static Vector128<int> CompareNotEqual(Vector128<int> left, Vector128<int> right);
    public static Vector128<long> CompareGreaterThanOrEqual(Vector128<long> left, Vector128<long> right);
    public static Vector128<long> CompareLessThan(Vector128<long> left, Vector128<long> right);
    public static Vector128<long> CompareLessThanOrEqual(Vector128<long> left, Vector128<long> right);
    public static Vector128<long> CompareNotEqual(Vector128<long> left, Vector128<long> right);
    public static Vector128<uint> CompareGreaterThan(Vector128<uint> left, Vector128<uint> right);
    public static Vector128<uint> CompareGreaterThanOrEqual(Vector128<uint> left, Vector128<uint> right);
    public static Vector128<uint> CompareLessThan(Vector128<uint> left, Vector128<uint> right);
    public static Vector128<uint> CompareLessThanOrEqual(Vector128<uint> left, Vector128<uint> right);
    public static Vector128<uint> CompareNotEqual(Vector128<uint> left, Vector128<uint> right);
    public static Vector128<ulong> CompareGreaterThan(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<ulong> CompareGreaterThanOrEqual(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<ulong> CompareLessThan(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<ulong> CompareLessThanOrEqual(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<ulong> CompareNotEqual(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<byte> ConvertToVector128Byte(Vector128<int> value);
    public static Vector128<byte> ConvertToVector128Byte(Vector128<long> value);
    public static Vector128<byte> ConvertToVector128Byte(Vector128<uint> value);
    public static Vector128<byte> ConvertToVector128Byte(Vector128<ulong> value);
    public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector128<uint> value);
    public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector128<ulong> value);
    public static Vector128<double> ConvertToVector128Double(Vector128<uint> value);
    public static Vector128<short> ConvertToVector128Int16(Vector128<int> value);
    public static Vector128<short> ConvertToVector128Int16(Vector128<long> value);
    public static Vector128<short> ConvertToVector128Int16(Vector128<uint> value);
    public static Vector128<short> ConvertToVector128Int16(Vector128<ulong> value);
    public static Vector128<short> ConvertToVector128Int16WithSaturation(Vector128<int> value);
    public static Vector128<short> ConvertToVector128Int16WithSaturation(Vector128<long> value);
    public static Vector128<int> ConvertToVector128Int32(Vector128<long> value);
    public static Vector128<int> ConvertToVector128Int32(Vector128<ulong> value);
    public static Vector128<int> ConvertToVector128Int32WithSaturation(Vector128<long> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector128<int> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector128<long> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector128<uint> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector128<ulong> value);
    public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector128<int> value);
    public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector128<long> value);
    public static Vector128<float> ConvertToVector128Single(Vector128<uint> value);
    public static Vector128<ushort> ConvertToVector128UInt16(Vector128<int> value);
    public static Vector128<ushort> ConvertToVector128UInt16(Vector128<long> value);
    public static Vector128<ushort> ConvertToVector128UInt16(Vector128<uint> value);
    public static Vector128<ushort> ConvertToVector128UInt16(Vector128<ulong> value);
    public static Vector128<ushort> ConvertToVector128UInt16WithSaturation(Vector128<uint> value);
    public static Vector128<ushort> ConvertToVector128UInt16WithSaturation(Vector128<ulong> value);
    public static Vector128<uint> ConvertToVector128UInt32(Vector128<long> value);
    public static Vector128<uint> ConvertToVector128UInt32(Vector128<ulong> value);
    public static Vector128<uint> ConvertToVector128UInt32(Vector128<float> value);
    public static Vector128<uint> ConvertToVector128UInt32(Vector128<double> value);
    public static Vector128<uint> ConvertToVector128UInt32WithSaturation(Vector128<ulong> value);
    public static Vector128<uint> ConvertToVector128UInt32WithTruncation(Vector128<float> value);
    public static Vector128<uint> ConvertToVector128UInt32WithTruncation(Vector128<double> value);
    public static Vector128<float> Fixup(Vector128<float> left, Vector128<float> right, Vector128<int> table, [ConstantExpected] byte control);
    public static Vector128<double> Fixup(Vector128<double> left, Vector128<double> right, Vector128<long> table, [ConstantExpected] byte control);
    public static Vector128<float> GetExponent(Vector128<float> value);
    public static Vector128<double> GetExponent(Vector128<double> value);
    public static Vector128<float> GetMantissa(Vector128<float> value, [ConstantExpected(Max = (byte)(0x0F))] byte control);
    public static Vector128<double> GetMantissa(Vector128<double> value, [ConstantExpected(Max = (byte)(0x0F))] byte control);
    public static Vector128<long> Max(Vector128<long> left, Vector128<long> right);
    public static Vector128<ulong> Max(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<long> Min(Vector128<long> left, Vector128<long> right);
    public static Vector128<ulong> Min(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<long> PermuteVar2x64x2(Vector128<long> lower, Vector128<long> indices, Vector128<long> upper);
    public static Vector128<ulong> PermuteVar2x64x2(Vector128<ulong> lower, Vector128<ulong> indices, Vector128<ulong> upper);
    public static Vector128<double> PermuteVar2x64x2(Vector128<double> lower, Vector128<long> indices, Vector128<double> upper);
    public static Vector128<int> PermuteVar4x32x2(Vector128<int> lower, Vector128<int> indices, Vector128<int> upper);
    public static Vector128<uint> PermuteVar4x32x2(Vector128<uint> lower, Vector128<uint> indices, Vector128<uint> upper);
    public static Vector128<float> PermuteVar4x32x2(Vector128<float> lower, Vector128<int> indices, Vector128<float> upper);
    public static Vector128<float> Reciprocal14(Vector128<float> value);
    public static Vector128<double> Reciprocal14(Vector128<double> value);
    public static Vector128<float> ReciprocalSqrt14(Vector128<float> value);
    public static Vector128<double> ReciprocalSqrt14(Vector128<double> value);
    public static Vector128<int> RotateLeft(Vector128<int> value, [ConstantExpected] byte count);
    public static Vector128<uint> RotateLeft(Vector128<uint> value, [ConstantExpected] byte count);
    public static Vector128<long> RotateLeft(Vector128<long> value, [ConstantExpected] byte count);
    public static Vector128<ulong> RotateLeft(Vector128<ulong> value, [ConstantExpected] byte count);
    public static Vector128<int> RotateLeftVariable(Vector128<int> value, Vector128<uint> count);
    public static Vector128<uint> RotateLeftVariable(Vector128<uint> value, Vector128<uint> count);
    public static Vector128<long> RotateLeftVariable(Vector128<long> value, Vector128<ulong> count);
    public static Vector128<ulong> RotateLeftVariable(Vector128<ulong> value, Vector128<ulong> count);
    public static Vector128<int> RotateRight(Vector128<int> value, [ConstantExpected] byte count);
    public static Vector128<uint> RotateRight(Vector128<uint> value, [ConstantExpected] byte count);
    public static Vector128<long> RotateRight(Vector128<long> value, [ConstantExpected] byte count);
    public static Vector128<ulong> RotateRight(Vector128<ulong> value, [ConstantExpected] byte count);
    public static Vector128<int> RotateRightVariable(Vector128<int> value, Vector128<uint> count);
    public static Vector128<uint> RotateRightVariable(Vector128<uint> value, Vector128<uint> count);
    public static Vector128<long> RotateRightVariable(Vector128<long> value, Vector128<ulong> count);
    public static Vector128<ulong> RotateRightVariable(Vector128<ulong> value, Vector128<ulong> count);
    public static Vector128<float> RoundScale(Vector128<float> value, [ConstantExpected] byte control);
    public static Vector128<double> RoundScale(Vector128<double> value, [ConstantExpected] byte control);
    public static Vector128<float> Scale(Vector128<float> left, Vector128<float> right);
    public static Vector128<double> Scale(Vector128<double> left, Vector128<double> right);
    public static Vector128<long> ShiftRightArithmetic(Vector128<long> value, Vector128<long> count);
    public static Vector128<long> ShiftRightArithmetic(Vector128<long> value, [ConstantExpected] byte count);
    public static Vector128<long> ShiftRightArithmeticVariable(Vector128<long> value, Vector128<ulong> count);
    public static Vector128<sbyte> TernaryLogic(Vector128<sbyte> a, Vector128<sbyte> b, Vector128<sbyte> c, [ConstantExpected] byte control);
    public static Vector128<byte> TernaryLogic(Vector128<byte> a, Vector128<byte> b, Vector128<byte> c, [ConstantExpected] byte control);
    public static Vector128<short> TernaryLogic(Vector128<short> a, Vector128<short> b, Vector128<short> c, [ConstantExpected] byte control);
    public static Vector128<ushort> TernaryLogic(Vector128<ushort> a, Vector128<ushort> b, Vector128<ushort> c, [ConstantExpected] byte control);
    public static Vector128<int> TernaryLogic(Vector128<int> a, Vector128<int> b, Vector128<int> c, [ConstantExpected] byte control);
    public static Vector128<uint> TernaryLogic(Vector128<uint> a, Vector128<uint> b, Vector128<uint> c, [ConstantExpected] byte control);
    public static Vector128<long> TernaryLogic(Vector128<long> a, Vector128<long> b, Vector128<long> c, [ConstantExpected] byte control);
    public static Vector128<ulong> TernaryLogic(Vector128<ulong> a, Vector128<ulong> b, Vector128<ulong> c, [ConstantExpected] byte control);
    public static Vector128<float> TernaryLogic(Vector128<float> a, Vector128<float> b, Vector128<float> c, [ConstantExpected] byte control);
    public static Vector128<double> TernaryLogic(Vector128<double> a, Vector128<double> b, Vector128<double> c, [ConstantExpected] byte control);

    /// From AVX512BW VL
    public static Vector128<byte> CompareGreaterThan(Vector128<byte> left, Vector128<byte> right);
    public static Vector128<byte> CompareGreaterThanOrEqual(Vector128<byte> left, Vector128<byte> right);
    public static Vector128<byte> CompareLessThan(Vector128<byte> left, Vector128<byte> right);
    public static Vector128<byte> CompareLessThanOrEqual(Vector128<byte> left, Vector128<byte> right);
    public static Vector128<byte> CompareNotEqual(Vector128<byte> left, Vector128<byte> right);
    public static Vector128<short> CompareGreaterThanOrEqual(Vector128<short> left, Vector128<short> right);
    public static Vector128<short> CompareLessThan(Vector128<short> left, Vector128<short> right);
    public static Vector128<short> CompareLessThanOrEqual(Vector128<short> left, Vector128<short> right);
    public static Vector128<short> CompareNotEqual(Vector128<short> left, Vector128<short> right);
    public static Vector128<sbyte> CompareGreaterThanOrEqual(Vector128<sbyte> left, Vector128<sbyte> right);
    public static Vector128<sbyte> CompareLessThan(Vector128<sbyte> left, Vector128<sbyte> right);
    public static Vector128<sbyte> CompareLessThanOrEqual(Vector128<sbyte> left, Vector128<sbyte> right);
    public static Vector128<sbyte> CompareNotEqual(Vector128<sbyte> left, Vector128<sbyte> right);
    public static Vector128<ushort> CompareGreaterThan(Vector128<ushort> left, Vector128<ushort> right);
    public static Vector128<ushort> CompareGreaterThanOrEqual(Vector128<ushort> left, Vector128<ushort> right);
    public static Vector128<ushort> CompareLessThan(Vector128<ushort> left, Vector128<ushort> right);
    public static Vector128<ushort> CompareLessThanOrEqual(Vector128<ushort> left, Vector128<ushort> right);
    public static Vector128<ushort> CompareNotEqual(Vector128<ushort> left, Vector128<ushort> right);
    public static Vector128<byte> ConvertToVector128Byte(Vector128<short> value);
    public static Vector128<byte> ConvertToVector128Byte(Vector128<ushort> value);
    public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector128<ushort> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector128<short> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector128<ushort> value);
    public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector128<short> value);
    public static Vector128<short> PermuteVar8x16(Vector128<short> left, Vector128<short> control);
    public static Vector128<ushort> PermuteVar8x16(Vector128<ushort> left, Vector128<ushort> control);
    public static Vector128<short> PermuteVar8x16x2(Vector128<short> lower, Vector128<short> indices, Vector128<short> upper);
    public static Vector128<ushort> PermuteVar8x16x2(Vector128<ushort> lower, Vector128<ushort> indices, Vector128<ushort> upper);
    public static Vector128<short> ShiftLeftLogicalVariable(Vector128<short> value, Vector128<ushort> count);
    public static Vector128<ushort> ShiftLeftLogicalVariable(Vector128<ushort> value, Vector128<ushort> count);
    public static Vector128<short> ShiftRightArithmeticVariable(Vector128<short> value, Vector128<ushort> count);
    public static Vector128<short> ShiftRightLogicalVariable(Vector128<short> value, Vector128<ushort> count);
    public static Vector128<ushort> ShiftRightLogicalVariable(Vector128<ushort> value, Vector128<ushort> count);
    public static Vector128<ushort> SumAbsoluteDifferencesInBlock32(Vector128<byte> left, Vector128<byte> right, [ConstantExpected] byte control);

    /// From AVX512CD VL
    public static Vector128<int> DetectConflicts(Vector128<int> value);
    public static Vector128<uint> DetectConflicts(Vector128<uint> value);
    public static Vector128<long> DetectConflicts(Vector128<long> value);
    public static Vector128<ulong> DetectConflicts(Vector128<ulong> value);
    public static Vector128<int> LeadingZeroCount(Vector128<int> value);
    public static Vector128<uint> LeadingZeroCount(Vector128<uint> value);
    public static Vector128<long> LeadingZeroCount(Vector128<long> value);
    public static Vector128<ulong> LeadingZeroCount(Vector128<ulong> value);

    /// From AVX512DQ VL
    public static Vector128<int> BroadcastPairScalarToVector128(Vector128<int> value);
    public static Vector128<uint> BroadcastPairScalarToVector128(Vector128<uint> value);
    public static Vector128<double> ConvertToVector128Double(Vector128<long> value);
    public static Vector128<double> ConvertToVector128Double(Vector128<ulong> value);
    public static Vector128<long> ConvertToVector128Int64(Vector128<float> value);
    public static Vector128<long> ConvertToVector128Int64(Vector128<double> value);
    public static Vector128<long> ConvertToVector128Int64WithTruncation(Vector128<float> value);
    public static Vector128<long> ConvertToVector128Int64WithTruncation(Vector128<double> value);
    public static Vector128<float> ConvertToVector128Single(Vector128<long> value);
    public static Vector128<float> ConvertToVector128Single(Vector128<ulong> value);
    public static Vector128<ulong> ConvertToVector128UInt64(Vector128<float> value);
    public static Vector128<ulong> ConvertToVector128UInt64(Vector128<double> value);
    public static Vector128<ulong> ConvertToVector128UInt64WithTruncation(Vector128<float> value);
    public static Vector128<ulong> ConvertToVector128UInt64WithTruncation(Vector128<double> value);
    public static Vector128<long> MultiplyLow(Vector128<long> left, Vector128<long> right);
    public static Vector128<ulong> MultiplyLow(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<float> Range(Vector128<float> left, Vector128<float> right, [ConstantExpected(Max = (byte)(0x0F))] byte control);
    public static Vector128<double> Range(Vector128<double> left, Vector128<double> right, [ConstantExpected(Max = (byte)(0x0F))] byte control);
    public static Vector128<float> Reduce(Vector128<float> value, [ConstantExpected] byte control);
    public static Vector128<double> Reduce(Vector128<double> value, [ConstantExpected] byte control);

    /// From AVX512_Vbmi_VL
    public static Vector128<sbyte> PermuteVar16x8(Vector128<sbyte> left, Vector128<sbyte> control);
    public static Vector128<byte> PermuteVar16x8(Vector128<byte> left, Vector128<byte> control);
    public static Vector128<byte> PermuteVar16x8x2(Vector128<byte> lower, Vector128<byte> indices, Vector128<byte> upper);
    public static Vector128<sbyte> PermuteVar16x8x2(Vector128<sbyte> lower, Vector128<sbyte> indices, Vector128<sbyte> upper);
       
    public abstract class V256 
    {
        public static new bool IsSupported { get; }
        
        /// From AVX512F VL
        public static Vector256<ulong> Abs(Vector256<long> value);
        public static Vector256<int> AlignRight32(Vector256<int> left, Vector256<int> right, [ConstantExpected] byte mask);
        public static Vector256<uint> AlignRight32(Vector256<uint> left, Vector256<uint> right, [ConstantExpected] byte mask);
        public static Vector256<long> AlignRight64(Vector256<long> left, Vector256<long> right, [ConstantExpected] byte mask);
        public static Vector256<ulong> AlignRight64(Vector256<ulong> left, Vector256<ulong> right, [ConstantExpected] byte mask);
        public static Vector256<int> CompareGreaterThanOrEqual(Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareLessThan(Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareLessThanOrEqual(Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareNotEqual(Vector256<int> left, Vector256<int> right);
        public static Vector256<long> CompareGreaterThanOrEqual(Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareLessThan(Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareLessThanOrEqual(Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareNotEqual(Vector256<long> left, Vector256<long> right);
        public static Vector256<uint> CompareGreaterThan(Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareGreaterThanOrEqual(Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareLessThan(Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareLessThanOrEqual(Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareNotEqual(Vector256<uint> left, Vector256<uint> right);
        public static Vector256<ulong> CompareGreaterThan(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareGreaterThanOrEqual(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareLessThan(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareLessThanOrEqual(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareNotEqual(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector128<byte> ConvertToVector128Byte(Vector256<int> value);
        public static Vector128<byte> ConvertToVector128Byte(Vector256<long> value);
        public static Vector128<byte> ConvertToVector128Byte(Vector256<uint> value);
        public static Vector128<byte> ConvertToVector128Byte(Vector256<ulong> value);
        public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector256<uint> value);
        public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector256<ulong> value);
        public static Vector128<short> ConvertToVector128Int16(Vector256<int> value);
        public static Vector128<short> ConvertToVector128Int16(Vector256<long> value);
        public static Vector128<short> ConvertToVector128Int16(Vector256<uint> value);
        public static Vector128<short> ConvertToVector128Int16(Vector256<ulong> value);
        public static Vector128<short> ConvertToVector128Int16WithSaturation(Vector256<int> value);
        public static Vector128<short> ConvertToVector128Int16WithSaturation(Vector256<long> value);
        public static Vector128<int> ConvertToVector128Int32(Vector256<long> value);
        public static Vector128<int> ConvertToVector128Int32(Vector256<ulong> value);
        public static Vector128<int> ConvertToVector128Int32WithSaturation(Vector256<long> value);
        public static Vector128<sbyte> ConvertToVector128SByte(Vector256<int> value);
        public static Vector128<sbyte> ConvertToVector128SByte(Vector256<long> value);
        public static Vector128<sbyte> ConvertToVector128SByte(Vector256<uint> value);
        public static Vector128<sbyte> ConvertToVector128SByte(Vector256<ulong> value);
        public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector256<int> value);
        public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector256<long> value);
        public static Vector128<ushort> ConvertToVector128UInt16(Vector256<int> value);
        public static Vector128<ushort> ConvertToVector128UInt16(Vector256<long> value);
        public static Vector128<ushort> ConvertToVector128UInt16(Vector256<uint> value);
        public static Vector128<ushort> ConvertToVector128UInt16(Vector256<ulong> value);
        public static Vector128<ushort> ConvertToVector128UInt16WithSaturation(Vector256<uint> value);
        public static Vector128<ushort> ConvertToVector128UInt16WithSaturation(Vector256<ulong> value);
        public static Vector128<uint> ConvertToVector128UInt32(Vector256<long> value);
        public static Vector128<uint> ConvertToVector128UInt32(Vector256<ulong> value);
        public static Vector128<uint> ConvertToVector128UInt32(Vector256<double> value);
        public static Vector128<uint> ConvertToVector128UInt32WithSaturation(Vector256<ulong> value);
        public static Vector128<uint> ConvertToVector128UInt32WithTruncation(Vector256<double> value);
        public static Vector256<double> ConvertToVector256Double(Vector128<uint> value);
        public static Vector256<float> ConvertToVector256Single(Vector256<uint> value);
        public static Vector256<uint> ConvertToVector256UInt32(Vector256<float> value);
        public static Vector256<uint> ConvertToVector256UInt32WithTruncation(Vector256<float> value);
        public static Vector256<float> Fixup(Vector256<float> left, Vector256<float> right, Vector256<int> table, [ConstantExpected] byte control);
        public static Vector256<double> Fixup(Vector256<double> left, Vector256<double> right, Vector256<long> table, [ConstantExpected] byte control);
        public static Vector256<float> GetExponent(Vector256<float> value);
        public static Vector256<double> GetExponent(Vector256<double> value);
        public static Vector256<float> GetMantissa(Vector256<float> value, [ConstantExpected(Max = (byte)(0x0F))] byte control);
        public static Vector256<double> GetMantissa(Vector256<double> value, [ConstantExpected(Max = (byte)(0x0F))] byte control);
        public static Vector256<long> Max(Vector256<long> left, Vector256<long> right);
        public static Vector256<ulong> Max(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<long> Min(Vector256<long> left, Vector256<long> right);
        public static Vector256<ulong> Min(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<long> PermuteVar4x64(Vector256<long> value, Vector256<long> control);
        public static Vector256<ulong> PermuteVar4x64(Vector256<ulong> value, Vector256<ulong> control);
        public static Vector256<double> PermuteVar4x64(Vector256<double> value, Vector256<long> control);
        public static Vector256<long> PermuteVar4x64x2(Vector256<long> lower, Vector256<long> indices, Vector256<long> upper);
        public static Vector256<ulong> PermuteVar4x64x2(Vector256<ulong> lower, Vector256<ulong> indices, Vector256<ulong> upper);
        public static Vector256<double> PermuteVar4x64x2(Vector256<double> lower, Vector256<long> indices, Vector256<double> upper);
        public static Vector256<int> PermuteVar8x32x2(Vector256<int> lower, Vector256<int> indices, Vector256<int> upper);
        public static Vector256<uint> PermuteVar8x32x2(Vector256<uint> lower, Vector256<uint> indices, Vector256<uint> upper);
        public static Vector256<float> PermuteVar8x32x2(Vector256<float> lower, Vector256<int> indices, Vector256<float> upper);
        public static Vector256<float> Reciprocal14(Vector256<float> value);
        public static Vector256<double> Reciprocal14(Vector256<double> value);
        public static Vector256<float> ReciprocalSqrt14(Vector256<float> value);
        public static Vector256<double> ReciprocalSqrt14(Vector256<double> value);
        public static Vector256<int> RotateLeft(Vector256<int> value, [ConstantExpected] byte count);
        public static Vector256<uint> RotateLeft(Vector256<uint> value, [ConstantExpected] byte count);
        public static Vector256<long> RotateLeft(Vector256<long> value, [ConstantExpected] byte count);
        public static Vector256<ulong> RotateLeft(Vector256<ulong> value, [ConstantExpected] byte count);
        public static Vector256<int> RotateLeftVariable(Vector256<int> value, Vector256<uint> count);
        public static Vector256<uint> RotateLeftVariable(Vector256<uint> value, Vector256<uint> count);
        public static Vector256<long> RotateLeftVariable(Vector256<long> value, Vector256<ulong> count);
        public static Vector256<ulong> RotateLeftVariable(Vector256<ulong> value, Vector256<ulong> count);
        public static Vector256<int> RotateRight(Vector256<int> value, [ConstantExpected] byte count);
        public static Vector256<uint> RotateRight(Vector256<uint> value, [ConstantExpected] byte count);
        public static Vector256<long> RotateRight(Vector256<long> value, [ConstantExpected] byte count);
        public static Vector256<ulong> RotateRight(Vector256<ulong> value, [ConstantExpected] byte count);
        public static Vector256<int> RotateRightVariable(Vector256<int> value, Vector256<uint> count);
        public static Vector256<uint> RotateRightVariable(Vector256<uint> value, Vector256<uint> count);
        public static Vector256<long> RotateRightVariable(Vector256<long> value, Vector256<ulong> count);
        public static Vector256<ulong> RotateRightVariable(Vector256<ulong> value, Vector256<ulong> count);
        public static Vector256<float> RoundScale(Vector256<float> value, [ConstantExpected] byte control);
        public static Vector256<double> RoundScale(Vector256<double> value, [ConstantExpected] byte control);
        public static Vector256<float> Scale(Vector256<float> left, Vector256<float> right);
        public static Vector256<double> Scale(Vector256<double> left, Vector256<double> right);
        public static Vector256<long> ShiftRightArithmetic(Vector256<long> value, Vector128<long> count);
        public static Vector256<long> ShiftRightArithmetic(Vector256<long> value, [ConstantExpected] byte count);
        public static Vector256<long> ShiftRightArithmeticVariable(Vector256<long> value, Vector256<ulong> count);
        public static Vector256<double> Shuffle2x128(Vector256<double> left, Vector256<double> right, [ConstantExpected] byte control);
        public static Vector256<int> Shuffle2x128(Vector256<int> left, Vector256<int> right, [ConstantExpected] byte control);
        public static Vector256<long> Shuffle2x128(Vector256<long> left, Vector256<long> right, [ConstantExpected] byte control);
        public static Vector256<float> Shuffle2x128(Vector256<float> left, Vector256<float> right, [ConstantExpected] byte control);
        public static Vector256<uint> Shuffle2x128(Vector256<uint> left, Vector256<uint> right, [ConstantExpected] byte control);
        public static Vector256<ulong> Shuffle2x128(Vector256<ulong> left, Vector256<ulong> right, [ConstantExpected] byte control);
        public static Vector256<sbyte> TernaryLogic(Vector256<sbyte> a, Vector256<sbyte> b, Vector256<sbyte> c, [ConstantExpected] byte control);
        public static Vector256<byte> TernaryLogic(Vector256<byte> a, Vector256<byte> b, Vector256<byte> c, [ConstantExpected] byte control);
        public static Vector256<short> TernaryLogic(Vector256<short> a, Vector256<short> b, Vector256<short> c, [ConstantExpected] byte control);
        public static Vector256<ushort> TernaryLogic(Vector256<ushort> a, Vector256<ushort> b, Vector256<ushort> c, [ConstantExpected] byte control);
        public static Vector256<int> TernaryLogic(Vector256<int> a, Vector256<int> b, Vector256<int> c, [ConstantExpected] byte control);
        public static Vector256<uint> TernaryLogic(Vector256<uint> a, Vector256<uint> b, Vector256<uint> c, [ConstantExpected] byte control);
        public static Vector256<long> TernaryLogic(Vector256<long> a, Vector256<long> b, Vector256<long> c, [ConstantExpected] byte control);
        public static Vector256<ulong> TernaryLogic(Vector256<ulong> a, Vector256<ulong> b, Vector256<ulong> c, [ConstantExpected] byte control);
        public static Vector256<float> TernaryLogic(Vector256<float> a, Vector256<float> b, Vector256<float> c, [ConstantExpected] byte control);
        public static Vector256<double> TernaryLogic(Vector256<double> a, Vector256<double> b, Vector256<double> c, [ConstantExpected] byte control);

        /// From AVX512BW VL
        public static Vector256<byte> CompareGreaterThan(Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareGreaterThanOrEqual(Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareLessThan(Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareLessThanOrEqual(Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareNotEqual(Vector256<byte> left, Vector256<byte> right);
        public static Vector256<short> CompareGreaterThanOrEqual(Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareLessThan(Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareLessThanOrEqual(Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareNotEqual(Vector256<short> left, Vector256<short> right);
        public static Vector256<sbyte> CompareGreaterThanOrEqual(Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareLessThan(Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareLessThanOrEqual(Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareNotEqual(Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<ushort> CompareGreaterThan(Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareGreaterThanOrEqual(Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareLessThan(Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareLessThanOrEqual(Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareNotEqual(Vector256<ushort> left, Vector256<ushort> right);
        public static Vector128<byte> ConvertToVector128Byte(Vector256<short> value);
        public static Vector128<byte> ConvertToVector128Byte(Vector256<ushort> value);
        public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector256<ushort> value);
        public static Vector128<sbyte> ConvertToVector128SByte(Vector256<short> value);
        public static Vector128<sbyte> ConvertToVector128SByte(Vector256<ushort> value);
        public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector256<short> value);
        public static Vector256<short> PermuteVar16x16(Vector256<short> left, Vector256<short> control);
        public static Vector256<ushort> PermuteVar16x16(Vector256<ushort> left, Vector256<ushort> control);
        public static Vector256<short> PermuteVar16x16x2(Vector256<short> lower, Vector256<short> indices, Vector256<short> upper);
        public static Vector256<ushort> PermuteVar16x16x2(Vector256<ushort> lower, Vector256<ushort> indices, Vector256<ushort> upper);
        public static Vector256<short> ShiftLeftLogicalVariable(Vector256<short> value, Vector256<ushort> count);
        public static Vector256<ushort> ShiftLeftLogicalVariable(Vector256<ushort> value, Vector256<ushort> count);
        public static Vector256<short> ShiftRightArithmeticVariable(Vector256<short> value, Vector256<ushort> count);
        public static Vector256<short> ShiftRightLogicalVariable(Vector256<short> value, Vector256<ushort> count);
        public static Vector256<ushort> ShiftRightLogicalVariable(Vector256<ushort> value, Vector256<ushort> count);
        public static Vector256<ushort> SumAbsoluteDifferencesInBlock32(Vector256<byte> left, Vector256<byte> right, [ConstantExpected] byte control);

        /// FROM AVX512CD VL
        public static Vector256<int> DetectConflicts(Vector256<int> value);
        public static Vector256<uint> DetectConflicts(Vector256<uint> value);
        public static Vector256<long> DetectConflicts(Vector256<long> value);
        public static Vector256<ulong> DetectConflicts(Vector256<ulong> value);
        public static Vector256<int> LeadingZeroCount(Vector256<int> value);
        public static Vector256<uint> LeadingZeroCount(Vector256<uint> value);
        public static Vector256<long> LeadingZeroCount(Vector256<long> value);
        public static Vector256<ulong> LeadingZeroCount(Vector256<ulong> value);
        
        /// From AVX512DQ VL
        public static Vector256<int> BroadcastPairScalarToVector256(Vector128<int> value);
        public static Vector256<uint> BroadcastPairScalarToVector256(Vector128<uint> value);
        public static Vector256<float> BroadcastPairScalarToVector256(Vector128<float> value);
        public static Vector128<float> ConvertToVector128Single(Vector256<long> value);
        public static Vector128<float> ConvertToVector128Single(Vector256<ulong> value);
        public static Vector256<double> ConvertToVector256Double(Vector256<long> value);
        public static Vector256<double> ConvertToVector256Double(Vector256<ulong> value);
        public static Vector256<long> ConvertToVector256Int64(Vector128<float> value);
        public static Vector256<long> ConvertToVector256Int64(Vector256<double> value);
        public static Vector256<long> ConvertToVector256Int64WithTruncation(Vector128<float> value);
        public static Vector256<long> ConvertToVector256Int64WithTruncation(Vector256<double> value);
        public static Vector256<ulong> ConvertToVector256UInt64(Vector128<float> value);
        public static Vector256<ulong> ConvertToVector256UInt64(Vector256<double> value);
        public static Vector256<ulong> ConvertToVector256UInt64WithTruncation(Vector128<float> value);
        public static Vector256<ulong> ConvertToVector256UInt64WithTruncation(Vector256<double> value);
        public static Vector256<long> MultiplyLow(Vector256<long> left, Vector256<long> right);
        public static Vector256<ulong> MultiplyLow(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<float> Range(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = (byte)(0x0F))] byte control);
        public static Vector256<double> Range(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = (byte)(0x0F))] byte control);
        public static Vector256<float> Reduce(Vector256<float> value, [ConstantExpected] byte control);
        public static Vector256<double> Reduce(Vector256<double> value, [ConstantExpected] byte control);

        /// From AVX512_Vbmi_VL
        public static Vector256<sbyte> PermuteVar32x8(Vector256<sbyte> left, Vector256<sbyte> control);
        public static Vector256<byte> PermuteVar32x8(Vector256<byte> left, Vector256<byte> control);
        public static Vector256<byte> PermuteVar32x8x2(Vector256<byte> lower, Vector256<byte> indices, Vector256<byte> upper);
        public static Vector256<sbyte> PermuteVar32x8x2(Vector256<sbyte> lower, Vector256<sbyte> indices, Vector256<sbyte> upper);
    }

    public abstract class V512 : AVX512F
    {
        public static new bool IsSupported { get; }

        /// From AVX512BW
        public static Vector512<byte> Abs(Vector512<sbyte> value);
        public static Vector512<ushort> Abs(Vector512<short> value);
        public static Vector512<sbyte> Add(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<byte> Add(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<short> Add(Vector512<short> left, Vector512<short> right);
        public static Vector512<ushort> Add(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<sbyte> AddSaturate(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<byte> AddSaturate(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<short> AddSaturate(Vector512<short> left, Vector512<short> right);
        public static Vector512<ushort> AddSaturate(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<sbyte> AlignRight(Vector512<sbyte> left, Vector512<sbyte> right, [ConstantExpected] byte mask);
        public static Vector512<byte> AlignRight(Vector512<byte> left, Vector512<byte> right, [ConstantExpected] byte mask);
        public static Vector512<byte> Average(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<ushort> Average(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<byte> BlendVariable(Vector512<byte> left, Vector512<byte> right, Vector512<byte> mask);
        public static Vector512<short> BlendVariable(Vector512<short> left, Vector512<short> right, Vector512<short> mask);
        public static Vector512<sbyte> BlendVariable(Vector512<sbyte> left, Vector512<sbyte> right, Vector512<sbyte> mask);
        public static Vector512<ushort> BlendVariable(Vector512<ushort> left, Vector512<ushort> right, Vector512<ushort> mask);
        public static Vector512<byte> BroadcastScalarToVector512(Vector128<byte> value);
        public static Vector512<sbyte> BroadcastScalarToVector512(Vector128<sbyte> value);
        public static Vector512<short> BroadcastScalarToVector512(Vector128<short> value);
        public static Vector512<ushort> BroadcastScalarToVector512(Vector128<ushort> value);
        public static Vector512<byte> CompareEqual(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<byte> CompareGreaterThan(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<byte> CompareGreaterThanOrEqual(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<byte> CompareLessThan(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<byte> CompareLessThanOrEqual(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<byte> CompareNotEqual(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<short> CompareEqual(Vector512<short> left, Vector512<short> right);
        public static Vector512<short> CompareGreaterThan(Vector512<short> left, Vector512<short> right);
        public static Vector512<short> CompareGreaterThanOrEqual(Vector512<short> left, Vector512<short> right);
        public static Vector512<short> CompareLessThan(Vector512<short> left, Vector512<short> right);
        public static Vector512<short> CompareLessThanOrEqual(Vector512<short> left, Vector512<short> right);
        public static Vector512<short> CompareNotEqual(Vector512<short> left, Vector512<short> right);
        public static Vector512<sbyte> CompareEqual(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<sbyte> CompareGreaterThan(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<sbyte> CompareGreaterThanOrEqual(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<sbyte> CompareLessThan(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<sbyte> CompareLessThanOrEqual(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<sbyte> CompareNotEqual(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<ushort> CompareEqual(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<ushort> CompareGreaterThan(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<ushort> CompareGreaterThanOrEqual(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<ushort> CompareLessThan(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<ushort> CompareLessThanOrEqual(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<ushort> CompareNotEqual(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector256<byte> ConvertToVector256Byte(Vector512<short> value);
        public static Vector256<byte> ConvertToVector256Byte(Vector512<ushort> value);
        public static Vector256<byte> ConvertToVector256ByteWithSaturation(Vector512<ushort> value);
        public static Vector256<sbyte> ConvertToVector256SByte(Vector512<short> value);
        public static Vector256<sbyte> ConvertToVector256SByte(Vector512<ushort> value);
        public static Vector256<sbyte> ConvertToVector256SByteWithSaturation(Vector512<short> value);
        public static Vector512<short> ConvertToVector512Int16(Vector256<sbyte> value);
        public static Vector512<short> ConvertToVector512Int16(Vector256<byte> value);
        public static Vector512<ushort> ConvertToVector512UInt16(Vector256<sbyte> value);
        public static Vector512<ushort> ConvertToVector512UInt16(Vector256<byte> value);
        public static new unsafe Vector512<sbyte> LoadVector512(sbyte* address);
        public static new unsafe Vector512<byte> LoadVector512(byte* address);
        public static new unsafe Vector512<short> LoadVector512(short* address);
        public static new unsafe Vector512<ushort> LoadVector512(ushort* address);
        public static Vector512<sbyte> Max(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<byte> Max(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<short> Max(Vector512<short> left, Vector512<short> right);
        public static Vector512<ushort> Max(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<sbyte> Min(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<byte> Min(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<short> Min(Vector512<short> left, Vector512<short> right);
        public static Vector512<ushort> Min(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<int> MultiplyAddAdjacent(Vector512<short> left, Vector512<short> right);
        public static Vector512<short> MultiplyAddAdjacent(Vector512<byte> left, Vector512<sbyte> right);
        public static Vector512<short> MultiplyHigh(Vector512<short> left, Vector512<short> right);
        public static Vector512<ushort> MultiplyHigh(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<short> MultiplyHighRoundScale(Vector512<short> left, Vector512<short> right);
        public static Vector512<short> MultiplyLow(Vector512<short> left, Vector512<short> right);
        public static Vector512<ushort> MultiplyLow(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<sbyte> PackSignedSaturate(Vector512<short> left, Vector512<short> right);
        public static Vector512<short> PackSignedSaturate(Vector512<int> left, Vector512<int> right);
        public static Vector512<byte> PackUnsignedSaturate(Vector512<short> left, Vector512<short> right);
        public static Vector512<ushort> PackUnsignedSaturate(Vector512<int> left, Vector512<int> right);
        public static Vector512<short> PermuteVar32x16(Vector512<short> left, Vector512<short> control);
        public static Vector512<ushort> PermuteVar32x16(Vector512<ushort> left, Vector512<ushort> control);
        public static Vector512<short> PermuteVar32x16x2(Vector512<short> lower, Vector512<short> indices, Vector512<short> upper);
        public static Vector512<ushort> PermuteVar32x16x2(Vector512<ushort> lower, Vector512<ushort> indices, Vector512<ushort> upper);
        public static Vector512<short> ShiftLeftLogical(Vector512<short> value, Vector128<short> count);
        public static Vector512<ushort> ShiftLeftLogical(Vector512<ushort> value, Vector128<ushort> count);
        public static Vector512<short> ShiftLeftLogical(Vector512<short> value, [ConstantExpected] byte count);
        public static Vector512<ushort> ShiftLeftLogical(Vector512<ushort> value, [ConstantExpected] byte count);
        public static Vector512<sbyte> ShiftLeftLogical128BitLane(Vector512<sbyte> value, [ConstantExpected] byte numBytes);
        public static Vector512<byte> ShiftLeftLogical128BitLane(Vector512<byte> value, [ConstantExpected] byte numBytes);
        public static Vector512<short> ShiftLeftLogicalVariable(Vector512<short> value, Vector512<ushort> count);
        public static Vector512<ushort> ShiftLeftLogicalVariable(Vector512<ushort> value, Vector512<ushort> count);
        public static Vector512<short> ShiftRightArithmetic(Vector512<short> value, Vector128<short> count);
        public static Vector512<short> ShiftRightArithmetic(Vector512<short> value, [ConstantExpected] byte count);
        public static Vector512<short> ShiftRightArithmeticVariable(Vector512<short> value, Vector512<ushort> count);
        public static Vector512<short> ShiftRightLogical(Vector512<short> value, Vector128<short> count);
        public static Vector512<ushort> ShiftRightLogical(Vector512<ushort> value, Vector128<ushort> count);
        public static Vector512<short> ShiftRightLogical(Vector512<short> value, [ConstantExpected] byte count);
        public static Vector512<ushort> ShiftRightLogical(Vector512<ushort> value, [ConstantExpected] byte count);
        public static Vector512<sbyte> ShiftRightLogical128BitLane(Vector512<sbyte> value, [ConstantExpected] byte numBytes);
        public static Vector512<byte> ShiftRightLogical128BitLane(Vector512<byte> value, [ConstantExpected] byte numBytes);
        public static Vector512<short> ShiftRightLogicalVariable(Vector512<short> value, Vector512<ushort> count);
        public static Vector512<ushort> ShiftRightLogicalVariable(Vector512<ushort> value, Vector512<ushort> count);
        public static Vector512<sbyte> Shuffle(Vector512<sbyte> value, Vector512<sbyte> mask);
        public static Vector512<byte> Shuffle(Vector512<byte> value, Vector512<byte> mask);
        public static Vector512<short> ShuffleHigh(Vector512<short> value, [ConstantExpected] byte control);
        public static Vector512<ushort> ShuffleHigh(Vector512<ushort> value, [ConstantExpected] byte control);
        public static Vector512<short> ShuffleLow(Vector512<short> value, [ConstantExpected] byte control);
        public static Vector512<ushort> ShuffleLow(Vector512<ushort> value, [ConstantExpected] byte control);
        public static new unsafe void Store(sbyte* address, Vector512<sbyte> source);
        public static new unsafe void Store(byte* address, Vector512<byte> source);
        public static new unsafe void Store(short* address, Vector512<short> source);
        public static new unsafe void Store(ushort* address, Vector512<ushort> source);
        public static Vector512<sbyte> Subtract(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<byte> Subtract(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<short> Subtract(Vector512<short> left, Vector512<short> right);
        public static Vector512<ushort> Subtract(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<sbyte> SubtractSaturate(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<short> SubtractSaturate(Vector512<short> left, Vector512<short> right);
        public static Vector512<byte> SubtractSaturate(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<ushort> SubtractSaturate(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<ushort> SumAbsoluteDifferences(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<ushort> SumAbsoluteDifferencesInBlock32(Vector512<byte> left, Vector512<byte> right, [ConstantExpected] byte control);
        public static Vector512<sbyte> UnpackHigh(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<byte> UnpackHigh(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<short> UnpackHigh(Vector512<short> left, Vector512<short> right);
        public static Vector512<ushort> UnpackHigh(Vector512<ushort> left, Vector512<ushort> right);
        public static Vector512<sbyte> UnpackLow(Vector512<sbyte> left, Vector512<sbyte> right);
        public static Vector512<byte> UnpackLow(Vector512<byte> left, Vector512<byte> right);
        public static Vector512<short> UnpackLow(Vector512<short> left, Vector512<short> right);
        public static Vector512<ushort> UnpackLow(Vector512<ushort> left, Vector512<ushort> right);

        /// From AVX512CD
        public static Vector512<int> DetectConflicts(Vector512<int> value);
        public static Vector512<uint> DetectConflicts(Vector512<uint> value);
        public static Vector512<long> DetectConflicts(Vector512<long> value);
        public static Vector512<ulong> DetectConflicts(Vector512<ulong> value);
        public static Vector512<int> LeadingZeroCount(Vector512<int> value);
        public static Vector512<uint> LeadingZeroCount(Vector512<uint> value);
        public static Vector512<long> LeadingZeroCount(Vector512<long> value);
        public static Vector512<ulong> LeadingZeroCount(Vector512<ulong> value);

        /// From AVX512DQ
        public static Vector512<float> And(Vector512<float> left, Vector512<float> right);
        public static Vector512<double> And(Vector512<double> left, Vector512<double> right);
        public static Vector512<float> AndNot(Vector512<float> left, Vector512<float> right);
        public static Vector512<double> AndNot(Vector512<double> left, Vector512<double> right);
        public static Vector512<int> BroadcastPairScalarToVector512(Vector128<int> value);
        public static Vector512<uint> BroadcastPairScalarToVector512(Vector128<uint> value);
        public static Vector512<float> BroadcastPairScalarToVector512(Vector128<float> value);
        public static unsafe Vector512<long> BroadcastVector128ToVector512(long* address);
        public static unsafe Vector512<ulong> BroadcastVector128ToVector512(ulong* address);
        public static unsafe Vector512<double> BroadcastVector128ToVector512(double* address);
        public static unsafe Vector512<int> BroadcastVector256ToVector512(int* address);
        public static unsafe Vector512<uint> BroadcastVector256ToVector512(uint* address);
        public static unsafe Vector512<float> BroadcastVector256ToVector512(float* address);
        public static Vector256<float> ConvertToVector256Single(Vector512<long> value);
        public static Vector256<float> ConvertToVector256Single(Vector512<ulong> value);
        public static Vector512<double> ConvertToVector512Double(Vector512<long> value);
        public static Vector512<double> ConvertToVector512Double(Vector512<ulong> value);
        public static Vector512<long> ConvertToVector512Int64(Vector256<float> value);
        public static Vector512<long> ConvertToVector512Int64(Vector512<double> value);
        public static Vector512<long> ConvertToVector512Int64WithTruncation(Vector256<float> value);
        public static Vector512<long> ConvertToVector512Int64WithTruncation(Vector512<double> value);
        public static Vector512<ulong> ConvertToVector512UInt64(Vector256<float> value);
        public static Vector512<ulong> ConvertToVector512UInt64(Vector512<double> value);
        public static Vector512<ulong> ConvertToVector512UInt64WithTruncation(Vector256<float> value);
        public static Vector512<ulong> ConvertToVector512UInt64WithTruncation(Vector512<double> value);
        public static new Vector128<long> ExtractVector128(Vector512<long> value, [ConstantExpected] byte index);
        public static new Vector128<ulong> ExtractVector128(Vector512<ulong> value, [ConstantExpected] byte index);
        public static new Vector128<double> ExtractVector128(Vector512<double> value, [ConstantExpected] byte index);
        public static new Vector256<int> ExtractVector256(Vector512<int> value, [ConstantExpected] byte index);
        public static new Vector256<uint> ExtractVector256(Vector512<uint> value, [ConstantExpected] byte index);
        public static new Vector256<float> ExtractVector256(Vector512<float> value, [ConstantExpected] byte index);
        public static new Vector512<long> InsertVector128(Vector512<long> value, Vector128<long> data, [ConstantExpected] byte index);
        public static new Vector512<ulong> InsertVector128(Vector512<ulong> value, Vector128<ulong> data, [ConstantExpected] byte index);
        public static new Vector512<double> InsertVector128(Vector512<double> value, Vector128<double> data, [ConstantExpected] byte index);
        public static new Vector512<int> InsertVector256(Vector512<int> value, Vector256<int> data, [ConstantExpected] byte index);
        public static new Vector512<uint> InsertVector256(Vector512<uint> value, Vector256<uint> data, [ConstantExpected] byte index);
        public static new Vector512<float> InsertVector256(Vector512<float> value, Vector256<float> data, [ConstantExpected] byte index);
        public static Vector512<long> MultiplyLow(Vector512<long> left, Vector512<long> right);
        public static Vector512<ulong> MultiplyLow(Vector512<ulong> left, Vector512<ulong> right);
        public static Vector512<float> Or(Vector512<float> left, Vector512<float> right);
        public static Vector512<double> Or(Vector512<double> left, Vector512<double> right);
        public static Vector512<float> Range(Vector512<float> left, Vector512<float> right, [ConstantExpected(Max = (byte)(0x0F))] byte control);
        public static Vector512<double> Range(Vector512<double> left, Vector512<double> right, [ConstantExpected(Max = (byte)(0x0F))] byte control);
        public static Vector512<float> Reduce(Vector512<float> value, [ConstantExpected] byte control);
        public static Vector512<double> Reduce(Vector512<double> value, [ConstantExpected] byte control);
        public static Vector512<float> Xor(Vector512<float> left, Vector512<float> right);
        public static Vector512<double> Xor(Vector512<double> left, Vector512<double> right);

        /// From AVX512Vbmi
        public static Vector512<sbyte> PermuteVar64x8(Vector512<sbyte> left, Vector512<sbyte> control);
        public static Vector512<byte> PermuteVar64x8(Vector512<byte> left, Vector512<byte> control);
        public static Vector512<byte> PermuteVar64x8x2(Vector512<byte> lower, Vector512<byte> indices, Vector512<byte> upper);
        public static Vector512<sbyte> PermuteVar64x8x2(Vector512<sbyte> lower, Vector512<sbyte> indices, Vector512<sbyte> upper);   
    }
}

Notes

The proposed API surface area above has support for extensions for whom we currently expose a AVX512* API. This includes the following : AVX512F, AVX512BW, AVX512DQ, AVX512CD, AVX512_VBMI

The remaining APIs will be added as we expose them.

@MichalPetryka
Copy link
Contributor

MichalPetryka commented Feb 15, 2024

where all current AVX512VL family instructions are consolidated under the top level Avx10v1 and V256

Doesn't this fragment of the spec:

While Intel AVX10/512 includes all Intel AVX-512 instructions, it important to note that applications compiled to Intel AVX-512 with vector length limited to 256 bits are not guaranteed to be compatible on an Intel AVX10/256 processor.

say that not all VL instructions can be exposed this way?

@anthonycanino
Copy link
Contributor Author

anthonycanino commented Feb 15, 2024

The incompatibility is due to kmask registers are 32-bit in length on AVX10/256:

(EDIT: This was from the outdated spec, in the up-to-date one, all kmask registers are 64-bit)

image

The JIT hides this abstraction, though it is a point to consider/double check. @tannergooding I don't think this would pose a problem even if kmask support is exposed up to a user API as we had planned on exposing it with VectorMask256 or even Vector256 for a 256-bit mask, which the JIT would handle.

(EDIT: Attached below is the current table)

image

@MichalPetryka
Copy link
Contributor

MichalPetryka commented Feb 15, 2024

Where is this table from? The 2.0 spec linked in the issue says 64bit:
image

The JIT hides this abstraction, though it is a point to consider/double check.

I think AOT scenarios would possibly be more of an issue for stuff like this.

@anthonycanino
Copy link
Contributor Author

My apologies, I had used an outdated spec.

Regardless, I do not think this is an issue. What the spec is stating, is that if one takes an existing AVX512 application where all uses of vectors are limited to 256-bit, that code is not guaranteed to be compatible with AVX10/256. However, from an API standpoint, the set of available ISAs are the same.

We can limit the exposed ISAs though if we do find a case that is an issue, though I am currently unaware of any.

@tannergooding
Copy link
Member

I don't think this would pose a problem even if kmask support is exposed up to a user API as we had planned on exposing it with VectorMask256 or even Vector256 for a 256-bit mask, which the JIT would handle.

Right. I think this is really only a concern for R2R/AOT and whether we can consider a binary that was compiled for AVX512 but which only used V128/V256 as "compatible with" hardware that only supports AVX10/V256

I think the simplest answer today is "no" because we haven't done any tracking on maximum vector use size, only what ISAs are required. For .NET 9+, I imagine we may need to track the maximum used vector size and it gets a little more interesting.

@tannergooding
Copy link
Member

A few small notes:

  1. We will want to move the API surface proposed into the top post before marking ready-for-review
  2. We should remove the duplication between API Surface Area and V512 Surface Area. That is, it will make it easier to review if V512 Surface Area is only the additions on top of API Surface Area. Right now APIs like Vector128<ulong> Abs(Vector128<long> value) are listed in both.
  3. We should ensure that the inheritance hierarchy is correct. Thus Avx10v1 should inherit from Avx2 and we should call out any additional implications that can't be covered by the hierarchy (such as Fma). We should do the same for V256 and for V512

@BruceForstall BruceForstall added the avx10 Related to the AVX10 architecture label Feb 28, 2024
@tannergooding tannergooding added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation untriaged New issue has not been triaged by the area owner labels Feb 29, 2024
@terrajobst
Copy link
Member

terrajobst commented Feb 29, 2024

Video

  • Looks good as proposed
  • We decided that Avx10v1.V512 should have the full Avx512* API surface
    • Easiest to make it extend Avx512BW and copy in the others
namespace System.Runtime.Intrinsics.X86;

public abstract class Avx10v1 : Avx2
{
    public static new bool IsSupported { get; }
    
    /// From AVX512F VL
    public static Vector128<ulong> Abs(Vector128<long> value);
    public static Vector128<int> AlignRight32(Vector128<int> left, Vector128<int> right, [ConstantExpected] byte mask);
    public static Vector128<uint> AlignRight32(Vector128<uint> left, Vector128<uint> right, [ConstantExpected] byte mask);
    public static Vector128<long> AlignRight64(Vector128<long> left, Vector128<long> right, [ConstantExpected] byte mask);
    public static Vector128<ulong> AlignRight64(Vector128<ulong> left, Vector128<ulong> right, [ConstantExpected] byte mask);
    public static Vector128<int> CompareGreaterThanOrEqual(Vector128<int> left, Vector128<int> right);
    public static Vector128<int> CompareLessThan(Vector128<int> left, Vector128<int> right);
    public static Vector128<int> CompareLessThanOrEqual(Vector128<int> left, Vector128<int> right);
    public static Vector128<int> CompareNotEqual(Vector128<int> left, Vector128<int> right);
    public static Vector128<long> CompareGreaterThanOrEqual(Vector128<long> left, Vector128<long> right);
    public static Vector128<long> CompareLessThan(Vector128<long> left, Vector128<long> right);
    public static Vector128<long> CompareLessThanOrEqual(Vector128<long> left, Vector128<long> right);
    public static Vector128<long> CompareNotEqual(Vector128<long> left, Vector128<long> right);
    public static Vector128<uint> CompareGreaterThan(Vector128<uint> left, Vector128<uint> right);
    public static Vector128<uint> CompareGreaterThanOrEqual(Vector128<uint> left, Vector128<uint> right);
    public static Vector128<uint> CompareLessThan(Vector128<uint> left, Vector128<uint> right);
    public static Vector128<uint> CompareLessThanOrEqual(Vector128<uint> left, Vector128<uint> right);
    public static Vector128<uint> CompareNotEqual(Vector128<uint> left, Vector128<uint> right);
    public static Vector128<ulong> CompareGreaterThan(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<ulong> CompareGreaterThanOrEqual(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<ulong> CompareLessThan(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<ulong> CompareLessThanOrEqual(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<ulong> CompareNotEqual(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<byte> ConvertToVector128Byte(Vector128<int> value);
    public static Vector128<byte> ConvertToVector128Byte(Vector128<long> value);
    public static Vector128<byte> ConvertToVector128Byte(Vector128<uint> value);
    public static Vector128<byte> ConvertToVector128Byte(Vector128<ulong> value);
    public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector128<uint> value);
    public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector128<ulong> value);
    public static Vector128<double> ConvertToVector128Double(Vector128<uint> value);
    public static Vector128<short> ConvertToVector128Int16(Vector128<int> value);
    public static Vector128<short> ConvertToVector128Int16(Vector128<long> value);
    public static Vector128<short> ConvertToVector128Int16(Vector128<uint> value);
    public static Vector128<short> ConvertToVector128Int16(Vector128<ulong> value);
    public static Vector128<short> ConvertToVector128Int16WithSaturation(Vector128<int> value);
    public static Vector128<short> ConvertToVector128Int16WithSaturation(Vector128<long> value);
    public static Vector128<int> ConvertToVector128Int32(Vector128<long> value);
    public static Vector128<int> ConvertToVector128Int32(Vector128<ulong> value);
    public static Vector128<int> ConvertToVector128Int32WithSaturation(Vector128<long> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector128<int> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector128<long> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector128<uint> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector128<ulong> value);
    public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector128<int> value);
    public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector128<long> value);
    public static Vector128<float> ConvertToVector128Single(Vector128<uint> value);
    public static Vector128<ushort> ConvertToVector128UInt16(Vector128<int> value);
    public static Vector128<ushort> ConvertToVector128UInt16(Vector128<long> value);
    public static Vector128<ushort> ConvertToVector128UInt16(Vector128<uint> value);
    public static Vector128<ushort> ConvertToVector128UInt16(Vector128<ulong> value);
    public static Vector128<ushort> ConvertToVector128UInt16WithSaturation(Vector128<uint> value);
    public static Vector128<ushort> ConvertToVector128UInt16WithSaturation(Vector128<ulong> value);
    public static Vector128<uint> ConvertToVector128UInt32(Vector128<long> value);
    public static Vector128<uint> ConvertToVector128UInt32(Vector128<ulong> value);
    public static Vector128<uint> ConvertToVector128UInt32(Vector128<float> value);
    public static Vector128<uint> ConvertToVector128UInt32(Vector128<double> value);
    public static Vector128<uint> ConvertToVector128UInt32WithSaturation(Vector128<ulong> value);
    public static Vector128<uint> ConvertToVector128UInt32WithTruncation(Vector128<float> value);
    public static Vector128<uint> ConvertToVector128UInt32WithTruncation(Vector128<double> value);
    public static Vector128<float> Fixup(Vector128<float> left, Vector128<float> right, Vector128<int> table, [ConstantExpected] byte control);
    public static Vector128<double> Fixup(Vector128<double> left, Vector128<double> right, Vector128<long> table, [ConstantExpected] byte control);
    public static Vector128<float> GetExponent(Vector128<float> value);
    public static Vector128<double> GetExponent(Vector128<double> value);
    public static Vector128<float> GetMantissa(Vector128<float> value, [ConstantExpected(Max = (byte)(0x0F))] byte control);
    public static Vector128<double> GetMantissa(Vector128<double> value, [ConstantExpected(Max = (byte)(0x0F))] byte control);
    public static Vector128<long> Max(Vector128<long> left, Vector128<long> right);
    public static Vector128<ulong> Max(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<long> Min(Vector128<long> left, Vector128<long> right);
    public static Vector128<ulong> Min(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<long> PermuteVar2x64x2(Vector128<long> lower, Vector128<long> indices, Vector128<long> upper);
    public static Vector128<ulong> PermuteVar2x64x2(Vector128<ulong> lower, Vector128<ulong> indices, Vector128<ulong> upper);
    public static Vector128<double> PermuteVar2x64x2(Vector128<double> lower, Vector128<long> indices, Vector128<double> upper);
    public static Vector128<int> PermuteVar4x32x2(Vector128<int> lower, Vector128<int> indices, Vector128<int> upper);
    public static Vector128<uint> PermuteVar4x32x2(Vector128<uint> lower, Vector128<uint> indices, Vector128<uint> upper);
    public static Vector128<float> PermuteVar4x32x2(Vector128<float> lower, Vector128<int> indices, Vector128<float> upper);
    public static Vector128<float> Reciprocal14(Vector128<float> value);
    public static Vector128<double> Reciprocal14(Vector128<double> value);
    public static Vector128<float> ReciprocalSqrt14(Vector128<float> value);
    public static Vector128<double> ReciprocalSqrt14(Vector128<double> value);
    public static Vector128<int> RotateLeft(Vector128<int> value, [ConstantExpected] byte count);
    public static Vector128<uint> RotateLeft(Vector128<uint> value, [ConstantExpected] byte count);
    public static Vector128<long> RotateLeft(Vector128<long> value, [ConstantExpected] byte count);
    public static Vector128<ulong> RotateLeft(Vector128<ulong> value, [ConstantExpected] byte count);
    public static Vector128<int> RotateLeftVariable(Vector128<int> value, Vector128<uint> count);
    public static Vector128<uint> RotateLeftVariable(Vector128<uint> value, Vector128<uint> count);
    public static Vector128<long> RotateLeftVariable(Vector128<long> value, Vector128<ulong> count);
    public static Vector128<ulong> RotateLeftVariable(Vector128<ulong> value, Vector128<ulong> count);
    public static Vector128<int> RotateRight(Vector128<int> value, [ConstantExpected] byte count);
    public static Vector128<uint> RotateRight(Vector128<uint> value, [ConstantExpected] byte count);
    public static Vector128<long> RotateRight(Vector128<long> value, [ConstantExpected] byte count);
    public static Vector128<ulong> RotateRight(Vector128<ulong> value, [ConstantExpected] byte count);
    public static Vector128<int> RotateRightVariable(Vector128<int> value, Vector128<uint> count);
    public static Vector128<uint> RotateRightVariable(Vector128<uint> value, Vector128<uint> count);
    public static Vector128<long> RotateRightVariable(Vector128<long> value, Vector128<ulong> count);
    public static Vector128<ulong> RotateRightVariable(Vector128<ulong> value, Vector128<ulong> count);
    public static Vector128<float> RoundScale(Vector128<float> value, [ConstantExpected] byte control);
    public static Vector128<double> RoundScale(Vector128<double> value, [ConstantExpected] byte control);
    public static Vector128<float> Scale(Vector128<float> left, Vector128<float> right);
    public static Vector128<double> Scale(Vector128<double> left, Vector128<double> right);
    public static Vector128<long> ShiftRightArithmetic(Vector128<long> value, Vector128<long> count);
    public static Vector128<long> ShiftRightArithmetic(Vector128<long> value, [ConstantExpected] byte count);
    public static Vector128<long> ShiftRightArithmeticVariable(Vector128<long> value, Vector128<ulong> count);
    public static Vector128<sbyte> TernaryLogic(Vector128<sbyte> a, Vector128<sbyte> b, Vector128<sbyte> c, [ConstantExpected] byte control);
    public static Vector128<byte> TernaryLogic(Vector128<byte> a, Vector128<byte> b, Vector128<byte> c, [ConstantExpected] byte control);
    public static Vector128<short> TernaryLogic(Vector128<short> a, Vector128<short> b, Vector128<short> c, [ConstantExpected] byte control);
    public static Vector128<ushort> TernaryLogic(Vector128<ushort> a, Vector128<ushort> b, Vector128<ushort> c, [ConstantExpected] byte control);
    public static Vector128<int> TernaryLogic(Vector128<int> a, Vector128<int> b, Vector128<int> c, [ConstantExpected] byte control);
    public static Vector128<uint> TernaryLogic(Vector128<uint> a, Vector128<uint> b, Vector128<uint> c, [ConstantExpected] byte control);
    public static Vector128<long> TernaryLogic(Vector128<long> a, Vector128<long> b, Vector128<long> c, [ConstantExpected] byte control);
    public static Vector128<ulong> TernaryLogic(Vector128<ulong> a, Vector128<ulong> b, Vector128<ulong> c, [ConstantExpected] byte control);
    public static Vector128<float> TernaryLogic(Vector128<float> a, Vector128<float> b, Vector128<float> c, [ConstantExpected] byte control);
    public static Vector128<double> TernaryLogic(Vector128<double> a, Vector128<double> b, Vector128<double> c, [ConstantExpected] byte control);

    /// From AVX512BW VL
    public static Vector128<byte> CompareGreaterThan(Vector128<byte> left, Vector128<byte> right);
    public static Vector128<byte> CompareGreaterThanOrEqual(Vector128<byte> left, Vector128<byte> right);
    public static Vector128<byte> CompareLessThan(Vector128<byte> left, Vector128<byte> right);
    public static Vector128<byte> CompareLessThanOrEqual(Vector128<byte> left, Vector128<byte> right);
    public static Vector128<byte> CompareNotEqual(Vector128<byte> left, Vector128<byte> right);
    public static Vector128<short> CompareGreaterThanOrEqual(Vector128<short> left, Vector128<short> right);
    public static Vector128<short> CompareLessThan(Vector128<short> left, Vector128<short> right);
    public static Vector128<short> CompareLessThanOrEqual(Vector128<short> left, Vector128<short> right);
    public static Vector128<short> CompareNotEqual(Vector128<short> left, Vector128<short> right);
    public static Vector128<sbyte> CompareGreaterThanOrEqual(Vector128<sbyte> left, Vector128<sbyte> right);
    public static Vector128<sbyte> CompareLessThan(Vector128<sbyte> left, Vector128<sbyte> right);
    public static Vector128<sbyte> CompareLessThanOrEqual(Vector128<sbyte> left, Vector128<sbyte> right);
    public static Vector128<sbyte> CompareNotEqual(Vector128<sbyte> left, Vector128<sbyte> right);
    public static Vector128<ushort> CompareGreaterThan(Vector128<ushort> left, Vector128<ushort> right);
    public static Vector128<ushort> CompareGreaterThanOrEqual(Vector128<ushort> left, Vector128<ushort> right);
    public static Vector128<ushort> CompareLessThan(Vector128<ushort> left, Vector128<ushort> right);
    public static Vector128<ushort> CompareLessThanOrEqual(Vector128<ushort> left, Vector128<ushort> right);
    public static Vector128<ushort> CompareNotEqual(Vector128<ushort> left, Vector128<ushort> right);
    public static Vector128<byte> ConvertToVector128Byte(Vector128<short> value);
    public static Vector128<byte> ConvertToVector128Byte(Vector128<ushort> value);
    public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector128<ushort> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector128<short> value);
    public static Vector128<sbyte> ConvertToVector128SByte(Vector128<ushort> value);
    public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector128<short> value);
    public static Vector128<short> PermuteVar8x16(Vector128<short> left, Vector128<short> control);
    public static Vector128<ushort> PermuteVar8x16(Vector128<ushort> left, Vector128<ushort> control);
    public static Vector128<short> PermuteVar8x16x2(Vector128<short> lower, Vector128<short> indices, Vector128<short> upper);
    public static Vector128<ushort> PermuteVar8x16x2(Vector128<ushort> lower, Vector128<ushort> indices, Vector128<ushort> upper);
    public static Vector128<short> ShiftLeftLogicalVariable(Vector128<short> value, Vector128<ushort> count);
    public static Vector128<ushort> ShiftLeftLogicalVariable(Vector128<ushort> value, Vector128<ushort> count);
    public static Vector128<short> ShiftRightArithmeticVariable(Vector128<short> value, Vector128<ushort> count);
    public static Vector128<short> ShiftRightLogicalVariable(Vector128<short> value, Vector128<ushort> count);
    public static Vector128<ushort> ShiftRightLogicalVariable(Vector128<ushort> value, Vector128<ushort> count);
    public static Vector128<ushort> SumAbsoluteDifferencesInBlock32(Vector128<byte> left, Vector128<byte> right, [ConstantExpected] byte control);

    /// From AVX512CD VL
    public static Vector128<int> DetectConflicts(Vector128<int> value);
    public static Vector128<uint> DetectConflicts(Vector128<uint> value);
    public static Vector128<long> DetectConflicts(Vector128<long> value);
    public static Vector128<ulong> DetectConflicts(Vector128<ulong> value);
    public static Vector128<int> LeadingZeroCount(Vector128<int> value);
    public static Vector128<uint> LeadingZeroCount(Vector128<uint> value);
    public static Vector128<long> LeadingZeroCount(Vector128<long> value);
    public static Vector128<ulong> LeadingZeroCount(Vector128<ulong> value);

    /// From AVX512DQ VL
    public static Vector128<int> BroadcastPairScalarToVector128(Vector128<int> value);
    public static Vector128<uint> BroadcastPairScalarToVector128(Vector128<uint> value);
    public static Vector128<double> ConvertToVector128Double(Vector128<long> value);
    public static Vector128<double> ConvertToVector128Double(Vector128<ulong> value);
    public static Vector128<long> ConvertToVector128Int64(Vector128<float> value);
    public static Vector128<long> ConvertToVector128Int64(Vector128<double> value);
    public static Vector128<long> ConvertToVector128Int64WithTruncation(Vector128<float> value);
    public static Vector128<long> ConvertToVector128Int64WithTruncation(Vector128<double> value);
    public static Vector128<float> ConvertToVector128Single(Vector128<long> value);
    public static Vector128<float> ConvertToVector128Single(Vector128<ulong> value);
    public static Vector128<ulong> ConvertToVector128UInt64(Vector128<float> value);
    public static Vector128<ulong> ConvertToVector128UInt64(Vector128<double> value);
    public static Vector128<ulong> ConvertToVector128UInt64WithTruncation(Vector128<float> value);
    public static Vector128<ulong> ConvertToVector128UInt64WithTruncation(Vector128<double> value);
    public static Vector128<long> MultiplyLow(Vector128<long> left, Vector128<long> right);
    public static Vector128<ulong> MultiplyLow(Vector128<ulong> left, Vector128<ulong> right);
    public static Vector128<float> Range(Vector128<float> left, Vector128<float> right, [ConstantExpected(Max = (byte)(0x0F))] byte control);
    public static Vector128<double> Range(Vector128<double> left, Vector128<double> right, [ConstantExpected(Max = (byte)(0x0F))] byte control);
    public static Vector128<float> Reduce(Vector128<float> value, [ConstantExpected] byte control);
    public static Vector128<double> Reduce(Vector128<double> value, [ConstantExpected] byte control);

    /// From AVX512_Vbmi_VL
    public static Vector128<sbyte> PermuteVar16x8(Vector128<sbyte> left, Vector128<sbyte> control);
    public static Vector128<byte> PermuteVar16x8(Vector128<byte> left, Vector128<byte> control);
    public static Vector128<byte> PermuteVar16x8x2(Vector128<byte> lower, Vector128<byte> indices, Vector128<byte> upper);
    public static Vector128<sbyte> PermuteVar16x8x2(Vector128<sbyte> lower, Vector128<sbyte> indices, Vector128<sbyte> upper);
       
    public abstract class V256 : Avx2
    {
        public static new bool IsSupported { get; }
        
        /// From AVX512F VL
        public static Vector256<ulong> Abs(Vector256<long> value);
        public static Vector256<int> AlignRight32(Vector256<int> left, Vector256<int> right, [ConstantExpected] byte mask);
        public static Vector256<uint> AlignRight32(Vector256<uint> left, Vector256<uint> right, [ConstantExpected] byte mask);
        public static Vector256<long> AlignRight64(Vector256<long> left, Vector256<long> right, [ConstantExpected] byte mask);
        public static Vector256<ulong> AlignRight64(Vector256<ulong> left, Vector256<ulong> right, [ConstantExpected] byte mask);
        public static Vector256<int> CompareGreaterThanOrEqual(Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareLessThan(Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareLessThanOrEqual(Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareNotEqual(Vector256<int> left, Vector256<int> right);
        public static Vector256<long> CompareGreaterThanOrEqual(Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareLessThan(Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareLessThanOrEqual(Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareNotEqual(Vector256<long> left, Vector256<long> right);
        public static Vector256<uint> CompareGreaterThan(Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareGreaterThanOrEqual(Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareLessThan(Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareLessThanOrEqual(Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareNotEqual(Vector256<uint> left, Vector256<uint> right);
        public static Vector256<ulong> CompareGreaterThan(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareGreaterThanOrEqual(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareLessThan(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareLessThanOrEqual(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareNotEqual(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector128<byte> ConvertToVector128Byte(Vector256<int> value);
        public static Vector128<byte> ConvertToVector128Byte(Vector256<long> value);
        public static Vector128<byte> ConvertToVector128Byte(Vector256<uint> value);
        public static Vector128<byte> ConvertToVector128Byte(Vector256<ulong> value);
        public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector256<uint> value);
        public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector256<ulong> value);
        public static Vector128<short> ConvertToVector128Int16(Vector256<int> value);
        public static Vector128<short> ConvertToVector128Int16(Vector256<long> value);
        public static Vector128<short> ConvertToVector128Int16(Vector256<uint> value);
        public static Vector128<short> ConvertToVector128Int16(Vector256<ulong> value);
        public static Vector128<short> ConvertToVector128Int16WithSaturation(Vector256<int> value);
        public static Vector128<short> ConvertToVector128Int16WithSaturation(Vector256<long> value);
        public static Vector128<int> ConvertToVector128Int32(Vector256<long> value);
        public static Vector128<int> ConvertToVector128Int32(Vector256<ulong> value);
        public static Vector128<int> ConvertToVector128Int32WithSaturation(Vector256<long> value);
        public static Vector128<sbyte> ConvertToVector128SByte(Vector256<int> value);
        public static Vector128<sbyte> ConvertToVector128SByte(Vector256<long> value);
        public static Vector128<sbyte> ConvertToVector128SByte(Vector256<uint> value);
        public static Vector128<sbyte> ConvertToVector128SByte(Vector256<ulong> value);
        public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector256<int> value);
        public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector256<long> value);
        public static Vector128<ushort> ConvertToVector128UInt16(Vector256<int> value);
        public static Vector128<ushort> ConvertToVector128UInt16(Vector256<long> value);
        public static Vector128<ushort> ConvertToVector128UInt16(Vector256<uint> value);
        public static Vector128<ushort> ConvertToVector128UInt16(Vector256<ulong> value);
        public static Vector128<ushort> ConvertToVector128UInt16WithSaturation(Vector256<uint> value);
        public static Vector128<ushort> ConvertToVector128UInt16WithSaturation(Vector256<ulong> value);
        public static Vector128<uint> ConvertToVector128UInt32(Vector256<long> value);
        public static Vector128<uint> ConvertToVector128UInt32(Vector256<ulong> value);
        public static Vector128<uint> ConvertToVector128UInt32(Vector256<double> value);
        public static Vector128<uint> ConvertToVector128UInt32WithSaturation(Vector256<ulong> value);
        public static Vector128<uint> ConvertToVector128UInt32WithTruncation(Vector256<double> value);
        public static Vector256<double> ConvertToVector256Double(Vector128<uint> value);
        public static Vector256<float> ConvertToVector256Single(Vector256<uint> value);
        public static Vector256<uint> ConvertToVector256UInt32(Vector256<float> value);
        public static Vector256<uint> ConvertToVector256UInt32WithTruncation(Vector256<float> value);
        public static Vector256<float> Fixup(Vector256<float> left, Vector256<float> right, Vector256<int> table, [ConstantExpected] byte control);
        public static Vector256<double> Fixup(Vector256<double> left, Vector256<double> right, Vector256<long> table, [ConstantExpected] byte control);
        public static Vector256<float> GetExponent(Vector256<float> value);
        public static Vector256<double> GetExponent(Vector256<double> value);
        public static Vector256<float> GetMantissa(Vector256<float> value, [ConstantExpected(Max = (byte)(0x0F))] byte control);
        public static Vector256<double> GetMantissa(Vector256<double> value, [ConstantExpected(Max = (byte)(0x0F))] byte control);
        public static Vector256<long> Max(Vector256<long> left, Vector256<long> right);
        public static Vector256<ulong> Max(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<long> Min(Vector256<long> left, Vector256<long> right);
        public static Vector256<ulong> Min(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<long> PermuteVar4x64(Vector256<long> value, Vector256<long> control);
        public static Vector256<ulong> PermuteVar4x64(Vector256<ulong> value, Vector256<ulong> control);
        public static Vector256<double> PermuteVar4x64(Vector256<double> value, Vector256<long> control);
        public static Vector256<long> PermuteVar4x64x2(Vector256<long> lower, Vector256<long> indices, Vector256<long> upper);
        public static Vector256<ulong> PermuteVar4x64x2(Vector256<ulong> lower, Vector256<ulong> indices, Vector256<ulong> upper);
        public static Vector256<double> PermuteVar4x64x2(Vector256<double> lower, Vector256<long> indices, Vector256<double> upper);
        public static Vector256<int> PermuteVar8x32x2(Vector256<int> lower, Vector256<int> indices, Vector256<int> upper);
        public static Vector256<uint> PermuteVar8x32x2(Vector256<uint> lower, Vector256<uint> indices, Vector256<uint> upper);
        public static Vector256<float> PermuteVar8x32x2(Vector256<float> lower, Vector256<int> indices, Vector256<float> upper);
        public static Vector256<float> Reciprocal14(Vector256<float> value);
        public static Vector256<double> Reciprocal14(Vector256<double> value);
        public static Vector256<float> ReciprocalSqrt14(Vector256<float> value);
        public static Vector256<double> ReciprocalSqrt14(Vector256<double> value);
        public static Vector256<int> RotateLeft(Vector256<int> value, [ConstantExpected] byte count);
        public static Vector256<uint> RotateLeft(Vector256<uint> value, [ConstantExpected] byte count);
        public static Vector256<long> RotateLeft(Vector256<long> value, [ConstantExpected] byte count);
        public static Vector256<ulong> RotateLeft(Vector256<ulong> value, [ConstantExpected] byte count);
        public static Vector256<int> RotateLeftVariable(Vector256<int> value, Vector256<uint> count);
        public static Vector256<uint> RotateLeftVariable(Vector256<uint> value, Vector256<uint> count);
        public static Vector256<long> RotateLeftVariable(Vector256<long> value, Vector256<ulong> count);
        public static Vector256<ulong> RotateLeftVariable(Vector256<ulong> value, Vector256<ulong> count);
        public static Vector256<int> RotateRight(Vector256<int> value, [ConstantExpected] byte count);
        public static Vector256<uint> RotateRight(Vector256<uint> value, [ConstantExpected] byte count);
        public static Vector256<long> RotateRight(Vector256<long> value, [ConstantExpected] byte count);
        public static Vector256<ulong> RotateRight(Vector256<ulong> value, [ConstantExpected] byte count);
        public static Vector256<int> RotateRightVariable(Vector256<int> value, Vector256<uint> count);
        public static Vector256<uint> RotateRightVariable(Vector256<uint> value, Vector256<uint> count);
        public static Vector256<long> RotateRightVariable(Vector256<long> value, Vector256<ulong> count);
        public static Vector256<ulong> RotateRightVariable(Vector256<ulong> value, Vector256<ulong> count);
        public static Vector256<float> RoundScale(Vector256<float> value, [ConstantExpected] byte control);
        public static Vector256<double> RoundScale(Vector256<double> value, [ConstantExpected] byte control);
        public static Vector256<float> Scale(Vector256<float> left, Vector256<float> right);
        public static Vector256<double> Scale(Vector256<double> left, Vector256<double> right);
        public static Vector256<long> ShiftRightArithmetic(Vector256<long> value, Vector128<long> count);
        public static Vector256<long> ShiftRightArithmetic(Vector256<long> value, [ConstantExpected] byte count);
        public static Vector256<long> ShiftRightArithmeticVariable(Vector256<long> value, Vector256<ulong> count);
        public static Vector256<double> Shuffle2x128(Vector256<double> left, Vector256<double> right, [ConstantExpected] byte control);
        public static Vector256<int> Shuffle2x128(Vector256<int> left, Vector256<int> right, [ConstantExpected] byte control);
        public static Vector256<long> Shuffle2x128(Vector256<long> left, Vector256<long> right, [ConstantExpected] byte control);
        public static Vector256<float> Shuffle2x128(Vector256<float> left, Vector256<float> right, [ConstantExpected] byte control);
        public static Vector256<uint> Shuffle2x128(Vector256<uint> left, Vector256<uint> right, [ConstantExpected] byte control);
        public static Vector256<ulong> Shuffle2x128(Vector256<ulong> left, Vector256<ulong> right, [ConstantExpected] byte control);
        public static Vector256<sbyte> TernaryLogic(Vector256<sbyte> a, Vector256<sbyte> b, Vector256<sbyte> c, [ConstantExpected] byte control);
        public static Vector256<byte> TernaryLogic(Vector256<byte> a, Vector256<byte> b, Vector256<byte> c, [ConstantExpected] byte control);
        public static Vector256<short> TernaryLogic(Vector256<short> a, Vector256<short> b, Vector256<short> c, [ConstantExpected] byte control);
        public static Vector256<ushort> TernaryLogic(Vector256<ushort> a, Vector256<ushort> b, Vector256<ushort> c, [ConstantExpected] byte control);
        public static Vector256<int> TernaryLogic(Vector256<int> a, Vector256<int> b, Vector256<int> c, [ConstantExpected] byte control);
        public static Vector256<uint> TernaryLogic(Vector256<uint> a, Vector256<uint> b, Vector256<uint> c, [ConstantExpected] byte control);
        public static Vector256<long> TernaryLogic(Vector256<long> a, Vector256<long> b, Vector256<long> c, [ConstantExpected] byte control);
        public static Vector256<ulong> TernaryLogic(Vector256<ulong> a, Vector256<ulong> b, Vector256<ulong> c, [ConstantExpected] byte control);
        public static Vector256<float> TernaryLogic(Vector256<float> a, Vector256<float> b, Vector256<float> c, [ConstantExpected] byte control);
        public static Vector256<double> TernaryLogic(Vector256<double> a, Vector256<double> b, Vector256<double> c, [ConstantExpected] byte control);

        /// From AVX512BW VL
        public static Vector256<byte> CompareGreaterThan(Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareGreaterThanOrEqual(Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareLessThan(Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareLessThanOrEqual(Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareNotEqual(Vector256<byte> left, Vector256<byte> right);
        public static Vector256<short> CompareGreaterThanOrEqual(Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareLessThan(Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareLessThanOrEqual(Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareNotEqual(Vector256<short> left, Vector256<short> right);
        public static Vector256<sbyte> CompareGreaterThanOrEqual(Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareLessThan(Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareLessThanOrEqual(Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareNotEqual(Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<ushort> CompareGreaterThan(Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareGreaterThanOrEqual(Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareLessThan(Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareLessThanOrEqual(Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareNotEqual(Vector256<ushort> left, Vector256<ushort> right);
        public static Vector128<byte> ConvertToVector128Byte(Vector256<short> value);
        public static Vector128<byte> ConvertToVector128Byte(Vector256<ushort> value);
        public static Vector128<byte> ConvertToVector128ByteWithSaturation(Vector256<ushort> value);
        public static Vector128<sbyte> ConvertToVector128SByte(Vector256<short> value);
        public static Vector128<sbyte> ConvertToVector128SByte(Vector256<ushort> value);
        public static Vector128<sbyte> ConvertToVector128SByteWithSaturation(Vector256<short> value);
        public static Vector256<short> PermuteVar16x16(Vector256<short> left, Vector256<short> control);
        public static Vector256<ushort> PermuteVar16x16(Vector256<ushort> left, Vector256<ushort> control);
        public static Vector256<short> PermuteVar16x16x2(Vector256<short> lower, Vector256<short> indices, Vector256<short> upper);
        public static Vector256<ushort> PermuteVar16x16x2(Vector256<ushort> lower, Vector256<ushort> indices, Vector256<ushort> upper);
        public static Vector256<short> ShiftLeftLogicalVariable(Vector256<short> value, Vector256<ushort> count);
        public static Vector256<ushort> ShiftLeftLogicalVariable(Vector256<ushort> value, Vector256<ushort> count);
        public static Vector256<short> ShiftRightArithmeticVariable(Vector256<short> value, Vector256<ushort> count);
        public static Vector256<short> ShiftRightLogicalVariable(Vector256<short> value, Vector256<ushort> count);
        public static Vector256<ushort> ShiftRightLogicalVariable(Vector256<ushort> value, Vector256<ushort> count);
        public static Vector256<ushort> SumAbsoluteDifferencesInBlock32(Vector256<byte> left, Vector256<byte> right, [ConstantExpected] byte control);

        /// FROM AVX512CD VL
        public static Vector256<int> DetectConflicts(Vector256<int> value);
        public static Vector256<uint> DetectConflicts(Vector256<uint> value);
        public static Vector256<long> DetectConflicts(Vector256<long> value);
        public static Vector256<ulong> DetectConflicts(Vector256<ulong> value);
        public static Vector256<int> LeadingZeroCount(Vector256<int> value);
        public static Vector256<uint> LeadingZeroCount(Vector256<uint> value);
        public static Vector256<long> LeadingZeroCount(Vector256<long> value);
        public static Vector256<ulong> LeadingZeroCount(Vector256<ulong> value);
        
        /// From AVX512DQ VL
        public static Vector256<int> BroadcastPairScalarToVector256(Vector128<int> value);
        public static Vector256<uint> BroadcastPairScalarToVector256(Vector128<uint> value);
        public static Vector256<float> BroadcastPairScalarToVector256(Vector128<float> value);
        public static Vector128<float> ConvertToVector128Single(Vector256<long> value);
        public static Vector128<float> ConvertToVector128Single(Vector256<ulong> value);
        public static Vector256<double> ConvertToVector256Double(Vector256<long> value);
        public static Vector256<double> ConvertToVector256Double(Vector256<ulong> value);
        public static Vector256<long> ConvertToVector256Int64(Vector128<float> value);
        public static Vector256<long> ConvertToVector256Int64(Vector256<double> value);
        public static Vector256<long> ConvertToVector256Int64WithTruncation(Vector128<float> value);
        public static Vector256<long> ConvertToVector256Int64WithTruncation(Vector256<double> value);
        public static Vector256<ulong> ConvertToVector256UInt64(Vector128<float> value);
        public static Vector256<ulong> ConvertToVector256UInt64(Vector256<double> value);
        public static Vector256<ulong> ConvertToVector256UInt64WithTruncation(Vector128<float> value);
        public static Vector256<ulong> ConvertToVector256UInt64WithTruncation(Vector256<double> value);
        public static Vector256<long> MultiplyLow(Vector256<long> left, Vector256<long> right);
        public static Vector256<ulong> MultiplyLow(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<float> Range(Vector256<float> left, Vector256<float> right, [ConstantExpected(Max = (byte)(0x0F))] byte control);
        public static Vector256<double> Range(Vector256<double> left, Vector256<double> right, [ConstantExpected(Max = (byte)(0x0F))] byte control);
        public static Vector256<float> Reduce(Vector256<float> value, [ConstantExpected] byte control);
        public static Vector256<double> Reduce(Vector256<double> value, [ConstantExpected] byte control);

        /// From AVX512_Vbmi_VL
        public static Vector256<sbyte> PermuteVar32x8(Vector256<sbyte> left, Vector256<sbyte> control);
        public static Vector256<byte> PermuteVar32x8(Vector256<byte> left, Vector256<byte> control);
        public static Vector256<byte> PermuteVar32x8x2(Vector256<byte> lower, Vector256<byte> indices, Vector256<byte> upper);
        public static Vector256<sbyte> PermuteVar32x8x2(Vector256<sbyte> lower, Vector256<sbyte> indices, Vector256<sbyte> upper);
    }

    public abstract class V512 : Avx512BW
    {
        public static new bool IsSupported { get; }
        
        /// From AVX512CD
        public static Vector512<int> DetectConflicts(Vector512<int> value);
        public static Vector512<uint> DetectConflicts(Vector512<uint> value);
        public static Vector512<long> DetectConflicts(Vector512<long> value);
        public static Vector512<ulong> DetectConflicts(Vector512<ulong> value);
        public static Vector512<int> LeadingZeroCount(Vector512<int> value);
        public static Vector512<uint> LeadingZeroCount(Vector512<uint> value);
        public static Vector512<long> LeadingZeroCount(Vector512<long> value);
        public static Vector512<ulong> LeadingZeroCount(Vector512<ulong> value);

        /// From AVX512DQ
        public static Vector512<float> And(Vector512<float> left, Vector512<float> right);
        public static Vector512<double> And(Vector512<double> left, Vector512<double> right);
        public static Vector512<float> AndNot(Vector512<float> left, Vector512<float> right);
        public static Vector512<double> AndNot(Vector512<double> left, Vector512<double> right);
        public static Vector512<int> BroadcastPairScalarToVector512(Vector128<int> value);
        public static Vector512<uint> BroadcastPairScalarToVector512(Vector128<uint> value);
        public static Vector512<float> BroadcastPairScalarToVector512(Vector128<float> value);
        public static unsafe Vector512<long> BroadcastVector128ToVector512(long* address);
        public static unsafe Vector512<ulong> BroadcastVector128ToVector512(ulong* address);
        public static unsafe Vector512<double> BroadcastVector128ToVector512(double* address);
        public static unsafe Vector512<int> BroadcastVector256ToVector512(int* address);
        public static unsafe Vector512<uint> BroadcastVector256ToVector512(uint* address);
        public static unsafe Vector512<float> BroadcastVector256ToVector512(float* address);
        public static Vector256<float> ConvertToVector256Single(Vector512<long> value);
        public static Vector256<float> ConvertToVector256Single(Vector512<ulong> value);
        public static Vector512<double> ConvertToVector512Double(Vector512<long> value);
        public static Vector512<double> ConvertToVector512Double(Vector512<ulong> value);
        public static Vector512<long> ConvertToVector512Int64(Vector256<float> value);
        public static Vector512<long> ConvertToVector512Int64(Vector512<double> value);
        public static Vector512<long> ConvertToVector512Int64WithTruncation(Vector256<float> value);
        public static Vector512<long> ConvertToVector512Int64WithTruncation(Vector512<double> value);
        public static Vector512<ulong> ConvertToVector512UInt64(Vector256<float> value);
        public static Vector512<ulong> ConvertToVector512UInt64(Vector512<double> value);
        public static Vector512<ulong> ConvertToVector512UInt64WithTruncation(Vector256<float> value);
        public static Vector512<ulong> ConvertToVector512UInt64WithTruncation(Vector512<double> value);
        public static new Vector128<long> ExtractVector128(Vector512<long> value, [ConstantExpected] byte index);
        public static new Vector128<ulong> ExtractVector128(Vector512<ulong> value, [ConstantExpected] byte index);
        public static new Vector128<double> ExtractVector128(Vector512<double> value, [ConstantExpected] byte index);
        public static new Vector256<int> ExtractVector256(Vector512<int> value, [ConstantExpected] byte index);
        public static new Vector256<uint> ExtractVector256(Vector512<uint> value, [ConstantExpected] byte index);
        public static new Vector256<float> ExtractVector256(Vector512<float> value, [ConstantExpected] byte index);
        public static new Vector512<long> InsertVector128(Vector512<long> value, Vector128<long> data, [ConstantExpected] byte index);
        public static new Vector512<ulong> InsertVector128(Vector512<ulong> value, Vector128<ulong> data, [ConstantExpected] byte index);
        public static new Vector512<double> InsertVector128(Vector512<double> value, Vector128<double> data, [ConstantExpected] byte index);
        public static new Vector512<int> InsertVector256(Vector512<int> value, Vector256<int> data, [ConstantExpected] byte index);
        public static new Vector512<uint> InsertVector256(Vector512<uint> value, Vector256<uint> data, [ConstantExpected] byte index);
        public static new Vector512<float> InsertVector256(Vector512<float> value, Vector256<float> data, [ConstantExpected] byte index);
        public static Vector512<long> MultiplyLow(Vector512<long> left, Vector512<long> right);
        public static Vector512<ulong> MultiplyLow(Vector512<ulong> left, Vector512<ulong> right);
        public static Vector512<float> Or(Vector512<float> left, Vector512<float> right);
        public static Vector512<double> Or(Vector512<double> left, Vector512<double> right);
        public static Vector512<float> Range(Vector512<float> left, Vector512<float> right, [ConstantExpected(Max = (byte)(0x0F))] byte control);
        public static Vector512<double> Range(Vector512<double> left, Vector512<double> right, [ConstantExpected(Max = (byte)(0x0F))] byte control);
        public static Vector512<float> Reduce(Vector512<float> value, [ConstantExpected] byte control);
        public static Vector512<double> Reduce(Vector512<double> value, [ConstantExpected] byte control);
        public static Vector512<float> Xor(Vector512<float> left, Vector512<float> right);
        public static Vector512<double> Xor(Vector512<double> left, Vector512<double> right);

        /// From AVX512Vbmi
        public static Vector512<sbyte> PermuteVar64x8(Vector512<sbyte> left, Vector512<sbyte> control);
        public static Vector512<byte> PermuteVar64x8(Vector512<byte> left, Vector512<byte> control);
        public static Vector512<byte> PermuteVar64x8x2(Vector512<byte> lower, Vector512<byte> indices, Vector512<byte> upper);
        public static Vector512<sbyte> PermuteVar64x8x2(Vector512<sbyte> lower, Vector512<sbyte> indices, Vector512<sbyte> upper);   
    }
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Feb 29, 2024
@tannergooding
Copy link
Member

Closing as this was completed achieving parity with what's exposed for Avx512VL today.

There are additional Avx10v1 APIs to still be exposed when the corresponding Avx512VL support is brought online. For example (this list may be incomplete):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.Intrinsics avx10 Related to the AVX10 architecture
Projects
None yet
Development

No branches or pull requests

6 participants