Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVX512 masking support #87097

Open
tannergooding opened this issue Jun 3, 2023 · 22 comments
Open

AVX512 masking support #87097

tannergooding opened this issue Jun 3, 2023 · 22 comments
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.Intrinsics avx512 Related to the AVX-512 architecture
Milestone

Comments

@tannergooding
Copy link
Member

Summary

While implementing the API surface for Expose VectorMask to support generic masking for Vector, various considerations were found that necessitated taking a step back and reconsidering how it works.

Most of these issues were found foremost in the additional complexity and throughput hit that was going to be required for the JIT to integrate the type. However, it also impacted the way users interacted with the types and the public API surface we were to expose. Namely that existing user code would not benefit and it would nearly double the API surface we're currently exposing for the XArch and cross-platform intrinsics.

These considerations were raised with @dotnet/avx512-contrib and an alternative design was proposed where the JIT would do pattern recognition in lowering instead to limit the throughput hit and provide light-up to existing user code. This does not preclude the ability to expose VectorMask in the future and we can revisit the type and its design as appropriate.

Conceptual Differences

Previously, we would have defined the following and this would have expanded to effectively all existing intrinsics exposed. This would nearly double or triple our API surface taking us from the ~1900 APIs we have today up to at least ~3800 APIs. Arm64, as a corallary example, currently has ~2100 APIs.

namespace System.Runtime.Intrinsics.X86;

public static partial class Avx512F
{
    // Existing API
    public static Vector512<float> Add(Vector512<float> left, Vector512<float> right);

    // New mask API
    public static Vector512<float> Add(Vector512<float> mergeValues, Vector512Mask<float> mergeMask, Vector512<float> left, Vector512<float> right);

    // Potentially handled by just the above overload where `mergeValues: Vector512<float>.Zero`
    public static Vector512<float> Add(Vector512Mask<float> zeroMask, Vector512<float> left, Vector512<float> right);

    public static partial class VL
    {
        // New mask API
        public static Vector512<float> Add(Vector128<float> mergeValues, Vector128Mask<float> mergeMask, Vector128<float> left, Vector128<float> right);
        public static Vector512<float> Add(Vector256<float> mergeValues, Vector256Mask<float> mergeMask, Vector256<float> left, Vector256<float> right);

        // Potentially handled by just the above overload where `mergeValues: Vector512<float>.Zero`
        public static Vector512<float> Add(Vector128Mask<float> zeroMask, Vector128<float> left, Vector128<float> right);
        public static Vector512<float> Add(Vector256Mask<float> zeroMask, Vector256<float> left, Vector256<float> right);
    }
}

Pattern Recognition

Rather than exposing these overloads of APIs that take VectorMask<T> and allowing users to explicitly utilize masking, we will instead recognize a few key patterns and transform those in the JIT instead.

We would of also had some intrinsics such as public static Vector512Mask<float> CompareEqual(Vector512<float> left, Vector512<float> right) which produce a mask and various other ways to produce a mask as well. Developers then would've been able to consume this by passing down the mask to the API. For example, in the following we find all additions involving NaN and ensure those elements become 0 in the result.

Vector512Mask<float> nanMask = Avx512F.CompareNotEqual(left, left) | Avx512F.CompareNotEqual(right, right);
return Avx512F.Add(Vector512<float>.Zero, ~nanMask, left, right);

If a user wanted to do that today where masking doesn't exist, they'd actually do a functionally similar thing:

Vector256<float> nanMask = Avx.CompareNotEqual(left, left) | Avx.CompareNotEqual(right, right);
Vector256<float> result = Avx.Add(left, right);
return Vector256.ConditionalSelect(~nanMask, result, Vector256<float>.Zero);

Thus, by instead recognizing these patterns we can light up existing code and avoid exploding the API surface while also ensuring that the code users aim to write is consistent regardless of whether they are on hardware with native hardware masking or not.

A sampling of the set of patterns we want to recognize include, but are not limited to:

  • {k1} - ConditionalSelect(mask1, resultVector, mergeVector)
  • {k1}{z} - ConditionalSelect(mask1, resultVector, Vector.Zero)
  • kadd k1, k2 - mask1.ExtractMostSignificantBits() + mask2.ExtractMostSignificantBits()
  • kand k1, k2 - mask1 & mask2
  • kandn k1, k2 - ~mask1 & mask2
  • kmov k1, k2 - mask1 = mask2
  • kmov r32, k1 - mask1.ExtractMostSignificantBits()
  • kmov k1, r32 - Vector.Create(...).ExtractMostSignificantBits()
  • knot k1, k2 - ~mask1
  • kor k1, k2 - mask1 | mask2
  • kortest k1, k2; jz - (mask1 | mask2) == Vector.Zero
  • kortest k1, k2; jnz - (mask1 | mask2) != Vector.Zero
  • kortest k1, k2; jc - (mask1 | mask2) == Vector.AllBitsSet
  • kortest k1, k2; jnc - (mask1 | mask2) != Vector.AllBitsSet
  • kshiftl k1, k2, imm8 - mask1.ExtractMostSignificantBits() << amount
  • kshiftr k1, k2, imm8 - mask1.ExtractMostSignificantBits() >> amount
  • ktest k1, k2; jz - (mask1 & mask2) == Vector.Zero
  • ktest k1, k2; jnz - (mask1 & mask2) != Vector.Zero
  • ktest k1, k2; jc - (~mask1 & mask2) == Vector.Zero
  • ktest k1, k2; jnc - (~mask1 & mask2) == Vector.Zero
  • kunpck k1, k2, k3 - UnpackLow(mask1, mask2)
  • kxnor k1, k2 - ~( mask1 ^ mask2)
  • kxor k1, k2 - (mask1 ^ mask2)
  • vpbroadcastm - Vector.Create(mask1)
  • vpmovm2* - mask1.ExtractMostSignificantBits()
  • vpmov*2m - vector1.ExtractMostSignificantBits()

API Proposal

namespace System.Runtime.Intrinsics.X86;

public enum IntComparisonMode : byte
{
    Equals = 0,
    LessThan = 1,
    LessThanOrEqual = 2,
    False = 3,

    NotEquals = 4,
    GreaterThanOrEqual = 5,
    GreaterThan = 6,
    True = 7,

    // Additional names for parity
    //
    // FloatComparisonMode has similar but they are necessary there since
    // `!(x > y)` is not the same as `(x <= y)` due to the existance of NaN
    //
    // The architecture manual formally uses NotLessThan and NotLessThanOrEqual

    NotGreaterThanOrEqual = 1,
    NotGreaterThan = 2,

    NotLessThan = 5,
    NotLessThanOrEqual = 6,
}

public static partial class Avx512F
{
    public static Vector512<double> BlendVariable(Vector512<double> left, Vector512<double> right, Vector512<double> mask);
    public static Vector512<int>    BlendVariable(Vector512<int>    left, Vector512<int>    right, Vector512<int>    mask);
    public static Vector512<long>   BlendVariable(Vector512<long>   left, Vector512<long>   right, Vector512<long>   mask);
    public static Vector512<float>  BlendVariable(Vector512<float>  left, Vector512<float>  right, Vector512<float>  mask);
    public static Vector512<uint>   BlendVariable(Vector512<uint>   left, Vector512<uint>   right, Vector512<uint>   mask);
    public static Vector512<ulong>  BlendVariable(Vector512<ulong>  left, Vector512<ulong>  right, Vector512<ulong>  mask);

    public static Vector512<double> Compare                     (Vector512<double> left, Vector512<double> right, [ConstantExpected(Max = FloatComparisonMode.UnorderedTrueSignaling)] FloatComparisonMode mode);
    public static Vector512<double> CompareEqual                (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareGreaterThan          (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareGreaterThanOrEqual   (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareLessThan             (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareLessThanOrEqual      (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareNotEqual             (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareNotGreaterThan       (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareNotGreaterThanOrEqual(Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareNotLessThan          (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareNotLessThanOrEqual   (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareOrdered              (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareUnordered            (Vector512<double> left, Vector512<double> right);

    public static Vector512<float> Compare                     (Vector512<float> left, Vector512<float> right, [ConstantExpected(Max = FloatComparisonMode.UnorderedTrueSignaling)] FloatComparisonMode mode);
    public static Vector512<float> CompareEqual                (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareGreaterThan          (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareGreaterThanOrEqual   (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareLessThan             (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareLessThanOrEqual      (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareNotEqual             (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareNotGreaterThan       (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareNotGreaterThanOrEqual(Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareNotLessThan          (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareNotLessThanOrEqual   (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareOrdered              (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareUnordered            (Vector512<float> left, Vector512<float> right);

    public static Vector512<int> Compare                  (Vector512<int> left, Vector512<int> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
    public static Vector512<int> CompareEqual             (Vector512<int> left, Vector512<int> right);
    public static Vector512<int> CompareGreaterThan       (Vector512<int> left, Vector512<int> right);
    public static Vector512<int> CompareGreaterThanOrEqual(Vector512<int> left, Vector512<int> right);
    public static Vector512<int> CompareLessThan          (Vector512<int> left, Vector512<int> right);
    public static Vector512<int> CompareLessThanOrEqual   (Vector512<int> left, Vector512<int> right);
    public static Vector512<int> CompareNotEqual          (Vector512<int> left, Vector512<int> right);

    public static Vector512<long> Compare                  (Vector512<long> left, Vector512<long> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
    public static Vector512<long> CompareEqual             (Vector512<long> left, Vector512<long> right);
    public static Vector512<long> CompareGreaterThan       (Vector512<long> left, Vector512<long> right);
    public static Vector512<long> CompareGreaterThanOrEqual(Vector512<long> left, Vector512<long> right);
    public static Vector512<long> CompareLessThan          (Vector512<long> left, Vector512<long> right);
    public static Vector512<long> CompareLessThanOrEqual   (Vector512<long> left, Vector512<long> right);
    public static Vector512<long> CompareNotEqual          (Vector512<long> left, Vector512<long> right);

    public static Vector512<uint> Compare                  (Vector512<uint> left, Vector512<uint> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
    public static Vector512<uint> CompareEqual             (Vector512<uint> left, Vector512<uint> right);
    public static Vector512<uint> CompareGreaterThan       (Vector512<uint> left, Vector512<uint> right);
    public static Vector512<uint> CompareGreaterThanOrEqual(Vector512<uint> left, Vector512<uint> right);
    public static Vector512<uint> CompareLessThan          (Vector512<uint> left, Vector512<uint> right);
    public static Vector512<uint> CompareLessThanOrEqual   (Vector512<uint> left, Vector512<uint> right);
    public static Vector512<uint> CompareNotEqual          (Vector512<uint> left, Vector512<uint> right);

    public static Vector512<ulong> Compare                  (Vector512<ulong> left, Vector512<ulong> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
    public static Vector512<ulong> CompareEqual             (Vector512<ulong> left, Vector512<ulong> right);
    public static Vector512<ulong> CompareGreaterThan       (Vector512<ulong> left, Vector512<ulong> right);
    public static Vector512<ulong> CompareGreaterThanOrEqual(Vector512<ulong> left, Vector512<ulong> right);
    public static Vector512<ulong> CompareLessThan          (Vector512<ulong> left, Vector512<ulong> right);
    public static Vector512<ulong> CompareLessThanOrEqual   (Vector512<ulong> left, Vector512<ulong> right);
    public static Vector512<ulong> CompareNotEqual          (Vector512<ulong> left, Vector512<ulong> right);

    public static Vector512<double> Compress(Vector512<double> value, Vector512<double> mask);
    public static Vector512<int>    Compress(Vector512<int>    value, Vector512<int>    mask);
    public static Vector512<long>   Compress(Vector512<long>   value, Vector512<long>   mask);
    public static Vector512<float>  Compress(Vector512<float>  value, Vector512<float>  mask);
    public static Vector512<uint>   Compress(Vector512<uint>   value, Vector512<uint>   mask);
    public static Vector512<ulong>  Compress(Vector512<ulong>  value, Vector512<ulong>  mask);

    public static Vector512<double> Expand(Vector512<double> value, Vector512<double> mask);
    public static Vector512<int>    Expand(Vector512<int>    value, Vector512<int>    mask);
    public static Vector512<long>   Expand(Vector512<long>   value, Vector512<long>   mask);
    public static Vector512<float>  Expand(Vector512<float>  value, Vector512<float>  mask);
    public static Vector512<uint>   Expand(Vector512<uint>   value, Vector512<uint>   mask);
    public static Vector512<ulong>  Expand(Vector512<ulong>  value, Vector512<ulong>  mask);

    public static unsafe Vector512<double> GatherMaskVector512(Vector512<double> source, double* baseAddress, Vector512<int> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherMaskVector512(Vector512<int>    source, int*    baseAddress, Vector512<int> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherMaskVector512(Vector512<long>   source, long*   baseAddress, Vector512<int> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherMaskVector512(Vector512<float>  source, float*  baseAddress, Vector512<int> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherMaskVector512(Vector512<uint>   source, uint*   baseAddress, Vector512<int> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherMaskVector512(Vector512<ulong>  source, ulong*  baseAddress, Vector512<int> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> GatherMaskVector512(Vector512<double> source, double* baseAddress, Vector512<long> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherMaskVector512(Vector512<int>    source, int*    baseAddress, Vector512<long> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherMaskVector512(Vector512<long>   source, long*   baseAddress, Vector512<long> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherMaskVector512(Vector512<uint>   source, uint*   baseAddress, Vector512<long> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherMaskVector512(Vector512<float>  source, float*  baseAddress, Vector512<long> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherMaskVector512(Vector512<ulong>  source, ulong*  baseAddress, Vector512<long> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> GatherVector512(double* baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherVector512(int*    baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherVector512(long*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherVector512(float*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherVector512(uint*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherVector512(ulong*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> GatherVector512(double* baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherVector512(int*    baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherVector512(long*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherVector512(float*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherVector512(uint*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherVector512(ulong*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> MaskLoad(double* address, Vector512<double> mask);
    public static unsafe Vector512<int>    MaskLoad(int*    address, Vector512<int>    mask);
    public static unsafe Vector512<long>   MaskLoad(long*   address, Vector512<long>   mask);
    public static unsafe Vector512<float>  MaskLoad(float*  address, Vector512<float>  mask);
    public static unsafe Vector512<uint>   MaskLoad(uint*   address, Vector512<uint>   mask);
    public static unsafe Vector512<ulong>  MaskLoad(ulong*  address, Vector512<ulong>  mask);

    public static unsafe void MaskStore(double* address, Vector512<double> mask, Vector512<double> source);
    public static unsafe void MaskStore(int*    address, Vector512<int>    mask, Vector512<int>    source);
    public static unsafe void MaskStore(long*   address, Vector512<long>   mask, Vector512<long>   source);
    public static unsafe void MaskStore(float*  address, Vector512<float>  mask, Vector512<float>  source);
    public static unsafe void MaskStore(uint*   address, Vector512<uint>   mask, Vector512<uint>   source);
    public static unsafe void MaskStore(ulong*  address, Vector512<ulong>  mask, Vector512<ulong>  source);

    public static int MoveMask(Vector256<short>  value);
    public static int MoveMask(Vector256<ushort> value);
    public static int MoveMask(Vector512<int>    value);
    public static int MoveMask(Vector512<float>  value);
    public static int MoveMask(Vector512<uint>   value);

    public static unsafe void ScatterMaskVector512(Vector512<double> value, double* baseAddress, Vector512<int> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<int>    value, int*    baseAddress, Vector512<int> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<long>   value, long*   baseAddress, Vector512<int> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<float>  value, float*  baseAddress, Vector512<int> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<uint>   value, uint*   baseAddress, Vector512<int> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<int> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe void ScatterMaskVector512(Vector512<double> value, double* baseAddress, Vector512<long> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<int>    value, int*    baseAddress, Vector512<long> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<long>   value, long*   baseAddress, Vector512<long> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<float>  value, uint*   baseAddress, Vector512<long> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<uint>   value, float*  baseAddress, Vector512<long> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<long> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe void ScatterVector512(Vector512<double> value, double* baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<int>    value, int*    baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<long>   value, long*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<float>  value, float*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<uint>   value, uint*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe void ScatterVector512(Vector512<double> value, double* baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<int>    value, int*    baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<long>   value, long*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<float>  value, float*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<uint>   value, uint*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static bool TestC(Vector512<double> left, Vector512<double> right);
    public static bool TestC(Vector512<int>    left, Vector512<int>    right);
    public static bool TestC(Vector512<long>   left, Vector512<long>   right);
    public static bool TestC(Vector512<float>  left, Vector512<float>  right);
    public static bool TestC(Vector512<uint>   left, Vector512<uint>   right);
    public static bool TestC(Vector512<ulong>  left, Vector512<ulong>  right);

    public static bool TestNotZAndNotC(Vector512<double> left, Vector512<double> right);
    public static bool TestNotZAndNotC(Vector512<int>    left, Vector512<int>    right);
    public static bool TestNotZAndNotC(Vector512<long>   left, Vector512<long>   right);
    public static bool TestNotZAndNotC(Vector512<float>  left, Vector512<float>  right);
    public static bool TestNotZAndNotC(Vector512<uint>   left, Vector512<uint>   right);
    public static bool TestNotZAndNotC(Vector512<ulong>  left, Vector512<ulong>  right);

    public static bool TestZ(Vector512<double> left, Vector512<double> right);
    public static bool TestZ(Vector512<int>    left, Vector512<int>    right);
    public static bool TestZ(Vector512<long>   left, Vector512<long>   right);
    public static bool TestZ(Vector512<float>  left, Vector512<float>  right);
    public static bool TestZ(Vector512<uint>   left, Vector512<uint>   right);
    public static bool TestZ(Vector512<ulong>  left, Vector512<ulong>  right);

    public static partial class VL
    {
        public static Vector128<int> Compare                  (Vector128<int> left, Vector128<int> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector128<int> CompareGreaterThanOrEqual(Vector128<int> left, Vector128<int> right);
        public static Vector128<int> CompareLessThan          (Vector128<int> left, Vector128<int> right);
        public static Vector128<int> CompareLessThanOrEqual   (Vector128<int> left, Vector128<int> right);
        public static Vector128<int> CompareNotEqual          (Vector128<int> left, Vector128<int> right);

        public static Vector256<int> Compare                  (Vector256<int> left, Vector256<int> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector256<int> CompareGreaterThanOrEqual(Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareLessThan          (Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareLessThanOrEqual   (Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareNotEqual          (Vector256<int> left, Vector256<int> right);

        public static Vector128<long> Compare                  (Vector128<long> left, Vector128<long> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector128<long> CompareGreaterThanOrEqual(Vector128<long> left, Vector128<long> right);
        public static Vector128<long> CompareLessThan          (Vector128<long> left, Vector128<long> right);
        public static Vector128<long> CompareLessThanOrEqual   (Vector128<long> left, Vector128<long> right);
        public static Vector128<long> CompareNotEqual          (Vector128<long> left, Vector128<long> right);
        public static Vector256<long> Compare                  (Vector256<long> left, Vector256<long> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector256<long> CompareGreaterThanOrEqual(Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareLessThan          (Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareLessThanOrEqual   (Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareNotEqual          (Vector256<long> left, Vector256<long> right);

        public static Vector128<uint> Compare                  (Vector128<uint> left, Vector128<uint> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector128<uint> CompareGreaterThan       (Vector128<uint> left, Vector128<uint> right);
        public static Vector128<uint> CompareGreaterThanOrEqual(Vector128<uint> left, Vector128<uint> right);
        public static Vector128<uint> CompareLessThan          (Vector128<uint> left, Vector128<uint> right);
        public static Vector128<uint> CompareLessThanOrEqual   (Vector128<uint> left, Vector128<uint> right);
        public static Vector128<uint> CompareNotEqual          (Vector128<uint> left, Vector128<uint> right);
        public static Vector256<uint> Compare                  (Vector256<uint> left, Vector256<uint> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector256<uint> CompareGreaterThan       (Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareGreaterThanOrEqual(Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareLessThan          (Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareLessThanOrEqual   (Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareNotEqual          (Vector256<uint> left, Vector256<uint> right);

        public static Vector128<ulong> Compare                  (Vector128<ulong> left, Vector128<ulong> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector128<ulong> CompareGreaterThan       (Vector128<ulong> left, Vector128<ulong> right);
        public static Vector128<ulong> CompareGreaterThanOrEqual(Vector128<ulong> left, Vector128<ulong> right);
        public static Vector128<ulong> CompareLessThan          (Vector128<ulong> left, Vector128<ulong> right);
        public static Vector128<ulong> CompareLessThanOrEqual   (Vector128<ulong> left, Vector128<ulong> right);
        public static Vector128<ulong> CompareNotEqual          (Vector128<ulong> left, Vector128<ulong> right);
        public static Vector256<ulong> Compare                  (Vector256<ulong> left, Vector256<ulong> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector256<ulong> CompareGreaterThan       (Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareGreaterThanOrEqual(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareLessThan          (Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareLessThanOrEqual   (Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareNotEqual          (Vector256<ulong> left, Vector256<ulong> right);

        public static Vector128<double> Compress(Vector128<double> value, Vector128<double> mask);
        public static Vector128<int>    Compress(Vector128<int>    value, Vector128<int>    mask);
        public static Vector128<long>   Compress(Vector128<long>   value, Vector128<long>   mask);
        public static Vector128<float>  Compress(Vector128<float>  value, Vector128<float>  mask);
        public static Vector128<uint>   Compress(Vector128<uint>   value, Vector128<uint>   mask);
        public static Vector128<ulong>  Compress(Vector128<ulong>  value, Vector128<ulong>  mask);
        public static Vector256<double> Compress(Vector256<double> value, Vector256<double> mask);
        public static Vector256<int>    Compress(Vector256<int>    value, Vector256<int>    mask);
        public static Vector256<long>   Compress(Vector256<long>   value, Vector256<long>   mask);
        public static Vector256<float>  Compress(Vector256<float>  value, Vector256<float>  mask);
        public static Vector256<uint>   Compress(Vector256<uint>   value, Vector256<uint>   mask);
        public static Vector256<ulong>  Compress(Vector256<ulong>  value, Vector256<ulong>  mask);

        public static Vector128<double> Expand(Vector128<double> value, Vector128<double> mask);
        public static Vector128<int>    Expand(Vector128<int>    value, Vector128<int>    mask);
        public static Vector128<long>   Expand(Vector128<long>   value, Vector128<long>   mask);
        public static Vector128<float>  Expand(Vector128<float>  value, Vector128<float>  mask);
        public static Vector128<uint>   Expand(Vector128<uint>   value, Vector128<uint>   mask);
        public static Vector128<ulong>  Expand(Vector128<ulong>  value, Vector128<ulong>  mask);
        public static Vector256<double> Expand(Vector256<double> value, Vector256<double> mask);
        public static Vector256<int>    Expand(Vector256<int>    value, Vector256<int>    mask);
        public static Vector256<long>   Expand(Vector256<long>   value, Vector256<long>   mask);
        public static Vector256<float>  Expand(Vector256<float>  value, Vector256<float>  mask);
        public static Vector256<uint>   Expand(Vector256<uint>   value, Vector256<uint>   mask);
        public static Vector256<ulong>  Expand(Vector256<ulong>  value, Vector256<ulong>  mask);

        public static unsafe void ScatterMaskVector128(Vector128<double> value, double* baseAddress, Vector128<int> index, Vector128<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<int>    value, int*    baseAddress, Vector128<int> index, Vector128<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<long>   value, long*   baseAddress, Vector128<int> index, Vector128<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<float>  value, float*  baseAddress, Vector128<int> index, Vector128<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<uint>   value, uint*   baseAddress, Vector128<int> index, Vector128<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<int> index, Vector128<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<double> value, double* baseAddress, Vector256<int> index, Vector256<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<int>    value, int*    baseAddress, Vector256<int> index, Vector256<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<long>   value, long*   baseAddress, Vector256<int> index, Vector256<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<float>  value, float*  baseAddress, Vector256<int> index, Vector256<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<uint>   value, uint*   baseAddress, Vector256<int> index, Vector256<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<int> index, Vector256<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

        public static unsafe void ScatterMaskVector128(Vector128<double> value, double* baseAddress, Vector128<long> index, Vector128<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<int>    value, int*    baseAddress, Vector128<long> index, Vector128<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<long>   value, long*   baseAddress, Vector128<long> index, Vector128<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<float>  value, uint*   baseAddress, Vector128<long> index, Vector128<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<uint>   value, float*  baseAddress, Vector128<long> index, Vector128<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<long> index, Vector128<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<double> value, double* baseAddress, Vector256<long> index, Vector256<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<int>    value, int*    baseAddress, Vector256<long> index, Vector256<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<long>   value, long*   baseAddress, Vector256<long> index, Vector256<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<float>  value, uint*   baseAddress, Vector256<long> index, Vector256<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<uint>   value, float*  baseAddress, Vector256<long> index, Vector256<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<long> index, Vector256<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

        public static unsafe void ScatterVector128(Vector128<double> value, double* baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<int>    value, int*    baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<long>   value, long*   baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<float>  value, float*  baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<uint>   value, uint*   baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<double> value, double* baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<int>    value, int*    baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<long>   value, long*   baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<float>  value, float*  baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<uint>   value, uint*   baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

        public static unsafe void ScatterVector128(Vector128<double> value, double* baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<int>    value, int*    baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<long>   value, long*   baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<float>  value, float*  baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<uint>   value, uint*   baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<double> value, double* baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<int>    value, int*    baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<long>   value, long*   baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<float>  value, float*  baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<uint>   value, uint*   baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    }
}

public static partial class Avx512BW
{
    public static Vector512<byte>   BlendVariable(Vector512<byte>   left, Vector512<byte>   right, Vector512<byte>   mask);
    public static Vector512<short>  BlendVariable(Vector512<short>  left, Vector512<short>  right, Vector512<short>  mask);
    public static Vector512<sbyte>  BlendVariable(Vector512<sbyte>  left, Vector512<sbyte>  right, Vector512<sbyte>  mask);
    public static Vector512<ushort> BlendVariable(Vector512<ushort> left, Vector512<ushort> right, Vector512<ushort> mask);

    public static Vector512<byte> Compare                  (Vector512<byte> left, Vector512<byte> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
    public static Vector512<byte> CompareEqual             (Vector512<byte> left, Vector512<byte> right);
    public static Vector512<byte> CompareGreaterThan       (Vector512<byte> left, Vector512<byte> right);
    public static Vector512<byte> CompareGreaterThanOrEqual(Vector512<byte> left, Vector512<byte> right);
    public static Vector512<byte> CompareLessThan          (Vector512<byte> left, Vector512<byte> right);
    public static Vector512<byte> CompareLessThanOrEqual   (Vector512<byte> left, Vector512<byte> right);
    public static Vector512<byte> CompareNotEqual          (Vector512<byte> left, Vector512<byte> right);

    public static Vector512<short> Compare                  (Vector512<short> left, Vector512<short> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
    public static Vector512<short> CompareEqual             (Vector512<short> left, Vector512<short> right);
    public static Vector512<short> CompareGreaterThan       (Vector512<short> left, Vector512<short> right);
    public static Vector512<short> CompareGreaterThanOrEqual(Vector512<short> left, Vector512<short> right);
    public static Vector512<short> CompareLessThan          (Vector512<short> left, Vector512<short> right);
    public static Vector512<short> CompareLessThanOrEqual   (Vector512<short> left, Vector512<short> right);
    public static Vector512<short> CompareNotEqual          (Vector512<short> left, Vector512<short> right);

    public static Vector512<sbyte> Compare                  (Vector512<sbyte> left, Vector512<sbyte> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
    public static Vector512<sbyte> CompareEqual             (Vector512<sbyte> left, Vector512<sbyte> right);
    public static Vector512<sbyte> CompareGreaterThan       (Vector512<sbyte> left, Vector512<sbyte> right);
    public static Vector512<sbyte> CompareGreaterThanOrEqual(Vector512<sbyte> left, Vector512<sbyte> right);
    public static Vector512<sbyte> CompareLessThan          (Vector512<sbyte> left, Vector512<sbyte> right);
    public static Vector512<sbyte> CompareLessThanOrEqual   (Vector512<sbyte> left, Vector512<sbyte> right);
    public static Vector512<sbyte> CompareNotEqual          (Vector512<sbyte> left, Vector512<sbyte> right);

    public static Vector512<ushort> Compare                  (Vector512<ushort> left, Vector512<ushort> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
    public static Vector512<ushort> CompareEqual             (Vector512<ushort> left, Vector512<ushort> right);
    public static Vector512<ushort> CompareGreaterThan       (Vector512<ushort> left, Vector512<ushort> right);
    public static Vector512<ushort> CompareGreaterThanOrEqual(Vector512<ushort> left, Vector512<ushort> right);
    public static Vector512<ushort> CompareLessThan          (Vector512<ushort> left, Vector512<ushort> right);
    public static Vector512<ushort> CompareLessThanOrEqual   (Vector512<ushort> left, Vector512<ushort> right);
    public static Vector512<ushort> CompareNotEqual          (Vector512<ushort> left, Vector512<ushort> right);

    public static int MoveMask(Vector512<short>  value);
    public static int MoveMask(Vector512<ushort> value);

    public static long MoveMask(Vector512<byte>  value);
    public static long MoveMask(Vector512<sbyte> value);

    public static bool TestC(Vector512<byte>   left, Vector512<byte>   right);
    public static bool TestC(Vector512<short>  left, Vector512<short>  right);
    public static bool TestC(Vector512<sbyte>  left, Vector512<sbyte>  right);
    public static bool TestC(Vector512<ushort> left, Vector512<ushort> right);

    public static bool TestNotZAndNotC(Vector512<byte>   left, Vector512<byte>   right);
    public static bool TestNotZAndNotC(Vector512<short>  left, Vector512<short>  right);
    public static bool TestNotZAndNotC(Vector512<sbyte>  left, Vector512<sbyte>  right);
    public static bool TestNotZAndNotC(Vector512<ushort> left, Vector512<ushort> right);

    public static bool TestZ(Vector512<byte>   left, Vector512<byte>   right);
    public static bool TestZ(Vector512<short>  left, Vector512<short>  right);
    public static bool TestZ(Vector512<sbyte>  left, Vector512<sbyte>  right);
    public static bool TestZ(Vector512<ushort> left, Vector512<ushort> right);

    public static partial class VL
    {
        public static Vector128<byte> Compare                  (Vector128<byte> left, Vector128<byte> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector128<byte> CompareGreaterThan       (Vector128<byte> left, Vector128<byte> right);
        public static Vector128<byte> CompareGreaterThanOrEqual(Vector128<byte> left, Vector128<byte> right);
        public static Vector128<byte> CompareLessThan          (Vector128<byte> left, Vector128<byte> right);
        public static Vector128<byte> CompareLessThanOrEqual   (Vector128<byte> left, Vector128<byte> right);
        public static Vector128<byte> CompareNotEqual          (Vector128<byte> left, Vector128<byte> right);
        public static Vector256<byte> Compare                  (Vector256<byte> left, Vector256<byte> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector256<byte> CompareGreaterThan       (Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareGreaterThanOrEqual(Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareLessThan          (Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareLessThanOrEqual   (Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareNotEqual          (Vector256<byte> left, Vector256<byte> right);

        public static Vector128<short> Compare                  (Vector128<short> left, Vector128<short> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector128<short> CompareGreaterThanOrEqual(Vector128<short> left, Vector128<short> right);
        public static Vector128<short> CompareLessThan          (Vector128<short> left, Vector128<short> right);
        public static Vector128<short> CompareLessThanOrEqual   (Vector128<short> left, Vector128<short> right);
        public static Vector128<short> CompareNotEqual          (Vector128<short> left, Vector128<short> right);
        public static Vector256<short> Compare                  (Vector256<short> left, Vector256<short> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector256<short> CompareGreaterThanOrEqual(Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareLessThan          (Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareLessThanOrEqual   (Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareNotEqual          (Vector256<short> left, Vector256<short> right);

        public static Vector128<sbyte> Compare                  (Vector128<sbyte> left, Vector128<sbyte> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector128<sbyte> CompareGreaterThanOrEqual(Vector128<sbyte> left, Vector128<sbyte> right);
        public static Vector128<sbyte> CompareLessThan          (Vector128<sbyte> left, Vector128<sbyte> right);
        public static Vector128<sbyte> CompareLessThanOrEqual   (Vector128<sbyte> left, Vector128<sbyte> right);
        public static Vector128<sbyte> CompareNotEqual          (Vector128<sbyte> left, Vector128<sbyte> right);
        public static Vector256<sbyte> Compare                  (Vector256<sbyte> left, Vector256<sbyte> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector256<sbyte> CompareGreaterThanOrEqual(Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareLessThan          (Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareLessThanOrEqual   (Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareNotEqual          (Vector256<sbyte> left, Vector256<sbyte> right);

        public static Vector128<ushort> Compare                  (Vector128<ushort> left, Vector128<ushort> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector128<ushort> CompareGreaterThan       (Vector128<ushort> left, Vector128<ushort> right);
        public static Vector128<ushort> CompareGreaterThanOrEqual(Vector128<ushort> left, Vector128<ushort> right);
        public static Vector128<ushort> CompareLessThan          (Vector128<ushort> left, Vector128<ushort> right);
        public static Vector128<ushort> CompareLessThanOrEqual   (Vector128<ushort> left, Vector128<ushort> right);
        public static Vector128<ushort> CompareNotEqual          (Vector128<ushort> left, Vector128<ushort> right);
        public static Vector256<ushort> Compare                  (Vector256<ushort> left, Vector256<ushort> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector256<ushort> CompareGreaterThan       (Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareGreaterThanOrEqual(Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareLessThan          (Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareLessThanOrEqual   (Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareNotEqual          (Vector256<ushort> left, Vector256<ushort> right);
    }
}

public static partial class Avx512DQ
{
    public static Vector512<double> Classify(Vector512<double> value, [ConstantExpected] byte control);
    public static Vector512<float>  Classify(Vector512<float>  value, [ConstantExpected] byte control);

    public static Vector128<double> ClassifyScalar(Vector128<double> value, [ConstantExpected] byte control);
    public static Vector128<float>  ClassifyScalar(Vector128<float>  value, [ConstantExpected] byte control);

    public static int MoveMask(Vector128<short>  value);
    public static int MoveMask(Vector128<ushort> value);
    public static int MoveMask(Vector256<int>    value);
    public static int MoveMask(Vector256<uint>   value);
    public static int MoveMask(Vector512<double> value);
    public static int MoveMask(Vector512<long>   value);
    public static int MoveMask(Vector512<ulong>  value);

    public static partial class VL
    {
        public static Vector128<double> Classify(Vector128<double> value, [ConstantExpected] byte control);
        public static Vector128<float>  Classify(Vector128<float>  value, [ConstantExpected] byte control);
        public static Vector256<double> Classify(Vector256<double> value, [ConstantExpected] byte control);
        public static Vector256<float>  Classify(Vector256<float>  value, [ConstantExpected] byte control);
    }
}

public abstract class Avx512Vbmi2 : Avx512BW
{
    public static new bool IsSupported { get; }

    public static Vector512<byte>   Compress(Vector512<byte>   value, Vector512<byte>   mask);
    public static Vector512<short>  Compress(Vector512<short>  value, Vector512<short>  mask);
    public static Vector512<sbyte>  Compress(Vector512<sbyte>  value, Vector512<sbyte>  mask);
    public static Vector512<ushort> Compress(Vector512<ushort> value, Vector512<ushort> mask);

    public static Vector512<byte>   Expand(Vector512<byte>   value, Vector512<byte>   mask);
    public static Vector512<short>  Expand(Vector512<short>  value, Vector512<short>  mask);
    public static Vector512<sbyte>  Expand(Vector512<sbyte>  value, Vector512<sbyte>  mask);
    public static Vector512<ushort> Expand(Vector512<ushort> value, Vector512<ushort> mask);

    public abstract class VL : Avx512BW.VL
    {
        public static new bool IsSupported { get; }

        public static Vector128<byte>   Compress(Vector128<byte>   value, Vector128<byte>   mask);
        public static Vector128<short>  Compress(Vector128<short>  value, Vector128<short>  mask);
        public static Vector128<sbyte>  Compress(Vector128<sbyte>  value, Vector128<sbyte>  mask);
        public static Vector128<ushort> Compress(Vector128<ushort> value, Vector128<ushort> mask);
        public static Vector256<byte>   Compress(Vector256<byte>   value, Vector256<byte>   mask);
        public static Vector256<short>  Compress(Vector256<short>  value, Vector256<short>  mask);
        public static Vector256<sbyte>  Compress(Vector256<sbyte>  value, Vector256<sbyte>  mask);
        public static Vector256<ushort> Compress(Vector256<ushort> value, Vector256<ushort> mask);

        public static Vector128<byte>   Expand(Vector128<byte>   value, Vector128<byte>   mask);
        public static Vector128<short>  Expand(Vector128<short>  value, Vector128<short>  mask);
        public static Vector128<sbyte>  Expand(Vector128<sbyte>  value, Vector128<sbyte>  mask);
        public static Vector128<ushort> Expand(Vector128<ushort> value, Vector128<ushort> mask);
        public static Vector256<byte>   Expand(Vector256<byte>   value, Vector256<byte>   mask);
        public static Vector256<short>  Expand(Vector256<short>  value, Vector256<short>  mask);
        public static Vector256<sbyte>  Expand(Vector256<sbyte>  value, Vector256<sbyte>  mask);
        public static Vector256<ushort> Expand(Vector256<ushort> value, Vector256<ushort> mask);
    }

    public abstract class X64 : Avx512BW.X64
    {
        public static new bool IsSupported { get; }
    }
}
@tannergooding tannergooding added area-System.Runtime.Intrinsics blocking Marks issues that we want to fast track in order to unblock other important work api-ready-for-review API is ready for review, it is NOT ready for implementation avx512 Related to the AVX-512 architecture labels Jun 3, 2023
@tannergooding tannergooding added this to the 8.0.0 milestone Jun 3, 2023
@ghost
Copy link

ghost commented Jun 3, 2023

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Issue Details

Summary

While implementing the API surface for Expose VectorMask to support generic masking for Vector, various considerations were found that necessitated taking a step back and reconsidering how it works.

Most of these issues were found foremost in the additional complexity and throughput hit that was going to be required for the JIT to integrate the type. However, it also impacted the way users interacted with the types and the public API surface we were to expose. Namely that existing user code would not benefit and it would nearly double the API surface we're currently exposing for the XArch and cross-platform intrinsics.

These considerations were raised with @dotnet/avx512-contrib and an alternative design was proposed where the JIT would do pattern recognition in lowering instead to limit the throughput hit and provide light-up to existing user code. This does not preclude the ability to expose VectorMask in the future and we can revisit the type and its design as appropriate.

Conceptual Differences

Previously, we would have defined the following and this would have expanded to effectively all existing intrinsics exposed. This would nearly double or triple our API surface taking us from the ~1900 APIs we have today up to at least ~3800 APIs. Arm64, as a corallary example, currently has ~2100 APIs.

namespace System.Runtime.Intrinsics.X86;

public static partial class Avx512F
{
    // Existing API
    public static Vector512<float> Add(Vector512<float> left, Vector512<float> right);

    // New mask API
    public static Vector512<float> Add(Vector512<float> mergeValues, Vector512Mask<float> mergeMask, Vector512<float> left, Vector512<float> right);

    // Potentially handled by just the above overload where `mergeValues: Vector512<float>.Zero`
    public static Vector512<float> Add(Vector512Mask<float> zeroMask, Vector512<float> left, Vector512<float> right);

    public static partial class VL
    {
        // New mask API
        public static Vector512<float> Add(Vector128<float> mergeValues, Vector128Mask<float> mergeMask, Vector128<float> left, Vector128<float> right);
        public static Vector512<float> Add(Vector256<float> mergeValues, Vector256Mask<float> mergeMask, Vector256<float> left, Vector256<float> right);

        // Potentially handled by just the above overload where `mergeValues: Vector512<float>.Zero`
        public static Vector512<float> Add(Vector128Mask<float> zeroMask, Vector128<float> left, Vector128<float> right);
        public static Vector512<float> Add(Vector256Mask<float> zeroMask, Vector256<float> left, Vector256<float> right);
    }
}

Pattern Recognition

Rather than exposing these overloads of APIs that take VectorMask<T> and allowing users to explicitly utilize masking, we will instead recognize a few key patterns and transform those in the JIT instead.

We would of also had some intrinsics such as public static Vector512Mask<float> CompareEqual(Vector512<float> left, Vector512<float> right) which produce a mask and various other ways to produce a mask as well. Developers then would've been able to consume this by passing down the mask to the API. For example, in the following we find all additions involving NaN and ensure those elements become 0 in the result.

Vector512Mask<float> nanMask = Avx512F.CompareNotEqual(left, left) | Avx512F.CompareNotEqual(right, right);
return Avx512F.Add(Vector512<float>.Zero, ~nanMask, left, right);

If a user wanted to do that today where masking doesn't exist, they'd actually do a functionally similar thing:

Vector256<float> nanMask = Avx.CompareNotEqual(left, left) | Avx.CompareNotEqual(right, right);
Vector256<float> result = Avx.Add(left, right);
return Vector256.ConditionalSelect(~nanMask, result, Vector256<float>.Zero);

Thus, by instead recognizing these patterns we can light up existing code and avoid exploding the API surface while also ensuring that the code users aim to write is consistent regardless of whether they are on hardware with native hardware masking or not.

A sampling of the set of patterns we want to recognize include, but are not limited to:

  • {k1} - ConditionalSelect(mask1, resultVector, mergeVector)
  • {k1}{z} - ConditionalSelect(mask1, resultVector, Vector.Zero)
  • kadd k1, k2 - mask1.ExtractMostSignificantBits() + mask2.ExtractMostSignificantBits()
  • kand k1, k2 - mask1 & mask2
  • kandn k1, k2 - ~mask1 & mask2
  • kmov k1, k2 - mask1 = mask2
  • kmov r32, k1 - mask1.ExtractMostSignificantBits()
  • kmov k1, r32 - Vector.Create(...).ExtractMostSignificantBits()
  • knot k1, k2 - ~mask1
  • kor k1, k2 - mask1 | mask2
  • kortest k1, k2; jz - (mask1 | mask2) == Vector.Zero
  • kortest k1, k2; jnz - (mask1 | mask2) != Vector.Zero
  • kortest k1, k2; jc - (mask1 | mask2) == Vector.AllBitsSet
  • kortest k1, k2; jnc - (mask1 | mask2) != Vector.AllBitsSet
  • kshiftl k1, k2, imm8 - mask1.ExtractMostSignificantBits() << amount
  • kshiftr k1, k2, imm8 - mask1.ExtractMostSignificantBits() >> amount
  • ktest k1, k2; jz - (mask1 & mask2) == Vector.Zero
  • ktest k1, k2; jnz - (mask1 & mask2) != Vector.Zero
  • ktest k1, k2; jc - (~mask1 & mask2) == Vector.Zero
  • ktest k1, k2; jnc - (~mask1 & mask2) == Vector.Zero
  • kunpck k1, k2, k3 - UnpackLow(mask1, mask2)
  • kxnor k1, k2 - ~( mask1 ^ mask2)
  • kxor k1, k2 - (mask1 ^ mask2)
  • vpbroadcastm - Vector.Create(mask1)
  • vpmovm2* - mask1.ExtractMostSignificantBits()
  • vpmov*2m - vector1.ExtractMostSignificantBits()

API Proposal

namespace System.Runtime.Intrinsics.X86;

public enum IntComparisonMode : byte
{
    Equals = 0,
    LessThan = 1,
    LessThanOrEqual = 2,
    False = 3,

    NotEquals = 4,
    GreaterThanOrEqual = 5,
    GreaterThan = 6,
    True = 7,

    // Additional names for parity
    //
    // FloatComparisonMode has similar but they are necessary there since
    // `!(x > y)` is not the same as `(x <= y)` due to the existance of NaN
    //
    // The architecture manual formally uses NotLessThan and NotLessThanOrEqual

    NotGreaterThanOrEqual = 1,
    NotGreaterThan = 2,

    NotLessThan = 5,
    NotLessThanOrEqual = 6,
}

public static partial class Avx512F
{
    public static Vector512<double> BlendVariable(Vector512<double> left, Vector512<double> right, Vector512<double> mask);
    public static Vector512<int>    BlendVariable(Vector512<int>    left, Vector512<int>    right, Vector512<int>    mask);
    public static Vector512<long>   BlendVariable(Vector512<long>   left, Vector512<long>   right, Vector512<long>   mask);
    public static Vector512<float>  BlendVariable(Vector512<float>  left, Vector512<float>  right, Vector512<float>  mask);
    public static Vector512<uint>   BlendVariable(Vector512<uint>   left, Vector512<uint>   right, Vector512<uint>   mask);
    public static Vector512<ulong>  BlendVariable(Vector512<ulong>  left, Vector512<ulong>  right, Vector512<ulong>  mask);

    public static Vector512<double> Compare                     (Vector512<double> left, Vector512<double> right, [ConstantExpected(Max = FloatComparisonMode.UnorderedTrueSignaling)] FloatComparisonMode mode);
    public static Vector512<double> CompareEqual                (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareGreaterThan          (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareGreaterThanOrEqual   (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareLessThan             (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareLessThanOrEqual      (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareNotEqual             (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareNotGreaterThan       (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareNotGreaterThanOrEqual(Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareNotLessThan          (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareNotLessThanOrEqual   (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareOrdered              (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareUnordered            (Vector512<double> left, Vector512<double> right);

    public static Vector512<float> Compare                     (Vector512<float> left, Vector512<float> right, [ConstantExpected(Max = FloatComparisonMode.UnorderedTrueSignaling)] FloatComparisonMode mode);
    public static Vector512<float> CompareEqual                (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareGreaterThan          (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareGreaterThanOrEqual   (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareLessThan             (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareLessThanOrEqual      (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareNotEqual             (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareNotGreaterThan       (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareNotGreaterThanOrEqual(Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareNotLessThan          (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareNotLessThanOrEqual   (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareOrdered              (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareUnordered            (Vector512<float> left, Vector512<float> right);

    public static Vector512<int> Compare                  (Vector512<int> left, Vector512<int> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
    public static Vector512<int> CompareEqual             (Vector512<int> left, Vector512<int> right);
    public static Vector512<int> CompareGreaterThan       (Vector512<int> left, Vector512<int> right);
    public static Vector512<int> CompareGreaterThanOrEqual(Vector512<int> left, Vector512<int> right);
    public static Vector512<int> CompareLessThan          (Vector512<int> left, Vector512<int> right);
    public static Vector512<int> CompareLessThanOrEqual   (Vector512<int> left, Vector512<int> right);
    public static Vector512<int> CompareNotEqual          (Vector512<int> left, Vector512<int> right);

    public static Vector512<long> Compare                  (Vector512<long> left, Vector512<long> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
    public static Vector512<long> CompareEqual             (Vector512<long> left, Vector512<long> right);
    public static Vector512<long> CompareGreaterThan       (Vector512<long> left, Vector512<long> right);
    public static Vector512<long> CompareGreaterThanOrEqual(Vector512<long> left, Vector512<long> right);
    public static Vector512<long> CompareLessThan          (Vector512<long> left, Vector512<long> right);
    public static Vector512<long> CompareLessThanOrEqual   (Vector512<long> left, Vector512<long> right);
    public static Vector512<long> CompareNotEqual          (Vector512<long> left, Vector512<long> right);

    public static Vector512<uint> Compare                  (Vector512<uint> left, Vector512<uint> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
    public static Vector512<uint> CompareEqual             (Vector512<uint> left, Vector512<uint> right);
    public static Vector512<uint> CompareGreaterThan       (Vector512<uint> left, Vector512<uint> right);
    public static Vector512<uint> CompareGreaterThanOrEqual(Vector512<uint> left, Vector512<uint> right);
    public static Vector512<uint> CompareLessThan          (Vector512<uint> left, Vector512<uint> right);
    public static Vector512<uint> CompareLessThanOrEqual   (Vector512<uint> left, Vector512<uint> right);
    public static Vector512<uint> CompareNotEqual          (Vector512<uint> left, Vector512<uint> right);

    public static Vector512<ulong> Compare                  (Vector512<ulong> left, Vector512<ulong> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
    public static Vector512<ulong> CompareEqual             (Vector512<ulong> left, Vector512<ulong> right);
    public static Vector512<ulong> CompareGreaterThan       (Vector512<ulong> left, Vector512<ulong> right);
    public static Vector512<ulong> CompareGreaterThanOrEqual(Vector512<ulong> left, Vector512<ulong> right);
    public static Vector512<ulong> CompareLessThan          (Vector512<ulong> left, Vector512<ulong> right);
    public static Vector512<ulong> CompareLessThanOrEqual   (Vector512<ulong> left, Vector512<ulong> right);
    public static Vector512<ulong> CompareNotEqual          (Vector512<ulong> left, Vector512<ulong> right);

    public static Vector512<double> Compress(Vector512<double> value, Vector512<double> mask);
    public static Vector512<int>    Compress(Vector512<int>    value, Vector512<int>    mask);
    public static Vector512<long>   Compress(Vector512<long>   value, Vector512<long>   mask);
    public static Vector512<float>  Compress(Vector512<float>  value, Vector512<float>  mask);
    public static Vector512<uint>   Compress(Vector512<uint>   value, Vector512<uint>   mask);
    public static Vector512<ulong>  Compress(Vector512<ulong>  value, Vector512<ulong>  mask);

    public static Vector512<double> Expand(Vector512<double> value, Vector512<double> mask);
    public static Vector512<int>    Expand(Vector512<int>    value, Vector512<int>    mask);
    public static Vector512<long>   Expand(Vector512<long>   value, Vector512<long>   mask);
    public static Vector512<float>  Expand(Vector512<float>  value, Vector512<float>  mask);
    public static Vector512<uint>   Expand(Vector512<uint>   value, Vector512<uint>   mask);
    public static Vector512<ulong>  Expand(Vector512<ulong>  value, Vector512<ulong>  mask);

    public static unsafe Vector512<double> GatherMaskVector512(Vector512<double> source, double* baseAddress, Vector512<int> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherMaskVector512(Vector512<int>    source, int*    baseAddress, Vector512<int> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherMaskVector512(Vector512<long>   source, long*   baseAddress, Vector512<int> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherMaskVector512(Vector512<float>  source, float*  baseAddress, Vector512<int> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherMaskVector512(Vector512<uint>   source, uint*   baseAddress, Vector512<int> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherMaskVector512(Vector512<ulong>  source, ulong*  baseAddress, Vector512<int> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> GatherMaskVector512(Vector512<double> source, double* baseAddress, Vector512<long> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherMaskVector512(Vector512<int>    source, int*    baseAddress, Vector512<long> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherMaskVector512(Vector512<long>   source, long*   baseAddress, Vector512<long> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherMaskVector512(Vector512<uint>   source, uint*   baseAddress, Vector512<long> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherMaskVector512(Vector512<float>  source, float*  baseAddress, Vector512<long> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherMaskVector512(Vector512<ulong>  source, ulong*  baseAddress, Vector512<long> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> GatherVector512(double* baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherVector512(int*    baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherVector512(long*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherVector512(float*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherVector512(uint*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherVector512(ulong*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> GatherVector512(double* baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherVector512(int*    baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherVector512(long*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherVector512(float*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherVector512(uint*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherVector512(ulong*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> MaskLoad(double* address, Vector512<double> mask);
    public static unsafe Vector512<int>    MaskLoad(int*    address, Vector512<int>    mask);
    public static unsafe Vector512<long>   MaskLoad(long*   address, Vector512<long>   mask);
    public static unsafe Vector512<float>  MaskLoad(float*  address, Vector512<float>  mask);
    public static unsafe Vector512<uint>   MaskLoad(uint*   address, Vector512<uint>   mask);
    public static unsafe Vector512<ulong>  MaskLoad(ulong*  address, Vector512<ulong>  mask);

    public static unsafe void MaskStore(double* address, Vector512<double> mask, Vector512<double> source);
    public static unsafe void MaskStore(int*    address, Vector512<int>    mask, Vector512<int>    source);
    public static unsafe void MaskStore(long*   address, Vector512<long>   mask, Vector512<long>   source);
    public static unsafe void MaskStore(float*  address, Vector512<float>  mask, Vector512<float>  source);
    public static unsafe void MaskStore(uint*   address, Vector512<uint>   mask, Vector512<uint>   source);
    public static unsafe void MaskStore(ulong*  address, Vector512<ulong>  mask, Vector512<ulong>  source);

    public static int MoveMask(Vector256<short>  value);
    public static int MoveMask(Vector256<ushort> value);
    public static int MoveMask(Vector512<int>    value);
    public static int MoveMask(Vector512<float>  value);
    public static int MoveMask(Vector512<uint>   value);

    public static unsafe void ScatterMaskVector512(Vector512<double> value, double* baseAddress, Vector512<int> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<int>    value, int*    baseAddress, Vector512<int> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<long>   value, long*   baseAddress, Vector512<int> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<float>  value, float*  baseAddress, Vector512<int> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<uint>   value, uint*   baseAddress, Vector512<int> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<int> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe void ScatterMaskVector512(Vector512<double> value, double* baseAddress, Vector512<long> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<int>    value, int*    baseAddress, Vector512<long> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<long>   value, long*   baseAddress, Vector512<long> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<float>  value, uint*   baseAddress, Vector512<long> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<uint>   value, float*  baseAddress, Vector512<long> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<long> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe void ScatterVector512(Vector512<double> value, double* baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<int>    value, int*    baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<long>   value, long*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<float>  value, float*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<uint>   value, uint*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe void ScatterVector512(Vector512<double> value, double* baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<int>    value, int*    baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<long>   value, long*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<float>  value, float*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<uint>   value, uint*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static bool TestC(Vector512<double> left, Vector512<double> right);
    public static bool TestC(Vector512<int>    left, Vector512<int>    right);
    public static bool TestC(Vector512<long>   left, Vector512<long>   right);
    public static bool TestC(Vector512<float>  left, Vector512<float>  right);
    public static bool TestC(Vector512<uint>   left, Vector512<uint>   right);
    public static bool TestC(Vector512<ulong>  left, Vector512<ulong>  right);

    public static bool TestNotZAndNotC(Vector512<double> left, Vector512<double> right);
    public static bool TestNotZAndNotC(Vector512<int>    left, Vector512<int>    right);
    public static bool TestNotZAndNotC(Vector512<long>   left, Vector512<long>   right);
    public static bool TestNotZAndNotC(Vector512<float>  left, Vector512<float>  right);
    public static bool TestNotZAndNotC(Vector512<uint>   left, Vector512<uint>   right);
    public static bool TestNotZAndNotC(Vector512<ulong>  left, Vector512<ulong>  right);

    public static bool TestZ(Vector512<double> left, Vector512<double> right);
    public static bool TestZ(Vector512<int>    left, Vector512<int>    right);
    public static bool TestZ(Vector512<long>   left, Vector512<long>   right);
    public static bool TestZ(Vector512<float>  left, Vector512<float>  right);
    public static bool TestZ(Vector512<uint>   left, Vector512<uint>   right);
    public static bool TestZ(Vector512<ulong>  left, Vector512<ulong>  right);

    public static partial class VL
    {
        public static Vector128<int> Compare                  (Vector128<int> left, Vector128<int> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector128<int> CompareGreaterThanOrEqual(Vector128<int> left, Vector128<int> right);
        public static Vector128<int> CompareLessThan          (Vector128<int> left, Vector128<int> right);
        public static Vector128<int> CompareLessThanOrEqual   (Vector128<int> left, Vector128<int> right);
        public static Vector128<int> CompareNotEqual          (Vector128<int> left, Vector128<int> right);

        public static Vector256<int> Compare                  (Vector256<int> left, Vector256<int> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector256<int> CompareGreaterThanOrEqual(Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareLessThan          (Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareLessThanOrEqual   (Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareNotEqual          (Vector256<int> left, Vector256<int> right);

        public static Vector128<long> Compare                  (Vector128<long> left, Vector128<long> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector128<long> CompareGreaterThanOrEqual(Vector128<long> left, Vector128<long> right);
        public static Vector128<long> CompareLessThan          (Vector128<long> left, Vector128<long> right);
        public static Vector128<long> CompareLessThanOrEqual   (Vector128<long> left, Vector128<long> right);
        public static Vector128<long> CompareNotEqual          (Vector128<long> left, Vector128<long> right);
        public static Vector256<long> Compare                  (Vector256<long> left, Vector256<long> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector256<long> CompareGreaterThanOrEqual(Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareLessThan          (Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareLessThanOrEqual   (Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareNotEqual          (Vector256<long> left, Vector256<long> right);

        public static Vector128<uint> Compare                  (Vector128<uint> left, Vector128<uint> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector128<uint> CompareGreaterThan       (Vector128<uint> left, Vector128<uint> right);
        public static Vector128<uint> CompareGreaterThanOrEqual(Vector128<uint> left, Vector128<uint> right);
        public static Vector128<uint> CompareLessThan          (Vector128<uint> left, Vector128<uint> right);
        public static Vector128<uint> CompareLessThanOrEqual   (Vector128<uint> left, Vector128<uint> right);
        public static Vector128<uint> CompareNotEqual          (Vector128<uint> left, Vector128<uint> right);
        public static Vector256<uint> Compare                  (Vector256<uint> left, Vector256<uint> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector256<uint> CompareGreaterThan       (Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareGreaterThanOrEqual(Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareLessThan          (Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareLessThanOrEqual   (Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareNotEqual          (Vector256<uint> left, Vector256<uint> right);

        public static Vector128<ulong> Compare                  (Vector128<ulong> left, Vector128<ulong> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector128<ulong> CompareGreaterThan       (Vector128<ulong> left, Vector128<ulong> right);
        public static Vector128<ulong> CompareGreaterThanOrEqual(Vector128<ulong> left, Vector128<ulong> right);
        public static Vector128<ulong> CompareLessThan          (Vector128<ulong> left, Vector128<ulong> right);
        public static Vector128<ulong> CompareLessThanOrEqual   (Vector128<ulong> left, Vector128<ulong> right);
        public static Vector128<ulong> CompareNotEqual          (Vector128<ulong> left, Vector128<ulong> right);
        public static Vector256<ulong> Compare                  (Vector256<ulong> left, Vector256<ulong> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector256<ulong> CompareGreaterThan       (Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareGreaterThanOrEqual(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareLessThan          (Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareLessThanOrEqual   (Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareNotEqual          (Vector256<ulong> left, Vector256<ulong> right);

        public static Vector128<double> Compress(Vector128<double> value, Vector128<double> mask);
        public static Vector128<int>    Compress(Vector128<int>    value, Vector128<int>    mask);
        public static Vector128<long>   Compress(Vector128<long>   value, Vector128<long>   mask);
        public static Vector128<float>  Compress(Vector128<float>  value, Vector128<float>  mask);
        public static Vector128<uint>   Compress(Vector128<uint>   value, Vector128<uint>   mask);
        public static Vector128<ulong>  Compress(Vector128<ulong>  value, Vector128<ulong>  mask);
        public static Vector256<double> Compress(Vector256<double> value, Vector256<double> mask);
        public static Vector256<int>    Compress(Vector256<int>    value, Vector256<int>    mask);
        public static Vector256<long>   Compress(Vector256<long>   value, Vector256<long>   mask);
        public static Vector256<float>  Compress(Vector256<float>  value, Vector256<float>  mask);
        public static Vector256<uint>   Compress(Vector256<uint>   value, Vector256<uint>   mask);
        public static Vector256<ulong>  Compress(Vector256<ulong>  value, Vector256<ulong>  mask);

        public static Vector128<double> Expand(Vector128<double> value, Vector128<double> mask);
        public static Vector128<int>    Expand(Vector128<int>    value, Vector128<int>    mask);
        public static Vector128<long>   Expand(Vector128<long>   value, Vector128<long>   mask);
        public static Vector128<float>  Expand(Vector128<float>  value, Vector128<float>  mask);
        public static Vector128<uint>   Expand(Vector128<uint>   value, Vector128<uint>   mask);
        public static Vector128<ulong>  Expand(Vector128<ulong>  value, Vector128<ulong>  mask);
        public static Vector256<double> Expand(Vector256<double> value, Vector256<double> mask);
        public static Vector256<int>    Expand(Vector256<int>    value, Vector256<int>    mask);
        public static Vector256<long>   Expand(Vector256<long>   value, Vector256<long>   mask);
        public static Vector256<float>  Expand(Vector256<float>  value, Vector256<float>  mask);
        public static Vector256<uint>   Expand(Vector256<uint>   value, Vector256<uint>   mask);
        public static Vector256<ulong>  Expand(Vector256<ulong>  value, Vector256<ulong>  mask);

        public static unsafe void ScatterMaskVector128(Vector128<double> value, double* baseAddress, Vector128<int> index, Vector128<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<int>    value, int*    baseAddress, Vector128<int> index, Vector128<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<long>   value, long*   baseAddress, Vector128<int> index, Vector128<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<float>  value, float*  baseAddress, Vector128<int> index, Vector128<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<uint>   value, uint*   baseAddress, Vector128<int> index, Vector128<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<int> index, Vector128<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<double> value, double* baseAddress, Vector256<int> index, Vector256<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<int>    value, int*    baseAddress, Vector256<int> index, Vector256<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<long>   value, long*   baseAddress, Vector256<int> index, Vector256<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<float>  value, float*  baseAddress, Vector256<int> index, Vector256<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<uint>   value, uint*   baseAddress, Vector256<int> index, Vector256<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<int> index, Vector256<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

        public static unsafe void ScatterMaskVector128(Vector128<double> value, double* baseAddress, Vector128<long> index, Vector128<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<int>    value, int*    baseAddress, Vector128<long> index, Vector128<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<long>   value, long*   baseAddress, Vector128<long> index, Vector128<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<float>  value, uint*   baseAddress, Vector128<long> index, Vector128<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<uint>   value, float*  baseAddress, Vector128<long> index, Vector128<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<long> index, Vector128<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<double> value, double* baseAddress, Vector256<long> index, Vector256<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<int>    value, int*    baseAddress, Vector256<long> index, Vector256<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<long>   value, long*   baseAddress, Vector256<long> index, Vector256<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<float>  value, uint*   baseAddress, Vector256<long> index, Vector256<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<uint>   value, float*  baseAddress, Vector256<long> index, Vector256<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<long> index, Vector256<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

        public static unsafe void ScatterVector128(Vector128<double> value, double* baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<int>    value, int*    baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<long>   value, long*   baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<float>  value, float*  baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<uint>   value, uint*   baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<double> value, double* baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<int>    value, int*    baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<long>   value, long*   baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<float>  value, float*  baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<uint>   value, uint*   baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

        public static unsafe void ScatterVector128(Vector128<double> value, double* baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<int>    value, int*    baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<long>   value, long*   baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<float>  value, float*  baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<uint>   value, uint*   baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<double> value, double* baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<int>    value, int*    baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<long>   value, long*   baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<float>  value, float*  baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<uint>   value, uint*   baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    }
}

public static partial class Avx512BW
{
    public static Vector512<byte>   BlendVariable(Vector512<byte>   left, Vector512<byte>   right, Vector512<byte>   mask);
    public static Vector512<short>  BlendVariable(Vector512<short>  left, Vector512<short>  right, Vector512<short>  mask);
    public static Vector512<sbyte>  BlendVariable(Vector512<sbyte>  left, Vector512<sbyte>  right, Vector512<sbyte>  mask);
    public static Vector512<ushort> BlendVariable(Vector512<ushort> left, Vector512<ushort> right, Vector512<ushort> mask);

    public static Vector512<byte> Compare                  (Vector512<byte> left, Vector512<byte> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
    public static Vector512<byte> CompareEqual             (Vector512<byte> left, Vector512<byte> right);
    public static Vector512<byte> CompareGreaterThan       (Vector512<byte> left, Vector512<byte> right);
    public static Vector512<byte> CompareGreaterThanOrEqual(Vector512<byte> left, Vector512<byte> right);
    public static Vector512<byte> CompareLessThan          (Vector512<byte> left, Vector512<byte> right);
    public static Vector512<byte> CompareLessThanOrEqual   (Vector512<byte> left, Vector512<byte> right);
    public static Vector512<byte> CompareNotEqual          (Vector512<byte> left, Vector512<byte> right);

    public static Vector512<short> Compare                  (Vector512<short> left, Vector512<short> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
    public static Vector512<short> CompareEqual             (Vector512<short> left, Vector512<short> right);
    public static Vector512<short> CompareGreaterThan       (Vector512<short> left, Vector512<short> right);
    public static Vector512<short> CompareGreaterThanOrEqual(Vector512<short> left, Vector512<short> right);
    public static Vector512<short> CompareLessThan          (Vector512<short> left, Vector512<short> right);
    public static Vector512<short> CompareLessThanOrEqual   (Vector512<short> left, Vector512<short> right);
    public static Vector512<short> CompareNotEqual          (Vector512<short> left, Vector512<short> right);

    public static Vector512<sbyte> Compare                  (Vector512<sbyte> left, Vector512<sbyte> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
    public static Vector512<sbyte> CompareEqual             (Vector512<sbyte> left, Vector512<sbyte> right);
    public static Vector512<sbyte> CompareGreaterThan       (Vector512<sbyte> left, Vector512<sbyte> right);
    public static Vector512<sbyte> CompareGreaterThanOrEqual(Vector512<sbyte> left, Vector512<sbyte> right);
    public static Vector512<sbyte> CompareLessThan          (Vector512<sbyte> left, Vector512<sbyte> right);
    public static Vector512<sbyte> CompareLessThanOrEqual   (Vector512<sbyte> left, Vector512<sbyte> right);
    public static Vector512<sbyte> CompareNotEqual          (Vector512<sbyte> left, Vector512<sbyte> right);

    public static Vector512<ushort> Compare                  (Vector512<ushort> left, Vector512<ushort> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
    public static Vector512<ushort> CompareEqual             (Vector512<ushort> left, Vector512<ushort> right);
    public static Vector512<ushort> CompareGreaterThan       (Vector512<ushort> left, Vector512<ushort> right);
    public static Vector512<ushort> CompareGreaterThanOrEqual(Vector512<ushort> left, Vector512<ushort> right);
    public static Vector512<ushort> CompareLessThan          (Vector512<ushort> left, Vector512<ushort> right);
    public static Vector512<ushort> CompareLessThanOrEqual   (Vector512<ushort> left, Vector512<ushort> right);
    public static Vector512<ushort> CompareNotEqual          (Vector512<ushort> left, Vector512<ushort> right);

    public static int MoveMask(Vector512<short>  value);
    public static int MoveMask(Vector512<ushort> value);

    public static long MoveMask(Vector512<byte>  value);
    public static long MoveMask(Vector512<sbyte> value);

    public static bool TestC(Vector512<byte>   left, Vector512<byte>   right);
    public static bool TestC(Vector512<short>  left, Vector512<short>  right);
    public static bool TestC(Vector512<sbyte>  left, Vector512<sbyte>  right);
    public static bool TestC(Vector512<ushort> left, Vector512<ushort> right);

    public static bool TestNotZAndNotC(Vector512<byte>   left, Vector512<byte>   right);
    public static bool TestNotZAndNotC(Vector512<short>  left, Vector512<short>  right);
    public static bool TestNotZAndNotC(Vector512<sbyte>  left, Vector512<sbyte>  right);
    public static bool TestNotZAndNotC(Vector512<ushort> left, Vector512<ushort> right);

    public static bool TestZ(Vector512<byte>   left, Vector512<byte>   right);
    public static bool TestZ(Vector512<short>  left, Vector512<short>  right);
    public static bool TestZ(Vector512<sbyte>  left, Vector512<sbyte>  right);
    public static bool TestZ(Vector512<ushort> left, Vector512<ushort> right);

    public static partial class VL
    {
        public static Vector128<byte> Compare                  (Vector128<byte> left, Vector128<byte> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector128<byte> CompareGreaterThan       (Vector128<byte> left, Vector128<byte> right);
        public static Vector128<byte> CompareGreaterThanOrEqual(Vector128<byte> left, Vector128<byte> right);
        public static Vector128<byte> CompareLessThan          (Vector128<byte> left, Vector128<byte> right);
        public static Vector128<byte> CompareLessThanOrEqual   (Vector128<byte> left, Vector128<byte> right);
        public static Vector128<byte> CompareNotEqual          (Vector128<byte> left, Vector128<byte> right);
        public static Vector256<byte> Compare                  (Vector256<byte> left, Vector256<byte> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector256<byte> CompareGreaterThan       (Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareGreaterThanOrEqual(Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareLessThan          (Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareLessThanOrEqual   (Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareNotEqual          (Vector256<byte> left, Vector256<byte> right);

        public static Vector128<short> Compare                  (Vector128<short> left, Vector128<short> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector128<short> CompareGreaterThanOrEqual(Vector128<short> left, Vector128<short> right);
        public static Vector128<short> CompareLessThan          (Vector128<short> left, Vector128<short> right);
        public static Vector128<short> CompareLessThanOrEqual   (Vector128<short> left, Vector128<short> right);
        public static Vector128<short> CompareNotEqual          (Vector128<short> left, Vector128<short> right);
        public static Vector256<short> Compare                  (Vector256<short> left, Vector256<short> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector256<short> CompareGreaterThanOrEqual(Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareLessThan          (Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareLessThanOrEqual   (Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareNotEqual          (Vector256<short> left, Vector256<short> right);

        public static Vector128<sbyte> Compare                  (Vector128<sbyte> left, Vector128<sbyte> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector128<sbyte> CompareGreaterThanOrEqual(Vector128<sbyte> left, Vector128<sbyte> right);
        public static Vector128<sbyte> CompareLessThan          (Vector128<sbyte> left, Vector128<sbyte> right);
        public static Vector128<sbyte> CompareLessThanOrEqual   (Vector128<sbyte> left, Vector128<sbyte> right);
        public static Vector128<sbyte> CompareNotEqual          (Vector128<sbyte> left, Vector128<sbyte> right);
        public static Vector256<sbyte> Compare                  (Vector256<sbyte> left, Vector256<sbyte> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector256<sbyte> CompareGreaterThanOrEqual(Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareLessThan          (Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareLessThanOrEqual   (Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareNotEqual          (Vector256<sbyte> left, Vector256<sbyte> right);

        public static Vector128<ushort> Compare                  (Vector128<ushort> left, Vector128<ushort> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector128<ushort> CompareGreaterThan       (Vector128<ushort> left, Vector128<ushort> right);
        public static Vector128<ushort> CompareGreaterThanOrEqual(Vector128<ushort> left, Vector128<ushort> right);
        public static Vector128<ushort> CompareLessThan          (Vector128<ushort> left, Vector128<ushort> right);
        public static Vector128<ushort> CompareLessThanOrEqual   (Vector128<ushort> left, Vector128<ushort> right);
        public static Vector128<ushort> CompareNotEqual          (Vector128<ushort> left, Vector128<ushort> right);
        public static Vector256<ushort> Compare                  (Vector256<ushort> left, Vector256<ushort> right, [ConstantExpected(Max = IntComparisonMode.True)] IntComparisonMode mode);
        public static Vector256<ushort> CompareGreaterThan       (Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareGreaterThanOrEqual(Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareLessThan          (Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareLessThanOrEqual   (Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareNotEqual          (Vector256<ushort> left, Vector256<ushort> right);
    }
}

public static partial class Avx512DQ
{
    public static Vector512<double> Classify(Vector512<double> value, [ConstantExpected] byte control);
    public static Vector512<float>  Classify(Vector512<float>  value, [ConstantExpected] byte control);

    public static Vector128<double> ClassifyScalar(Vector128<double> value, [ConstantExpected] byte control);
    public static Vector128<float>  ClassifyScalar(Vector128<float>  value, [ConstantExpected] byte control);

    public static int MoveMask(Vector128<short>  value);
    public static int MoveMask(Vector128<ushort> value);
    public static int MoveMask(Vector256<int>    value);
    public static int MoveMask(Vector256<uint>   value);
    public static int MoveMask(Vector512<double> value);
    public static int MoveMask(Vector512<long>   value);
    public static int MoveMask(Vector512<ulong>  value);

    public static partial class VL
    {
        public static Vector128<double> Classify(Vector128<double> value, [ConstantExpected] byte control);
        public static Vector128<float>  Classify(Vector128<float>  value, [ConstantExpected] byte control);
        public static Vector256<double> Classify(Vector256<double> value, [ConstantExpected] byte control);
        public static Vector256<float>  Classify(Vector256<float>  value, [ConstantExpected] byte control);
    }
}

public abstract class Avx512Vbmi2 : Avx512BW
{
    public static new bool IsSupported { get; }

    public static Vector512<byte>   Compress(Vector512<byte>   value, Vector512<byte>   mask);
    public static Vector512<short>  Compress(Vector512<short>  value, Vector512<short>  mask);
    public static Vector512<sbyte>  Compress(Vector512<sbyte>  value, Vector512<sbyte>  mask);
    public static Vector512<ushort> Compress(Vector512<ushort> value, Vector512<ushort> mask);

    public static Vector512<byte>   Expand(Vector512<byte>   value, Vector512<byte>   mask);
    public static Vector512<short>  Expand(Vector512<short>  value, Vector512<short>  mask);
    public static Vector512<sbyte>  Expand(Vector512<sbyte>  value, Vector512<sbyte>  mask);
    public static Vector512<ushort> Expand(Vector512<ushort> value, Vector512<ushort> mask);

    public abstract class VL : Avx512BW.VL
    {
        public static new bool IsSupported { get; }

        public static Vector128<byte>   Compress(Vector128<byte>   value, Vector128<byte>   mask);
        public static Vector128<short>  Compress(Vector128<short>  value, Vector128<short>  mask);
        public static Vector128<sbyte>  Compress(Vector128<sbyte>  value, Vector128<sbyte>  mask);
        public static Vector128<ushort> Compress(Vector128<ushort> value, Vector128<ushort> mask);
        public static Vector256<byte>   Compress(Vector256<byte>   value, Vector256<byte>   mask);
        public static Vector256<short>  Compress(Vector256<short>  value, Vector256<short>  mask);
        public static Vector256<sbyte>  Compress(Vector256<sbyte>  value, Vector256<sbyte>  mask);
        public static Vector256<ushort> Compress(Vector256<ushort> value, Vector256<ushort> mask);

        public static Vector128<byte>   Expand(Vector128<byte>   value, Vector128<byte>   mask);
        public static Vector128<short>  Expand(Vector128<short>  value, Vector128<short>  mask);
        public static Vector128<sbyte>  Expand(Vector128<sbyte>  value, Vector128<sbyte>  mask);
        public static Vector128<ushort> Expand(Vector128<ushort> value, Vector128<ushort> mask);
        public static Vector256<byte>   Expand(Vector256<byte>   value, Vector256<byte>   mask);
        public static Vector256<short>  Expand(Vector256<short>  value, Vector256<short>  mask);
        public static Vector256<sbyte>  Expand(Vector256<sbyte>  value, Vector256<sbyte>  mask);
        public static Vector256<ushort> Expand(Vector256<ushort> value, Vector256<ushort> mask);
    }

    public abstract class X64 : Avx512BW.X64
    {
        public static new bool IsSupported { get; }
    }
}
Author: tannergooding
Assignees: -
Labels:

area-System.Runtime.Intrinsics, blocking, api-ready-for-review, arch-avx512

Milestone: 8.0.0

@tannergooding
Copy link
Member Author

This replaces #74613 which should no longer be marked as api-approved if this is approved instead.

This does not remove the ability to still do #74613 in the future if the outlook changes.

@terrajobst
Copy link
Member

terrajobst commented Jun 6, 2023

Video

  • Let's remove IntComparisonMode as it doesn't provide any value over the named methods and it's more consistent what we have done before.
namespace System.Runtime.Intrinsics.X86;

public static partial class Avx512F
{
    public static Vector512<double> BlendVariable(Vector512<double> left, Vector512<double> right, Vector512<double> mask);
    public static Vector512<int>    BlendVariable(Vector512<int>    left, Vector512<int>    right, Vector512<int>    mask);
    public static Vector512<long>   BlendVariable(Vector512<long>   left, Vector512<long>   right, Vector512<long>   mask);
    public static Vector512<float>  BlendVariable(Vector512<float>  left, Vector512<float>  right, Vector512<float>  mask);
    public static Vector512<uint>   BlendVariable(Vector512<uint>   left, Vector512<uint>   right, Vector512<uint>   mask);
    public static Vector512<ulong>  BlendVariable(Vector512<ulong>  left, Vector512<ulong>  right, Vector512<ulong>  mask);

    public static Vector512<double> Compare                     (Vector512<double> left, Vector512<double> right, [ConstantExpected(Max = FloatComparisonMode.UnorderedTrueSignaling)] FloatComparisonMode mode);
    public static Vector512<double> CompareEqual                (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareGreaterThan          (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareGreaterThanOrEqual   (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareLessThan             (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareLessThanOrEqual      (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareNotEqual             (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareNotGreaterThan       (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareNotGreaterThanOrEqual(Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareNotLessThan          (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareNotLessThanOrEqual   (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareOrdered              (Vector512<double> left, Vector512<double> right);
    public static Vector512<double> CompareUnordered            (Vector512<double> left, Vector512<double> right);

    public static Vector512<float> Compare                     (Vector512<float> left, Vector512<float> right, [ConstantExpected(Max = FloatComparisonMode.UnorderedTrueSignaling)] FloatComparisonMode mode);
    public static Vector512<float> CompareEqual                (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareGreaterThan          (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareGreaterThanOrEqual   (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareLessThan             (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareLessThanOrEqual      (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareNotEqual             (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareNotGreaterThan       (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareNotGreaterThanOrEqual(Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareNotLessThan          (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareNotLessThanOrEqual   (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareOrdered              (Vector512<float> left, Vector512<float> right);
    public static Vector512<float> CompareUnordered            (Vector512<float> left, Vector512<float> right);

    public static Vector512<int> CompareEqual             (Vector512<int> left, Vector512<int> right);
    public static Vector512<int> CompareGreaterThan       (Vector512<int> left, Vector512<int> right);
    public static Vector512<int> CompareGreaterThanOrEqual(Vector512<int> left, Vector512<int> right);
    public static Vector512<int> CompareLessThan          (Vector512<int> left, Vector512<int> right);
    public static Vector512<int> CompareLessThanOrEqual   (Vector512<int> left, Vector512<int> right);
    public static Vector512<int> CompareNotEqual          (Vector512<int> left, Vector512<int> right);

    public static Vector512<long> CompareEqual             (Vector512<long> left, Vector512<long> right);
    public static Vector512<long> CompareGreaterThan       (Vector512<long> left, Vector512<long> right);
    public static Vector512<long> CompareGreaterThanOrEqual(Vector512<long> left, Vector512<long> right);
    public static Vector512<long> CompareLessThan          (Vector512<long> left, Vector512<long> right);
    public static Vector512<long> CompareLessThanOrEqual   (Vector512<long> left, Vector512<long> right);
    public static Vector512<long> CompareNotEqual          (Vector512<long> left, Vector512<long> right);

    public static Vector512<uint> CompareEqual             (Vector512<uint> left, Vector512<uint> right);
    public static Vector512<uint> CompareGreaterThan       (Vector512<uint> left, Vector512<uint> right);
    public static Vector512<uint> CompareGreaterThanOrEqual(Vector512<uint> left, Vector512<uint> right);
    public static Vector512<uint> CompareLessThan          (Vector512<uint> left, Vector512<uint> right);
    public static Vector512<uint> CompareLessThanOrEqual   (Vector512<uint> left, Vector512<uint> right);
    public static Vector512<uint> CompareNotEqual          (Vector512<uint> left, Vector512<uint> right);

    public static Vector512<ulong> CompareEqual             (Vector512<ulong> left, Vector512<ulong> right);
    public static Vector512<ulong> CompareGreaterThan       (Vector512<ulong> left, Vector512<ulong> right);
    public static Vector512<ulong> CompareGreaterThanOrEqual(Vector512<ulong> left, Vector512<ulong> right);
    public static Vector512<ulong> CompareLessThan          (Vector512<ulong> left, Vector512<ulong> right);
    public static Vector512<ulong> CompareLessThanOrEqual   (Vector512<ulong> left, Vector512<ulong> right);
    public static Vector512<ulong> CompareNotEqual          (Vector512<ulong> left, Vector512<ulong> right);

    public static Vector512<double> Compress(Vector512<double> value, Vector512<double> mask);
    public static Vector512<int>    Compress(Vector512<int>    value, Vector512<int>    mask);
    public static Vector512<long>   Compress(Vector512<long>   value, Vector512<long>   mask);
    public static Vector512<float>  Compress(Vector512<float>  value, Vector512<float>  mask);
    public static Vector512<uint>   Compress(Vector512<uint>   value, Vector512<uint>   mask);
    public static Vector512<ulong>  Compress(Vector512<ulong>  value, Vector512<ulong>  mask);

    public static Vector512<double> Expand(Vector512<double> value, Vector512<double> mask);
    public static Vector512<int>    Expand(Vector512<int>    value, Vector512<int>    mask);
    public static Vector512<long>   Expand(Vector512<long>   value, Vector512<long>   mask);
    public static Vector512<float>  Expand(Vector512<float>  value, Vector512<float>  mask);
    public static Vector512<uint>   Expand(Vector512<uint>   value, Vector512<uint>   mask);
    public static Vector512<ulong>  Expand(Vector512<ulong>  value, Vector512<ulong>  mask);

    public static unsafe Vector512<double> GatherMaskVector512(Vector512<double> source, double* baseAddress, Vector512<int> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherMaskVector512(Vector512<int>    source, int*    baseAddress, Vector512<int> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherMaskVector512(Vector512<long>   source, long*   baseAddress, Vector512<int> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherMaskVector512(Vector512<float>  source, float*  baseAddress, Vector512<int> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherMaskVector512(Vector512<uint>   source, uint*   baseAddress, Vector512<int> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherMaskVector512(Vector512<ulong>  source, ulong*  baseAddress, Vector512<int> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> GatherMaskVector512(Vector512<double> source, double* baseAddress, Vector512<long> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherMaskVector512(Vector512<int>    source, int*    baseAddress, Vector512<long> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherMaskVector512(Vector512<long>   source, long*   baseAddress, Vector512<long> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherMaskVector512(Vector512<uint>   source, uint*   baseAddress, Vector512<long> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherMaskVector512(Vector512<float>  source, float*  baseAddress, Vector512<long> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherMaskVector512(Vector512<ulong>  source, ulong*  baseAddress, Vector512<long> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> GatherVector512(double* baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherVector512(int*    baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherVector512(long*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherVector512(float*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherVector512(uint*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherVector512(ulong*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> GatherVector512(double* baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherVector512(int*    baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherVector512(long*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherVector512(float*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherVector512(uint*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherVector512(ulong*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> MaskLoad(double* address, Vector512<double> mask);
    public static unsafe Vector512<int>    MaskLoad(int*    address, Vector512<int>    mask);
    public static unsafe Vector512<long>   MaskLoad(long*   address, Vector512<long>   mask);
    public static unsafe Vector512<float>  MaskLoad(float*  address, Vector512<float>  mask);
    public static unsafe Vector512<uint>   MaskLoad(uint*   address, Vector512<uint>   mask);
    public static unsafe Vector512<ulong>  MaskLoad(ulong*  address, Vector512<ulong>  mask);

    public static unsafe void MaskStore(double* address, Vector512<double> mask, Vector512<double> source);
    public static unsafe void MaskStore(int*    address, Vector512<int>    mask, Vector512<int>    source);
    public static unsafe void MaskStore(long*   address, Vector512<long>   mask, Vector512<long>   source);
    public static unsafe void MaskStore(float*  address, Vector512<float>  mask, Vector512<float>  source);
    public static unsafe void MaskStore(uint*   address, Vector512<uint>   mask, Vector512<uint>   source);
    public static unsafe void MaskStore(ulong*  address, Vector512<ulong>  mask, Vector512<ulong>  source);

    public static int MoveMask(Vector256<short>  value);
    public static int MoveMask(Vector256<ushort> value);
    public static int MoveMask(Vector512<int>    value);
    public static int MoveMask(Vector512<float>  value);
    public static int MoveMask(Vector512<uint>   value);

    public static unsafe void ScatterMaskVector512(Vector512<double> value, double* baseAddress, Vector512<int> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<int>    value, int*    baseAddress, Vector512<int> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<long>   value, long*   baseAddress, Vector512<int> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<float>  value, float*  baseAddress, Vector512<int> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<uint>   value, uint*   baseAddress, Vector512<int> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<int> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe void ScatterMaskVector512(Vector512<double> value, double* baseAddress, Vector512<long> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<int>    value, int*    baseAddress, Vector512<long> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<long>   value, long*   baseAddress, Vector512<long> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<float>  value, uint*   baseAddress, Vector512<long> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<uint>   value, float*  baseAddress, Vector512<long> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<long> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe void ScatterVector512(Vector512<double> value, double* baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<int>    value, int*    baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<long>   value, long*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<float>  value, float*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<uint>   value, uint*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe void ScatterVector512(Vector512<double> value, double* baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<int>    value, int*    baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<long>   value, long*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<float>  value, float*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<uint>   value, uint*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static bool TestC(Vector512<double> left, Vector512<double> right);
    public static bool TestC(Vector512<int>    left, Vector512<int>    right);
    public static bool TestC(Vector512<long>   left, Vector512<long>   right);
    public static bool TestC(Vector512<float>  left, Vector512<float>  right);
    public static bool TestC(Vector512<uint>   left, Vector512<uint>   right);
    public static bool TestC(Vector512<ulong>  left, Vector512<ulong>  right);

    public static bool TestNotZAndNotC(Vector512<double> left, Vector512<double> right);
    public static bool TestNotZAndNotC(Vector512<int>    left, Vector512<int>    right);
    public static bool TestNotZAndNotC(Vector512<long>   left, Vector512<long>   right);
    public static bool TestNotZAndNotC(Vector512<float>  left, Vector512<float>  right);
    public static bool TestNotZAndNotC(Vector512<uint>   left, Vector512<uint>   right);
    public static bool TestNotZAndNotC(Vector512<ulong>  left, Vector512<ulong>  right);

    public static bool TestZ(Vector512<double> left, Vector512<double> right);
    public static bool TestZ(Vector512<int>    left, Vector512<int>    right);
    public static bool TestZ(Vector512<long>   left, Vector512<long>   right);
    public static bool TestZ(Vector512<float>  left, Vector512<float>  right);
    public static bool TestZ(Vector512<uint>   left, Vector512<uint>   right);
    public static bool TestZ(Vector512<ulong>  left, Vector512<ulong>  right);

    public static partial class VL
    {
        public static Vector128<int> CompareGreaterThanOrEqual(Vector128<int> left, Vector128<int> right);
        public static Vector128<int> CompareLessThan          (Vector128<int> left, Vector128<int> right);
        public static Vector128<int> CompareLessThanOrEqual   (Vector128<int> left, Vector128<int> right);
        public static Vector128<int> CompareNotEqual          (Vector128<int> left, Vector128<int> right);

        public static Vector256<int> CompareGreaterThanOrEqual(Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareLessThan          (Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareLessThanOrEqual   (Vector256<int> left, Vector256<int> right);
        public static Vector256<int> CompareNotEqual          (Vector256<int> left, Vector256<int> right);

        public static Vector128<long> CompareGreaterThanOrEqual(Vector128<long> left, Vector128<long> right);
        public static Vector128<long> CompareLessThan          (Vector128<long> left, Vector128<long> right);
        public static Vector128<long> CompareLessThanOrEqual   (Vector128<long> left, Vector128<long> right);
        public static Vector128<long> CompareNotEqual          (Vector128<long> left, Vector128<long> right);
        public static Vector256<long> CompareGreaterThanOrEqual(Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareLessThan          (Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareLessThanOrEqual   (Vector256<long> left, Vector256<long> right);
        public static Vector256<long> CompareNotEqual          (Vector256<long> left, Vector256<long> right);

        public static Vector128<uint> CompareGreaterThan       (Vector128<uint> left, Vector128<uint> right);
        public static Vector128<uint> CompareGreaterThanOrEqual(Vector128<uint> left, Vector128<uint> right);
        public static Vector128<uint> CompareLessThan          (Vector128<uint> left, Vector128<uint> right);
        public static Vector128<uint> CompareLessThanOrEqual   (Vector128<uint> left, Vector128<uint> right);
        public static Vector128<uint> CompareNotEqual          (Vector128<uint> left, Vector128<uint> right);
        public static Vector256<uint> CompareGreaterThan       (Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareGreaterThanOrEqual(Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareLessThan          (Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareLessThanOrEqual   (Vector256<uint> left, Vector256<uint> right);
        public static Vector256<uint> CompareNotEqual          (Vector256<uint> left, Vector256<uint> right);

        public static Vector128<ulong> CompareGreaterThan       (Vector128<ulong> left, Vector128<ulong> right);
        public static Vector128<ulong> CompareGreaterThanOrEqual(Vector128<ulong> left, Vector128<ulong> right);
        public static Vector128<ulong> CompareLessThan          (Vector128<ulong> left, Vector128<ulong> right);
        public static Vector128<ulong> CompareLessThanOrEqual   (Vector128<ulong> left, Vector128<ulong> right);
        public static Vector128<ulong> CompareNotEqual          (Vector128<ulong> left, Vector128<ulong> right);
        public static Vector256<ulong> CompareGreaterThan       (Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareGreaterThanOrEqual(Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareLessThan          (Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareLessThanOrEqual   (Vector256<ulong> left, Vector256<ulong> right);
        public static Vector256<ulong> CompareNotEqual          (Vector256<ulong> left, Vector256<ulong> right);

        public static Vector128<double> Compress(Vector128<double> value, Vector128<double> mask);
        public static Vector128<int>    Compress(Vector128<int>    value, Vector128<int>    mask);
        public static Vector128<long>   Compress(Vector128<long>   value, Vector128<long>   mask);
        public static Vector128<float>  Compress(Vector128<float>  value, Vector128<float>  mask);
        public static Vector128<uint>   Compress(Vector128<uint>   value, Vector128<uint>   mask);
        public static Vector128<ulong>  Compress(Vector128<ulong>  value, Vector128<ulong>  mask);
        public static Vector256<double> Compress(Vector256<double> value, Vector256<double> mask);
        public static Vector256<int>    Compress(Vector256<int>    value, Vector256<int>    mask);
        public static Vector256<long>   Compress(Vector256<long>   value, Vector256<long>   mask);
        public static Vector256<float>  Compress(Vector256<float>  value, Vector256<float>  mask);
        public static Vector256<uint>   Compress(Vector256<uint>   value, Vector256<uint>   mask);
        public static Vector256<ulong>  Compress(Vector256<ulong>  value, Vector256<ulong>  mask);

        public static Vector128<double> Expand(Vector128<double> value, Vector128<double> mask);
        public static Vector128<int>    Expand(Vector128<int>    value, Vector128<int>    mask);
        public static Vector128<long>   Expand(Vector128<long>   value, Vector128<long>   mask);
        public static Vector128<float>  Expand(Vector128<float>  value, Vector128<float>  mask);
        public static Vector128<uint>   Expand(Vector128<uint>   value, Vector128<uint>   mask);
        public static Vector128<ulong>  Expand(Vector128<ulong>  value, Vector128<ulong>  mask);
        public static Vector256<double> Expand(Vector256<double> value, Vector256<double> mask);
        public static Vector256<int>    Expand(Vector256<int>    value, Vector256<int>    mask);
        public static Vector256<long>   Expand(Vector256<long>   value, Vector256<long>   mask);
        public static Vector256<float>  Expand(Vector256<float>  value, Vector256<float>  mask);
        public static Vector256<uint>   Expand(Vector256<uint>   value, Vector256<uint>   mask);
        public static Vector256<ulong>  Expand(Vector256<ulong>  value, Vector256<ulong>  mask);

        public static unsafe void ScatterMaskVector128(Vector128<double> value, double* baseAddress, Vector128<int> index, Vector128<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<int>    value, int*    baseAddress, Vector128<int> index, Vector128<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<long>   value, long*   baseAddress, Vector128<int> index, Vector128<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<float>  value, float*  baseAddress, Vector128<int> index, Vector128<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<uint>   value, uint*   baseAddress, Vector128<int> index, Vector128<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<int> index, Vector128<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<double> value, double* baseAddress, Vector256<int> index, Vector256<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<int>    value, int*    baseAddress, Vector256<int> index, Vector256<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<long>   value, long*   baseAddress, Vector256<int> index, Vector256<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<float>  value, float*  baseAddress, Vector256<int> index, Vector256<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<uint>   value, uint*   baseAddress, Vector256<int> index, Vector256<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<int> index, Vector256<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

        public static unsafe void ScatterMaskVector128(Vector128<double> value, double* baseAddress, Vector128<long> index, Vector128<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<int>    value, int*    baseAddress, Vector128<long> index, Vector128<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<long>   value, long*   baseAddress, Vector128<long> index, Vector128<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<float>  value, uint*   baseAddress, Vector128<long> index, Vector128<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<uint>   value, float*  baseAddress, Vector128<long> index, Vector128<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<long> index, Vector128<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<double> value, double* baseAddress, Vector256<long> index, Vector256<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<int>    value, int*    baseAddress, Vector256<long> index, Vector256<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<long>   value, long*   baseAddress, Vector256<long> index, Vector256<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<float>  value, uint*   baseAddress, Vector256<long> index, Vector256<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<uint>   value, float*  baseAddress, Vector256<long> index, Vector256<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<long> index, Vector256<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

        public static unsafe void ScatterVector128(Vector128<double> value, double* baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<int>    value, int*    baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<long>   value, long*   baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<float>  value, float*  baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<uint>   value, uint*   baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<double> value, double* baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<int>    value, int*    baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<long>   value, long*   baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<float>  value, float*  baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<uint>   value, uint*   baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

        public static unsafe void ScatterVector128(Vector128<double> value, double* baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<int>    value, int*    baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<long>   value, long*   baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<float>  value, float*  baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<uint>   value, uint*   baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<double> value, double* baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<int>    value, int*    baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<long>   value, long*   baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<float>  value, float*  baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<uint>   value, uint*   baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    }
}

public static partial class Avx512BW
{
    public static Vector512<byte>   BlendVariable(Vector512<byte>   left, Vector512<byte>   right, Vector512<byte>   mask);
    public static Vector512<short>  BlendVariable(Vector512<short>  left, Vector512<short>  right, Vector512<short>  mask);
    public static Vector512<sbyte>  BlendVariable(Vector512<sbyte>  left, Vector512<sbyte>  right, Vector512<sbyte>  mask);
    public static Vector512<ushort> BlendVariable(Vector512<ushort> left, Vector512<ushort> right, Vector512<ushort> mask);

    public static Vector512<byte> CompareEqual             (Vector512<byte> left, Vector512<byte> right);
    public static Vector512<byte> CompareGreaterThan       (Vector512<byte> left, Vector512<byte> right);
    public static Vector512<byte> CompareGreaterThanOrEqual(Vector512<byte> left, Vector512<byte> right);
    public static Vector512<byte> CompareLessThan          (Vector512<byte> left, Vector512<byte> right);
    public static Vector512<byte> CompareLessThanOrEqual   (Vector512<byte> left, Vector512<byte> right);
    public static Vector512<byte> CompareNotEqual          (Vector512<byte> left, Vector512<byte> right);

    public static Vector512<short> CompareEqual             (Vector512<short> left, Vector512<short> right);
    public static Vector512<short> CompareGreaterThan       (Vector512<short> left, Vector512<short> right);
    public static Vector512<short> CompareGreaterThanOrEqual(Vector512<short> left, Vector512<short> right);
    public static Vector512<short> CompareLessThan          (Vector512<short> left, Vector512<short> right);
    public static Vector512<short> CompareLessThanOrEqual   (Vector512<short> left, Vector512<short> right);
    public static Vector512<short> CompareNotEqual          (Vector512<short> left, Vector512<short> right);

    public static Vector512<sbyte> CompareEqual             (Vector512<sbyte> left, Vector512<sbyte> right);
    public static Vector512<sbyte> CompareGreaterThan       (Vector512<sbyte> left, Vector512<sbyte> right);
    public static Vector512<sbyte> CompareGreaterThanOrEqual(Vector512<sbyte> left, Vector512<sbyte> right);
    public static Vector512<sbyte> CompareLessThan          (Vector512<sbyte> left, Vector512<sbyte> right);
    public static Vector512<sbyte> CompareLessThanOrEqual   (Vector512<sbyte> left, Vector512<sbyte> right);
    public static Vector512<sbyte> CompareNotEqual          (Vector512<sbyte> left, Vector512<sbyte> right);

    public static Vector512<ushort> CompareEqual             (Vector512<ushort> left, Vector512<ushort> right);
    public static Vector512<ushort> CompareGreaterThan       (Vector512<ushort> left, Vector512<ushort> right);
    public static Vector512<ushort> CompareGreaterThanOrEqual(Vector512<ushort> left, Vector512<ushort> right);
    public static Vector512<ushort> CompareLessThan          (Vector512<ushort> left, Vector512<ushort> right);
    public static Vector512<ushort> CompareLessThanOrEqual   (Vector512<ushort> left, Vector512<ushort> right);
    public static Vector512<ushort> CompareNotEqual          (Vector512<ushort> left, Vector512<ushort> right);

    public static int MoveMask(Vector512<short>  value);
    public static int MoveMask(Vector512<ushort> value);

    public static long MoveMask(Vector512<byte>  value);
    public static long MoveMask(Vector512<sbyte> value);

    public static bool TestC(Vector512<byte>   left, Vector512<byte>   right);
    public static bool TestC(Vector512<short>  left, Vector512<short>  right);
    public static bool TestC(Vector512<sbyte>  left, Vector512<sbyte>  right);
    public static bool TestC(Vector512<ushort> left, Vector512<ushort> right);

    public static bool TestNotZAndNotC(Vector512<byte>   left, Vector512<byte>   right);
    public static bool TestNotZAndNotC(Vector512<short>  left, Vector512<short>  right);
    public static bool TestNotZAndNotC(Vector512<sbyte>  left, Vector512<sbyte>  right);
    public static bool TestNotZAndNotC(Vector512<ushort> left, Vector512<ushort> right);

    public static bool TestZ(Vector512<byte>   left, Vector512<byte>   right);
    public static bool TestZ(Vector512<short>  left, Vector512<short>  right);
    public static bool TestZ(Vector512<sbyte>  left, Vector512<sbyte>  right);
    public static bool TestZ(Vector512<ushort> left, Vector512<ushort> right);

    public static partial class VL
    {
        public static Vector128<byte> CompareGreaterThan       (Vector128<byte> left, Vector128<byte> right);
        public static Vector128<byte> CompareGreaterThanOrEqual(Vector128<byte> left, Vector128<byte> right);
        public static Vector128<byte> CompareLessThan          (Vector128<byte> left, Vector128<byte> right);
        public static Vector128<byte> CompareLessThanOrEqual   (Vector128<byte> left, Vector128<byte> right);
        public static Vector128<byte> CompareNotEqual          (Vector128<byte> left, Vector128<byte> right);
        public static Vector256<byte> CompareGreaterThan       (Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareGreaterThanOrEqual(Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareLessThan          (Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareLessThanOrEqual   (Vector256<byte> left, Vector256<byte> right);
        public static Vector256<byte> CompareNotEqual          (Vector256<byte> left, Vector256<byte> right);

        public static Vector128<short> CompareGreaterThanOrEqual(Vector128<short> left, Vector128<short> right);
        public static Vector128<short> CompareLessThan          (Vector128<short> left, Vector128<short> right);
        public static Vector128<short> CompareLessThanOrEqual   (Vector128<short> left, Vector128<short> right);
        public static Vector128<short> CompareNotEqual          (Vector128<short> left, Vector128<short> right);
        public static Vector256<short> CompareGreaterThanOrEqual(Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareLessThan          (Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareLessThanOrEqual   (Vector256<short> left, Vector256<short> right);
        public static Vector256<short> CompareNotEqual          (Vector256<short> left, Vector256<short> right);

        public static Vector128<sbyte> CompareGreaterThanOrEqual(Vector128<sbyte> left, Vector128<sbyte> right);
        public static Vector128<sbyte> CompareLessThan          (Vector128<sbyte> left, Vector128<sbyte> right);
        public static Vector128<sbyte> CompareLessThanOrEqual   (Vector128<sbyte> left, Vector128<sbyte> right);
        public static Vector128<sbyte> CompareNotEqual          (Vector128<sbyte> left, Vector128<sbyte> right);
        public static Vector256<sbyte> CompareGreaterThanOrEqual(Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareLessThan          (Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareLessThanOrEqual   (Vector256<sbyte> left, Vector256<sbyte> right);
        public static Vector256<sbyte> CompareNotEqual          (Vector256<sbyte> left, Vector256<sbyte> right);

        public static Vector128<ushort> CompareGreaterThan       (Vector128<ushort> left, Vector128<ushort> right);
        public static Vector128<ushort> CompareGreaterThanOrEqual(Vector128<ushort> left, Vector128<ushort> right);
        public static Vector128<ushort> CompareLessThan          (Vector128<ushort> left, Vector128<ushort> right);
        public static Vector128<ushort> CompareLessThanOrEqual   (Vector128<ushort> left, Vector128<ushort> right);
        public static Vector128<ushort> CompareNotEqual          (Vector128<ushort> left, Vector128<ushort> right);
        public static Vector256<ushort> CompareGreaterThan       (Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareGreaterThanOrEqual(Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareLessThan          (Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareLessThanOrEqual   (Vector256<ushort> left, Vector256<ushort> right);
        public static Vector256<ushort> CompareNotEqual          (Vector256<ushort> left, Vector256<ushort> right);
    }
}

public static partial class Avx512DQ
{
    public static Vector512<double> Classify(Vector512<double> value, [ConstantExpected] byte control);
    public static Vector512<float>  Classify(Vector512<float>  value, [ConstantExpected] byte control);

    public static Vector128<double> ClassifyScalar(Vector128<double> value, [ConstantExpected] byte control);
    public static Vector128<float>  ClassifyScalar(Vector128<float>  value, [ConstantExpected] byte control);

    public static int MoveMask(Vector128<short>  value);
    public static int MoveMask(Vector128<ushort> value);
    public static int MoveMask(Vector256<int>    value);
    public static int MoveMask(Vector256<uint>   value);
    public static int MoveMask(Vector512<double> value);
    public static int MoveMask(Vector512<long>   value);
    public static int MoveMask(Vector512<ulong>  value);

    public static partial class VL
    {
        public static Vector128<double> Classify(Vector128<double> value, [ConstantExpected] byte control);
        public static Vector128<float>  Classify(Vector128<float>  value, [ConstantExpected] byte control);
        public static Vector256<double> Classify(Vector256<double> value, [ConstantExpected] byte control);
        public static Vector256<float>  Classify(Vector256<float>  value, [ConstantExpected] byte control);
    }
}

public abstract class Avx512Vbmi2 : Avx512BW
{
    public static new bool IsSupported { get; }

    public static Vector512<byte>   Compress(Vector512<byte>   value, Vector512<byte>   mask);
    public static Vector512<short>  Compress(Vector512<short>  value, Vector512<short>  mask);
    public static Vector512<sbyte>  Compress(Vector512<sbyte>  value, Vector512<sbyte>  mask);
    public static Vector512<ushort> Compress(Vector512<ushort> value, Vector512<ushort> mask);

    public static Vector512<byte>   Expand(Vector512<byte>   value, Vector512<byte>   mask);
    public static Vector512<short>  Expand(Vector512<short>  value, Vector512<short>  mask);
    public static Vector512<sbyte>  Expand(Vector512<sbyte>  value, Vector512<sbyte>  mask);
    public static Vector512<ushort> Expand(Vector512<ushort> value, Vector512<ushort> mask);

    public abstract class VL : Avx512BW.VL
    {
        public static new bool IsSupported { get; }

        public static Vector128<byte>   Compress(Vector128<byte>   value, Vector128<byte>   mask);
        public static Vector128<short>  Compress(Vector128<short>  value, Vector128<short>  mask);
        public static Vector128<sbyte>  Compress(Vector128<sbyte>  value, Vector128<sbyte>  mask);
        public static Vector128<ushort> Compress(Vector128<ushort> value, Vector128<ushort> mask);
        public static Vector256<byte>   Compress(Vector256<byte>   value, Vector256<byte>   mask);
        public static Vector256<short>  Compress(Vector256<short>  value, Vector256<short>  mask);
        public static Vector256<sbyte>  Compress(Vector256<sbyte>  value, Vector256<sbyte>  mask);
        public static Vector256<ushort> Compress(Vector256<ushort> value, Vector256<ushort> mask);

        public static Vector128<byte>   Expand(Vector128<byte>   value, Vector128<byte>   mask);
        public static Vector128<short>  Expand(Vector128<short>  value, Vector128<short>  mask);
        public static Vector128<sbyte>  Expand(Vector128<sbyte>  value, Vector128<sbyte>  mask);
        public static Vector128<ushort> Expand(Vector128<ushort> value, Vector128<ushort> mask);
        public static Vector256<byte>   Expand(Vector256<byte>   value, Vector256<byte>   mask);
        public static Vector256<short>  Expand(Vector256<short>  value, Vector256<short>  mask);
        public static Vector256<sbyte>  Expand(Vector256<sbyte>  value, Vector256<sbyte>  mask);
        public static Vector256<ushort> Expand(Vector256<ushort> value, Vector256<ushort> mask);
    }

    public abstract class X64 : Avx512BW.X64
    {
        public static new bool IsSupported { get; }
    }
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jun 6, 2023
@MineCake147E

This comment was marked as resolved.

@tannergooding
Copy link
Member Author

Not everything will land in .NET 8 due to time constraints. Some of the mask related support and handling is going to land in .NET 9 instead.

Just to be clear, this would've been true regardless of whether we kept with VectorMask<T> or we went with the new approach. AVX-512 is a very large set of functionality and squeezing it all into 1 release just wasn't possible.

First, in this design, how could I construct a mask register for variable integers representing mask bits?

AVX-512 does not have any "built-in" functionality for creating a mask from a constant. So for most scenarios, you want to get a mask from an instruction that produces a mask, such as a comparison instruction.

If you really have to create it from a constant, you're at best getting either of following bits of codegen:

; Load literal into general purpose register, then move into mask register
mov rax, imm64
kmov k1, rax

-or-

; Load constant into simd register, then convert to a mask register
vmovups zmm0, [addr]
vpmovb2m k1, zmm0

APIs that expect specific parameters to be a mask already emit the relevant conversion and so using Vector512.Create(cns, ..., cns) is the way to go. It matches how you'd write the same algorithm for Vector128/Vector256. In the future we may have additional recognition for specific patterns and try to optimize them further if possible.

Second, would ... be optimized like the code below?

For .NET 8 it will generate something like:

vpcmpeqb k1, zmm0, zmm1
vpmovm2b zmm0, k1
vpandd zmm0, zmm0, zmm1

Noting that this is taking into account the default Windows x64 calling convention where the first arg is passed in rcx, the second in rdx, and the third in r8. There is a hidden first argument being the return buffer since this is a large struct return.

For .NET 9, we'll likely get it to:

vpcmpeqb k1, zmm0, zmm1
vpandd zmm0 {k1}{z}, zmm1, zmm1

This just removes the one vpmovm2b instruction that converts from "mask register" back to "vector register". It would saves 6 bytes of codegen and 3 cycles.

Third, could I pass a mask register to a method within either a general-purpose register, or more preferably, a mask register?

No platform has argument passing for kmask registers, they are all considered caller trash and must be saved by the callee. The only way to pass them is in memory or by converting to an int/vector. This means you're ultimately paying a conversion price of 1-3 cycles on each side, depending on the base type used (byte/short are 3 cycles, int/long are 1 cycle).

With this new design, everything is just exposed to the user as a vector and so passing it as a vector, much as you would have done for Vector128/Vector256 is the way to go. We may look at providing something that allows doing what is functionally the inverse of ExtractMostSignificantBits, and to create a vector from a bitmask. Such an API would end up being 2-3 instructions on older hardware, however, and is a much more rarely needed scenario.

@MineCake147E

This comment was marked as resolved.

@tannergooding tannergooding modified the milestones: 8.0.0, Future Jul 24, 2023
@tannergooding tannergooding removed the blocking Marks issues that we want to fast track in order to unblock other important work label Jul 24, 2023
@tannergooding
Copy link
Member Author

For .NET 8 we did land BlendVariable and the Compare APIs which were the most critical. We also have the general functionality for MoveMask available via the xplat ExtractMostSignificantBits APIs, and the general pattern recognition that will emit vptestm. We didn't, however, land Compress/Expand, Gather/Scatter, MaskLoad/MaskStore, the platform specific MoveMask or Test APIs:

namespace System.Runtime.Intrinsics.X86;

public static partial class Avx512F
{

    public static Vector512<double> Compress(Vector512<double> value, Vector512<double> mask);
    public static Vector512<int>    Compress(Vector512<int>    value, Vector512<int>    mask);
    public static Vector512<long>   Compress(Vector512<long>   value, Vector512<long>   mask);
    public static Vector512<float>  Compress(Vector512<float>  value, Vector512<float>  mask);
    public static Vector512<uint>   Compress(Vector512<uint>   value, Vector512<uint>   mask);
    public static Vector512<ulong>  Compress(Vector512<ulong>  value, Vector512<ulong>  mask);

    public static Vector512<double> Expand(Vector512<double> value, Vector512<double> mask);
    public static Vector512<int>    Expand(Vector512<int>    value, Vector512<int>    mask);
    public static Vector512<long>   Expand(Vector512<long>   value, Vector512<long>   mask);
    public static Vector512<float>  Expand(Vector512<float>  value, Vector512<float>  mask);
    public static Vector512<uint>   Expand(Vector512<uint>   value, Vector512<uint>   mask);
    public static Vector512<ulong>  Expand(Vector512<ulong>  value, Vector512<ulong>  mask);

    public static unsafe Vector512<double> GatherMaskVector512(Vector512<double> source, double* baseAddress, Vector512<int> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherMaskVector512(Vector512<int>    source, int*    baseAddress, Vector512<int> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherMaskVector512(Vector512<long>   source, long*   baseAddress, Vector512<int> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherMaskVector512(Vector512<float>  source, float*  baseAddress, Vector512<int> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherMaskVector512(Vector512<uint>   source, uint*   baseAddress, Vector512<int> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherMaskVector512(Vector512<ulong>  source, ulong*  baseAddress, Vector512<int> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> GatherMaskVector512(Vector512<double> source, double* baseAddress, Vector512<long> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherMaskVector512(Vector512<int>    source, int*    baseAddress, Vector512<long> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherMaskVector512(Vector512<long>   source, long*   baseAddress, Vector512<long> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherMaskVector512(Vector512<uint>   source, uint*   baseAddress, Vector512<long> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherMaskVector512(Vector512<float>  source, float*  baseAddress, Vector512<long> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherMaskVector512(Vector512<ulong>  source, ulong*  baseAddress, Vector512<long> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> GatherVector512(double* baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherVector512(int*    baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherVector512(long*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherVector512(float*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherVector512(uint*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherVector512(ulong*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> GatherVector512(double* baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<int>    GatherVector512(int*    baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<long>   GatherVector512(long*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<float>  GatherVector512(float*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<uint>   GatherVector512(uint*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe Vector512<ulong>  GatherVector512(ulong*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe Vector512<double> MaskLoad(double* address, Vector512<double> mask);
    public static unsafe Vector512<int>    MaskLoad(int*    address, Vector512<int>    mask);
    public static unsafe Vector512<long>   MaskLoad(long*   address, Vector512<long>   mask);
    public static unsafe Vector512<float>  MaskLoad(float*  address, Vector512<float>  mask);
    public static unsafe Vector512<uint>   MaskLoad(uint*   address, Vector512<uint>   mask);
    public static unsafe Vector512<ulong>  MaskLoad(ulong*  address, Vector512<ulong>  mask);

    public static unsafe void MaskStore(double* address, Vector512<double> mask, Vector512<double> source);
    public static unsafe void MaskStore(int*    address, Vector512<int>    mask, Vector512<int>    source);
    public static unsafe void MaskStore(long*   address, Vector512<long>   mask, Vector512<long>   source);
    public static unsafe void MaskStore(float*  address, Vector512<float>  mask, Vector512<float>  source);
    public static unsafe void MaskStore(uint*   address, Vector512<uint>   mask, Vector512<uint>   source);
    public static unsafe void MaskStore(ulong*  address, Vector512<ulong>  mask, Vector512<ulong>  source);

    public static int MoveMask(Vector256<short>  value);
    public static int MoveMask(Vector256<ushort> value);
    public static int MoveMask(Vector512<int>    value);
    public static int MoveMask(Vector512<float>  value);
    public static int MoveMask(Vector512<uint>   value);

    public static unsafe void ScatterMaskVector512(Vector512<double> value, double* baseAddress, Vector512<int> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<int>    value, int*    baseAddress, Vector512<int> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<long>   value, long*   baseAddress, Vector512<int> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<float>  value, float*  baseAddress, Vector512<int> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<uint>   value, uint*   baseAddress, Vector512<int> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<int> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe void ScatterMaskVector512(Vector512<double> value, double* baseAddress, Vector512<long> index, Vector512<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<int>    value, int*    baseAddress, Vector512<long> index, Vector512<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<long>   value, long*   baseAddress, Vector512<long> index, Vector512<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<float>  value, uint*   baseAddress, Vector512<long> index, Vector512<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<uint>   value, float*  baseAddress, Vector512<long> index, Vector512<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterMaskVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<long> index, Vector512<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe void ScatterVector512(Vector512<double> value, double* baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<int>    value, int*    baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<long>   value, long*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<float>  value, float*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<uint>   value, uint*   baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static unsafe void ScatterVector512(Vector512<double> value, double* baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<int>    value, int*    baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<long>   value, long*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<float>  value, float*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<uint>   value, uint*   baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    public static unsafe void ScatterVector512(Vector512<ulong>  value, ulong*  baseAddress, Vector512<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

    public static bool TestC(Vector512<double> left, Vector512<double> right);
    public static bool TestC(Vector512<int>    left, Vector512<int>    right);
    public static bool TestC(Vector512<long>   left, Vector512<long>   right);
    public static bool TestC(Vector512<float>  left, Vector512<float>  right);
    public static bool TestC(Vector512<uint>   left, Vector512<uint>   right);
    public static bool TestC(Vector512<ulong>  left, Vector512<ulong>  right);

    public static bool TestNotZAndNotC(Vector512<double> left, Vector512<double> right);
    public static bool TestNotZAndNotC(Vector512<int>    left, Vector512<int>    right);
    public static bool TestNotZAndNotC(Vector512<long>   left, Vector512<long>   right);
    public static bool TestNotZAndNotC(Vector512<float>  left, Vector512<float>  right);
    public static bool TestNotZAndNotC(Vector512<uint>   left, Vector512<uint>   right);
    public static bool TestNotZAndNotC(Vector512<ulong>  left, Vector512<ulong>  right);

    public static bool TestZ(Vector512<double> left, Vector512<double> right);
    public static bool TestZ(Vector512<int>    left, Vector512<int>    right);
    public static bool TestZ(Vector512<long>   left, Vector512<long>   right);
    public static bool TestZ(Vector512<float>  left, Vector512<float>  right);
    public static bool TestZ(Vector512<uint>   left, Vector512<uint>   right);
    public static bool TestZ(Vector512<ulong>  left, Vector512<ulong>  right);

    public static partial class VL
    {
        public static Vector128<double> Compress(Vector128<double> value, Vector128<double> mask);
        public static Vector128<int>    Compress(Vector128<int>    value, Vector128<int>    mask);
        public static Vector128<long>   Compress(Vector128<long>   value, Vector128<long>   mask);
        public static Vector128<float>  Compress(Vector128<float>  value, Vector128<float>  mask);
        public static Vector128<uint>   Compress(Vector128<uint>   value, Vector128<uint>   mask);
        public static Vector128<ulong>  Compress(Vector128<ulong>  value, Vector128<ulong>  mask);
        public static Vector256<double> Compress(Vector256<double> value, Vector256<double> mask);
        public static Vector256<int>    Compress(Vector256<int>    value, Vector256<int>    mask);
        public static Vector256<long>   Compress(Vector256<long>   value, Vector256<long>   mask);
        public static Vector256<float>  Compress(Vector256<float>  value, Vector256<float>  mask);
        public static Vector256<uint>   Compress(Vector256<uint>   value, Vector256<uint>   mask);
        public static Vector256<ulong>  Compress(Vector256<ulong>  value, Vector256<ulong>  mask);

        public static Vector128<double> Expand(Vector128<double> value, Vector128<double> mask);
        public static Vector128<int>    Expand(Vector128<int>    value, Vector128<int>    mask);
        public static Vector128<long>   Expand(Vector128<long>   value, Vector128<long>   mask);
        public static Vector128<float>  Expand(Vector128<float>  value, Vector128<float>  mask);
        public static Vector128<uint>   Expand(Vector128<uint>   value, Vector128<uint>   mask);
        public static Vector128<ulong>  Expand(Vector128<ulong>  value, Vector128<ulong>  mask);
        public static Vector256<double> Expand(Vector256<double> value, Vector256<double> mask);
        public static Vector256<int>    Expand(Vector256<int>    value, Vector256<int>    mask);
        public static Vector256<long>   Expand(Vector256<long>   value, Vector256<long>   mask);
        public static Vector256<float>  Expand(Vector256<float>  value, Vector256<float>  mask);
        public static Vector256<uint>   Expand(Vector256<uint>   value, Vector256<uint>   mask);
        public static Vector256<ulong>  Expand(Vector256<ulong>  value, Vector256<ulong>  mask);

        public static unsafe void ScatterMaskVector128(Vector128<double> value, double* baseAddress, Vector128<int> index, Vector128<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<int>    value, int*    baseAddress, Vector128<int> index, Vector128<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<long>   value, long*   baseAddress, Vector128<int> index, Vector128<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<float>  value, float*  baseAddress, Vector128<int> index, Vector128<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<uint>   value, uint*   baseAddress, Vector128<int> index, Vector128<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<int> index, Vector128<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<double> value, double* baseAddress, Vector256<int> index, Vector256<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<int>    value, int*    baseAddress, Vector256<int> index, Vector256<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<long>   value, long*   baseAddress, Vector256<int> index, Vector256<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<float>  value, float*  baseAddress, Vector256<int> index, Vector256<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<uint>   value, uint*   baseAddress, Vector256<int> index, Vector256<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<int> index, Vector256<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

        public static unsafe void ScatterMaskVector128(Vector128<double> value, double* baseAddress, Vector128<long> index, Vector128<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<int>    value, int*    baseAddress, Vector128<long> index, Vector128<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<long>   value, long*   baseAddress, Vector128<long> index, Vector128<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<float>  value, uint*   baseAddress, Vector128<long> index, Vector128<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<uint>   value, float*  baseAddress, Vector128<long> index, Vector128<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<long> index, Vector128<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<double> value, double* baseAddress, Vector256<long> index, Vector256<double> mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<int>    value, int*    baseAddress, Vector256<long> index, Vector256<int>    mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<long>   value, long*   baseAddress, Vector256<long> index, Vector256<long>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<float>  value, uint*   baseAddress, Vector256<long> index, Vector256<uint>   mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<uint>   value, float*  baseAddress, Vector256<long> index, Vector256<float>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterMaskVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<long> index, Vector256<ulong>  mask, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

        public static unsafe void ScatterVector128(Vector128<double> value, double* baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<int>    value, int*    baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<long>   value, long*   baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<float>  value, float*  baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<uint>   value, uint*   baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<double> value, double* baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<int>    value, int*    baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<long>   value, long*   baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<float>  value, float*  baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<uint>   value, uint*   baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<int> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);

        public static unsafe void ScatterVector128(Vector128<double> value, double* baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<int>    value, int*    baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<long>   value, long*   baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<float>  value, float*  baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<uint>   value, uint*   baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector128(Vector128<ulong>  value, ulong*  baseAddress, Vector128<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<double> value, double* baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<int>    value, int*    baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<long>   value, long*   baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<float>  value, float*  baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<uint>   value, uint*   baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
        public static unsafe void ScatterVector256(Vector256<ulong>  value, ulong*  baseAddress, Vector256<long> index, [ConstantExpected(Min = (byte)(1), Max = (byte)(8))] byte scale);
    }
}

public static partial class Avx512BW
{
    public static int MoveMask(Vector512<short>  value);
    public static int MoveMask(Vector512<ushort> value);

    public static long MoveMask(Vector512<byte>  value);
    public static long MoveMask(Vector512<sbyte> value);

    public static bool TestC(Vector512<byte>   left, Vector512<byte>   right);
    public static bool TestC(Vector512<short>  left, Vector512<short>  right);
    public static bool TestC(Vector512<sbyte>  left, Vector512<sbyte>  right);
    public static bool TestC(Vector512<ushort> left, Vector512<ushort> right);

    public static bool TestNotZAndNotC(Vector512<byte>   left, Vector512<byte>   right);
    public static bool TestNotZAndNotC(Vector512<short>  left, Vector512<short>  right);
    public static bool TestNotZAndNotC(Vector512<sbyte>  left, Vector512<sbyte>  right);
    public static bool TestNotZAndNotC(Vector512<ushort> left, Vector512<ushort> right);

    public static bool TestZ(Vector512<byte>   left, Vector512<byte>   right);
    public static bool TestZ(Vector512<short>  left, Vector512<short>  right);
    public static bool TestZ(Vector512<sbyte>  left, Vector512<sbyte>  right);
    public static bool TestZ(Vector512<ushort> left, Vector512<ushort> right);
}

public static partial class Avx512DQ
{
    public static Vector512<double> Classify(Vector512<double> value, [ConstantExpected] byte control);
    public static Vector512<float>  Classify(Vector512<float>  value, [ConstantExpected] byte control);

    public static Vector128<double> ClassifyScalar(Vector128<double> value, [ConstantExpected] byte control);
    public static Vector128<float>  ClassifyScalar(Vector128<float>  value, [ConstantExpected] byte control);

    public static int MoveMask(Vector128<short>  value);
    public static int MoveMask(Vector128<ushort> value);
    public static int MoveMask(Vector256<int>    value);
    public static int MoveMask(Vector256<uint>   value);
    public static int MoveMask(Vector512<double> value);
    public static int MoveMask(Vector512<long>   value);
    public static int MoveMask(Vector512<ulong>  value);

    public static partial class VL
    {
        public static Vector128<double> Classify(Vector128<double> value, [ConstantExpected] byte control);
        public static Vector128<float>  Classify(Vector128<float>  value, [ConstantExpected] byte control);
        public static Vector256<double> Classify(Vector256<double> value, [ConstantExpected] byte control);
        public static Vector256<float>  Classify(Vector256<float>  value, [ConstantExpected] byte control);
    }
}

public abstract class Avx512Vbmi2 : Avx512BW
{
    public static new bool IsSupported { get; }

    public static Vector512<byte>   Compress(Vector512<byte>   value, Vector512<byte>   mask);
    public static Vector512<short>  Compress(Vector512<short>  value, Vector512<short>  mask);
    public static Vector512<sbyte>  Compress(Vector512<sbyte>  value, Vector512<sbyte>  mask);
    public static Vector512<ushort> Compress(Vector512<ushort> value, Vector512<ushort> mask);

    public static Vector512<byte>   Expand(Vector512<byte>   value, Vector512<byte>   mask);
    public static Vector512<short>  Expand(Vector512<short>  value, Vector512<short>  mask);
    public static Vector512<sbyte>  Expand(Vector512<sbyte>  value, Vector512<sbyte>  mask);
    public static Vector512<ushort> Expand(Vector512<ushort> value, Vector512<ushort> mask);

    public abstract class VL : Avx512BW.VL
    {
        public static new bool IsSupported { get; }

        public static Vector128<byte>   Compress(Vector128<byte>   value, Vector128<byte>   mask);
        public static Vector128<short>  Compress(Vector128<short>  value, Vector128<short>  mask);
        public static Vector128<sbyte>  Compress(Vector128<sbyte>  value, Vector128<sbyte>  mask);
        public static Vector128<ushort> Compress(Vector128<ushort> value, Vector128<ushort> mask);
        public static Vector256<byte>   Compress(Vector256<byte>   value, Vector256<byte>   mask);
        public static Vector256<short>  Compress(Vector256<short>  value, Vector256<short>  mask);
        public static Vector256<sbyte>  Compress(Vector256<sbyte>  value, Vector256<sbyte>  mask);
        public static Vector256<ushort> Compress(Vector256<ushort> value, Vector256<ushort> mask);

        public static Vector128<byte>   Expand(Vector128<byte>   value, Vector128<byte>   mask);
        public static Vector128<short>  Expand(Vector128<short>  value, Vector128<short>  mask);
        public static Vector128<sbyte>  Expand(Vector128<sbyte>  value, Vector128<sbyte>  mask);
        public static Vector128<ushort> Expand(Vector128<ushort> value, Vector128<ushort> mask);
        public static Vector256<byte>   Expand(Vector256<byte>   value, Vector256<byte>   mask);
        public static Vector256<short>  Expand(Vector256<short>  value, Vector256<short>  mask);
        public static Vector256<sbyte>  Expand(Vector256<sbyte>  value, Vector256<sbyte>  mask);
        public static Vector256<ushort> Expand(Vector256<ushort> value, Vector256<ushort> mask);
    }

    public abstract class X64 : Avx512BW.X64
    {
        public static new bool IsSupported { get; }
    }
}

These remaining one can be implemented anytime after main opens for .NET 9 changes next month

@MineCake147E

This comment was marked as resolved.

@tannergooding
Copy link
Member Author

As per the above, not everything landed in .NET 8.

Right now, Vector512.Equals(x, y) + Vector512.Equals(z, w) would produce a kadd instruction, but the support around ExtractMostsignificantBits isn't there. We'll get to it in .NET 9 instead. We likewise don't have the support to generate kshift for such cases at the moment.

If you were to do something like:

Avx512F.BlendVariable(x, y, Vector512.Equals(x, y) + Vector512.Equals(z, w))

You would get the generally expected codegen:

vpcmpeqd k1, zmm0, zmm1
vpcmpeqd k2, zmm2, zmm3
kaddw    k1, k1, k2
vpblendmd zmm0 {k1}, zmm0, zmm1

@MineCake147E

This comment was marked as resolved.

@MadProbe
Copy link

MadProbe commented Nov 2, 2023

Is there a way to convert the integer that I have computed for use as a mask to a vector to pass it into methods which accept a mask represented as VectorXXX<XXX>?

@tannergooding
Copy link
Member Author

There isn't currently a way to create a mask from an integer value (and no instruction to do this either, it at best would be mov rax, imm; kmov k0, rax)

Such an API would be reasonably to define and expose in Avx512F

@MadProbe
Copy link

MadProbe commented Nov 2, 2023

There isn't currently a way to create a mask from an integer value (and no instruction to do this either, it at best would be mov rax, imm; kmov k0, rax)

Such an API would be reasonably to define and expose in Avx512F

It would be best to expose this as it would severely limit usability of new mask APIs otherwise and make it clunky to use in cases like conditionally loading only the last N values.

@tannergooding
Copy link
Member Author

It notably works as expected when you simply use AllBitsSet and Zero on a per element basis, which is exactly what you'd need to do downlevel.

I'm fine with the general concept, however. Someone would need to open an API proposal covering the 4 mask variants.

@MadProbe
Copy link

MadProbe commented Nov 2, 2023

Would this be something like the older closed one without all the overloads with zero & write masks of already existing methods?

@tannergooding
Copy link
Member Author

It’d be an api proposal, but with a signature that looks something like Vector512<byte> CreateVector512Mask(long mostSignificantBits) and expanded to the full appropriate set. They would be in Avx512F or another relevant ISA depending on which is appropriate for kmovb/w/d/q

@MadProbe
Copy link

MadProbe commented Dec 1, 2023

I still think that exposing vector masks as VectorMask<T> where T is the baking storage type for the mask as it provides much more control over what compiler does so one wouldn't need to heavily rely on compiler optimizations or one can be sure that it will do stuff one wants one's code to actually do. Also there's no need for implementing all the variants of instructions as were described in the earlier iterations of this proposal and there would also be no need for exposing VectorMaskXXX<T> where XXX is vector's size.

This is, I think, the closest to hardware way of doing masking and it is very minimal in its nature.

The masks would be usable in VectorXXX.ConditionalSelect functions and in masked vector loading & storing functions just like in this proposal, but the checks of minimal baking storage size of mask would be placed in VectorXXX functions, so amount of elements in mask would be always greater than or equal to the amount of values in VectorXXX<T>. And if there are more elements in the mask than needed, just ignore the unneeded upper bits as it's no use to check the unneeded upper bits to be zero as it would only make this more complicated than it should have been.

Also I propose to expose native functions in Avx512DQ & Avx512BW for masked loading & storing functions as well as blending functions as they also use masks in corresponding instructions.

This is, I think, the closest to hardware way of doing masking and it is very minimal in its nature.

@MadProbe
Copy link

MadProbe commented Dec 2, 2023

Could I make proposal, that I have described above, with all necessary API definitions and explainers instead of doing mask-to-vector broadcast proposal, Tanner Gooding?

@tannergooding
Copy link
Member Author

Could I make proposal, that I have described above, with all necessary API definitions and explainers instead of doing mask-to-vector broadcast proposal, Tanner Gooding?

You're free to make a proposal, but it's not going to provide what you think it will provide and I don't see it moving forward at this point in time.

While we could indeed have exposed a VectorMask64/128/256/512<T> and VectorMask<T> type and then only exposed a subset of APIs that "must" take the mask (like ConditionalSelect) or return a mask (like GreaterThan), it ultimately isn't "better". It still relies very heavily on the JIT to do pattern recognition for optimizations, still relies very heavily on users updating their code manually, and adds additional complexity/overhead that will lead users into a pit of failure.

We opted for the path we did because it massively simplifies the implementation, provides the greatest impact to both new and existing code, reduces the throughput impact this already niche feature has on the JIT, makes it more pay to play, meshes nicely with other patterns we already rely on optimizations to light up for (such as embedded loads/stores, which C/C++ also relies on compiler opts around), actively reduces the total set of pattern recognition we need to do compared to the alternative, and most importantly because we have a decently high level of confidence the JIT can be made to generate the optimal code in the vast majority of scenarios; enough so that users shouldn't even care about the nuance in practice.

The only downside was that we didn't finish all the work in .NET 8 and we will need to finish it up in .NET 9 as well as continue to improve it over time instead.

where T is the baking storage type

This won't work. Vector128<byte> requires a mask with 16-bits and therefore VectorMask128<byte> must make at least 16-bits available. If we used byte as the backing storage, we'd only have 8-bits available. This also adds significant complexity into how it would work from a cross platform perspective, complexity into the overloads exposed, overhead in the conversions between VectorMask types, and more JIT strain in it needing to handle small types.


At the end of the day, we have a few considerations. But the most important two are...

1. What happens when existing code is run on the latest hardware

Just due to how software works, the considerations of targeting multiple platforms (Windows, Linux, MacOS, Android, iOS; x64, Arm64, WASM, RISC-V, LoongArch, etc), and that newer ISAs are less common (both in terms of hardware support and in terms of code that has paths for it), existing code is very important. We will always have far more code that targets Vector128 than anything else. We will frequently have users who are more than willing to give up a tiny bit of perf in favor of maintainability, portability, reusability, etc.

This first consideration is one of the reasons why cross platform API surface is so important. There will always be some libraries that want to write a specific code path per platform/architecture/ISA. There may even be some that are willing to do micro-architecture specific optimizations. But those are ultimately a minority compared to the other set where they will simply write a Vector128<T> code path and have it reused across multiple platforms.

Because of that, we get the most benefit out of doing some pattern recognition for existing code and then having that light up on the latest hardware. We see this particularly where these patterns are simple to do, like ConditionalSelect(mask, left, right) and so the light-up recognition we currently rely on for masking support is effectively a must have.

2. What happens when the newer types are used on downlevel hardware

Providing APIs that power users can utilize to fine tune is also important however. It makes .NET a first class place for users to write such code and helps drive innovation. In most cases that newer support is straightforward as we simply expose the APIs, they throw if not supported, and users don't have to think much about the support.

Where the APIs are exposed in a cross platform way becomes more of a consideration as we then have to consider how it gets used across multiple platforms. For example, we have to consider that ConditionalSelect needs to operate on a bitwise basis and that Shuffle operates on the entire vector (Vector256.Shuffle doesn't operate on 2x128 lanes). This leads to nuance in how the APIs are exposed, how users might be expected to consume them, etc.

In the case of something like VectorMask<T> there are cases where it can have hardware acceleration and cases where it won't. The majority of developers won't want to insert code like the following into their logic, nor would they be able to trivially define helpers for easier reuse. This is even true for devs writing micro-architecture specific opts because of how it can impact the JIT and codegen:

if (VectorMask128.IsHardwareAccelerated)
{ 
    VectorMask128<T> mask = Vector128.GreaterThanMask(x, y);
    return Vector128.MaskedConditionalSelect(mask, z, w);
}
else
{
    Vector128<T> mask = Vector128.GreaterThan(x, y);
    return Vector128.ConditionalSelect(mask, z, w);
}

So, we have to then account that most users want to write only one of the two paths.

If they use Vector128<T> then all their existing code works and on newer hardware it has the capability of opportunistically lighting up to use the new mask registers. In the worst case their writing the code they would have already written and it generates slightly suboptimal code. This is within the realm of normal for most compilers, even C/C++, where compiler opts are heavily relied on in many scenarios and where you may not get exactly what you asked for already.

However, if they use VectorMask128<T>, they need to explicitly change their code to take advantage of it. Additionally, on downlevel hardware we have to functionally treat VectorMask128<T> as Vector128<T>. This gets more complex than treating Vector128<T> as TYP_MASK because we suddenly don't have the hardware capability to do the conversions, we have to rationalize what the size and shape of this non-accelerated type is on the older hardware, and it doesn't mesh naturally with how code would otherwise be written.

@xoofx
Copy link
Member

xoofx commented Jan 26, 2024

We didn't, however, land Compress/Expand, Gather/Scatter, MaskLoad/MaskStore, the platform specific MoveMask or Test APIs:

Just to confirm, as I don't see it in the list, but will .NET 9 try to expose AVX512 MaskCompressStore? (vpcompressb, vpcompressw, vpcompressd, vpcompresss, vpcompressq)

@MichalPetryka
Copy link
Contributor

MichalPetryka commented Jan 26, 2024

We didn't, however, land Compress/Expand, Gather/Scatter, MaskLoad/MaskStore, the platform specific MoveMask or Test APIs:

Just to confirm, as I don't see it in the list, but will .NET 9 try to expose AVX512 MaskCompressStore? (vpcompressb, vpcompressw, vpcompressd, vpcompresss, vpcompressq)

MaskCompressStore should be avoided as it's terribly slow on Zen4.

@tannergooding
Copy link
Member Author

Just to confirm, as I don't see it in the list, but will .NET 9 try to expose AVX512 MaskCompressStore? (vpcompressb, vpcompressw, vpcompressd, vpcompresss, vpcompressq)

If there's any that were missed (and I know there were a small handful, namely around masking), we'd need an explicit API proposal requesting them. There shouldn't be anything blocking us from adding them once the proposal is up/approved, however.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.Intrinsics avx512 Related to the AVX-512 architecture
Projects
None yet
Development

No branches or pull requests

6 participants