Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend System.Runtime.Intrinsics.X86 to support nint and nuint #52021

Open
tannergooding opened this issue Apr 28, 2021 · 4 comments
Open

Extend System.Runtime.Intrinsics.X86 to support nint and nuint #52021

tannergooding opened this issue Apr 28, 2021 · 4 comments
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.Intrinsics
Milestone

Comments

@tannergooding
Copy link
Member

Proposal

Extend ISAs in System.Runtime.Intrinsics.X86 to support nint and nuint as valid primitive types. This will expose a number of new APIs to take the new types. Several of these APIs exist in Base vs Base.X64 splits. Others represent cases where either int or long are possible inputs today.

namespace System.Runtime.Intrinsics.X86
{
    public abstract partial class Sse
    {
        public static Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, nint value);

        public static nint ConvertToNInt(Vector128<float> value);
        public static nint ConvertToNIntWithTruncation(Vector128<float> value);
    }

    public abstract partial class Sse2
    {
        public static Vector128<double> ConvertScalarToVector128Double(Vector128<double> upper, nint value);

        public static Vector128<nint> ConvertScalarToVector128NInt(nint value);
        public static Vector128<nuint> ConvertScalarToVector128NUInt(nuint value);

        public static nint ConvertToNInt(Vector128<double> value);
        public static nint ConvertToNIntWithTruncation(Vector128<double> value);

        public static nint ConvertToNInt(Vector128<nint> value);
        public static nuint ConvertToNUInt(Vector128<nuint> value);

        public static unsafe void StoreNonTemporal(nint* address, nint value);
        public static unsafe void StoreNonTemporal(nuint* address, nuint value);

        public static Vector128<nint> Add(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> Add(Vector128<nuint> left, Vector128<nuint> right);

        public static Vector128<nint> And(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> And(Vector128<nuint> left, Vector128<nuint> right);

        public static Vector128<nint> AndNot(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> AndNot(Vector128<nuint> left, Vector128<nuint> right);

        public static unsafe Vector128<nint> LoadVector128(nint* address);
        public static unsafe Vector128<nuint> LoadVector128(nuint* address);

        public static unsafe Vector128<nint> LoadAlignedVector128(nint* address);
        public static unsafe Vector128<nuint> LoadAlignedVector128(nuint* address);

        public static unsafe Vector128<nint> LoadScalarVector128(nint* address);
        public static unsafe Vector128<nuint> LoadScalarVector128(nuint* address);

        public static Vector128<nint> MoveScalar(Vector128<nint> value);
        public static Vector128<nuint> MoveScalar(Vector128<nuint> value);

        public static Vector128<nint> Or(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> Or(Vector128<nuint> left, Vector128<nuint> right);

        public static Vector128<nint> ShiftLeftLogical(Vector128<nint> value, Vector128<nint> count);
        public static Vector128<nuint> ShiftLeftLogical(Vector128<nuint> value, Vector128<nuint> count);

        public static Vector128<nint> ShiftLeftLogical(Vector128<nint> value, byte count);
        public static Vector128<nuint> ShiftLeftLogical(Vector128<nuint> value, byte count);

        public static Vector128<nint> ShiftLeftLogical128BitLane(Vector128<nint> value, byte numBytes);
        public static Vector128<nuint> ShiftLeftLogical128BitLane(Vector128<nuint> value, byte numBytes);

        public static Vector128<nint> ShiftRightLogical(Vector128<nint> value, Vector128<nint> count);
        public static Vector128<nuint> ShiftRightLogical(Vector128<nuint> value, Vector128<nuint> count);

        public static Vector128<nint> ShiftRightLogical(Vector128<nint> value, byte count);
        public static Vector128<nuint> ShiftRightLogical(Vector128<nuint> value, byte count);

        public static Vector128<nint> ShiftRightLogical128BitLane(Vector128<nint> value, byte numBytes);
        public static Vector128<nuint> ShiftRightLogical128BitLane(Vector128<nuint> value, byte numBytes);

        public static unsafe void StoreScalar(nint* address, Vector128<nint> source);
        public static unsafe void StoreScalar(nuint* address, Vector128<nuint> source);

        public static unsafe void StoreAligned(nint* address, Vector128<nint> source);
        public static unsafe void StoreAligned(nuint* address, Vector128<nuint> source);

        public static unsafe void StoreAlignedNonTemporal(nint* address, Vector128<nint> source);
        public static unsafe void StoreAlignedNonTemporal(nuint* address, Vector128<nuint> source);

        public static unsafe void Store(nint* address, Vector128<nint> source);
        public static unsafe void Store(nuint* address, Vector128<nuint> source);

        public static Vector128<nint> Subtract(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> Subtract(Vector128<nuint> left, Vector128<nuint> right);

        public static Vector128<nint> UnpackHigh(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> UnpackHigh(Vector128<nuint> left, Vector128<nuint> right);

        public static Vector128<nint> UnpackLow(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> UnpackLow(Vector128<nuint> left, Vector128<nuint> right);

        public static Vector128<nint> Xor(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> Xor(Vector128<nuint> left, Vector128<nuint> right);
    }

    public abstract partial class Sse3
    {
        public static unsafe Vector128<nint> LoadDquVector128(nint* address);
        public static unsafe Vector128<nuint> LoadDquVector128(nuint* address);
    }

    public abstract partial class Ssse3
    {
        public static Vector128<nint> AlignRight(Vector128<nint> left, Vector128<nint> right, byte mask);
        public static Vector128<nuint> AlignRight(Vector128<nuint> left, Vector128<nuint> right, byte mask);
    }

    public abstract partial class Sse41
    {
        public static nint Extract(Vector128<nint> value, byte index);
        public static nuint Extract(Vector128<nuint> value, byte index);

        public static Vector128<nint> Insert(Vector128<nint> value, nint data, byte index);
        public static Vector128<nuint> Insert(Vector128<nuint> value, nuint data, byte index);

        public static Vector128<nint> BlendVariable(Vector128<nint> left, Vector128<nint> right, Vector128<nint> mask);
        public static Vector128<nuint> BlendVariable(Vector128<nuint> left, Vector128<nuint> right, Vector128<nuint> mask);

        public static Vector128<nint> CompareEqual(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> CompareEqual(Vector128<nuint> left, Vector128<nuint> right);

        public static Vector128<nint> ConvertToVector128NInt(Vector128<sbyte> value);
        public static Vector128<nint> ConvertToVector128NInt(Vector128<byte> value);
        public static Vector128<nint> ConvertToVector128NInt(Vector128<short> value);
        public static Vector128<nint> ConvertToVector128NInt(Vector128<ushort> value);
        public static Vector128<nint> ConvertToVector128NInt(Vector128<int> value);
        public static Vector128<nint> ConvertToVector128NInt(Vector128<uint> value);

        public static unsafe Vector128<nint> ConvertToVector128NInt(sbyte* address);
        public static unsafe Vector128<nint> ConvertToVector128NInt(byte* address);
        public static unsafe Vector128<nint> ConvertToVector128NInt(short* address);
        public static unsafe Vector128<nint> ConvertToVector128NInt(ushort* address);
        public static unsafe Vector128<nint> ConvertToVector128NInt(int* address);
        public static unsafe Vector128<nint> ConvertToVector128NInt(uint* address);

        public static Vector128<nint> Multiply(Vector128<int> left, Vector128<int> right);

        public static unsafe Vector128<nint> LoadAlignedVector128NonTemporal(nint* address);
        public static unsafe Vector128<nuint> LoadAlignedVector128NonTemporal(nuint* address);

        public static bool TestC(Vector128<nint> left, Vector128<nint> right);
        public static bool TestC(Vector128<nuint> left, Vector128<nuint> right);

        public static bool TestNotZAndNotC(Vector128<nint> left, Vector128<nint> right);
        public static bool TestNotZAndNotC(Vector128<nuint> left, Vector128<nuint> right);

        public static bool TestZ(Vector128<nint> left, Vector128<nint> right);
        public static bool TestZ(Vector128<nuint> left, Vector128<nuint> right);
    }

    public abstract partial class Sse42
    {
        public static Vector128<nint> CompareGreaterThan(Vector128<nint> left, Vector128<nint> right);

        public static nuint Crc32(nuint crc, nuint data);
    }

    public abstract partial class Avx
    {
        public static Vector128<nint> ExtractVector128(Vector256<nint> value, byte index);
        public static Vector128<nuint> ExtractVector128(Vector256<nuint> value, byte index);

        public static Vector256<nint> InsertVector128(Vector256<nint> value, Vector128<nint> data, byte index);
        public static Vector256<nuint> InsertVector128(Vector256<nuint> value, Vector128<nuint> data, byte index);

        public static unsafe Vector256<nint> LoadVector256(nint* address);
        public static unsafe Vector256<nuint> LoadVector256(nuint* address);

        public static unsafe Vector256<nint> LoadAlignedVector256(nint* address);
        public static unsafe Vector256<nuint> LoadAlignedVector256(nuint* address);

        public static unsafe Vector256<nint> LoadDquVector256(nint* address);
        public static unsafe Vector256<nuint> LoadDquVector256(nuint* address);

        public static Vector256<nint> Permute2x128(Vector256<nint> left, Vector256<nint> right, byte control);
        public static Vector256<nuint> Permute2x128(Vector256<nuint> left, Vector256<nuint> right, byte control);

        public static unsafe void StoreAligned(nint* address, Vector256<nint> source);
        public static unsafe void StoreAligned(nuint* address, Vector256<nuint> source);

        public static unsafe void StoreAlignedNonTemporal(nint* address, Vector256<nint> source);
        public static unsafe void StoreAlignedNonTemporal(nuint* address, Vector256<nuint> source);

        public static unsafe void Store(nint* address, Vector256<nint> source);
        public static unsafe void Store(nuint* address, Vector256<nuint> source);

        public static bool TestC(Vector256<nint> left, Vector256<nint> right);
        public static bool TestC(Vector256<nuint> left, Vector256<nuint> right);

        public static bool TestNotZAndNotC(Vector256<nint> left, Vector256<nint> right);
        public static bool TestNotZAndNotC(Vector256<nuint> left, Vector256<nuint> right);

        public static bool TestZ(Vector256<nint> left, Vector256<nint> right);
        public static bool TestZ(Vector256<nuint> left, Vector256<nuint> right);
    }

    public abstract partial class Avx2
    {
        public static Vector256<nint> Add(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> Add(Vector256<nuint> left, Vector256<nuint> right);

        public static Vector256<nint> AlignRight(Vector256<nint> left, Vector256<nint> right, byte mask);
        public static Vector256<nuint> AlignRight(Vector256<nuint> left, Vector256<nuint> right, byte mask);

        public static Vector256<nint> And(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> And(Vector256<nuint> left, Vector256<nuint> right);

        public static Vector256<nint> AndNot(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> AndNot(Vector256<nuint> left, Vector256<nuint> right);

        public static Vector256<nint> BlendVariable(Vector256<nint> left, Vector256<nint> right, Vector256<nint> mask);
        public static Vector256<nuint> BlendVariable(Vector256<nuint> left, Vector256<nuint> right, Vector256<nuint> mask);

        public static Vector128<nint> BroadcastScalarToVector128(Vector128<nint> value);
        public static Vector128<nuint> BroadcastScalarToVector128(Vector128<nuint> value);

        public static unsafe Vector128<nint> BroadcastScalarToVector128(nint* source);
        public static unsafe Vector128<nuint> BroadcastScalarToVector128(nuint* source);

        public static Vector256<nint> BroadcastScalarToVector256(Vector128<nint> value);
        public static Vector256<nuint> BroadcastScalarToVector256(Vector128<nuint> value);

        public static unsafe Vector256<nint> BroadcastScalarToVector256(nint* source);
        public static unsafe Vector256<nuint> BroadcastScalarToVector256(nuint* source);

        public static unsafe Vector256<nint> BroadcastVector128ToVector256(nint* address);
        public static unsafe Vector256<nuint> BroadcastVector128ToVector256(nuint* address);

        public static Vector256<nint> CompareEqual(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> CompareEqual(Vector256<nuint> left, Vector256<nuint> right);

        public static Vector256<nint> CompareGreaterThan(Vector256<nint> left, Vector256<nint> right);

        public static Vector256<nint> ConvertToVector256NInt(Vector128<sbyte> value);
        public static Vector256<nint> ConvertToVector256NInt(Vector128<byte> value);
        public static Vector256<nint> ConvertToVector256NInt(Vector128<short> value);
        public static Vector256<nint> ConvertToVector256NInt(Vector128<ushort> value);
        public static Vector256<nint> ConvertToVector256NInt(Vector128<int> value);
        public static Vector256<nint> ConvertToVector256NInt(Vector128<uint> value);

        public static unsafe Vector256<nint> ConvertToVector256NInt(sbyte* address);
        public static unsafe Vector256<nint> ConvertToVector256NInt(byte* address);
        public static unsafe Vector256<nint> ConvertToVector256NInt(short* address);
        public static unsafe Vector256<nint> ConvertToVector256NInt(ushort* address);
        public static unsafe Vector256<nint> ConvertToVector256NInt(int* address);
        public static unsafe Vector256<nint> ConvertToVector256NInt(uint* address);

        public static new Vector128<nint> ExtractVector128(Vector256<nint> value, byte index);
        public static new Vector128<nuint> ExtractVector128(Vector256<nuint> value, byte index);

        public static unsafe Vector128<nint> GatherVector128(nint* baseAddress, Vector128<int> index, byte scale);
        public static unsafe Vector128<nuint> GatherVector128(nuint* baseAddress, Vector128<int> index, byte scale);

        public static unsafe Vector128<int> GatherVector128(int* baseAddress, Vector128<nint> index, byte scale);
        public static unsafe Vector128<uint> GatherVector128(uint* baseAddress, Vector128<nint> index, byte scale);
        public static unsafe Vector128<nint> GatherVector128(long* baseAddress, Vector128<nint> index, byte scale);
        public static unsafe Vector128<nuint> GatherVector128(ulong* baseAddress, Vector128<nint> index, byte scale);
        public static unsafe Vector128<nint> GatherVector128(nint* baseAddress, Vector128<nint> index, byte scale);
        public static unsafe Vector128<nuint> GatherVector128(nuint* baseAddress, Vector128<nint> index, byte scale);
        public static unsafe Vector128<float> GatherVector128(float* baseAddress, Vector128<nint> index, byte scale);
        public static unsafe Vector128<double> GatherVector128(double* baseAddress, Vector128<nint> index, byte scale);

        public static unsafe Vector256<nint> GatherVector256(nint* baseAddress, Vector128<int> index, byte scale);
        public static unsafe Vector256<nuint> GatherVector256(nuint* baseAddress, Vector128<int> index, byte scale);

        public static unsafe Vector128<int> GatherVector128(int* baseAddress, Vector256<nint> index, byte scale);
        public static unsafe Vector128<uint> GatherVector128(uint* baseAddress, Vector256<nint> index, byte scale);
        public static unsafe Vector256<nint> GatherVector256(long* baseAddress, Vector256<nint> index, byte scale);
        public static unsafe Vector256<nuint> GatherVector256(ulong* baseAddress, Vector256<nint> index, byte scale);
        public static unsafe Vector256<nint> GatherVector256(nint* baseAddress, Vector256<nint> index, byte scale);
        public static unsafe Vector256<nuint> GatherVector256(nuint* baseAddress, Vector256<nint> index, byte scale);
        public static unsafe Vector128<float> GatherVector128(float* baseAddress, Vector256<nint> index, byte scale);
        public static unsafe Vector256<double> GatherVector256(double* baseAddress, Vector256<nint> index, byte scale);

        public static unsafe Vector128<nint> GatherMaskVector128(Vector128<nint> source, nint* baseAddress, Vector128<int> index, Vector128<nint> mask, byte scale);
        public static unsafe Vector128<nuint> GatherMaskVector128(Vector128<nuint> source, nuint* baseAddress, Vector128<int> index, Vector128<nuint> mask, byte scale);

        public static unsafe Vector128<int> GatherMaskVector128(Vector128<int> source, int* baseAddress, Vector128<nint> index, Vector128<int> mask, byte scale);
        public static unsafe Vector128<uint> GatherMaskVector128(Vector128<uint> source, uint* baseAddress, Vector128<nint> index, Vector128<uint> mask, byte scale);
        public static unsafe Vector128<long> GatherMaskVector128(Vector128<long> source, long* baseAddress, Vector128<nint> index, Vector128<long> mask, byte scale);
        public static unsafe Vector128<ulong> GatherMaskVector128(Vector128<ulong> source, ulong* baseAddress, Vector128<nint> index, Vector128<long> mask, byte scale);

        public static unsafe Vector128<nint> GatherMaskVector128(Vector128<nint> source, nint* baseAddress, Vector128<nint> index, Vector128<nint> mask, byte scale);
        public static unsafe Vector128<nuint> GatherMaskVector128(Vector128<nuint> source, nuint* baseAddress, Vector128<nint> index, Vector128<nuint> mask, byte scale);
        public static unsafe Vector128<float> GatherMaskVector128(Vector128<float> source, float* baseAddress, Vector128<nint> index, Vector128<float> mask, byte scale);
        public static unsafe Vector128<double> GatherMaskVector128(Vector128<double> source, double* baseAddress, Vector128<nint> index, Vector128<double> mask, byte scale);

        public static unsafe Vector256<nint> GatherMaskVector256(Vector256<nint> source, nint* baseAddress, Vector128<int> index, Vector256<nint> mask, byte scale);
        public static unsafe Vector256<nuint> GatherMaskVector256(Vector256<nuint> source, nuint* baseAddress, Vector128<int> index, Vector256<nuint> mask, byte scale);

        public static unsafe Vector128<int> GatherMaskVector128(Vector128<int> source, int* baseAddress, Vector256<nint> index, Vector128<int> mask, byte scale);
        public static unsafe Vector128<uint> GatherMaskVector128(Vector128<uint> source, uint* baseAddress, Vector256<nint> index, Vector128<uint> mask, byte scale);
        public static unsafe Vector256<long> GatherMaskVector256(Vector256<long> source, long* baseAddress, Vector256<nint> index, Vector256<long> mask, byte scale);
        public static unsafe Vector256<ulong> GatherMaskVector256(Vector256<ulong> source, ulong* baseAddress, Vector256<nint> index, Vector256<ulong> mask, byte scale);

        public static unsafe Vector256<nint> GatherMaskVector256(Vector256<nint> source, nint* baseAddress, Vector256<nint> index, Vector256<nint> mask, byte scale);
        public static unsafe Vector256<nuint> GatherMaskVector256(Vector256<nuint> source, nuint* baseAddress, Vector256<nint> index, Vector256<nuint> mask, byte scale);
        public static unsafe Vector128<float> GatherMaskVector128(Vector128<float> source, float* baseAddress, Vector256<nint> index, Vector128<float> mask, byte scale);
        public static unsafe Vector256<double> GatherMaskVector256(Vector256<double> source, double* baseAddress, Vector256<nint> index, Vector256<double> mask, byte scale);

        public static new Vector256<nint> InsertVector128(Vector256<nint> value, Vector128<nint> data, byte index);
        public static new Vector256<nuint> InsertVector128(Vector256<nuint> value, Vector128<nuint> data, byte index);

        public static unsafe Vector256<nint> LoadAlignedVector256NonTemporal(nint* address);
        public static unsafe Vector256<nuint> LoadAlignedVector256NonTemporal(nuint* address);

        public static unsafe Vector128<nint> MaskLoad(nint* address, Vector128<nint> mask);
        public static unsafe Vector128<nuint> MaskLoad(nuint* address, Vector128<nuint> mask);

        public static unsafe Vector256<nint> MaskLoad(nint* address, Vector256<nint> mask);
        public static unsafe Vector256<nuint> MaskLoad(nuint* address, Vector256<nuint> mask);

        public static unsafe void MaskStore(nint* address, Vector128<nint> mask, Vector128<nint> source);
        public static unsafe void MaskStore(nuint* address, Vector128<nuint> mask, Vector128<nuint> source);

        public static unsafe void MaskStore(nint* address, Vector256<nint> mask, Vector256<nint> source);
        public static unsafe void MaskStore(nuint* address, Vector256<nuint> mask, Vector256<nuint> source);

        public static Vector256<nint> Or(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> Or(Vector256<nuint> left, Vector256<nuint> right);

        public static new Vector256<nint> Permute2x128(Vector256<nint> left, Vector256<nint> right, byte control);
        public static new Vector256<nuint> Permute2x128(Vector256<nuint> left, Vector256<nuint> right, byte control);

        public static Vector256<nint> Permute4x64(Vector256<nint> value, byte control);
        public static Vector256<nuint> Permute4x64(Vector256<nuint> value, byte control);

        public static Vector256<nint> ShiftLeftLogical(Vector256<nint> value, Vector128<nint> count);
        public static Vector256<nuint> ShiftLeftLogical(Vector256<nuint> value, Vector128<nuint> count);

        public static Vector256<nint> ShiftLeftLogical(Vector256<nint> value, byte count);
        public static Vector256<nuint> ShiftLeftLogical(Vector256<nuint> value, byte count);

        public static Vector256<nint> ShiftLeftLogical128BitLane(Vector256<nint> value, byte numBytes);
        public static Vector256<nuint> ShiftLeftLogical128BitLane(Vector256<nuint> value, byte numBytes);

        public static Vector256<nint> ShiftLeftLogicalVariable(Vector256<nint> value, Vector256<nuint> count);
        public static Vector256<nuint> ShiftLeftLogicalVariable(Vector256<nuint> value, Vector256<nuint> count);

        public static Vector128<nint> ShiftLeftLogicalVariable(Vector128<nint> value, Vector128<nuint> count);
        public static Vector128<nuint> ShiftLeftLogicalVariable(Vector128<nuint> value, Vector128<nuint> count);

        public static Vector256<nint> ShiftRightLogical(Vector256<nint> value, Vector128<nint> count);
        public static Vector256<nuint> ShiftRightLogical(Vector256<nuint> value, Vector128<nuint> count);

        public static Vector256<nint> ShiftRightLogical(Vector256<nint> value, byte count);
        public static Vector256<nuint> ShiftRightLogical(Vector256<nuint> value, byte count);

        public static Vector256<nint> ShiftRightLogical128BitLane(Vector256<nint> value, byte numBytes);
        public static Vector256<nuint> ShiftRightLogical128BitLane(Vector256<nuint> value, byte numBytes);

        public static Vector256<nint> ShiftRightLogicalVariable(Vector256<nint> value, Vector256<nuint> count);
        public static Vector256<nuint> ShiftRightLogicalVariable(Vector256<nuint> value, Vector256<nuint> count);

        public static Vector128<nint> ShiftRightLogicalVariable(Vector128<nint> value, Vector128<nuint> count);
        public static Vector128<nuint> ShiftRightLogicalVariable(Vector128<nuint> value, Vector128<nuint> count);

        public static Vector256<nint> Subtract(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> Subtract(Vector256<nuint> left, Vector256<nuint> right);

        public static Vector256<nint> UnpackHigh(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> UnpackHigh(Vector256<nuint> left, Vector256<nuint> right);

        public static Vector256<nint> UnpackLow(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> UnpackLow(Vector256<nuint> left, Vector256<nuint> right);

        public static Vector256<nint> Xor(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> Xor(Vector256<nuint> left, Vector256<nuint> right);
    }

    public abstract partial class Bmi1
    {
        public static nuint AndNot(nuint left, nuint right);

        public static nuint BitFieldExtract(nuint value, byte start, byte length);
        public static nuint BitFieldExtract(nuint value, ushort control);

        public static nuint ExtractLowestSetBit(nuint value);

        public static nuint GetMaskUpToLowestSetBit(nuint value);

        public static nuint ResetLowestSetBit(nuint value);

        public static nuint TrailingZeroCount(nuint value);
    }

    public abstract partial class Bmi2
    {
        public static nuint ZeroHighBits(nuint value, nuint index);

        public static nuint MultiplyNoFlags(nuint left, nuint right);
        public static unsafe nuint MultiplyNoFlags(nuint left, nuint right, nuint* low);

        public static nuint ParallelBitDeposit(nuint value, nuint mask);

        public static nuint ParallelBitExtract(nuint value, nuint mask);
    }

    public abstract partial class Lzcnt
    {
        public static nuint LeadingZeroCount(nuint value);
    }

    public abstract partial class Popcnt
    {
        public static nuint PopCount(nuint value);
    }
}
@tannergooding tannergooding added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Apr 28, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Runtime.Intrinsics untriaged New issue has not been triaged by the area owner labels Apr 28, 2021
@ghost
Copy link

ghost commented Apr 28, 2021

Tagging subscribers to this area: @tannergooding
See info in area-owners.md if you want to be subscribed.

Issue Details

Proposal

Extend ISAs in System.Runtime.Intrinsics.X86 to support nint and nuint as valid primitive types. This will expose a number of new APIs to take the new types. Several of these APIs exist in Base vs Base.X64 splits. Others represent cases where either int or long are possible inputs today.

namespace System.Runtime.Intrinsics.X86
{
    public abstract partial class Sse
    {
        public static Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, nint value);

        public static nint ConvertToNInt(Vector128<float> value);
        public static nint ConvertToNIntWithTruncation(Vector128<float> value);
    }

    public abstract partial class Sse2
    {
        public static Vector128<double> ConvertScalarToVector128Double(Vector128<double> upper, nint value);

        public static Vector128<nint> ConvertScalarToVector128NInt(nint value);
        public static Vector128<nuint> ConvertScalarToVector128NUInt(nuint value);

        public static nint ConvertToNInt(Vector128<double> value);
        public static nint ConvertToNIntWithTruncation(Vector128<double> value);

        public static nint ConvertToNInt(Vector128<nint> value);
        public static nuint ConvertToNUInt(Vector128<nuint> value);

        public static unsafe void StoreNonTemporal(nint* address, nint value);
        public static unsafe void StoreNonTemporal(nuint* address, nuint value);

        public static Vector128<nint> Add(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> Add(Vector128<nuint> left, Vector128<nuint> right);

        public static Vector128<nint> And(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> And(Vector128<nuint> left, Vector128<nuint> right);

        public static Vector128<nint> AndNot(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> AndNot(Vector128<nuint> left, Vector128<nuint> right);

        public static unsafe Vector128<nint> LoadVector128(nint* address);
        public static unsafe Vector128<nuint> LoadVector128(nuint* address);

        public static unsafe Vector128<nint> LoadAlignedVector128(nint* address);
        public static unsafe Vector128<nuint> LoadAlignedVector128(nuint* address);

        public static unsafe Vector128<nint> LoadScalarVector128(nint* address);
        public static unsafe Vector128<nuint> LoadScalarVector128(nuint* address);

        public static Vector128<nint> MoveScalar(Vector128<nint> value);
        public static Vector128<nuint> MoveScalar(Vector128<nuint> value);

        public static Vector128<nint> Or(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> Or(Vector128<nuint> left, Vector128<nuint> right);

        public static Vector128<nint> ShiftLeftLogical(Vector128<nint> value, Vector128<nint> count);
        public static Vector128<nuint> ShiftLeftLogical(Vector128<nuint> value, Vector128<nuint> count);

        public static Vector128<nint> ShiftLeftLogical(Vector128<nint> value, byte count);
        public static Vector128<nuint> ShiftLeftLogical(Vector128<nuint> value, byte count);

        public static Vector128<nint> ShiftLeftLogical128BitLane(Vector128<nint> value, byte numBytes);
        public static Vector128<nuint> ShiftLeftLogical128BitLane(Vector128<nuint> value, byte numBytes);

        public static Vector128<nint> ShiftRightLogical(Vector128<nint> value, Vector128<nint> count);
        public static Vector128<nuint> ShiftRightLogical(Vector128<nuint> value, Vector128<nuint> count);

        public static Vector128<nint> ShiftRightLogical(Vector128<nint> value, byte count);
        public static Vector128<nuint> ShiftRightLogical(Vector128<nuint> value, byte count);

        public static Vector128<nint> ShiftRightLogical128BitLane(Vector128<nint> value, byte numBytes);
        public static Vector128<nuint> ShiftRightLogical128BitLane(Vector128<nuint> value, byte numBytes);

        public static unsafe void StoreScalar(nint* address, Vector128<nint> source);
        public static unsafe void StoreScalar(nuint* address, Vector128<nuint> source);

        public static unsafe void StoreAligned(nint* address, Vector128<nint> source);
        public static unsafe void StoreAligned(nuint* address, Vector128<nuint> source);

        public static unsafe void StoreAlignedNonTemporal(nint* address, Vector128<nint> source);
        public static unsafe void StoreAlignedNonTemporal(nuint* address, Vector128<nuint> source);

        public static unsafe void Store(nint* address, Vector128<nint> source);
        public static unsafe void Store(nuint* address, Vector128<nuint> source);

        public static Vector128<nint> Subtract(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> Subtract(Vector128<nuint> left, Vector128<nuint> right);

        public static Vector128<nint> UnpackHigh(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> UnpackHigh(Vector128<nuint> left, Vector128<nuint> right);

        public static Vector128<nint> UnpackLow(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> UnpackLow(Vector128<nuint> left, Vector128<nuint> right);

        public static Vector128<nint> Xor(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> Xor(Vector128<nuint> left, Vector128<nuint> right);
    }

    public abstract partial class Sse3
    {
        public static unsafe Vector128<nint> LoadDquVector128(nint* address);
        public static unsafe Vector128<nuint> LoadDquVector128(nuint* address);
    }

    public abstract partial class Ssse3
    {
        public static Vector128<nint> AlignRight(Vector128<nint> left, Vector128<nint> right, byte mask);
        public static Vector128<nuint> AlignRight(Vector128<nuint> left, Vector128<nuint> right, byte mask);
    }

    public abstract partial class Sse41
    {
        public static nint Extract(Vector128<nint> value, byte index);
        public static nuint Extract(Vector128<nuint> value, byte index);

        public static Vector128<nint> Insert(Vector128<nint> value, nint data, byte index);
        public static Vector128<nuint> Insert(Vector128<nuint> value, nuint data, byte index);

        public static Vector128<nint> BlendVariable(Vector128<nint> left, Vector128<nint> right, Vector128<nint> mask);
        public static Vector128<nuint> BlendVariable(Vector128<nuint> left, Vector128<nuint> right, Vector128<nuint> mask);

        public static Vector128<nint> CompareEqual(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> CompareEqual(Vector128<nuint> left, Vector128<nuint> right);

        public static Vector128<nint> ConvertToVector128NInt(Vector128<sbyte> value);
        public static Vector128<nint> ConvertToVector128NInt(Vector128<byte> value);
        public static Vector128<nint> ConvertToVector128NInt(Vector128<short> value);
        public static Vector128<nint> ConvertToVector128NInt(Vector128<ushort> value);
        public static Vector128<nint> ConvertToVector128NInt(Vector128<int> value);
        public static Vector128<nint> ConvertToVector128NInt(Vector128<uint> value);

        public static unsafe Vector128<nint> ConvertToVector128NInt(sbyte* address);
        public static unsafe Vector128<nint> ConvertToVector128NInt(byte* address);
        public static unsafe Vector128<nint> ConvertToVector128NInt(short* address);
        public static unsafe Vector128<nint> ConvertToVector128NInt(ushort* address);
        public static unsafe Vector128<nint> ConvertToVector128NInt(int* address);
        public static unsafe Vector128<nint> ConvertToVector128NInt(uint* address);

        public static Vector128<nint> Multiply(Vector128<int> left, Vector128<int> right);

        public static unsafe Vector128<nint> LoadAlignedVector128NonTemporal(nint* address);
        public static unsafe Vector128<nuint> LoadAlignedVector128NonTemporal(nuint* address);

        public static bool TestC(Vector128<nint> left, Vector128<nint> right);
        public static bool TestC(Vector128<nuint> left, Vector128<nuint> right);

        public static bool TestNotZAndNotC(Vector128<nint> left, Vector128<nint> right);
        public static bool TestNotZAndNotC(Vector128<nuint> left, Vector128<nuint> right);

        public static bool TestZ(Vector128<nint> left, Vector128<nint> right);
        public static bool TestZ(Vector128<nuint> left, Vector128<nuint> right);
    }

    public abstract partial class Sse42
    {
        public static Vector128<nint> CompareGreaterThan(Vector128<nint> left, Vector128<nint> right);

        public static nuint Crc32(nuint crc, nuint data);
    }

    public abstract partial class Avx
    {
        public static Vector128<nint> ExtractVector128(Vector256<nint> value, byte index);
        public static Vector128<nuint> ExtractVector128(Vector256<nuint> value, byte index);

        public static Vector256<nint> InsertVector128(Vector256<nint> value, Vector128<nint> data, byte index);
        public static Vector256<nuint> InsertVector128(Vector256<nuint> value, Vector128<nuint> data, byte index);

        public static unsafe Vector256<nint> LoadVector256(nint* address);
        public static unsafe Vector256<nuint> LoadVector256(nuint* address);

        public static unsafe Vector256<nint> LoadAlignedVector256(nint* address);
        public static unsafe Vector256<nuint> LoadAlignedVector256(nuint* address);

        public static unsafe Vector256<nint> LoadDquVector256(nint* address);
        public static unsafe Vector256<nuint> LoadDquVector256(nuint* address);

        public static Vector256<nint> Permute2x128(Vector256<nint> left, Vector256<nint> right, byte control);
        public static Vector256<nuint> Permute2x128(Vector256<nuint> left, Vector256<nuint> right, byte control);

        public static unsafe void StoreAligned(nint* address, Vector256<nint> source);
        public static unsafe void StoreAligned(nuint* address, Vector256<nuint> source);

        public static unsafe void StoreAlignedNonTemporal(nint* address, Vector256<nint> source);
        public static unsafe void StoreAlignedNonTemporal(nuint* address, Vector256<nuint> source);

        public static unsafe void Store(nint* address, Vector256<nint> source);
        public static unsafe void Store(nuint* address, Vector256<nuint> source);

        public static bool TestC(Vector256<nint> left, Vector256<nint> right);
        public static bool TestC(Vector256<nuint> left, Vector256<nuint> right);

        public static bool TestNotZAndNotC(Vector256<nint> left, Vector256<nint> right);
        public static bool TestNotZAndNotC(Vector256<nuint> left, Vector256<nuint> right);

        public static bool TestZ(Vector256<nint> left, Vector256<nint> right);
        public static bool TestZ(Vector256<nuint> left, Vector256<nuint> right);
    }

    public abstract partial class Avx2
    {
        public static Vector256<nint> Add(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> Add(Vector256<nuint> left, Vector256<nuint> right);

        public static Vector256<nint> AlignRight(Vector256<nint> left, Vector256<nint> right, byte mask);
        public static Vector256<nuint> AlignRight(Vector256<nuint> left, Vector256<nuint> right, byte mask);

        public static Vector256<nint> And(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> And(Vector256<nuint> left, Vector256<nuint> right);

        public static Vector256<nint> AndNot(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> AndNot(Vector256<nuint> left, Vector256<nuint> right);

        public static Vector256<nint> BlendVariable(Vector256<nint> left, Vector256<nint> right, Vector256<nint> mask);
        public static Vector256<nuint> BlendVariable(Vector256<nuint> left, Vector256<nuint> right, Vector256<nuint> mask);

        public static Vector128<nint> BroadcastScalarToVector128(Vector128<nint> value);
        public static Vector128<nuint> BroadcastScalarToVector128(Vector128<nuint> value);

        public static unsafe Vector128<nint> BroadcastScalarToVector128(nint* source);
        public static unsafe Vector128<nuint> BroadcastScalarToVector128(nuint* source);

        public static Vector256<nint> BroadcastScalarToVector256(Vector128<nint> value);
        public static Vector256<nuint> BroadcastScalarToVector256(Vector128<nuint> value);

        public static unsafe Vector256<nint> BroadcastScalarToVector256(nint* source);
        public static unsafe Vector256<nuint> BroadcastScalarToVector256(nuint* source);

        public static unsafe Vector256<nint> BroadcastVector128ToVector256(nint* address);
        public static unsafe Vector256<nuint> BroadcastVector128ToVector256(nuint* address);

        public static Vector256<nint> CompareEqual(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> CompareEqual(Vector256<nuint> left, Vector256<nuint> right);

        public static Vector256<nint> CompareGreaterThan(Vector256<nint> left, Vector256<nint> right);

        public static Vector256<nint> ConvertToVector256NInt(Vector128<sbyte> value);
        public static Vector256<nint> ConvertToVector256NInt(Vector128<byte> value);
        public static Vector256<nint> ConvertToVector256NInt(Vector128<short> value);
        public static Vector256<nint> ConvertToVector256NInt(Vector128<ushort> value);
        public static Vector256<nint> ConvertToVector256NInt(Vector128<int> value);
        public static Vector256<nint> ConvertToVector256NInt(Vector128<uint> value);

        public static unsafe Vector256<nint> ConvertToVector256NInt(sbyte* address);
        public static unsafe Vector256<nint> ConvertToVector256NInt(byte* address);
        public static unsafe Vector256<nint> ConvertToVector256NInt(short* address);
        public static unsafe Vector256<nint> ConvertToVector256NInt(ushort* address);
        public static unsafe Vector256<nint> ConvertToVector256NInt(int* address);
        public static unsafe Vector256<nint> ConvertToVector256NInt(uint* address);

        public static new Vector128<nint> ExtractVector128(Vector256<nint> value, byte index);
        public static new Vector128<nuint> ExtractVector128(Vector256<nuint> value, byte index);

        public static unsafe Vector128<nint> GatherVector128(nint* baseAddress, Vector128<int> index, byte scale);
        public static unsafe Vector128<nuint> GatherVector128(nuint* baseAddress, Vector128<int> index, byte scale);

        public static unsafe Vector128<int> GatherVector128(int* baseAddress, Vector128<nint> index, byte scale);
        public static unsafe Vector128<uint> GatherVector128(uint* baseAddress, Vector128<nint> index, byte scale);
        public static unsafe Vector128<nint> GatherVector128(long* baseAddress, Vector128<nint> index, byte scale);
        public static unsafe Vector128<nuint> GatherVector128(ulong* baseAddress, Vector128<nint> index, byte scale);
        public static unsafe Vector128<nint> GatherVector128(nint* baseAddress, Vector128<nint> index, byte scale);
        public static unsafe Vector128<nuint> GatherVector128(nuint* baseAddress, Vector128<nint> index, byte scale);
        public static unsafe Vector128<float> GatherVector128(float* baseAddress, Vector128<nint> index, byte scale);
        public static unsafe Vector128<double> GatherVector128(double* baseAddress, Vector128<nint> index, byte scale);

        public static unsafe Vector256<nint> GatherVector256(nint* baseAddress, Vector128<int> index, byte scale);
        public static unsafe Vector256<nuint> GatherVector256(nuint* baseAddress, Vector128<int> index, byte scale);

        public static unsafe Vector128<int> GatherVector128(int* baseAddress, Vector256<nint> index, byte scale);
        public static unsafe Vector128<uint> GatherVector128(uint* baseAddress, Vector256<nint> index, byte scale);
        public static unsafe Vector256<nint> GatherVector256(long* baseAddress, Vector256<nint> index, byte scale);
        public static unsafe Vector256<nuint> GatherVector256(ulong* baseAddress, Vector256<nint> index, byte scale);
        public static unsafe Vector256<nint> GatherVector256(nint* baseAddress, Vector256<nint> index, byte scale);
        public static unsafe Vector256<nuint> GatherVector256(nuint* baseAddress, Vector256<nint> index, byte scale);
        public static unsafe Vector128<float> GatherVector128(float* baseAddress, Vector256<nint> index, byte scale);
        public static unsafe Vector256<double> GatherVector256(double* baseAddress, Vector256<nint> index, byte scale);

        public static unsafe Vector128<nint> GatherMaskVector128(Vector128<nint> source, nint* baseAddress, Vector128<int> index, Vector128<nint> mask, byte scale);
        public static unsafe Vector128<nuint> GatherMaskVector128(Vector128<nuint> source, nuint* baseAddress, Vector128<int> index, Vector128<nuint> mask, byte scale);

        public static unsafe Vector128<int> GatherMaskVector128(Vector128<int> source, int* baseAddress, Vector128<nint> index, Vector128<int> mask, byte scale);
        public static unsafe Vector128<uint> GatherMaskVector128(Vector128<uint> source, uint* baseAddress, Vector128<nint> index, Vector128<uint> mask, byte scale);
        public static unsafe Vector128<long> GatherMaskVector128(Vector128<long> source, long* baseAddress, Vector128<nint> index, Vector128<long> mask, byte scale);
        public static unsafe Vector128<ulong> GatherMaskVector128(Vector128<ulong> source, ulong* baseAddress, Vector128<nint> index, Vector128<long> mask, byte scale);

        public static unsafe Vector128<nint> GatherMaskVector128(Vector128<nint> source, nint* baseAddress, Vector128<nint> index, Vector128<nint> mask, byte scale);
        public static unsafe Vector128<nuint> GatherMaskVector128(Vector128<nuint> source, nuint* baseAddress, Vector128<nint> index, Vector128<nuint> mask, byte scale);
        public static unsafe Vector128<float> GatherMaskVector128(Vector128<float> source, float* baseAddress, Vector128<nint> index, Vector128<float> mask, byte scale);
        public static unsafe Vector128<double> GatherMaskVector128(Vector128<double> source, double* baseAddress, Vector128<nint> index, Vector128<double> mask, byte scale);

        public static unsafe Vector256<nint> GatherMaskVector256(Vector256<nint> source, nint* baseAddress, Vector128<int> index, Vector256<nint> mask, byte scale);
        public static unsafe Vector256<nuint> GatherMaskVector256(Vector256<nuint> source, nuint* baseAddress, Vector128<int> index, Vector256<nuint> mask, byte scale);

        public static unsafe Vector128<int> GatherMaskVector128(Vector128<int> source, int* baseAddress, Vector256<nint> index, Vector128<int> mask, byte scale);
        public static unsafe Vector128<uint> GatherMaskVector128(Vector128<uint> source, uint* baseAddress, Vector256<nint> index, Vector128<uint> mask, byte scale);
        public static unsafe Vector256<long> GatherMaskVector256(Vector256<long> source, long* baseAddress, Vector256<nint> index, Vector256<long> mask, byte scale);
        public static unsafe Vector256<ulong> GatherMaskVector256(Vector256<ulong> source, ulong* baseAddress, Vector256<nint> index, Vector256<ulong> mask, byte scale);

        public static unsafe Vector256<nint> GatherMaskVector256(Vector256<nint> source, nint* baseAddress, Vector256<nint> index, Vector256<nint> mask, byte scale);
        public static unsafe Vector256<nuint> GatherMaskVector256(Vector256<nuint> source, nuint* baseAddress, Vector256<nint> index, Vector256<nuint> mask, byte scale);
        public static unsafe Vector128<float> GatherMaskVector128(Vector128<float> source, float* baseAddress, Vector256<nint> index, Vector128<float> mask, byte scale);
        public static unsafe Vector256<double> GatherMaskVector256(Vector256<double> source, double* baseAddress, Vector256<nint> index, Vector256<double> mask, byte scale);

        public static new Vector256<nint> InsertVector128(Vector256<nint> value, Vector128<nint> data, byte index);
        public static new Vector256<nuint> InsertVector128(Vector256<nuint> value, Vector128<nuint> data, byte index);

        public static unsafe Vector256<nint> LoadAlignedVector256NonTemporal(nint* address);
        public static unsafe Vector256<nuint> LoadAlignedVector256NonTemporal(nuint* address);

        public static unsafe Vector128<nint> MaskLoad(nint* address, Vector128<nint> mask);
        public static unsafe Vector128<nuint> MaskLoad(nuint* address, Vector128<nuint> mask);

        public static unsafe Vector256<nint> MaskLoad(nint* address, Vector256<nint> mask);
        public static unsafe Vector256<nuint> MaskLoad(nuint* address, Vector256<nuint> mask);

        public static unsafe void MaskStore(nint* address, Vector128<nint> mask, Vector128<nint> source);
        public static unsafe void MaskStore(nuint* address, Vector128<nuint> mask, Vector128<nuint> source);

        public static unsafe void MaskStore(nint* address, Vector256<nint> mask, Vector256<nint> source);
        public static unsafe void MaskStore(nuint* address, Vector256<nuint> mask, Vector256<nuint> source);

        public static Vector256<nint> Or(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> Or(Vector256<nuint> left, Vector256<nuint> right);

        public static new Vector256<nint> Permute2x128(Vector256<nint> left, Vector256<nint> right, byte control);
        public static new Vector256<nuint> Permute2x128(Vector256<nuint> left, Vector256<nuint> right, byte control);

        public static Vector256<nint> Permute4x64(Vector256<nint> value, byte control);
        public static Vector256<nuint> Permute4x64(Vector256<nuint> value, byte control);

        public static Vector256<nint> ShiftLeftLogical(Vector256<nint> value, Vector128<nint> count);
        public static Vector256<nuint> ShiftLeftLogical(Vector256<nuint> value, Vector128<nuint> count);

        public static Vector256<nint> ShiftLeftLogical(Vector256<nint> value, byte count);
        public static Vector256<nuint> ShiftLeftLogical(Vector256<nuint> value, byte count);

        public static Vector256<nint> ShiftLeftLogical128BitLane(Vector256<nint> value, byte numBytes);
        public static Vector256<nuint> ShiftLeftLogical128BitLane(Vector256<nuint> value, byte numBytes);

        public static Vector256<nint> ShiftLeftLogicalVariable(Vector256<nint> value, Vector256<nuint> count);
        public static Vector256<nuint> ShiftLeftLogicalVariable(Vector256<nuint> value, Vector256<nuint> count);

        public static Vector128<nint> ShiftLeftLogicalVariable(Vector128<nint> value, Vector128<nuint> count);
        public static Vector128<nuint> ShiftLeftLogicalVariable(Vector128<nuint> value, Vector128<nuint> count);

        public static Vector256<nint> ShiftRightLogical(Vector256<nint> value, Vector128<nint> count);
        public static Vector256<nuint> ShiftRightLogical(Vector256<nuint> value, Vector128<nuint> count);

        public static Vector256<nint> ShiftRightLogical(Vector256<nint> value, byte count);
        public static Vector256<nuint> ShiftRightLogical(Vector256<nuint> value, byte count);

        public static Vector256<nint> ShiftRightLogical128BitLane(Vector256<nint> value, byte numBytes);
        public static Vector256<nuint> ShiftRightLogical128BitLane(Vector256<nuint> value, byte numBytes);

        public static Vector256<nint> ShiftRightLogicalVariable(Vector256<nint> value, Vector256<nuint> count);
        public static Vector256<nuint> ShiftRightLogicalVariable(Vector256<nuint> value, Vector256<nuint> count);

        public static Vector128<nint> ShiftRightLogicalVariable(Vector128<nint> value, Vector128<nuint> count);
        public static Vector128<nuint> ShiftRightLogicalVariable(Vector128<nuint> value, Vector128<nuint> count);

        public static Vector256<nint> Subtract(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> Subtract(Vector256<nuint> left, Vector256<nuint> right);

        public static Vector256<nint> UnpackHigh(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> UnpackHigh(Vector256<nuint> left, Vector256<nuint> right);

        public static Vector256<nint> UnpackLow(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> UnpackLow(Vector256<nuint> left, Vector256<nuint> right);

        public static Vector256<nint> Xor(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> Xor(Vector256<nuint> left, Vector256<nuint> right);
    }

    public abstract partial class Bmi1
    {
        public static nuint AndNot(nuint left, nuint right);

        public static nuint BitFieldExtract(nuint value, byte start, byte length);
        public static nuint BitFieldExtract(nuint value, ushort control);

        public static nuint ExtractLowestSetBit(nuint value);

        public static nuint GetMaskUpToLowestSetBit(nuint value);

        public static nuint ResetLowestSetBit(nuint value);

        public static nuint TrailingZeroCount(nuint value);
    }

    public abstract partial class Bmi2
    {
        public static nuint ZeroHighBits(nuint value, nuint index);

        public static nuint MultiplyNoFlags(nuint left, nuint right);
        public static unsafe nuint MultiplyNoFlags(nuint left, nuint right, nuint* low);

        public static nuint ParallelBitDeposit(nuint value, nuint mask);

        public static nuint ParallelBitExtract(nuint value, nuint mask);
    }

    public abstract partial class Lzcnt
    {
        public static nuint LeadingZeroCount(nuint value);
    }

    public abstract partial class Popcnt
    {
        public static nuint PopCount(nuint value);
    }
}
Author: tannergooding
Assignees: -
Labels:

api-suggestion, area-System.Runtime.Intrinsics, untriaged

Milestone: -

@tannergooding tannergooding added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation untriaged New issue has not been triaged by the area owner labels Apr 28, 2021
@bartonjs
Copy link
Member

bartonjs commented May 27, 2021

Video

Looks good as proposed.

namespace System.Runtime.Intrinsics.X86
{
    public abstract partial class Sse
    {
        public static Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, nint value);

        public static nint ConvertToNInt(Vector128<float> value);
        public static nint ConvertToNIntWithTruncation(Vector128<float> value);
    }

    public abstract partial class Sse2
    {
        public static Vector128<double> ConvertScalarToVector128Double(Vector128<double> upper, nint value);

        public static Vector128<nint> ConvertScalarToVector128NInt(nint value);
        public static Vector128<nuint> ConvertScalarToVector128NUInt(nuint value);

        public static nint ConvertToNInt(Vector128<double> value);
        public static nint ConvertToNIntWithTruncation(Vector128<double> value);

        public static nint ConvertToNInt(Vector128<nint> value);
        public static nuint ConvertToNUInt(Vector128<nuint> value);

        public static unsafe void StoreNonTemporal(nint* address, nint value);
        public static unsafe void StoreNonTemporal(nuint* address, nuint value);

        public static Vector128<nint> Add(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> Add(Vector128<nuint> left, Vector128<nuint> right);

        public static Vector128<nint> And(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> And(Vector128<nuint> left, Vector128<nuint> right);

        public static Vector128<nint> AndNot(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> AndNot(Vector128<nuint> left, Vector128<nuint> right);

        public static unsafe Vector128<nint> LoadVector128(nint* address);
        public static unsafe Vector128<nuint> LoadVector128(nuint* address);

        public static unsafe Vector128<nint> LoadAlignedVector128(nint* address);
        public static unsafe Vector128<nuint> LoadAlignedVector128(nuint* address);

        public static unsafe Vector128<nint> LoadScalarVector128(nint* address);
        public static unsafe Vector128<nuint> LoadScalarVector128(nuint* address);

        public static Vector128<nint> MoveScalar(Vector128<nint> value);
        public static Vector128<nuint> MoveScalar(Vector128<nuint> value);

        public static Vector128<nint> Or(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> Or(Vector128<nuint> left, Vector128<nuint> right);

        public static Vector128<nint> ShiftLeftLogical(Vector128<nint> value, Vector128<nint> count);
        public static Vector128<nuint> ShiftLeftLogical(Vector128<nuint> value, Vector128<nuint> count);

        public static Vector128<nint> ShiftLeftLogical(Vector128<nint> value, byte count);
        public static Vector128<nuint> ShiftLeftLogical(Vector128<nuint> value, byte count);

        public static Vector128<nint> ShiftLeftLogical128BitLane(Vector128<nint> value, byte numBytes);
        public static Vector128<nuint> ShiftLeftLogical128BitLane(Vector128<nuint> value, byte numBytes);

        public static Vector128<nint> ShiftRightLogical(Vector128<nint> value, Vector128<nint> count);
        public static Vector128<nuint> ShiftRightLogical(Vector128<nuint> value, Vector128<nuint> count);

        public static Vector128<nint> ShiftRightLogical(Vector128<nint> value, byte count);
        public static Vector128<nuint> ShiftRightLogical(Vector128<nuint> value, byte count);

        public static Vector128<nint> ShiftRightLogical128BitLane(Vector128<nint> value, byte numBytes);
        public static Vector128<nuint> ShiftRightLogical128BitLane(Vector128<nuint> value, byte numBytes);

        public static unsafe void StoreScalar(nint* address, Vector128<nint> source);
        public static unsafe void StoreScalar(nuint* address, Vector128<nuint> source);

        public static unsafe void StoreAligned(nint* address, Vector128<nint> source);
        public static unsafe void StoreAligned(nuint* address, Vector128<nuint> source);

        public static unsafe void StoreAlignedNonTemporal(nint* address, Vector128<nint> source);
        public static unsafe void StoreAlignedNonTemporal(nuint* address, Vector128<nuint> source);

        public static unsafe void Store(nint* address, Vector128<nint> source);
        public static unsafe void Store(nuint* address, Vector128<nuint> source);

        public static Vector128<nint> Subtract(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> Subtract(Vector128<nuint> left, Vector128<nuint> right);

        public static Vector128<nint> UnpackHigh(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> UnpackHigh(Vector128<nuint> left, Vector128<nuint> right);

        public static Vector128<nint> UnpackLow(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> UnpackLow(Vector128<nuint> left, Vector128<nuint> right);

        public static Vector128<nint> Xor(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> Xor(Vector128<nuint> left, Vector128<nuint> right);
    }

    public abstract partial class Sse3
    {
        public static unsafe Vector128<nint> LoadDquVector128(nint* address);
        public static unsafe Vector128<nuint> LoadDquVector128(nuint* address);
    }

    public abstract partial class Ssse3
    {
        public static Vector128<nint> AlignRight(Vector128<nint> left, Vector128<nint> right, byte mask);
        public static Vector128<nuint> AlignRight(Vector128<nuint> left, Vector128<nuint> right, byte mask);
    }

    public abstract partial class Sse41
    {
        public static nint Extract(Vector128<nint> value, byte index);
        public static nuint Extract(Vector128<nuint> value, byte index);

        public static Vector128<nint> Insert(Vector128<nint> value, nint data, byte index);
        public static Vector128<nuint> Insert(Vector128<nuint> value, nuint data, byte index);

        public static Vector128<nint> BlendVariable(Vector128<nint> left, Vector128<nint> right, Vector128<nint> mask);
        public static Vector128<nuint> BlendVariable(Vector128<nuint> left, Vector128<nuint> right, Vector128<nuint> mask);

        public static Vector128<nint> CompareEqual(Vector128<nint> left, Vector128<nint> right);
        public static Vector128<nuint> CompareEqual(Vector128<nuint> left, Vector128<nuint> right);

        public static Vector128<nint> ConvertToVector128NInt(Vector128<sbyte> value);
        public static Vector128<nint> ConvertToVector128NInt(Vector128<byte> value);
        public static Vector128<nint> ConvertToVector128NInt(Vector128<short> value);
        public static Vector128<nint> ConvertToVector128NInt(Vector128<ushort> value);
        public static Vector128<nint> ConvertToVector128NInt(Vector128<int> value);
        public static Vector128<nint> ConvertToVector128NInt(Vector128<uint> value);

        public static unsafe Vector128<nint> ConvertToVector128NInt(sbyte* address);
        public static unsafe Vector128<nint> ConvertToVector128NInt(byte* address);
        public static unsafe Vector128<nint> ConvertToVector128NInt(short* address);
        public static unsafe Vector128<nint> ConvertToVector128NInt(ushort* address);
        public static unsafe Vector128<nint> ConvertToVector128NInt(int* address);
        public static unsafe Vector128<nint> ConvertToVector128NInt(uint* address);

        public static Vector128<nint> Multiply(Vector128<int> left, Vector128<int> right);

        public static unsafe Vector128<nint> LoadAlignedVector128NonTemporal(nint* address);
        public static unsafe Vector128<nuint> LoadAlignedVector128NonTemporal(nuint* address);

        public static bool TestC(Vector128<nint> left, Vector128<nint> right);
        public static bool TestC(Vector128<nuint> left, Vector128<nuint> right);

        public static bool TestNotZAndNotC(Vector128<nint> left, Vector128<nint> right);
        public static bool TestNotZAndNotC(Vector128<nuint> left, Vector128<nuint> right);

        public static bool TestZ(Vector128<nint> left, Vector128<nint> right);
        public static bool TestZ(Vector128<nuint> left, Vector128<nuint> right);
    }

    public abstract partial class Sse42
    {
        public static Vector128<nint> CompareGreaterThan(Vector128<nint> left, Vector128<nint> right);

        public static nuint Crc32(nuint crc, nuint data);
    }

    public abstract partial class Avx
    {
        public static Vector128<nint> ExtractVector128(Vector256<nint> value, byte index);
        public static Vector128<nuint> ExtractVector128(Vector256<nuint> value, byte index);

        public static Vector256<nint> InsertVector128(Vector256<nint> value, Vector128<nint> data, byte index);
        public static Vector256<nuint> InsertVector128(Vector256<nuint> value, Vector128<nuint> data, byte index);

        public static unsafe Vector256<nint> LoadVector256(nint* address);
        public static unsafe Vector256<nuint> LoadVector256(nuint* address);

        public static unsafe Vector256<nint> LoadAlignedVector256(nint* address);
        public static unsafe Vector256<nuint> LoadAlignedVector256(nuint* address);

        public static unsafe Vector256<nint> LoadDquVector256(nint* address);
        public static unsafe Vector256<nuint> LoadDquVector256(nuint* address);

        public static Vector256<nint> Permute2x128(Vector256<nint> left, Vector256<nint> right, byte control);
        public static Vector256<nuint> Permute2x128(Vector256<nuint> left, Vector256<nuint> right, byte control);

        public static unsafe void StoreAligned(nint* address, Vector256<nint> source);
        public static unsafe void StoreAligned(nuint* address, Vector256<nuint> source);

        public static unsafe void StoreAlignedNonTemporal(nint* address, Vector256<nint> source);
        public static unsafe void StoreAlignedNonTemporal(nuint* address, Vector256<nuint> source);

        public static unsafe void Store(nint* address, Vector256<nint> source);
        public static unsafe void Store(nuint* address, Vector256<nuint> source);

        public static bool TestC(Vector256<nint> left, Vector256<nint> right);
        public static bool TestC(Vector256<nuint> left, Vector256<nuint> right);

        public static bool TestNotZAndNotC(Vector256<nint> left, Vector256<nint> right);
        public static bool TestNotZAndNotC(Vector256<nuint> left, Vector256<nuint> right);

        public static bool TestZ(Vector256<nint> left, Vector256<nint> right);
        public static bool TestZ(Vector256<nuint> left, Vector256<nuint> right);
    }

    public abstract partial class Avx2
    {
        public static Vector256<nint> Add(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> Add(Vector256<nuint> left, Vector256<nuint> right);

        public static Vector256<nint> AlignRight(Vector256<nint> left, Vector256<nint> right, byte mask);
        public static Vector256<nuint> AlignRight(Vector256<nuint> left, Vector256<nuint> right, byte mask);

        public static Vector256<nint> And(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> And(Vector256<nuint> left, Vector256<nuint> right);

        public static Vector256<nint> AndNot(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> AndNot(Vector256<nuint> left, Vector256<nuint> right);

        public static Vector256<nint> BlendVariable(Vector256<nint> left, Vector256<nint> right, Vector256<nint> mask);
        public static Vector256<nuint> BlendVariable(Vector256<nuint> left, Vector256<nuint> right, Vector256<nuint> mask);

        public static Vector128<nint> BroadcastScalarToVector128(Vector128<nint> value);
        public static Vector128<nuint> BroadcastScalarToVector128(Vector128<nuint> value);

        public static unsafe Vector128<nint> BroadcastScalarToVector128(nint* source);
        public static unsafe Vector128<nuint> BroadcastScalarToVector128(nuint* source);

        public static Vector256<nint> BroadcastScalarToVector256(Vector128<nint> value);
        public static Vector256<nuint> BroadcastScalarToVector256(Vector128<nuint> value);

        public static unsafe Vector256<nint> BroadcastScalarToVector256(nint* source);
        public static unsafe Vector256<nuint> BroadcastScalarToVector256(nuint* source);

        public static unsafe Vector256<nint> BroadcastVector128ToVector256(nint* address);
        public static unsafe Vector256<nuint> BroadcastVector128ToVector256(nuint* address);

        public static Vector256<nint> CompareEqual(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> CompareEqual(Vector256<nuint> left, Vector256<nuint> right);

        public static Vector256<nint> CompareGreaterThan(Vector256<nint> left, Vector256<nint> right);

        public static Vector256<nint> ConvertToVector256NInt(Vector128<sbyte> value);
        public static Vector256<nint> ConvertToVector256NInt(Vector128<byte> value);
        public static Vector256<nint> ConvertToVector256NInt(Vector128<short> value);
        public static Vector256<nint> ConvertToVector256NInt(Vector128<ushort> value);
        public static Vector256<nint> ConvertToVector256NInt(Vector128<int> value);
        public static Vector256<nint> ConvertToVector256NInt(Vector128<uint> value);

        public static unsafe Vector256<nint> ConvertToVector256NInt(sbyte* address);
        public static unsafe Vector256<nint> ConvertToVector256NInt(byte* address);
        public static unsafe Vector256<nint> ConvertToVector256NInt(short* address);
        public static unsafe Vector256<nint> ConvertToVector256NInt(ushort* address);
        public static unsafe Vector256<nint> ConvertToVector256NInt(int* address);
        public static unsafe Vector256<nint> ConvertToVector256NInt(uint* address);

        public static new Vector128<nint> ExtractVector128(Vector256<nint> value, byte index);
        public static new Vector128<nuint> ExtractVector128(Vector256<nuint> value, byte index);

        public static unsafe Vector128<nint> GatherVector128(nint* baseAddress, Vector128<int> index, byte scale);
        public static unsafe Vector128<nuint> GatherVector128(nuint* baseAddress, Vector128<int> index, byte scale);

        public static unsafe Vector128<int> GatherVector128(int* baseAddress, Vector128<nint> index, byte scale);
        public static unsafe Vector128<uint> GatherVector128(uint* baseAddress, Vector128<nint> index, byte scale);
        public static unsafe Vector128<nint> GatherVector128(long* baseAddress, Vector128<nint> index, byte scale);
        public static unsafe Vector128<nuint> GatherVector128(ulong* baseAddress, Vector128<nint> index, byte scale);
        public static unsafe Vector128<nint> GatherVector128(nint* baseAddress, Vector128<nint> index, byte scale);
        public static unsafe Vector128<nuint> GatherVector128(nuint* baseAddress, Vector128<nint> index, byte scale);
        public static unsafe Vector128<float> GatherVector128(float* baseAddress, Vector128<nint> index, byte scale);
        public static unsafe Vector128<double> GatherVector128(double* baseAddress, Vector128<nint> index, byte scale);

        public static unsafe Vector256<nint> GatherVector256(nint* baseAddress, Vector128<int> index, byte scale);
        public static unsafe Vector256<nuint> GatherVector256(nuint* baseAddress, Vector128<int> index, byte scale);

        public static unsafe Vector128<int> GatherVector128(int* baseAddress, Vector256<nint> index, byte scale);
        public static unsafe Vector128<uint> GatherVector128(uint* baseAddress, Vector256<nint> index, byte scale);
        public static unsafe Vector256<nint> GatherVector256(long* baseAddress, Vector256<nint> index, byte scale);
        public static unsafe Vector256<nuint> GatherVector256(ulong* baseAddress, Vector256<nint> index, byte scale);
        public static unsafe Vector256<nint> GatherVector256(nint* baseAddress, Vector256<nint> index, byte scale);
        public static unsafe Vector256<nuint> GatherVector256(nuint* baseAddress, Vector256<nint> index, byte scale);
        public static unsafe Vector128<float> GatherVector128(float* baseAddress, Vector256<nint> index, byte scale);
        public static unsafe Vector256<double> GatherVector256(double* baseAddress, Vector256<nint> index, byte scale);

        public static unsafe Vector128<nint> GatherMaskVector128(Vector128<nint> source, nint* baseAddress, Vector128<int> index, Vector128<nint> mask, byte scale);
        public static unsafe Vector128<nuint> GatherMaskVector128(Vector128<nuint> source, nuint* baseAddress, Vector128<int> index, Vector128<nuint> mask, byte scale);

        public static unsafe Vector128<int> GatherMaskVector128(Vector128<int> source, int* baseAddress, Vector128<nint> index, Vector128<int> mask, byte scale);
        public static unsafe Vector128<uint> GatherMaskVector128(Vector128<uint> source, uint* baseAddress, Vector128<nint> index, Vector128<uint> mask, byte scale);
        public static unsafe Vector128<long> GatherMaskVector128(Vector128<long> source, long* baseAddress, Vector128<nint> index, Vector128<long> mask, byte scale);
        public static unsafe Vector128<ulong> GatherMaskVector128(Vector128<ulong> source, ulong* baseAddress, Vector128<nint> index, Vector128<long> mask, byte scale);

        public static unsafe Vector128<nint> GatherMaskVector128(Vector128<nint> source, nint* baseAddress, Vector128<nint> index, Vector128<nint> mask, byte scale);
        public static unsafe Vector128<nuint> GatherMaskVector128(Vector128<nuint> source, nuint* baseAddress, Vector128<nint> index, Vector128<nuint> mask, byte scale);
        public static unsafe Vector128<float> GatherMaskVector128(Vector128<float> source, float* baseAddress, Vector128<nint> index, Vector128<float> mask, byte scale);
        public static unsafe Vector128<double> GatherMaskVector128(Vector128<double> source, double* baseAddress, Vector128<nint> index, Vector128<double> mask, byte scale);

        public static unsafe Vector256<nint> GatherMaskVector256(Vector256<nint> source, nint* baseAddress, Vector128<int> index, Vector256<nint> mask, byte scale);
        public static unsafe Vector256<nuint> GatherMaskVector256(Vector256<nuint> source, nuint* baseAddress, Vector128<int> index, Vector256<nuint> mask, byte scale);

        public static unsafe Vector128<int> GatherMaskVector128(Vector128<int> source, int* baseAddress, Vector256<nint> index, Vector128<int> mask, byte scale);
        public static unsafe Vector128<uint> GatherMaskVector128(Vector128<uint> source, uint* baseAddress, Vector256<nint> index, Vector128<uint> mask, byte scale);
        public static unsafe Vector256<long> GatherMaskVector256(Vector256<long> source, long* baseAddress, Vector256<nint> index, Vector256<long> mask, byte scale);
        public static unsafe Vector256<ulong> GatherMaskVector256(Vector256<ulong> source, ulong* baseAddress, Vector256<nint> index, Vector256<ulong> mask, byte scale);

        public static unsafe Vector256<nint> GatherMaskVector256(Vector256<nint> source, nint* baseAddress, Vector256<nint> index, Vector256<nint> mask, byte scale);
        public static unsafe Vector256<nuint> GatherMaskVector256(Vector256<nuint> source, nuint* baseAddress, Vector256<nint> index, Vector256<nuint> mask, byte scale);
        public static unsafe Vector128<float> GatherMaskVector128(Vector128<float> source, float* baseAddress, Vector256<nint> index, Vector128<float> mask, byte scale);
        public static unsafe Vector256<double> GatherMaskVector256(Vector256<double> source, double* baseAddress, Vector256<nint> index, Vector256<double> mask, byte scale);

        public static new Vector256<nint> InsertVector128(Vector256<nint> value, Vector128<nint> data, byte index);
        public static new Vector256<nuint> InsertVector128(Vector256<nuint> value, Vector128<nuint> data, byte index);

        public static unsafe Vector256<nint> LoadAlignedVector256NonTemporal(nint* address);
        public static unsafe Vector256<nuint> LoadAlignedVector256NonTemporal(nuint* address);

        public static unsafe Vector128<nint> MaskLoad(nint* address, Vector128<nint> mask);
        public static unsafe Vector128<nuint> MaskLoad(nuint* address, Vector128<nuint> mask);

        public static unsafe Vector256<nint> MaskLoad(nint* address, Vector256<nint> mask);
        public static unsafe Vector256<nuint> MaskLoad(nuint* address, Vector256<nuint> mask);

        public static unsafe void MaskStore(nint* address, Vector128<nint> mask, Vector128<nint> source);
        public static unsafe void MaskStore(nuint* address, Vector128<nuint> mask, Vector128<nuint> source);

        public static unsafe void MaskStore(nint* address, Vector256<nint> mask, Vector256<nint> source);
        public static unsafe void MaskStore(nuint* address, Vector256<nuint> mask, Vector256<nuint> source);

        public static Vector256<nint> Or(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> Or(Vector256<nuint> left, Vector256<nuint> right);

        public static new Vector256<nint> Permute2x128(Vector256<nint> left, Vector256<nint> right, byte control);
        public static new Vector256<nuint> Permute2x128(Vector256<nuint> left, Vector256<nuint> right, byte control);

        public static Vector256<nint> Permute4x64(Vector256<nint> value, byte control);
        public static Vector256<nuint> Permute4x64(Vector256<nuint> value, byte control);

        public static Vector256<nint> ShiftLeftLogical(Vector256<nint> value, Vector128<nint> count);
        public static Vector256<nuint> ShiftLeftLogical(Vector256<nuint> value, Vector128<nuint> count);

        public static Vector256<nint> ShiftLeftLogical(Vector256<nint> value, byte count);
        public static Vector256<nuint> ShiftLeftLogical(Vector256<nuint> value, byte count);

        public static Vector256<nint> ShiftLeftLogical128BitLane(Vector256<nint> value, byte numBytes);
        public static Vector256<nuint> ShiftLeftLogical128BitLane(Vector256<nuint> value, byte numBytes);

        public static Vector256<nint> ShiftLeftLogicalVariable(Vector256<nint> value, Vector256<nuint> count);
        public static Vector256<nuint> ShiftLeftLogicalVariable(Vector256<nuint> value, Vector256<nuint> count);

        public static Vector128<nint> ShiftLeftLogicalVariable(Vector128<nint> value, Vector128<nuint> count);
        public static Vector128<nuint> ShiftLeftLogicalVariable(Vector128<nuint> value, Vector128<nuint> count);

        public static Vector256<nint> ShiftRightLogical(Vector256<nint> value, Vector128<nint> count);
        public static Vector256<nuint> ShiftRightLogical(Vector256<nuint> value, Vector128<nuint> count);

        public static Vector256<nint> ShiftRightLogical(Vector256<nint> value, byte count);
        public static Vector256<nuint> ShiftRightLogical(Vector256<nuint> value, byte count);

        public static Vector256<nint> ShiftRightLogical128BitLane(Vector256<nint> value, byte numBytes);
        public static Vector256<nuint> ShiftRightLogical128BitLane(Vector256<nuint> value, byte numBytes);

        public static Vector256<nint> ShiftRightLogicalVariable(Vector256<nint> value, Vector256<nuint> count);
        public static Vector256<nuint> ShiftRightLogicalVariable(Vector256<nuint> value, Vector256<nuint> count);

        public static Vector128<nint> ShiftRightLogicalVariable(Vector128<nint> value, Vector128<nuint> count);
        public static Vector128<nuint> ShiftRightLogicalVariable(Vector128<nuint> value, Vector128<nuint> count);

        public static Vector256<nint> Subtract(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> Subtract(Vector256<nuint> left, Vector256<nuint> right);

        public static Vector256<nint> UnpackHigh(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> UnpackHigh(Vector256<nuint> left, Vector256<nuint> right);

        public static Vector256<nint> UnpackLow(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> UnpackLow(Vector256<nuint> left, Vector256<nuint> right);

        public static Vector256<nint> Xor(Vector256<nint> left, Vector256<nint> right);
        public static Vector256<nuint> Xor(Vector256<nuint> left, Vector256<nuint> right);
    }

    public abstract partial class Bmi1
    {
        public static nuint AndNot(nuint left, nuint right);

        public static nuint BitFieldExtract(nuint value, byte start, byte length);
        public static nuint BitFieldExtract(nuint value, ushort control);

        public static nuint ExtractLowestSetBit(nuint value);

        public static nuint GetMaskUpToLowestSetBit(nuint value);

        public static nuint ResetLowestSetBit(nuint value);

        public static nuint TrailingZeroCount(nuint value);
    }

    public abstract partial class Bmi2
    {
        public static nuint ZeroHighBits(nuint value, nuint index);

        public static nuint MultiplyNoFlags(nuint left, nuint right);
        public static unsafe nuint MultiplyNoFlags(nuint left, nuint right, nuint* low);

        public static nuint ParallelBitDeposit(nuint value, nuint mask);

        public static nuint ParallelBitExtract(nuint value, nuint mask);
    }

    public abstract partial class Lzcnt
    {
        public static nuint LeadingZeroCount(nuint value);
    }

    public abstract partial class Popcnt
    {
        public static nuint PopCount(nuint value);
    }
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels May 27, 2021
@tannergooding tannergooding added this to the Future milestone Jun 17, 2021
@deeprobin
Copy link
Contributor

If the intrinsics are already built in on the C++ side and only the C# implementation is missing, I'd be happy to take a look when I have time. @tannergooding Feel free to assign me (unless it's something extremely urgent).

@tannergooding
Copy link
Member Author

@deeprobin, they are built in on the C++ side "for the most part", that is the total set of functionality exists.

However, many of these represent intrinsics where TYP_LONG is only valid on 64-bit, in which case the instruction tables and importation logic may have to be fixed up.

For example, today we have and would be adding the following

    public abstract class Lzcnt : X86Base
    {
        internal Lzcnt() { }

        public static new bool IsSupported { [Intrinsic] get { return false; } }

        public new abstract class X64 : X86Base.X64
        {
            internal X64() { }

            public static new bool IsSupported { [Intrinsic] get { return false; } }

            public static ulong LeadingZeroCount(ulong value) => LeadingZeroCount(value);
        }

        public static uint LeadingZeroCount(uint value) => LeadingZeroCount(value);

+       public static nuint LeadingZeroCount(nuint value) => LeadingZeroCount(value);
    }

Because the uint64 variant exists only in Lzcnt.X64, the tables in https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/hwintrinsiclistxarch.h look like this

// ***************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
//                 ISA              Function name                               SIMD size       NumArg                                                                                                         Instructions                                                                                                                             Category                            Flags
//                                                                                                      {TYP_BYTE,              TYP_UBYTE,              TYP_SHORT,              TYP_USHORT,             TYP_INT,                TYP_UINT,               TYP_LONG,               TYP_ULONG,              TYP_FLOAT,              TYP_DOUBLE}
// ***************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
//  LZCNT Intrinsics
HARDWARE_INTRINSIC(LZCNT,           LeadingZeroCount,                            0,              1,     {INS_invalid,           INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_lzcnt,              INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid},           HW_Category_Scalar,                 HW_Flag_NoFloatingPointUsed|HW_Flag_NoRMWSemantics|HW_Flag_MultiIns)

HARDWARE_INTRINSIC(LZCNT_X64,       LeadingZeroCount,                            0,              1,     {INS_invalid,           INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_lzcnt,              INS_invalid,            INS_invalid},           HW_Category_Scalar,                 HW_Flag_NoFloatingPointUsed|HW_Flag_NoRMWSemantics|HW_Flag_MultiIns)

Where you will note that LZCNT_LeadingZeroCount shows TYP_ULONG as INS_invalid. You would need to update this so that it has the relevant entry from LZCNT_X64_LeadingZeroCount of INS_lzcnt

You may have to search for cases of LZCNT_LeadingZeroCount and LZCNT_X64_LeadingZeroCount to ensure that there are no stray asserts or other logic expecting that it is only TYP_UINT or TYP_ULONG, respectively.

The same would need to be done for each intrinsic that fits the same "scenario".

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 10, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jun 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.Intrinsics
Projects
None yet
3 participants