Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Vector128.ShuffleUnsafe #81609

Open
MihaZupan opened this issue Feb 3, 2023 · 3 comments · May be fixed by #99596
Open

[API Proposal]: Vector128.ShuffleUnsafe #81609

MihaZupan opened this issue Feb 3, 2023 · 3 comments · May be fixed by #99596
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.Intrinsics in-pr There is an active PR which will close this issue when it is merged
Milestone

Comments

@MihaZupan
Copy link
Member

MihaZupan commented Feb 3, 2023

Background and motivation

When implementing vectorized algorithms using the new Vector128 and friends APIs, Vector128.Shuffle is a case where one sometimes has to drop back down to using architecture-specific instructions.

Vector128.Shuffle guarantees that results for out-of-range indices (values above 15) are normalized to 0.
If the JIT can't prove that indices are already in the [0, 15] range, it is forced to emit extra logic to do the normalization.
For some algorithms, you may also want exactly the behavior of the underlying intrinsic (e.g. X86's PSHUFB will treat indices MOD 16, and set the result to 0 if the high bit is set).

The solution in both cases is to use the architecture-specific instruction directly, often writing a helper such as

private static Vector128<byte> Shuffle(Vector128<byte> vector, Vector128<byte> indices)
{
    return Ssse3.IsSupported
        ? Ssse3.Shuffle(vector, indices)
        : AdvSimd.Arm64.VectorTableLookup(vector, indices);
}

(We already have 5 such helpers in runtime)

I propose adding a Vector128.ShuffleUnsafe helper that does not guarantee identical behavior for out-of-range indices across platforms. Instead, it would be defined as using the underlying intrinsic directly when available.

API Proposal

namespace System.Runtime.Intrinsics;

public static class Vector128
{
    // Existing
    public static Vector128<byte> Shuffle(Vector128<byte> vector, Vector128<byte> indices);

    // Proposed
    public static Vector128<byte> ShuffleUnsafe(Vector128<byte> vector, Vector128<byte> indices);
}

API Usage

-Vector128<byte> bitMask = Shuffle(bitmapLookup, lowNibbles);
-Vector128<byte> bitPositions = Shuffle(Vector128.Create(0x8040201008040201).AsByte(), 
+Vector128<byte> bitMask = Vector128.ShuffleUnsafe(bitmapLookup, lowNibbles);
+Vector128<byte> bitPositions = Vector128.ShuffleUnsafe(Vector128.Create(0x8040201008040201).AsByte(), highNibbles);

-private static Vector128<byte> Shuffle(Vector128<byte> vector, Vector128<byte> indices)
-{
-    return Ssse3.IsSupported
-        ? Ssse3.Shuffle(vector, indices)
-        : AdvSimd.Arm64.VectorTableLookup(vector, indices);
-}

Alternative Designs

No response

Risks

No response

@MihaZupan MihaZupan added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime.Intrinsics labels Feb 3, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Feb 3, 2023
@ghost
Copy link

ghost commented Feb 3, 2023

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

When implementing vectorized algorithms using the new Vector128 and friends APIs, Vector128.Shuffle is a case where one sometimes has to drop back down to using architecture-specific instructions.

Vector128.Shuffle guarantees that results for out-of-range indices (values above 15) are normalized to 0.
If the JIT can't prove that indices are already in the [0, 15] range, it is forced to emit extra logic to do the normalization.
For some algorithms, you may also want exactly the behavior of the underlying intrinsic (e.g. X86's PSHUFB will treat indices MOD 16, and set the result to 0 if the high bit is set).

The solution in both cases is to use the architecture-specific instruction directly, often writing a helper such as

private static Vector128<byte> Shuffle(Vector128<byte> vector, Vector128<byte> indices)
{
    return Ssse3.IsSupported
        ? Ssse3.Shuffle(vector, indices)
        : AdvSimd.Arm64.VectorTableLookup(vector, indices);
}

(as of #80963, we already have 4 such helpers in runtime)

I propose adding a Vector128.ShuffleUnsafe helper that does not guarantee identical behavior for out-of-range indices across platforms. Instead, it would be defined as using the underlying intrinsic directly when available.

API Proposal

namespace System.Runtime.Intrinsics;

public static class Vector128
{
    // Existing
    public static Vector128<byte> Shuffle(Vector128<byte> vector, Vector128<byte> indices);

    // Proposed
    public static Vector128<byte> ShuffleUnsafe(Vector128<byte> vector, Vector128<byte> indices);
}

API Usage

-Vector128<byte> bitMask = Shuffle(bitmapLookup, lowNibbles);
-Vector128<byte> bitPositions = Shuffle(Vector128.Create(0x8040201008040201).AsByte(), 
+Vector128<byte> bitMask = Vector128.ShuffleUnsafe(bitmapLookup, lowNibbles);
+Vector128<byte> bitPositions = Vector128.ShuffleUnsafe(Vector128.Create(0x8040201008040201).AsByte(), highNibbles);

-private static Vector128<byte> Shuffle(Vector128<byte> vector, Vector128<byte> indices)
-{
-    return Ssse3.IsSupported
-        ? Ssse3.Shuffle(vector, indices)
-        : AdvSimd.Arm64.VectorTableLookup(vector, indices);
-}

Alternative Designs

No response

Risks

No response

Author: MihaZupan
Assignees: -
Labels:

api-suggestion, area-System.Runtime.Intrinsics

Milestone: -

@tannergooding tannergooding added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation untriaged New issue has not been triaged by the area owner labels Feb 3, 2023
@tannergooding tannergooding added this to the 8.0.0 milestone Feb 3, 2023
@terrajobst
Copy link
Member

terrajobst commented Mar 9, 2023

Video

  • Looks good as proposed
  • Overloads for other T's are approved by this, but we don't need them yet.
namespace System.Runtime.Intrinsics;

public partial class Vector64
{
    // Existing
    // public static Vector64<byte> Shuffle(Vector64<byte> vector, Vector64<byte> indices);

    public static Vector64<byte> ShuffleUnsafe(Vector64<byte> vector, Vector64<byte> indices);
}

public partial class Vector128
{
    // Existing
    // public static Vector128<byte> Shuffle(Vector128<byte> vector, Vector128<byte> indices);

    public static Vector128<byte> ShuffleUnsafe(Vector128<byte> vector, Vector128<byte> indices);
}

public partial class Vector256
{
    // Existing
    // public static Vector256<byte> Shuffle(Vector256<byte> vector, Vector256<byte> indices);

    public static Vector256<byte> ShuffleUnsafe(Vector256<byte> vector, Vector256<byte> indices);
}

public partial class Vector512
{
    // Existing
    // public static Vector512<byte> Shuffle(Vector512<byte> vector, Vector512<byte> indices);

    public static Vector512<byte> ShuffleUnsafe(Vector512<byte> vector, Vector512<byte> indices);
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Mar 9, 2023
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jul 9, 2023
@tannergooding tannergooding modified the milestones: 8.0.0, Future Jul 24, 2023
@tannergooding
Copy link
Member

This won't make .NET 8 and isn't critical enough to warrant a bar check given there are several trivial workarounds.

We can finish up the PR once CI is passing anytime after main opens for .NET 9 changes next month

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Nov 18, 2023
@hamarb123 hamarb123 linked a pull request Mar 12, 2024 that will close this issue
4 tasks
@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime.Intrinsics in-pr There is an active PR which will close this issue when it is merged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants