Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a zero-cost mechanism for going between System.Numerics.Vector and System.Runtime.Intrinsics.Vector types #952

Closed
tannergooding opened this issue Mar 26, 2019 · 21 comments
Labels
api-approved API was approved in API review, it can be implemented area-System.Numerics
Milestone

Comments

@tannergooding
Copy link
Member

At the 2019 MVP Summit, I got some level of feedback that we do not currently have a zero-cost mechanism for converting between the various System.Numerics.Vector types and the System.Runtime.Intrinsic vector types.

We should come up with an API proposal to make this conversion process non-expensive.

  • For types like Vector4 <-> Vector128<float> this should be true zero-cost.
  • For types like Vector2 or Vector3 this should be zero-cost if the type is either already in register or if it was spilled to the stack as a TYP_SIMD16.
  • Vector<T> is a bit more troublesome given that it is dynamically sized.

Proposed API

// Vector2, Vector3, and Vector4 will be moved down to S.P.Corelib

namespace System.Runtime.Intrinsics
{
    public static partial class Vector128
    {
        // Converting to intrinsic types
        public static Vector128<float> AsVector128(this Vector2 value);
        public static Vector128<float> AsVector128(this Vector3 value);
        public static Vector128<float> AsVector128(this Vector4 value);
        public static Vector128<T> AsVector128<T>(this Vector<T> value) where T : struct;

        // Converting from intrinsic types
        public static Vector2 AsVector2(this Vector128<float> value);
        public static Vector3 AsVector3(this Vector128<float> value);
        public static Vector4 AsVector4(this Vector128<float> value);
        public static Vector<T> AsVector<T>(this Vector128<T> value) where T : struct;
    }

    public static partial class Vector256
    {
        // Converting to intrinsic types
        public static Vector256<T> AsVector256<T>(this Vector<T> value) where T : struct;

        // Converting from intrinsic types
        public static Vector<T> AsVector<T>(this Vector256<T> value) where T : struct;
    }
}
@tannergooding
Copy link
Member Author

CC. @CarolEidt as an FYI that this came up

@tannergooding
Copy link
Member Author

CC. @fiigii since you may have some interesting ideas in this area.

@tannergooding tannergooding self-assigned this Mar 26, 2019
@tannergooding
Copy link
Member Author

This comes up for libraries that were already using or exposing the general-purpose System.Numerics.Vector types, but which want to unlock additional perf using HWIntrinsics where possible.

@tannergooding
Copy link
Member Author

@CarolEidt, do you think that the following would be a feasible proposal for handling this?

namespace System.Runtime.Intrinsics
{
    // This is a new class and lives in the System.Numerics assembly
    // because Vector2/3/4 aren't in S.P.Corelib; if they were, I would
    // instead propose these methods be exposed on the existing
    // Vector64/128/256 classes where other similar methods are exposed

    public static partial class VectorExtensions
    {
        // Converting to intrinsic types
        public static Vector128<T> AsVector128(this Vector2 value);
        public static Vector128<T> AsVector128(this Vector3 value);
        public static Vector128<T> AsVector128(this Vector4 value);
        public static Vector128<T> AsVector128(this Vector<T> value);
        public static Vector256<T> AsVector256(this Vector<T> value);

        // Converting from intrinsic types
        public static Vector2 AsVector2(this Vector128<float> value);
        public static Vector3 AsVector3(this Vector128<float> value);
        public static Vector4 AsVector4(this Vector128<float> value);
        public static Vector<T> AsVector<T>(this Vector<T> value);
        public static Vector<T> AsVector<T>(this Vector<T> value);
    }
}

As mentioned above, Vector<T> is a bit interesting given you don't have a definite known size up from. However, this would "worst case" give you a subset or superset of the register, and users can switch based on Vector<T>.Count.

These are extension methods to workaround the current pain point of instance methods requiring the C# compiler to emit a stloc, ldloca sequence which the JIT doesn't optimize well.

@CarolEidt
Copy link
Contributor

This looks good to me. I think it is quite reasonable to truncate or zero-extend for the Vector<T> case.

One thing I'd like to see us do is to code up some tests and/or micro-benchmarks to ensure that these really are zero-cost. It was a bit disappointing to me that we have cases like dotnet/corefx#24912 where what should be zero cost actually isn't quite (still hoping to address that one).

@Gnbrkm41
Copy link
Contributor

Gnbrkm41 commented Oct 9, 2019

So would the usage look like:

if (Vector<int>.Count == Vector128<int>.Count)
{
    Vector128<int> vector = numericsVector.AsVector128();
    /* 128-bit path */
}
// Optionally, if AVX paths may be beneficial
else if (Vector<int>.Count == Vector256<int>.Count)
{
    Vector256<int>vector = numericsVector.AsVector256();
    /* 256-bit path */
}

?

also @CarolEidt, I guess you meant https://github.com/dotnet/coreclr/issues/24912 not dotnet/corefx#24912? :D

Edit - Added ISA checks:

if (Vector<int>.Count == Vector128<int>.Count && Sse2.IsSupported)
{
    Vector128<int> vector = numericsVector.AsVector128();
    /* 128-bit path */
}
// Optionally, if AVX paths may be beneficial
else if (Vector<int>.Count == Vector256<int>.Count && Avx2.IsSupported)
{
    Vector256<int>vector = numericsVector.AsVector256();
    /* 256-bit path */
}

@tannergooding
Copy link
Member Author

Fixed some typos in the proposal that @Gnbrkm41 pointed out on the C# Discord server 😄

@CarolEidt
Copy link
Contributor

I guess you meant dotnet/coreclr#24912 not dotnet/corefx#24912? :D

Yes, oops! Thanks for clearing that up.

@tannergooding
Copy link
Member Author

So would the usage look like:

@Gnbrkm41, yes I think that is a good example of how you might write the checks. Relying on the ISA checks to assume the size (e.g. assuming that Vector<T> is 32-bytes if AVX2 is supported) would be incorrect as there are some CLR Configuration Knobs to control it (although they may be non-retail knobs today).

So, I think having your example and then doing the respective ISA checks would be the "recommended" way.

@terrajobst
Copy link
Member

@tannergooding

Should this be in the 5.0 milestone?

@tannergooding
Copy link
Member Author

Should this be in the 5.0 milestone?

Yes. Thanks!

@terrajobst
Copy link
Member

terrajobst commented Oct 22, 2019

Video

Can we move Vector2, Vector3, Vector4 to corlib? It seems weird to add one more dumping ground for extension methods. Another benefit is that we'd end up with all intrinsically understood types to be corlib then.

@tannergooding
Copy link
Member Author

We agreed that this is api-approved if the types can be moved down. Otherwise we want to take another look:

namespace System.Runtime.Intrinsics
{
    public static partial class Vector128
    {
        // Converting to intrinsic types
        public static Vector128<float> AsVector128(this Vector2 value);
        public static Vector128<float> AsVector128(this Vector3 value);
        public static Vector128<float> AsVector128(this Vector4 value);
        public static Vector128<T> AsVector128<T>(this Vector<T> value) where T : struct;

        // Converting from intrinsic types
        public static Vector2 AsVector2(this Vector128<float> value);
        public static Vector3 AsVector3(this Vector128<float> value);
        public static Vector4 AsVector4(this Vector128<float> value);
        public static Vector<T> AsVector<T>(this Vector128<T> value) where T : struct;
    }

    public static partial class Vector256
    {
        // Converting to intrinsic types
        public static Vector256<T> AsVector256<T>(this Vector<T> value) where T : struct;

        // Converting from intrinsic types
        public static Vector<T> AsVector<T>(this Vector256<T> value) where T : struct;
    }
}

@tannergooding
Copy link
Member Author

@CarolEidt, @jkotas. Do you foresee any problems with moving Vector2/Vector3/Vector4 down to S.P.Corelib (given that they are already types specially recognized and handled by the JIT/VM). This would be a similar move as to what we did with Vector<T>.

@saucecontrol
Copy link
Member

Weren't you looking at making the JIT recognize any SIMD8/12/16 types whether System.Numerics or not? These conversions would be more useful if they weren't tied to the System.Numerics types specifically. And I thought the goal was to remove most/all of the Vector2/3/4 specific code from the JIT at some point anyway.

@jkotas
Copy link
Member

jkotas commented Oct 22, 2019

Do you foresee any problems with moving Vector2/Vector3/Vector4 down to S.P.Corelib

I do not see any problems with this.

@CarolEidt
Copy link
Contributor

Agree with @jkotas - I see no reason that moving these would cause issues.

@tannergooding
Copy link
Member Author

Weren't you looking at making the JIT recognize any SIMD8/12/16 types whether System.Numerics or not?

@saucecontrol, I believe that is a much harder problem. It would require the JIT to recognize these types of structs (which I believe it partially does for ABI reasons; but possibly only on Unix systems) and some pattern for converting between them and Vector128<T>/Vector256<T>.

Exposing a general purpose API for doing this conversion with any type might be hard, since you can't readily constrain something to have a particular "shape" today. I think specializing Unsafe.As is probably the closest thing we could get...

And I thought the goal was to remove most/all of the Vector2/3/4 specific code from the JIT at some point anyway.

That is something that I would like to see and something that these APIs should allow (provided prototyping, benchmarking, etc).

One of the issues that came up with https://github.com/dotnet/corefx/issues/31425 was that there is a non-trivial cost to converting between Vector2/3/4/Vector<T> and Vector128<T>/Vector256<T>. The same would apply to any user defined vector types.

However, the Vector2/3/4 and Vector<T> types are already special cased by the JIT type system and are documented to be hardware accelerated (without needing to do auto-vectorization analysis, etc). So, these APIs provide a way to convert between them and the other explicit vector types supported by the JIT.

@terrajobst
Copy link
Member

After talking to @tannergooding that makes sense. Unless @dotnet/fxdc says otherwise, I think we can consider this approved.

// Type forward Vector2,Vector3, and Vector4 from to System.Runtime (reference assembly) and System.Private.Corlib (implementatiom)
namespace System.Runtime.Intrinsics
{
    public static partial class Vector128
    {
        // Converting to intrinsic types
        public static Vector128<float> AsVector128(this Vector2 value);
        public static Vector128<float> AsVector128(this Vector3 value);
        public static Vector128<float> AsVector128(this Vector4 value);
        public static Vector128<T> AsVector128<T>(this Vector<T> value) where T : struct;

        // Converting from intrinsic types
        public static Vector2 AsVector2(this Vector128<float> value);
        public static Vector3 AsVector3(this Vector128<float> value);
        public static Vector4 AsVector4(this Vector128<float> value);
        public static Vector<T> AsVector<T>(this Vector128<T> value) where T : struct;
    }

    public static partial class Vector256
    {
        // Converting to intrinsic types
        public static Vector256<T> AsVector256<T>(this Vector<T> value) where T : struct;

        // Converting from intrinsic types
        public static Vector<T> AsVector<T>(this Vector256<T> value) where T : struct;
    }
}

@saucecontrol
Copy link
Member

@tannergooding thanks for the clarification. Agree that this API shape is a good step forward to unblocking a lot of things.

I think specializing Unsafe.As is probably the closest thing we could get...

This is what I was really hoping for, but I guess that's a different issue either way

@maryamariyan maryamariyan transferred this issue from dotnet/corefx Dec 16, 2019
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Numerics untriaged New issue has not been triaged by the area owner labels Dec 16, 2019
@maryamariyan maryamariyan added the api-approved API was approved in API review, it can be implemented label Dec 16, 2019
@maryamariyan maryamariyan added this to the 5.0 milestone Dec 16, 2019
@maryamariyan maryamariyan added this to To do in Hardware Intrinsics via automation Dec 16, 2019
@tannergooding
Copy link
Member Author

These APIs are exposed and have been accelerated for the most part now.

Hardware Intrinsics automation moved this from To do to Done Feb 10, 2020
@tannergooding tannergooding removed the untriaged New issue has not been triaged by the area owner label Mar 2, 2020
@tannergooding tannergooding removed their assignment May 26, 2020
@dotnet dotnet locked as resolved and limited conversation to collaborators Dec 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Numerics
Projects
Development

No branches or pull requests

8 participants