Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding scalar hardware intrinsics for x86. #15341

Merged
merged 1 commit into from Dec 29, 2017

Conversation

@tannergooding
Copy link
Member

commented Dec 2, 2017

This is the start of dotnet/corefx#23519.

FYI. @fiigii, @eerhardt, @ViktorHofer

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Dec 2, 2017

Still need to update the *.PlatformNotSupported.cs files accordingly.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Dec 2, 2017

I haven't done COMIS or UCOMIS yet as I am not sure of the naming. I was thinking bool CheckEqualScalar(Vector128<float> left, Vector128<float> right)

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Dec 2, 2017

I also noticed a number of places in the existing files where we are not being consistent (either in naming or in following the API naming guidelines).

Ex: We have ReciprocalSquareRoot, Sqrt, and ReciprocalSqrt, depending on where you look.

We are also doing Int, Float, Long, etc.... When we should be doing Int32, Single, Int64, etc...

/// <summary>
/// __m128 _mm_cmp_ss (__m128 a, __m128 b, const int imm8)
/// </summary>
public static Vector128<float> CompareScalar(Vector128<float> left, Vector128<float> right, FloatComparisonMode mode) => CompareScalar(left, right, mode);

This comment has been minimized.

Copy link
@tannergooding

tannergooding Dec 2, 2017

Author Member

Was wondering why we have FloatComparisonMode here, but not in the Sse/Sse2 files?

/// <summary>
/// __m128 _mm_cmpunord_ps (__m128 a, __m128 b)
/// </summary>
public static Vector128<float> CompareUnordered(Vector128<float> left, Vector128<float> right) => CompareUnordered(left, right);

This comment has been minimized.

Copy link
@tannergooding

tannergooding Dec 2, 2017

Author Member

These files all contain a bunch of lines that are just whitespace... We should probably cleanup separately.

/// <summary>
/// __m128 _mm_sqrt_ss (__m128 a)
/// </summary>
public static Vector128<float> SqrtScalar(Vector128<float> value) => SqrtScalar(value);

This comment has been minimized.

Copy link
@tannergooding

tannergooding Dec 2, 2017

Author Member

The Sse2 form for double takes two arguments, but we only take one here (this is matching the C/C++ intrinsics). Perhaps we should expose both or just the one that takes two arguments?

/// <summary>
/// __m128d _mm_sqrt_sd (__m128d a, __m128d b)
/// </summary>
public static Vector128<double> SqrtScalar(Vector128<double> a, Vector128<double> b) => SqrtScalar(a, b);

This comment has been minimized.

Copy link
@tannergooding

tannergooding Dec 2, 2017

Author Member

I was thinking of calling a and b, upper and value respectively. Since b is the value we perform the operation on and a is the value we fill in the upper bits from. Thoughts?

This comment has been minimized.

Copy link
@fiigii

fiigii Dec 2, 2017

Collaborator

Do we really need to specifically fill the upper bits of the result in practice?
From the performance perspective, we always recommend using the same register as the source and upper argument.
Especially, if we decide to support the two-parameter version of Sqrt intrinsic, on non-AVX machines, the compiler may have to insert unpack or shuffle instructions to implement this semantic, which they are both long latency instructions.

This comment has been minimized.

Copy link
@fiigii

fiigii Dec 2, 2017

Collaborator

In summary, I am suggesting that only expose the one-parameter intrinsic for SQRTSS and SQRTSD.

This comment has been minimized.

Copy link
@tannergooding

tannergooding Dec 2, 2017

Author Member

Just exposing the single intrinsic version is probably fine. I actually missed that the two operand form is only on AVX and above, the Intel Intrinsics Guide lists it as SSE2: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=Sqrt&techs=SSE,SSE2

This comment has been minimized.

Copy link
@tannergooding

tannergooding Dec 2, 2017

Author Member

The Software Developers Manual lists the information correctly.

/// void _mm_store_ss (float* mem_addr, __m128 a)
/// </summary>
public static unsafe void StoreScalar(float* address, Vector128<float> source) => StoreScalar(address, source);

/// <summary>
/// __m128d _mm_sub_ps (__m128d a, __m128d b)

This comment has been minimized.

Copy link
@tannergooding

tannergooding Dec 2, 2017

Author Member

note: the files have several typos, such as this, where the wrong type is used in the intrinsic

This comment has been minimized.

Copy link
@fiigii

fiigii Dec 2, 2017

Collaborator

Good catch. For these comment typos, we can fix them later that do not impact the CoreCLR/CoreFX interface.

@@ -352,6 +352,11 @@ public static class Avx2
/// </summary>
public static Vector256<ulong> ConvertToVector256ULong(Vector128<uint> value) => ConvertToVector256ULong(value);

This comment has been minimized.

Copy link
@tannergooding

tannergooding Dec 2, 2017

Author Member

note: This is an example of an API that doesn't follow the general .NET API naming conventions. It should probably be ConvertToVector256Int64

@fiigii

This comment has been minimized.

Copy link
Collaborator

commented Dec 2, 2017

I also noticed a number of places in the existing files where we are not being consistent
We have ReciprocalSquareRoot, Sqrt, and ReciprocalSqrt, depending on where you look.
We are also doing Int, Float, Long, etc.... When we should be doing Int32, Single, Int64, etc...

Thank you for pointing this out. If the .NET API convention always prefers Int64 over Long, we definitely should fix it.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Dec 2, 2017

Just as an FYI. The exact guideline I am referring to is Avoiding Language Specific Names

/// __m128 _mm_round_ss (__m128 a, int rounding)
/// _MM_FROUND_TO_NEAREST_INT |_MM_FROUND_NO_EXC
/// </summary>
public static Vector128<float> RoundToNearestIntegerScalar(Vector128<float> value) => RoundToNearestIntegerScalar(value);

This comment has been minimized.

Copy link
@pentp

pentp Dec 2, 2017

Collaborator

It would be more consistent to use a RoundingMode immediate parameter here (similarly to comparisons for example). Then it would be 4 functions (Round/RoundScalar * float/double) that map directly to four machine instructions (roundpd/ps/sd/ss) instead of 20. The fully named helper functions could be defined on top of these basic instructions somewhere else.

This comment has been minimized.

Copy link
@tannergooding

tannergooding Dec 2, 2017

Author Member

@fiigii, thoughts? I was following the existing convention you had setup for the packed forms.

This comment has been minimized.

Copy link
@4creators

4creators Dec 2, 2017

Collaborator

AFAIR in discussions on Intrinsics API the consensus was to create direct mapping to processor instructions due to mutlitude of reasons. Therefore, I would avoid creating any APIs which do not map or omit any processor instructions. It means that if we have 3 argument scalar AVX or above instruction while having 2 argument SSE equivalent we should have both or here we should use immediate parameter for defining rounding mode.

This comment has been minimized.

Copy link
@fiigii

fiigii Dec 3, 2017

Collaborator

I believe that the current design (encodes rounding mode into intrinsic names) has better static semantics.
For example, RoundingMode immediate parameter requires 1) const parameter from language feature support to avoid non-literal values, 2) compile error reporting and runtime exception for invalid values from Roslyn/CoreCLR dotnet/corefx#22940 (comment).
Each round just has a few pre-defined modes, so I thought this is a good opportunity to provide intrinsic with more stable runtime behaviors and friendly development experience. Meanwhile, it does not lose any flexibility.

This comment has been minimized.

Copy link
@pentp

pentp Dec 3, 2017

Collaborator

We could still implement rounding functions with an immediate parameter as private intrinsics and expose them through wrapper functions that just forward to the actual intrinsic if this simplifies the implementation.

// __m128 _mm_round_ss (__m128 a, int rounding)
private static Vector128<float> RoundScalar(Vector128<float> value, byte rounding) => RoundScalar(value, rounding);

// _MM_FROUND_TO_NEAREST_INT |_MM_FROUND_NO_EXC
public static Vector128<float> RoundToNearestIntegerScalar(Vector128<float> value) => RoundScalar(value, _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC);

// _MM_FROUND_TO_NEG_INF |_MM_FROUND_NO_EXC
public static Vector128<float> RoundToNegativeInfinityScalar(Vector128<float> value) => RoundScalar(value, _MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC);

This comment has been minimized.

Copy link
@fiigii

fiigii Dec 3, 2017

Collaborator

if this simplifies the implementation.

No, the current runtime always expands all the APIs as intrinsics.

/// <summary>
/// __m128d _mm_cvtss_sd (__m128d a, __m128 b)
/// </summary>
public static Vector128<double> ConvertToDoubleScalar(Vector128<double> a, Vector128<float> b) => ConvertToDoubleScalar(a, b);

This comment has been minimized.

Copy link
@pentp

pentp Dec 2, 2017

Collaborator

MOVD/MOVQ instructions are missing from here (and AVX/AVX2). I propose something like this:

// __m128i _mm_cvtsi32_si128 (int a)
public static Vector128<int> CopyInt32(int value);
// int _mm_cvtsi128_si32 ( __m128i a)
public static int CopyInt32(Vector128<int> value);
// __m128i _mm_cvtsi64_si128(__int64)
public static Vector128<long> CopyInt64(long value);
// __int64 _mm_cvtsi128_si64(__m128i)
public static long CopyInt64(Vector128<long> value);

This comment has been minimized.

Copy link
@tannergooding

tannergooding Dec 9, 2017

Author Member

Fixed. Went with the existing ConvertTo naming convention

@4creators

This comment has been minimized.

Copy link
Collaborator

commented Dec 2, 2017

I have not seen any conversion intrinsics which would allow converting from and to Half. Fact that we do not have Half support in CLR should not stop us form having intrinsics which would act just as an interface between CLR and other runtimes which support Half sized floating types. Additionally, despite it is not directly related to this PR, it could be very helpful to have support for 8bit floating/binary sized numbers as well.

Related issues:

dotnet/corefx#17267

#11948

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Dec 2, 2017

@4creators, I believe the FP16C instructions haven't gone for design review yet.

The initial set that has gone for review covers both packed and scalar (as of this PR) instructions for SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2, FMA, AES, BMI1, BMI2, LZCNT, PCLMULQDQ, and POPCNT.

The intrinsic sets that have not gone to review (based on those listed under the Intel Intrinsics Guide) are MMX, AVX-512, KNC, SVML, ADX, CLFLUSHOPT, CLWB, FP16C, FSGSBASE, FXSR, INVPCID, MONITOR, MPX, PREFETCHWT1, RDPID, RDRAND, RDSEED, RDTSCP, RTM, SHA, TSC, XSAVE, XSAVEC, XSAVEOPT, XSS

A number of those intrinsic sets won't/shouldn't go to review because:

  • Not all of those (such as the SVML intrinsics) represent actual hardware instructions.
  • Some of them (such as MMX) are legacy instructions.
  • Others (like XSAVE) are targeted for OS, not necessarily for applications.

The others likely just need to be proposed and go up for review. For FP16C in particular, we at the very least need a Half data type so that it can be properly represented as an API (Vector128<Half>), so the review will be slightly more involved.

@4creators

This comment has been minimized.

Copy link
Collaborator

commented Dec 2, 2017

I believe the FP16C instructions haven't gone for design review yet.

@tannergooding Yep, you are right. Perhaps it's time to make both Half and missing intrinsics proposal :)

@tannergooding tannergooding force-pushed the tannergooding:simd-scalar branch from 81a22f9 to 193059a Dec 3, 2017

@4creators

This comment has been minimized.

Copy link
Collaborator

commented Dec 7, 2017

@dotnet-bot test Windows_NT x86 Checked Innerloop Build and Test
@dotnet-bot test Windows_NT x64 Checked Innerloop Build and Test
@dotnet-bot test Ubuntu x64 Checked Innerloop Build and Test
@dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test
@dotnet-bot test CentOS7.1 x64 Debug Innerloop Build
@dotnet-bot test CentOS7.1 x64 Checked Innerloop Build and Test
@dotnet-bot test Tizen armel Cross Checked Innerloop Build and Test

@tannergooding tannergooding force-pushed the tannergooding:simd-scalar branch from 193059a to 64eaf83 Dec 9, 2017

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Dec 9, 2017

Updated the *.PlatformNotSupported.cs files and added the remaining APIs that I was aware were missing.

/// <summary>
/// __int64 _mm_cvtsd_si64 (__m128d a)
/// </summary>
public static long ConvertToInt64(Vector128<double> value) => ConvertToInt64(value);

This comment has been minimized.

Copy link
@tannergooding

tannergooding Dec 9, 2017

Author Member

@fiigii, For instructions like this, which have an additional encoding on x64, how do we want to expose them?

This comment has been minimized.

Copy link
@fiigii

fiigii Dec 10, 2017

Collaborator

an additional encoding on x64

Do you mean the 64-bit register in cvtsd2si r64, xmm ?
These intrinsics are only available in 64-bit mode, and calling them in 32-bit should throw PlatformNotSupportExeception.

if (Sse2.IsSupported && Environment.Is64BitProcess)
{
    ulong res = Sse2.ConvertToInt64(vec);
}

This comment has been minimized.

Copy link
@tannergooding

tannergooding Dec 10, 2017

Author Member

Alright, that sounds good to me.

I was mostly just wanting to confirm we were exposing them in X86.Sse2 and not under some X64 specific sub-class

@tannergooding tannergooding force-pushed the tannergooding:simd-scalar branch 4 times, most recently from e557705 to 919779a Dec 9, 2017

@tannergooding tannergooding changed the title [WIP] Adding scalar hardware intrinsics for x86. Adding scalar hardware intrinsics for x86. Dec 11, 2017

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Dec 11, 2017

Should be ready for review.

Still todo (but will be in separate PRs):

  • Update CoreFX with the new APIs
  • Plumb through the JIT support for these APIs
    • @fiigii, are there any pending PRs from you that I should wait on before starting on this?
@eerhardt
Copy link
Member

left a comment

I'm no expert here, but this looks good to me.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Dec 14, 2017

Ok. I have now gone through all the exposed ISAs manually and ensured all scalar intrinsic instructions are exposed in this PR.

@fiigii, do you think unsigned overloads need to be exposed for the ConvertTo<Scalar> functions?
They all use MOVD/MOVQ, which is not explicit on the data being signed or unsigned.

int _mm_cvtsi128_si32 (__m128i a)
__m128i _mm_cvtsi32_si128 (int a)
__int64 _mm_cvtsi128_si64 (__m128i a)
__m128i _mm_cvtsi64_si128 (__int64 a)
int _mm256_cvtsi256_si32 (__m256i a)

I also found the following instructions to be missing from the packed forms.
None of them have an intrinsic with the same shape that uses the same underlying hardware instruction.

int _mm_movemask_ps (__m128 a)                                      // movmskps

__m128d _mm_loadh_pd (__m128d a, double const* mem_addr)            // movhpd
__m128d _mm_loadl_pd (__m128d a, double const* mem_addr)            // movlpd
__m128i _mm_loadl_epi64 (__m128i const* mem_addr)                   // movq

void _mm_stream_si32 (int* mem_addr, int a)                         // movnti
void _mm_stream_si64 (__int64* mem_addr, __int64 a)                 // movnti

__m256i _mm256_permute2f128_si256 (__m256i a, __m256i b, int imm8)  // vperm2f128

__m256i _mm256_stream_load_si256 (__m256i const* mem_addr)          // vmovntdqa

// The following 8 have intrinsics which take an imm8 and emit the same underlying instruction
__m128i _mm_sll_epi16 (__m128i a, __m128i count)                    // psllw
__m128i _mm_sll_epi32 (__m128i a, __m128i count)                    // pslld
__m128i _mm_sll_epi64 (__m128i a, __m128i count)                    // psllq
__m128i _mm_sra_epi16 (__m128i a, __m128i count)                    // psraw
__m128i _mm_sra_epi32 (__m128i a, __m128i count)                    // psrad
__m128i _mm_srl_epi16 (__m128i a, __m128i count)                    // psrlw
__m128i _mm_srl_epi32 (__m128i a, __m128i count)                    // psrld
__m128i _mm_srl_epi64 (__m128i a, __m128i count)                    // psrlq

// The following 6 have the corresponding _mm256 forms exposed
__m128i _mm_sllv_epi32 (__m128i a, __m128i count)                   // vpsllvd
__m128i _mm_sllv_epi64 (__m128i a, __m128i count)                   // vpsllvq
__m128i _mm_srav_epi32 (__m128i a, __m128i count)                   // vpsravd
__m128i _mm_srlv_epi32 (__m128i a, __m128i count)                   // vpsrlvd
__m128i _mm_srlv_epi64 (__m128i a, __m128i count)                   // vpsrlvq

Finally, the following are exposed, but under a different ISA than the Intrinsic Guide lists them:

// Exposed as SSE, listed as SSE2
__m128 _mm_castpd_ps (__m128d a)
__m128i _mm_castpd_si128 (__m128d a)
__m128d _mm_castps_pd (__m128 a)
__m128i _mm_castps_si128 (__m128 a)
__m128d _mm_castsi128_pd (__m128i a)
__m128 _mm_castsi128_ps (__m128i a)

// Exposed as AVX, listed as AVX2
__int16 _mm256_extract_epi16 (__m256i a, const int index)
__int8 _mm256_extract_epi8 (__m256i a, const int index)
@fiigii

This comment has been minimized.

Copy link
Collaborator

commented Dec 14, 2017

do you think unsigned overloads need to be exposed for the ConvertTo functions?
They all use MOVD/MOVQ, which is not explicit on the data being signed or unsigned.

Yes, they just copy the first element, no zero/sign extension behavior.

@fiigii

This comment has been minimized.

Copy link
Collaborator

commented Dec 14, 2017

// Exposed as AVX, listed as AVX2
__int16 _mm256_extract_epi16 (__m256i a, const int index)
__int8 _mm256_extract_epi8 (__m256i a, const int index)

They are helper intrinsics, we have AVX and AVX2 codegen solution both.

@fiigii

This comment has been minimized.

Copy link
Collaborator

commented Dec 14, 2017

// Exposed as SSE, listed as SSE2
__m128 _mm_castpd_ps (__m128d a)
__m128i _mm_castpd_si128 (__m128d a)
__m128d _mm_castps_pd (__m128 a)
__m128i _mm_castps_si128 (__m128 a)
__m128d _mm_castsi128_pd (__m128i a)
__m128 _mm_castsi128_ps (__m128i a)

These helper intriniscs do not generate any code. We have the type Vector128<double/int/long/...> with SSE, so I think it should be in Sse.

@fiigii

This comment has been minimized.

Copy link
Collaborator

commented Dec 14, 2017

__m256i _mm256_permute2f128_si256 (__m256i a, __m256i b, int imm8) // vperm2f128

We don't encourage using vperm2f128 on integer data due to the data bypass penalty.

@tannergooding tannergooding force-pushed the tannergooding:simd-scalar branch from 5c999fd to b2c4f3c Dec 14, 2017

@fiigii

This comment has been minimized.

Copy link
Collaborator

commented Dec 14, 2017

void _mm_stream_si32 (int* mem_addr, int a) // movnti
void _mm_stream_si64 (__int64* mem_addr, __int64 a) // movnti

I am not sure the usefulness of streaming store with scalar types. Please give me more time.

@fiigii

This comment has been minimized.

Copy link
Collaborator

commented Dec 14, 2017

__m256i _mm256_permute2f128_si256 (__m256i a, __m256i b, int imm8) // vperm2f128

We don't encourage using vperm2f128 on integer data due to the data bypass penalty.

But we can complement the types to make the API simpler Permute2x128<T>. Thoughts?

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Dec 14, 2017

I've added the unsigned overloads that were missing and believe the PR is now ready for final review and merge.

@fiigii, I have logged dotnet/corefx#25926 to continue the discussion of the "missing" APIs. We can add them in one go, in a separate PR, after we determine which ones need to be added.

@fiigii

This comment has been minimized.

Copy link
Collaborator

commented Dec 14, 2017

@tannergooding Thank you so much for the work.

@tannergooding tannergooding force-pushed the tannergooding:simd-scalar branch from b2c4f3c to e7fdb5b Dec 14, 2017

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Dec 14, 2017

@eerhardt, do I need another sign-off or am I good to merge?

Also, do you want the CoreFX PR up before or after this change goes in?

/// <summary>
/// __m128d _mm_cmpneq_pd (__m128d a, __m128d b)
/// </summary>
public static Vector128<double> CompareNotEqual(Vector128<double> left, Vector128<double> right) => CompareNotEqual(left, right);

/// <summary>
/// int _mm_comine_sd (__m128d a, __m128d b)

This comment has been minimized.

Copy link
@fiigii

fiigii Dec 14, 2017

Collaborator

Would you like to fix this comment to _mm_comineq_sd ?

This comment has been minimized.

Copy link
@tannergooding

tannergooding Dec 14, 2017

Author Member

Fixed.

/// <summary>
/// __m128 _mm_cmpneq_ps (__m128 a, __m128 b)
/// </summary>
public static Vector128<float> CompareNotEqual(Vector128<float> left, Vector128<float> right) => CompareNotEqual(left, right);

/// <summary>
/// int _mm_comine_ss (__m128 a, __m128 b)

This comment has been minimized.

Copy link
@fiigii

fiigii Dec 14, 2017

Collaborator

And here.

This comment has been minimized.

Copy link
@tannergooding

tannergooding Dec 14, 2017

Author Member

Fixed.

public static bool CompareNotEqualOrderedScalar(Vector128<float> left, Vector128<float> right) => CompareNotEqualOrderedScalar(left, right);

/// <summary>
/// int _mm_ucomine_ss (__m128 a, __m128 b)

This comment has been minimized.

Copy link
@fiigii

fiigii Dec 14, 2017

Collaborator

And here.

This comment has been minimized.

Copy link
@tannergooding

tannergooding Dec 14, 2017

Author Member

Fixed.

public static bool CompareNotEqualOrderedScalar(Vector128<double> left, Vector128<double> right) => CompareNotEqualOrderedScalar(left, right);

/// <summary>
/// int _mm_ucomine_sd (__m128d a, __m128d b)

This comment has been minimized.

Copy link
@fiigii

fiigii Dec 14, 2017

Collaborator

And here.

This comment has been minimized.

Copy link
@tannergooding

tannergooding Dec 14, 2017

Author Member

Fixed.

@tannergooding tannergooding force-pushed the tannergooding:simd-scalar branch from e7fdb5b to 11b04cc Dec 14, 2017

/// <summary>
/// float _mm256_cvtss_f32 (__m256 a)
/// </summary>
public static float ConvertToSingle(Vector256<float> value) => ConvertToSingle(value);

This comment has been minimized.

Copy link
@fiigii

fiigii Dec 14, 2017

Collaborator

I see you provide the helper functions that convert vector to float/double. Do we need helpers for float/double -> Vector128<float/double>?

This comment has been minimized.

Copy link
@tannergooding

tannergooding Dec 14, 2017

Author Member

Do you mean providing __m128 _mm_set_ss (float a) in addition to __m128 _mm_load_ss (float const* mem_addr)?

This comment has been minimized.

Copy link
@fiigii

fiigii Dec 14, 2017

Collaborator

Yes, SetScalar sometimes can avoid memory access than LoadScalar.

This comment has been minimized.

Copy link
@tannergooding

tannergooding Dec 14, 2017

Author Member

👍, will add.

This comment has been minimized.

Copy link
@tannergooding

tannergooding Dec 14, 2017

Author Member

Added Vector128<float> Sse.SetScalar(float value) and Vector128<double> Sse2.SetScalar(double value)

@tannergooding tannergooding force-pushed the tannergooding:simd-scalar branch from 11b04cc to d04768c Dec 14, 2017

@fiigii

fiigii approved these changes Dec 15, 2017

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Dec 29, 2017

CoreFX side of this PR is dotnet/corefx#26095

@jkotas

jkotas approved these changes Dec 29, 2017

@jkotas

This comment has been minimized.

Copy link
Member

commented Dec 29, 2017

@tannergooding Feel free to merge this if it is ready to go.

@tannergooding

This comment has been minimized.

Copy link
Member Author

commented Dec 29, 2017

@jkotas, thanks!

Will merge after the two pending jobs come back green (looks like they were kicked off with my last comment, so they are probably new).

@tannergooding tannergooding merged commit 4797974 into dotnet:master Dec 29, 2017

16 checks passed

Alpine.3.6 x64 Debug Build Build finished.
Details
CROSS Check Build finished.
Details
CentOS7.1 x64 Checked Innerloop Build and Test Build finished.
Details
CentOS7.1 x64 Debug Innerloop Build Build finished.
Details
OSX10.12 x64 Checked Innerloop Build and Test Build finished.
Details
Tizen armel Cross Checked Innerloop Build and Test Build finished.
Details
Ubuntu arm Cross Debug Innerloop Build Build finished.
Details
Ubuntu arm64 Cross Debug Innerloop Build Build finished.
Details
Ubuntu x64 Checked Innerloop Build and Test Build finished.
Details
Ubuntu x64 Innerloop Formatting Build finished.
Details
Ubuntu16.04 arm Cross Debug Innerloop Build Build finished.
Details
WIP ready for review
Details
Windows_NT x64 Checked Innerloop Build and Test Build finished.
Details
Windows_NT x64 Innerloop Formatting Build finished.
Details
Windows_NT x86 Checked Innerloop Build and Test Build finished.
Details
license/cla All CLA requirements met.
Details

@tannergooding tannergooding deleted the tannergooding:simd-scalar branch Jan 17, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.