Update SSE2NEON header #1

jserv · 2020-11-11T16:45:45Z

At present, kram included the old copy of SSE2NEON header, which can be replaced with the latest one: https://github.com/DLTcollab/sse2neon
The latest SSE2NEON already makes use of Aarch64 specific instructions.

The text was updated successfully, but these errors were encountered:

alecazam · 2020-11-13T00:03:57Z

Ah great! I haven't tested the Linux/Win Neon path yet, and am already using Apple's SIMD on iOS/Mac. I'll make an update, so thanks for the tip!

alecazam · 2020-11-15T04:09:08Z

I had to comment out a few GCC push/pop pragmas, and there was a (-c) construct that I made (-(int32_t)c) to avoid a precision loss warning. But the latest is pushed. I also added fp16 <-> fp32 AVX ops in float4a to/fromFloat16, and didn't see those in sse2neon. I'm using _Float16 on mac/ios, but MSVS doesn't appear to support these.

alecazam closed this as completed Nov 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update SSE2NEON header #1

Update SSE2NEON header #1

jserv commented Nov 11, 2020

alecazam commented Nov 13, 2020

alecazam commented Nov 15, 2020

Update SSE2NEON header #1

Update SSE2NEON header #1

Comments

jserv commented Nov 11, 2020

alecazam commented Nov 13, 2020

alecazam commented Nov 15, 2020