Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update SSE2NEON header #1

Closed
jserv opened this issue Nov 11, 2020 · 2 comments
Closed

Update SSE2NEON header #1

jserv opened this issue Nov 11, 2020 · 2 comments

Comments

@jserv
Copy link

jserv commented Nov 11, 2020

At present, kram included the old copy of SSE2NEON header, which can be replaced with the latest one: https://github.com/DLTcollab/sse2neon
The latest SSE2NEON already makes use of Aarch64 specific instructions.

@alecazam
Copy link
Owner

Ah great! I haven't tested the Linux/Win Neon path yet, and am already using Apple's SIMD on iOS/Mac. I'll make an update, so thanks for the tip!

@alecazam
Copy link
Owner

I had to comment out a few GCC push/pop pragmas, and there was a (-c) construct that I made (-(int32_t)c) to avoid a precision loss warning. But the latest is pushed. I also added fp16 <-> fp32 AVX ops in float4a to/fromFloat16, and didn't see those in sse2neon. I'm using _Float16 on mac/ios, but MSVS doesn't appear to support these.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants