-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable fast Huffman & Huffman zig-zag transform for Arm Neon #1323
Enable fast Huffman & Huffman zig-zag transform for Arm Neon #1323
Conversation
Enable fast Huffman decoding for macOS (x86 and Apple silicon) Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com>
Implements Huffman zig-zag transform and 32 to 16 bit floating point Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com>
Ah very nice, thank you! I'm wondering if you have any benchmarking results you might be able to report? I'm sure there's an improvement, but I am curious as to how much of a difference it might be? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aarch64 support could actually be expanded to even more platforms & compilers potentially: Clang for MinGW, GCC on *nix, and MSVC on WoA.
Though it's fine if you just want to keep to the tested ones ATM.
We did see a significant improvement in our own testing. If there are public tests the project would like to see, we are happy to run and provide |
Thanks for the contribution! I added "Arm Neon" to the PR title, for clarity. |
…SoftwareFoundation#1323) * Enable fast Huffman decoding on macOS Enable fast Huffman decoding for macOS (x86 and Apple silicon) Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com> * Implement Huffman zig-zag transform Implements Huffman zig-zag transform and 32 to 16 bit floating point Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com> Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com>
* Enable fast Huffman decoding on macOS Enable fast Huffman decoding for macOS (x86 and Apple silicon) Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com> * Implement Huffman zig-zag transform Implements Huffman zig-zag transform and 32 to 16 bit floating point Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com> Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com>
This causes build failures on ARMv7, see #1367 |
PR AcademySoftwareFoundation#1323 introduces a nested #ifdef check that results in a performance regression on Linux systems that use the clang compiler. This is because the check for __clang__ succeeds, but the nested check for __APPLE__ fails. As a result, the elif case is not taken on Linux. Fixes issue AcademySoftwareFoundation#1479
PR AcademySoftwareFoundation#1323 introduces a nested #ifdef check that results in a performance regression on Linux systems that use the clang compiler. This is because the check for __clang__ succeeds, but the nested check for __APPLE__ fails. As a result, the elif case is not taken on Linux. Fixes issue AcademySoftwareFoundation#1479 Signed-off-by: Peter Urbanec <git.user@urbanec.net>
PR AcademySoftwareFoundation#1323 introduces a nested #ifdef check that results in a performance regression on Linux systems that use the clang compiler. This is because the check for __clang__ succeeds, but the nested check for __APPLE__ fails. As a result, the elif case is not taken on Linux. Fixes issue AcademySoftwareFoundation#1479 Signed-off-by: Peter Urbanec <peterurbanec@users.noreply.github.com>
PR #1323 introduces a nested #ifdef check that results in a performance regression on Linux systems that use the clang compiler. This is because the check for __clang__ succeeds, but the nested check for __APPLE__ fails. As a result, the elif case is not taken on Linux. Fixes issue #1479 Signed-off-by: Peter Urbanec <peterurbanec@users.noreply.github.com>
…ndation#1480) PR AcademySoftwareFoundation#1323 introduces a nested #ifdef check that results in a performance regression on Linux systems that use the clang compiler. This is because the check for __clang__ succeeds, but the nested check for __APPLE__ fails. As a result, the elif case is not taken on Linux. Fixes issue AcademySoftwareFoundation#1479 Signed-off-by: Peter Urbanec <peterurbanec@users.noreply.github.com>
PR #1323 introduces a nested #ifdef check that results in a performance regression on Linux systems that use the clang compiler. This is because the check for __clang__ succeeds, but the nested check for __APPLE__ fails. As a result, the elif case is not taken on Linux. Fixes issue #1479 Signed-off-by: Peter Urbanec <peterurbanec@users.noreply.github.com>
Implements fast Huffman on macOS, then builds on top of those changes to enable Huffman zig-zag transform