Enable fast Huffman & Huffman zig-zag transform for Arm Neon #1323

Developer-Ecosystem-Engineering · 2023-01-06T17:13:18Z

Implements fast Huffman on macOS, then builds on top of those changes to enable Huffman zig-zag transform

Enable fast Huffman decoding for macOS (x86 and Apple silicon) Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com>

Implements Huffman zig-zag transform and 32 to 16 bit floating point Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com>

linux-foundation-easycla · 2023-01-06T17:13:21Z

The committers listed above are authorized under a signed CLA.

✅ login: Developer-Ecosystem-Engineering (b5e7e6f, bcac85f)

meshula · 2023-01-11T01:45:03Z

Ah very nice, thank you! I'm wondering if you have any benchmarking results you might be able to report? I'm sure there's an improvement, but I am curious as to how much of a difference it might be?

src/lib/OpenEXR/ImfFastHuf.cpp

kmilos

Aarch64 support could actually be expanded to even more platforms & compilers potentially: Clang for MinGW, GCC on *nix, and MSVC on WoA.

Though it's fine if you just want to keep to the tested ones ATM.

Developer-Ecosystem-Engineering · 2023-01-12T19:13:58Z

Ah very nice, thank you! I'm wondering if you have any benchmarking results you might be able to report? I'm sure there's an improvement, but I am curious as to how much of a difference it might be?

We did see a significant improvement in our own testing. If there are public tests the project would like to see, we are happy to run and provide

cary-ilm · 2023-01-13T01:27:21Z

Thanks for the contribution! I added "Arm Neon" to the PR title, for clarity.

…SoftwareFoundation#1323) * Enable fast Huffman decoding on macOS Enable fast Huffman decoding for macOS (x86 and Apple silicon) Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com> * Implement Huffman zig-zag transform Implements Huffman zig-zag transform and 32 to 16 bit floating point Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com> Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com>

* Enable fast Huffman decoding on macOS Enable fast Huffman decoding for macOS (x86 and Apple silicon) Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com> * Implement Huffman zig-zag transform Implements Huffman zig-zag transform and 32 to 16 bit floating point Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com> Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com>

mandree · 2023-03-21T17:06:03Z

This causes build failures on ARMv7, see #1367

PR AcademySoftwareFoundation#1323 introduces a nested #ifdef check that results in a performance regression on Linux systems that use the clang compiler. This is because the check for __clang__ succeeds, but the nested check for __APPLE__ fails. As a result, the elif case is not taken on Linux. Fixes issue AcademySoftwareFoundation#1479

PR AcademySoftwareFoundation#1323 introduces a nested #ifdef check that results in a performance regression on Linux systems that use the clang compiler. This is because the check for __clang__ succeeds, but the nested check for __APPLE__ fails. As a result, the elif case is not taken on Linux. Fixes issue AcademySoftwareFoundation#1479 Signed-off-by: Peter Urbanec <git.user@urbanec.net>

PR AcademySoftwareFoundation#1323 introduces a nested #ifdef check that results in a performance regression on Linux systems that use the clang compiler. This is because the check for __clang__ succeeds, but the nested check for __APPLE__ fails. As a result, the elif case is not taken on Linux. Fixes issue AcademySoftwareFoundation#1479 Signed-off-by: Peter Urbanec <peterurbanec@users.noreply.github.com>

PR #1323 introduces a nested #ifdef check that results in a performance regression on Linux systems that use the clang compiler. This is because the check for __clang__ succeeds, but the nested check for __APPLE__ fails. As a result, the elif case is not taken on Linux. Fixes issue #1479 Signed-off-by: Peter Urbanec <peterurbanec@users.noreply.github.com>

…ndation#1480) PR AcademySoftwareFoundation#1323 introduces a nested #ifdef check that results in a performance regression on Linux systems that use the clang compiler. This is because the check for __clang__ succeeds, but the nested check for __APPLE__ fails. As a result, the elif case is not taken on Linux. Fixes issue AcademySoftwareFoundation#1479 Signed-off-by: Peter Urbanec <peterurbanec@users.noreply.github.com>

PR #1323 introduces a nested #ifdef check that results in a performance regression on Linux systems that use the clang compiler. This is because the check for __clang__ succeeds, but the nested check for __APPLE__ fails. As a result, the elif case is not taken on Linux. Fixes issue #1479 Signed-off-by: Peter Urbanec <peterurbanec@users.noreply.github.com>

Developer-Ecosystem-Engineering added 2 commits January 6, 2023 09:08

Enable fast Huffman decoding on macOS

b5e7e6f

Enable fast Huffman decoding for macOS (x86 and Apple silicon) Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com>

Implement Huffman zig-zag transform

bcac85f

Implements Huffman zig-zag transform and 32 to 16 bit floating point Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com>

meshula approved these changes Jan 11, 2023

View reviewed changes

kmilos reviewed Jan 12, 2023

View reviewed changes

src/lib/OpenEXR/ImfFastHuf.cpp Show resolved Hide resolved

kmilos reviewed Jan 12, 2023

View reviewed changes

cary-ilm changed the title ~~Enable fast Huffman & Huffman zig-zag transform~~ Enable fast Huffman & Huffman zig-zag transform for Arm Neon Jan 13, 2023

cary-ilm merged commit 436fcd2 into AcademySoftwareFoundation:main Jan 13, 2023

mandree mentioned this pull request Mar 21, 2023

openexr 3.1.6 regression #2: ImfDwaCompressor does not compile on ARM v7 due to unguarded use of unavailable AARCH64 intrinsics #1367

Closed

peterurbanec mentioned this pull request Jul 4, 2023

Huffman performance regression #1479

Closed

peterurbanec mentioned this pull request Jul 4, 2023

Fix Huffman performance regression on Linux/clang #1480

Merged

cary-ilm added the v3.1.6 label Jul 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable fast Huffman & Huffman zig-zag transform for Arm Neon #1323

Enable fast Huffman & Huffman zig-zag transform for Arm Neon #1323

Developer-Ecosystem-Engineering commented Jan 6, 2023

linux-foundation-easycla bot commented Jan 6, 2023 •

edited

Loading

meshula commented Jan 11, 2023 •

edited

Loading

kmilos left a comment

Developer-Ecosystem-Engineering commented Jan 12, 2023

cary-ilm commented Jan 13, 2023

mandree commented Mar 21, 2023

Enable fast Huffman & Huffman zig-zag transform for Arm Neon #1323

Enable fast Huffman & Huffman zig-zag transform for Arm Neon #1323

Conversation

Developer-Ecosystem-Engineering commented Jan 6, 2023

linux-foundation-easycla bot commented Jan 6, 2023 • edited Loading

meshula commented Jan 11, 2023 • edited Loading

kmilos left a comment

Choose a reason for hiding this comment

Developer-Ecosystem-Engineering commented Jan 12, 2023

cary-ilm commented Jan 13, 2023

mandree commented Mar 21, 2023

linux-foundation-easycla bot commented Jan 6, 2023 •

edited

Loading

meshula commented Jan 11, 2023 •

edited

Loading