Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable fast Huffman & Huffman zig-zag transform for Arm Neon #1323

Conversation

Developer-Ecosystem-Engineering
Copy link
Contributor

Implements fast Huffman on macOS, then builds on top of those changes to enable Huffman zig-zag transform

Enable fast Huffman decoding for macOS (x86 and Apple silicon)

Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com>
Implements Huffman zig-zag transform and 32 to 16 bit floating point

Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com>
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jan 6, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@meshula
Copy link
Contributor

meshula commented Jan 11, 2023

Ah very nice, thank you! I'm wondering if you have any benchmarking results you might be able to report? I'm sure there's an improvement, but I am curious as to how much of a difference it might be?

Copy link

@kmilos kmilos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aarch64 support could actually be expanded to even more platforms & compilers potentially: Clang for MinGW, GCC on *nix, and MSVC on WoA.

Though it's fine if you just want to keep to the tested ones ATM.

@Developer-Ecosystem-Engineering
Copy link
Contributor Author

Ah very nice, thank you! I'm wondering if you have any benchmarking results you might be able to report? I'm sure there's an improvement, but I am curious as to how much of a difference it might be?

We did see a significant improvement in our own testing. If there are public tests the project would like to see, we are happy to run and provide

@cary-ilm cary-ilm changed the title Enable fast Huffman & Huffman zig-zag transform Enable fast Huffman & Huffman zig-zag transform for Arm Neon Jan 13, 2023
@cary-ilm
Copy link
Member

Thanks for the contribution! I added "Arm Neon" to the PR title, for clarity.

@cary-ilm cary-ilm merged commit 436fcd2 into AcademySoftwareFoundation:main Jan 13, 2023
cary-ilm pushed a commit to cary-ilm/openexr that referenced this pull request Mar 3, 2023
…SoftwareFoundation#1323)

* Enable fast Huffman decoding on macOS

Enable fast Huffman decoding for macOS (x86 and Apple silicon)

Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com>

* Implement Huffman zig-zag transform

Implements Huffman zig-zag transform and 32 to 16 bit floating point

Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com>

Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com>
cary-ilm pushed a commit that referenced this pull request Mar 5, 2023
* Enable fast Huffman decoding on macOS

Enable fast Huffman decoding for macOS (x86 and Apple silicon)

Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com>

* Implement Huffman zig-zag transform

Implements Huffman zig-zag transform and 32 to 16 bit floating point

Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com>

Signed-off-by: Developer Ecosystem Engineering <DeveloperEcosystemEngineering@apple.com>
@mandree
Copy link
Contributor

mandree commented Mar 21, 2023

This causes build failures on ARMv7, see #1367

peterurbanec added a commit to peterurbanec/openexr that referenced this pull request Jul 4, 2023
PR AcademySoftwareFoundation#1323 introduces a nested #ifdef check that results in a performance regression on Linux systems that use the clang compiler. This is because the check for __clang__ succeeds, but the nested check for __APPLE__ fails. As a result, the elif case is not taken on Linux.

Fixes issue AcademySoftwareFoundation#1479
peterurbanec added a commit to peterurbanec/openexr that referenced this pull request Jul 4, 2023
PR AcademySoftwareFoundation#1323 introduces a nested #ifdef check that results in a performance
regression on Linux systems that use the clang compiler. This is because
the check for __clang__ succeeds, but the nested check for __APPLE__
fails. As a result, the elif case is not taken on Linux.

Fixes issue AcademySoftwareFoundation#1479


Signed-off-by: Peter Urbanec <git.user@urbanec.net>
peterurbanec added a commit to peterurbanec/openexr that referenced this pull request Jul 4, 2023
PR AcademySoftwareFoundation#1323 introduces a nested #ifdef check that results in a performance
regression on Linux systems that use the clang compiler. This is because
the check for __clang__ succeeds, but the nested check for __APPLE__
fails. As a result, the elif case is not taken on Linux.

Fixes issue AcademySoftwareFoundation#1479


Signed-off-by: Peter Urbanec <peterurbanec@users.noreply.github.com>
@cary-ilm cary-ilm added the v3.1.6 label Jul 9, 2023
cary-ilm pushed a commit that referenced this pull request Jul 9, 2023
PR #1323 introduces a nested #ifdef check that results in a performance
regression on Linux systems that use the clang compiler. This is because
the check for __clang__ succeeds, but the nested check for __APPLE__
fails. As a result, the elif case is not taken on Linux.

Fixes issue #1479

Signed-off-by: Peter Urbanec <peterurbanec@users.noreply.github.com>
cary-ilm pushed a commit to cary-ilm/openexr that referenced this pull request Jul 25, 2023
…ndation#1480)

PR AcademySoftwareFoundation#1323 introduces a nested #ifdef check that results in a performance
regression on Linux systems that use the clang compiler. This is because
the check for __clang__ succeeds, but the nested check for __APPLE__
fails. As a result, the elif case is not taken on Linux.

Fixes issue AcademySoftwareFoundation#1479

Signed-off-by: Peter Urbanec <peterurbanec@users.noreply.github.com>
cary-ilm pushed a commit that referenced this pull request Jul 31, 2023
PR #1323 introduces a nested #ifdef check that results in a performance
regression on Linux systems that use the clang compiler. This is because
the check for __clang__ succeeds, but the nested check for __APPLE__
fails. As a result, the elif case is not taken on Linux.

Fixes issue #1479

Signed-off-by: Peter Urbanec <peterurbanec@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants