Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NEON optimizations for ZIP reading (reconstruct and interleave) #1348

Merged
merged 1 commit into from Mar 3, 2023

Conversation

aras-p
Copy link
Contributor

@aras-p aras-p commented Mar 3, 2023

Pretty much straight copies from the SSE2/SSE4 SIMD code, just changed for NEON.

Reading 310Mpix worth of ZIP compressed EXR files into memory, on Apple M1 Max (Clang 14, RelWithDebInfo build config): 7.50s -> 5.52s.

"reconstruct" part goes 1.55s -> 0.35s, "interleave" 0.80s -> 0.04s.

Pretty much straight copies from the SSE2/SSE4 SIMD code, just changed
for NEON.

Reading 310Mpix worth of ZIP compressed EXR files into memory,
on Apple M1 Max (Clang 14, RelWithDebInfo build config): 7.50s -> 5.52s.

"reconstruct" part goes 1.55s -> 0.35s, "interleave" 0.80s -> 0.04s.

Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>
Copy link
Member

@cary-ilm cary-ilm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@cary-ilm cary-ilm merged commit 677c6a5 into AcademySoftwareFoundation:main Mar 3, 2023
cary-ilm pushed a commit to cary-ilm/openexr that referenced this pull request Mar 3, 2023
…emySoftwareFoundation#1348)

Pretty much straight copies from the SSE2/SSE4 SIMD code, just changed
for NEON.

Reading 310Mpix worth of ZIP compressed EXR files into memory,
on Apple M1 Max (Clang 14, RelWithDebInfo build config): 7.50s -> 5.52s.

"reconstruct" part goes 1.55s -> 0.35s, "interleave" 0.80s -> 0.04s.

Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>
@aras-p aras-p deleted the zip_neon branch March 4, 2023 07:39
@meshula
Copy link
Contributor

meshula commented Mar 4, 2023

I'm extremely excited for this one, thanks!

cary-ilm pushed a commit that referenced this pull request Mar 5, 2023
Pretty much straight copies from the SSE2/SSE4 SIMD code, just changed
for NEON.

Reading 310Mpix worth of ZIP compressed EXR files into memory,
on Apple M1 Max (Clang 14, RelWithDebInfo build config): 7.50s -> 5.52s.

"reconstruct" part goes 1.55s -> 0.35s, "interleave" 0.80s -> 0.04s.

Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>
aras-p added a commit to aras-p/openexr that referenced this pull request Mar 20, 2023
…ch64) only

Should fix AcademySoftwareFoundation#1365. Recent PR (AcademySoftwareFoundation#1348) added NEON accelerated code paths
for ZIP filtering. But that code uses several instructions that are
ARMv8 (aarch64) only, and thus fail building on 32-bit ARM (armv7)
platforms. Make these optimizations only kick in when building
for 64-bit ARM platforms.
aras-p added a commit to aras-p/openexr that referenced this pull request Mar 20, 2023
…ch64) only

Should fix AcademySoftwareFoundation#1365. Recent PR (AcademySoftwareFoundation#1348) added NEON accelerated code paths
for ZIP filtering. But that code uses several instructions that are
ARMv8 (aarch64) only, and thus fail building on 32-bit ARM (armv7)
platforms. Make these optimizations only kick in when building
for 64-bit ARM platforms.

Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>
cary-ilm pushed a commit that referenced this pull request Mar 20, 2023
…ch64) only (#1366)

Should fix #1365. Recent PR (#1348) added NEON accelerated code paths
for ZIP filtering. But that code uses several instructions that are
ARMv8 (aarch64) only, and thus fail building on 32-bit ARM (armv7)
platforms. Make these optimizations only kick in when building
for 64-bit ARM platforms.

Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>
cary-ilm pushed a commit to cary-ilm/openexr that referenced this pull request Mar 26, 2023
…ch64) only (AcademySoftwareFoundation#1366)

Should fix AcademySoftwareFoundation#1365. Recent PR (AcademySoftwareFoundation#1348) added NEON accelerated code paths
for ZIP filtering. But that code uses several instructions that are
ARMv8 (aarch64) only, and thus fail building on 32-bit ARM (armv7)
platforms. Make these optimizations only kick in when building
for 64-bit ARM platforms.

Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>
cary-ilm pushed a commit to cary-ilm/openexr that referenced this pull request Mar 26, 2023
…ch64) only (AcademySoftwareFoundation#1366)

Should fix AcademySoftwareFoundation#1365. Recent PR (AcademySoftwareFoundation#1348) added NEON accelerated code paths
for ZIP filtering. But that code uses several instructions that are
ARMv8 (aarch64) only, and thus fail building on 32-bit ARM (armv7)
platforms. Make these optimizations only kick in when building
for 64-bit ARM platforms.

Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>
cary-ilm pushed a commit that referenced this pull request Mar 28, 2023
…ch64) only (#1366)

Should fix #1365. Recent PR (#1348) added NEON accelerated code paths
for ZIP filtering. But that code uses several instructions that are
ARMv8 (aarch64) only, and thus fail building on 32-bit ARM (armv7)
platforms. Make these optimizations only kick in when building
for 64-bit ARM platforms.

Signed-off-by: Aras Pranckevicius <aras@nesnausk.org>
@cary-ilm cary-ilm added the v3.1.6 label Jul 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants