Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect missing vst1q_f32_x2 and provide replacement if necessary #1358

Merged

Conversation

betajippity
Copy link
Contributor

@betajippity betajippity commented Mar 16, 2023

In ImfDwaCompressorSimd.h, convertFloatToHalf64_neon() uses the vst1q_f32_x2 intrinsic on aarch64. However, older versions of GCC (< 9) do not provide the vst1q_f32_x2 intrinsic on aarch64, so we must detect when vst1q_f32_x2 is not available and provide our own implementation instead.

Specific examples of platforms where this fix is needed are CentOS 8 on aarch64, and NVIDIA's Linux For Tegra 32.x on Jetson Nano.

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Mar 16, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: betajippity / name: Yining Karl Li (57a734e)

@cary-ilm
Copy link
Member

Thanks for the fix, but see the suggestions above.

@betajippity
Copy link
Contributor Author

Thanks for the fix, but see the suggestions above.

Which suggestions? I don't see any; apologies if I'm looking in the wrong place.

}" HAS_VLD1)

if(NOT HAS_VLD1)
string(APPEND CMAKE_CXX_FLAGS " -DMISSING_ARM_VLD1")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be better as a CMake-generated define rather than a compiler flag:

Suggested change
string(APPEND CMAKE_CXX_FLAGS " -DMISSING_ARM_VLD1")
set(OPENEXR_MISSING_ARM_VLD1" 1)

Then add:

    #cmakedefine OPENEXR_MISSING_ARM_VLD1 0

to cmake/OpenEXRConfigInternal.h.in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These notes are now implemented; the flag is now set as a CMake-generated define and is in cmake/OpenEXRConfigInternal.h.in.

@@ -62,4 +62,17 @@ extern "C" {

}

#if defined (MISSING_ARM_VLD1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and change this to OPENEXR_MISSING_ARM_VLD1 to identify it as an OpenEXR config thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is now implemented; the flag is now named OPENEXR_MISSING_ARM_VLD1.

@cary-ilm
Copy link
Member

Oops, I neglected to submit the review, sorry. Should be there now.

@betajippity
Copy link
Contributor Author

Whoops, sorry for the bevy of build test failures. I didn't realize that pushing a WIP to the branch on my fork would automatically update the merge request and kick off tests here!

Older versions of GCC (< 9) do not provide the vst1q_f32_x2 intrinsic on
aarch64, so we must detect when vst1q_f32_x2 is not available and provide
our own implementation instead.

Signed-off-by: Yining Karl Li <betajippity@gmail.com>
Copy link
Member

@cary-ilm cary-ilm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@cary-ilm cary-ilm merged commit a2e9799 into AcademySoftwareFoundation:main Mar 20, 2023
cary-ilm pushed a commit to cary-ilm/openexr that referenced this pull request Mar 26, 2023
…demySoftwareFoundation#1358)

Older versions of GCC (< 9) do not provide the vst1q_f32_x2 intrinsic on
aarch64, so we must detect when vst1q_f32_x2 is not available and provide
our own implementation instead.

Signed-off-by: Yining Karl Li <betajippity@gmail.com>
cary-ilm pushed a commit to cary-ilm/openexr that referenced this pull request Mar 26, 2023
…demySoftwareFoundation#1358)

Older versions of GCC (< 9) do not provide the vst1q_f32_x2 intrinsic on
aarch64, so we must detect when vst1q_f32_x2 is not available and provide
our own implementation instead.

Signed-off-by: Yining Karl Li <betajippity@gmail.com>
cary-ilm pushed a commit that referenced this pull request Mar 28, 2023
Older versions of GCC (< 9) do not provide the vst1q_f32_x2 intrinsic on
aarch64, so we must detect when vst1q_f32_x2 is not available and provide
our own implementation instead.

Signed-off-by: Yining Karl Li <betajippity@gmail.com>
@cary-ilm cary-ilm added the v3.1.7 label Jul 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants