Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arm support #316

Closed
wants to merge 4 commits into from
Closed

Arm support #316

wants to merge 4 commits into from

Conversation

brechtvl
Copy link
Contributor

@brechtvl brechtvl commented Feb 14, 2021

This is based on lighttransport/embree-aarch64, so most of the credit goes to @syoyo. My goal is to make the minimal number of changes needed, and to make the implementation as clean and non-invasive as possible. The other repository has many other changes, which while useful, probably make it hard to merge upstream soon.

Ignoring the addition of sse2neon.h, this implementation only changes about 200 lines of code and does everything we need for Blender Cycles using Embree on macOS.

This works by supporting the SSE2 ISA. SSE2 is emulated using sse2neon.h from DLTcollab/sse2neon. Arm Neon support is automatically detected by the build system, and no manual configuration is required to build. ISPC is not currently supported, so that is automatically disabled. No specific reason, besides me not having looked into it.

Tested on macOS, though none of the changes are Apple specific and it may work on Linux as well.

I ran a quick benchmark with the tutorials, on an Apple M1 MacBook Air. Base revision for all is 69bd4c272f1ed.

Repository ISA pathtracer hair_geometry
embree/embree SSE2 (Rosetta) 17 fps 1.4 fps
embree/embree SSE4.2 (Rosetta) 17 fps 1.4 fps
lighttransport/embree-aarch64 SSE2 (SSE2NEON) 16 fps 1.5 fps
lighttransport/embree-aarch64 AVX2 (AVX2NEON) 17 fps 1.3 fps
brechtvl/embree SSE2 (sse2neon) 20 fps 1.5 fps
brechtvl/embree SSE2 (sse2neon) + precise math 17 fps 1.4 fps

Making the results match SSE more closely with precise math functions has a significant performance impact, but from my renderer test scenes it seems to be important to avoid artifacts.

@brechtvl brechtvl changed the title Arm Arm support Feb 14, 2021
@brechtvl
Copy link
Contributor Author

Note that there are conflicts since this is based on revision 69bd4c272f1ed for easier performance comparison, it's pretty simple to rebase.

@brechtvl
Copy link
Contributor Author

The performance difference can be explained by lighttransport/embree-aarch64 using more precise math functions. I've added that to the patch now since it does seem to be needed to avoid artifacts in real-world scenes. It's unfortunate but I'm not sure how it can be avoided.

I've implemented it in a way that I think I can be it upstreamed in sse2neon, so the file does not need local modifications for Embree.

@syoyo
Copy link

syoyo commented Feb 14, 2021

@brechtvl embree-aarch64's local modification of sse2neon has much cleaner implementation of min/max/div/rcp... etc and behaves identically with SSE2 instructions(e.g. correct handling of -+Inf and NaN in min/max).

https://github.com/lighttransport/embree-aarch64/blob/6ef362f99af80c9dfe8dd2bfc582d9067897edc6/common/math/SSE2NEON.h#L1197

lighttransport/embree-aarch64#30

So you are encouraged to backport(?) such changes to upstream DLTcollab/sse2neon.

@brechtvl
Copy link
Contributor Author

The min/max implementation used is the one you contributed in DLTcollab/sse2neon#57. For div/rcp/sqrt/rsqrt I sent a pull request: DLTcollab/sse2neon#310.

@syoyo
Copy link

syoyo commented Feb 15, 2021

@brechtvl Awesome!

@svenwoop
Copy link
Contributor

Thanks for this merge request. We do have ARM support on our roadmap. I will look into this over the next weeks.

@brechtvl
Copy link
Contributor Author

  • Rebased on master
  • Using upstream sse2neon which now includes the required changes
  • Test compiling actual Arm Neon code in check_arm_neon.cpp

@brechtvl
Copy link
Contributor Author

This seems to have been merged into master in 2bbdbb3 and following commits, so I'll close this pull request.

@brechtvl brechtvl closed this Sep 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants