-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IlmBase SIMD optimization on Arm processors #96
Comments
There's nothing ARM specific in IlmBase or libImf that I'm aware of. The only SIMD optimization in OpenEXR is in libImf, On 2014-04-26
Links:[1] #96 |
Thank you for your reply. http://eigen.tuxfamily.org/index.php?title=FAQ Thanks, Alex |
The OpenEXR library itself doesn't rely heavily on Vector and Matrix operations in libImath, so reading/writing images on Arm processors wouldn't be accelerated much by SIMD optimizations. I believe that libImath types can be efficiently exchanged with libraries such as Eigen. Where performance is critical it might make sense to use such libraries to operate on libImath types. |
Thanks a lot for the tips I will make some tests between the libraries. |
I've been doing some testing with Eigen lately - Imath outperforms Eigen in Chris On Sun, Apr 27, 2014 at 2:48 PM, peterhillman notifications@github.comwrote:
I think this situation absolutely requires that a really futile and stupid |
Did you try any SIMD optimizations, I am more interested to see the performance tests on arm based mobiles. Can you share your tests source codes I can do a quick test on the actual device. Alex |
I made a quick comparison with the other libraries. I ran the test under my Mac 10.8 x64 with Intel Core2 Quad Q6600 . Testing Eigen library Matrix4f class. |
I think there's an interesting subtext here. The Eigen library does not have an appropriate license for many users, and is extremely extensive compared to a lot of common needs and is somewhat burdensome to drag around on small projects due to its non-trivial size. The glm library is very useful where precision and correctness are a lesser concern, and its performance is being continually improved. Imath gives reasonable performance with correctness and a fairly rich set of common operations and has a very friendly license. I think people come back to Imath again and again because it is concise, self contained, reasonably performant, easy to use, and has known correctness double checked by a good conformance suite. Peter's right that there are certain bulk operations worth porting to SIMD, but those operations do not really intersect the Imath problem space, so there is not much benefit to SIMD accelerations of Imath to OpenEXR. The subtext I think is that Imath itself has worth, independent of EXR, and optimizations there could be welcome by the larger community as long as the promises Imath makes as to ease of use, a reasonably large set of operations, and correctness are not violated. @sabotage3d, it's difficult to make a judgement about what you are measuring there, since you haven't shown source. Operations like matrix multiplication are typically burdened by cache misses and less so by the math operations themselves. You can skew such bench marks one way or another by cache warming, or contriving to keep all the operations in registers. I've had a go at SIMD accelerating Imath with coworkers in the past, and you can make impressive speed ups by contriving aligned loads and so on, but typically the compromise is that code becomes somewhat more difficult to use by virtue of what can be assigned to what, and trying to get type safety for types that end up aliased like vec3 and vec4. So far, the attempts I've seen compromise Imath's promise of conciseness and correctness, either strongly or weakly, and would push me towards a solution like glm when I want the extra speed and reasonable dependencies. I do feel like an accelerated Imath, or an accelerated Imath like library would be a welcome thing, if it was still Imath after the mod. |
I've done some tests and it seems that adding alignment annotation to Vec4f and Matrix4f help gcc generate better simd code. As @meshula said - cache is the biggest problem. If you try to make you code look like for (..) { result[i] = matrixA[i]*matrixB[i]} then compiler should produce quite nicely optimized code. At least gcc with --ftree-vectorize --fpmath=sse --fsee4.1 does that for me. |
A sidecar header of aligned typedefs would be a nice non-intrusive addition. I imagine appropriate adornments exist for MSVC, icc, gcc, and clang, and they would need to be boiled into the right macro soup. |
Hello ,
I couldn't find any info if there is SIMD optimization based on NEON for Arm processors in IlmBase. If not is there future plans for that ?
Thanks in advance,
Alex
The text was updated successfully, but these errors were encountered: