IlmBase SIMD optimization on Arm processors #96

sabotage3d · 2014-04-25T14:19:18Z

Hello ,
I couldn't find any info if there is SIMD optimization based on NEON for Arm processors in IlmBase. If not is there future plans for that ?

Thanks in advance,

Alex

peterhillman · 2014-04-27T01:05:24Z

There's nothing ARM specific in IlmBase or libImf that I'm aware of.
Are you aware of any operations that would particularly benefit from
such optimization?

The only SIMD optimization in OpenEXR is in libImf,
for reading images into RGB or RGBA half float textures using SSE2.
Porting that to ARM might make sense, though I'm not clear how many ARM
based devices have GPUs support half float textures.

On 2014-04-26
02:19, sabotage3d wrote:

Hello ,
I couldn't find any info if
there is SIMD optimization based on NEON for Arm processors in IlmBase.
If not is there future plans for that ?

Thanks in advance,

Alex

Reply to this email directly or view it on GitHub [1].

Links:

[1] #96

sabotage3d · 2014-04-27T11:14:51Z

Thank you for your reply.
According to a few articles there might be a huge speed up for Vector and Matrix operations using the Neon SIMD for arm processors. I am still researching whether it is possible to convert one to the other. It seems that Eigen already supports SSE, AltiVec and ARM NEON.

http://eigen.tuxfamily.org/index.php?title=FAQ

http://computer-vision-talks.com/articles/2011-02-08-a-very-fast-bgra-to-grayscale-conversion-on-iphone/

Thanks,

Alex

peterhillman · 2014-04-27T21:48:22Z

The OpenEXR library itself doesn't rely heavily on Vector and Matrix operations in libImath, so reading/writing images on Arm processors wouldn't be accelerated much by SIMD optimizations.

I believe that libImath types can be efficiently exchanged with libraries such as Eigen. Where performance is critical it might make sense to use such libraries to operate on libImath types.

sabotage3d · 2014-04-28T11:25:56Z

Thanks a lot for the tips I will make some tests between the libraries.

blackencino · 2014-04-29T05:47:11Z

I've been doing some testing with Eigen lately - Imath outperforms Eigen in
all the cases I've tested so far, which have been mostly related to 3x3 and
4x4 matrix stuff (eigen vectors, svd, etc). Eigen's chief strengths lie in
its wider range of solutions, but definitely not in performance.

Chris

On Sun, Apr 27, 2014 at 2:48 PM, peterhillman notifications@github.comwrote:

The OpenEXR library itself doesn't rely heavily on Vector and Matrix
operations in libImath, so reading/writing images on Arm processors
wouldn't be accelerated much by SIMD optimizations.

I believe that libImath types can be efficiently exchanged with libraries
such as Eigen. Where performance is critical it might make sense to use
such libraries to operate on libImath types.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/96#issuecomment-41510412
.

I think this situation absolutely requires that a really futile and stupid
gesture be done on somebody's part. And we're just the guys to do it.

sabotage3d · 2014-04-29T09:14:35Z

Did you try any SIMD optimizations, I am more interested to see the performance tests on arm based mobiles. Can you share your tests source codes I can do a quick test on the actual device.
To get vectorization you need to use 4d matrices . Can you post your results as well ?

Alex

sabotage3d · 2014-05-09T22:10:01Z

I made a quick comparison with the other libraries. I ran the test under my Mac 10.8 x64 with Intel Core2 Quad Q6600 .
These are the results:

Testing Eigen library Matrix4f class.
Performing additions.
Took 30 milliseconds.
Performing multiplications.
Took 94 milliseconds.
Testing GLM library Matrix4f class.
Performing additions.
Took 133 milliseconds.
Performing multiplications.
Took 616 milliseconds.
Testing CML library Matrix4f class.
Performing additions.
Took 186 milliseconds.
Performing multiplications.
Took 1136 milliseconds.
Testing Imath library Matrix44 class.
Performing additions.
Took 139 milliseconds.
Performing multiplications.
Took 432 milliseconds.

meshula · 2014-05-10T02:33:24Z

I think there's an interesting subtext here. The Eigen library does not have an appropriate license for many users, and is extremely extensive compared to a lot of common needs and is somewhat burdensome to drag around on small projects due to its non-trivial size. The glm library is very useful where precision and correctness are a lesser concern, and its performance is being continually improved.

Imath gives reasonable performance with correctness and a fairly rich set of common operations and has a very friendly license. I think people come back to Imath again and again because it is concise, self contained, reasonably performant, easy to use, and has known correctness double checked by a good conformance suite. Peter's right that there are certain bulk operations worth porting to SIMD, but those operations do not really intersect the Imath problem space, so there is not much benefit to SIMD accelerations of Imath to OpenEXR. The subtext I think is that Imath itself has worth, independent of EXR, and optimizations there could be welcome by the larger community as long as the promises Imath makes as to ease of use, a reasonably large set of operations, and correctness are not violated.

@sabotage3d, it's difficult to make a judgement about what you are measuring there, since you haven't shown source. Operations like matrix multiplication are typically burdened by cache misses and less so by the math operations themselves. You can skew such bench marks one way or another by cache warming, or contriving to keep all the operations in registers.

I've had a go at SIMD accelerating Imath with coworkers in the past, and you can make impressive speed ups by contriving aligned loads and so on, but typically the compromise is that code becomes somewhat more difficult to use by virtue of what can be assigned to what, and trying to get type safety for types that end up aliased like vec3 and vec4. So far, the attempts I've seen compromise Imath's promise of conciseness and correctness, either strongly or weakly, and would push me towards a solution like glm when I want the extra speed and reasonable dependencies.

I do feel like an accelerated Imath, or an accelerated Imath like library would be a welcome thing, if it was still Imath after the mod.

sopvop · 2014-05-12T08:19:09Z

I've done some tests and it seems that adding alignment annotation to Vec4f and Matrix4f help gcc generate better simd code.
Adding it to Imath_.h should be easy and will help a bit with speed. Or you can typedef something like Vec4faligned and use it in your computation code. That will also help if you want to use mm intrinsics or eigen (which has a nice Map<> class for wrapping existing data) to use it in some places.

As @meshula said - cache is the biggest problem. If you try to make you code look like for (..) { result[i] = matrixA[i]*matrixB[i]} then compiler should produce quite nicely optimized code. At least gcc with --ftree-vectorize --fpmath=sse --fsee4.1 does that for me.

meshula · 2014-05-12T14:33:47Z

A sidecar header of aligned typedefs would be a nice non-intrusive addition. I imagine appropriate adornments exist for MSVC, icc, gcc, and clang, and they would need to be boiled into the right macro soup.

ehanway-ilm added the Feature label Aug 13, 2014

cary-ilm added this to the Backlog milestone Jun 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IlmBase SIMD optimization on Arm processors #96

IlmBase SIMD optimization on Arm processors #96

sabotage3d commented Apr 25, 2014

peterhillman commented Apr 27, 2014

sabotage3d commented Apr 27, 2014

peterhillman commented Apr 27, 2014

sabotage3d commented Apr 28, 2014

blackencino commented Apr 29, 2014

sabotage3d commented Apr 29, 2014

sabotage3d commented May 9, 2014

meshula commented May 10, 2014

sopvop commented May 12, 2014

meshula commented May 12, 2014

IlmBase SIMD optimization on Arm processors #96

IlmBase SIMD optimization on Arm processors #96

Comments

sabotage3d commented Apr 25, 2014

peterhillman commented Apr 27, 2014

Links:

sabotage3d commented Apr 27, 2014

peterhillman commented Apr 27, 2014

sabotage3d commented Apr 28, 2014

blackencino commented Apr 29, 2014

sabotage3d commented Apr 29, 2014

sabotage3d commented May 9, 2014

meshula commented May 10, 2014

sopvop commented May 12, 2014

meshula commented May 12, 2014