Improve 4x4 matrix multiplication #158

lgritz · 2021-06-02T01:24:45Z

operator* and *= muliplying two Matrix44s had two efficiency problems:

They both involved a temporary matrix using the default
constructor, which sets it to the identity matrix, before then
immediately putting other values into every element. (I think the
reason it did not use the special constructor that leaves it
uninitialized is because that conflicted with the desire for this
to be constexpr. Uninitialized data during its C++ visible lifetime
is a no-no for constexpr.)
It relied on the multiply() helper, which took the ADDRESS of
elements of the matrix. As discussed before in an earlier vector
ops overhaul, this interferes with good code generation when the
matrix multiply is inside a loop that we hope will
autovectorize. Taking the address of an array and then using that
as a pointer can run up against the fragility of the compiler in
knowing when it can keep things in SIMD registers, etc.

The solution is as follows:

Add a new matrix() helper that takes two matrix reference params and
returns the result, and its implementation is fully inlined, no pointers
(and thus can be constexpr for C++14). This specific implementation
is backported from OSL, where it was written by Alex Wells from Intel,
who crafted it very carefully to compile to better (and more likely
to be autovectorized) code than the previous Imath code.

I implemented the other 4x4 matrix multiplies in terms of that. This
allows them to remain constexpr(14) while avoiding the unfortunate
use of the initializing constructor.

Signed-off-by: Larry Gritz lg@larrygritz.com

`operator*` and `*=` of two Matrix44s had two efficiency problems: 1. They both involved a temporary matrix using the default constructor, which sets it to the identity matrix, before then immediately putting other values into every element. (I think the reason it did not use the special constructor that leaves it uninitialized is because that conflicted with the desire for this to be constexpr. Uninitialized data during its C++ visible lifetime is a no-no for constexpr.) 2. It relied on the multiply() helper, which took the ADDRESS of elements of the matrix. As discussed before in an earlier vector ops overhaul, this interferes with good code generation when the matrix multiply is inside a loop that we hope will autovectorize. Taking the address of an array and then using that as a pointer can run up against the fragility of the compiler in knowing when it can keep things in SIMD registers, etc. The solution is as follows: Add a new matrix() helper that takes two matrix reference params and returns the result, and its implementation is fully inlined, no pointers (and thus can be constexpr for C++14). This specific implementation is backported from OSL, where it was written by Alex Wells from Intel, who crafted it very carefully to compile to better (and more likely to be autovectorized) code than the previous Imath code. I implemented the other 4x4 matrix multiplies in terms of that. This allows them to remain constexpr(14) while avoiding the unfortunate use of the initializing constructor. Signed-off-by: Larry Gritz <lg@larrygritz.com>

cary-ilm

LGTM

cary-ilm approved these changes Jun 2, 2021

View reviewed changes

meshula approved these changes Jun 3, 2021

View reviewed changes

Merge branch 'master' into lg-matrix

3ea57be

cary-ilm merged commit b14f035 into AcademySoftwareFoundation:master Jun 3, 2021

cary-ilm added the v3.1.0 label Jul 10, 2021

lgritz deleted the lg-matrix branch September 22, 2021 17:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve 4x4 matrix multiplication #158

Improve 4x4 matrix multiplication #158

lgritz commented Jun 2, 2021

cary-ilm left a comment

Improve 4x4 matrix multiplication #158

Improve 4x4 matrix multiplication #158

Conversation

lgritz commented Jun 2, 2021

cary-ilm left a comment

Choose a reason for hiding this comment