Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
operator*
and*=
muliplying two Matrix44s had two efficiency problems:They both involved a temporary matrix using the default
constructor, which sets it to the identity matrix, before then
immediately putting other values into every element. (I think the
reason it did not use the special constructor that leaves it
uninitialized is because that conflicted with the desire for this
to be constexpr. Uninitialized data during its C++ visible lifetime
is a no-no for constexpr.)
It relied on the multiply() helper, which took the ADDRESS of
elements of the matrix. As discussed before in an earlier vector
ops overhaul, this interferes with good code generation when the
matrix multiply is inside a loop that we hope will
autovectorize. Taking the address of an array and then using that
as a pointer can run up against the fragility of the compiler in
knowing when it can keep things in SIMD registers, etc.
The solution is as follows:
Add a new matrix() helper that takes two matrix reference params and
returns the result, and its implementation is fully inlined, no pointers
(and thus can be constexpr for C++14). This specific implementation
is backported from OSL, where it was written by Alex Wells from Intel,
who crafted it very carefully to compile to better (and more likely
to be autovectorized) code than the previous Imath code.
I implemented the other 4x4 matrix multiplies in terms of that. This
allows them to remain constexpr(14) while avoiding the unfortunate
use of the initializing constructor.
Signed-off-by: Larry Gritz lg@larrygritz.com