-
Notifications
You must be signed in to change notification settings - Fork 79
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
simd4x4f: Improve the matrix multiplication
The current implementation using matrix-vector multiplication gets the non-commutativity wrong, plus it's not entirely clear. I decided to rewrite it from scratch starting from a naive implementation that does the classic row times column multiplication; unsurprisingly, this approach is also fairly slow compared to the current implementation, especially when it comes to the scalar SIMD implementation fallback code. On the plus side, the naive implementation is actually correct. From the correct implementation, I reduced the coded by skipping the transpose operation, instead opting for four dot products per row; this approach maintains correctness, while speeding up all cases by a factor of two, and bringing us back to the same levels of the current implementation.
- Loading branch information
Showing
1 changed file
with
88 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters