Skip to content

Further shortcuts for (small) cases that do not need buffer allocation#3252

Merged
martin-frbg merged 4 commits intoOpenMathLib:developfrom
martin-frbg:more_shortcuts
Jun 15, 2021
Merged

Further shortcuts for (small) cases that do not need buffer allocation#3252
martin-frbg merged 4 commits intoOpenMathLib:developfrom
martin-frbg:more_shortcuts

Conversation

@martin-frbg
Copy link
Copy Markdown
Collaborator

@martin-frbg martin-frbg commented May 27, 2021

For problem sizes that are too small to benefit from multithreading, we can also skip the locking-intensive allocation and freeing of a temporary buffer if the data does not need compacting or sorting. This PR speeds up the single and double precision real versions of GER, SPR, SPR2, SYR2 and TRSV as well as the single and double precision complex versions of SYR and TRSV.
On x86_64, the same method should be applcable to SYMV as none of the present kernels make use of the buffer array under these circumstances, but this does not hold for all architectures.

@martin-frbg martin-frbg added this to the 0.3.16 milestone Jun 15, 2021
@martin-frbg martin-frbg merged commit baf03a0 into OpenMathLib:develop Jun 15, 2021
martin-frbg added a commit that referenced this pull request Oct 20, 2021
Revert wrong ZTRSV optimization from #3252
martin-frbg added a commit that referenced this pull request Oct 25, 2021
Revert invalid trsv shortcut from PR #3252
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant