Optimized sgemv_t for small N based on AVX512 by intelmy · Pull Request #3260 · OpenMathLib/OpenBLAS

intelmy · 2021-06-08T10:37:38Z

This patch is to optimized sgemv_t for N=1-8, as original algorithm would divied M and N no matter how big M and N is. This patch is to customize kernel for individual N size to get much better performance with vectorization .
The performance improvement is up to 8.16x per the verification on Cascade server

martin-frbg · 2021-07-13T18:57:27Z

Unfortunately there appears to be a problem with the sgemv_t microkernel for SkylakeX that you introduced with this PR, causing about 1060 failures in the LAPACK testsuite. Apologies for not catching this earlier, and I wonder if you could recheck your contribution ? Running make lapack-test after a successful build should be sufficient to demonstrate the issue.

martin-frbg · 2021-08-25T20:45:52Z

Now fixed by guowangy's #3348

Optimized sgemv_t for small N based on AVX512

706a08d

martin-frbg added this to the 0.3.16 milestone Jun 10, 2021

martin-frbg merged commit dbba381 into OpenMathLib:develop Jun 10, 2021

martin-frbg mentioned this pull request Jul 14, 2021

Temporarily disable the SkylakeX sgemv_t microkernel #3312

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimized sgemv_t for small N based on AVX512#3260

Optimized sgemv_t for small N based on AVX512#3260
martin-frbg merged 1 commit intoOpenMathLib:developfrom
intelmy:sgemv_t_opt

intelmy commented Jun 8, 2021

Uh oh!

martin-frbg commented Jul 13, 2021

Uh oh!

martin-frbg commented Aug 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

intelmy commented Jun 8, 2021

Uh oh!

martin-frbg commented Jul 13, 2021

Uh oh!

martin-frbg commented Aug 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants