Skip to content

Optimized sgemv_t for small N based on AVX512#3260

Merged
martin-frbg merged 1 commit intoOpenMathLib:developfrom
intelmy:sgemv_t_opt
Jun 10, 2021
Merged

Optimized sgemv_t for small N based on AVX512#3260
martin-frbg merged 1 commit intoOpenMathLib:developfrom
intelmy:sgemv_t_opt

Conversation

@intelmy
Copy link
Copy Markdown
Contributor

@intelmy intelmy commented Jun 8, 2021

This patch is to optimized sgemv_t for N=1-8, as original algorithm would divied M and N no matter how big M and N is. This patch is to customize kernel for individual N size to get much better performance with vectorization .
The performance improvement is up to 8.16x per the verification on Cascade server

@martin-frbg martin-frbg added this to the 0.3.16 milestone Jun 10, 2021
@martin-frbg martin-frbg merged commit dbba381 into OpenMathLib:develop Jun 10, 2021
@martin-frbg
Copy link
Copy Markdown
Collaborator

Unfortunately there appears to be a problem with the sgemv_t microkernel for SkylakeX that you introduced with this PR, causing about 1060 failures in the LAPACK testsuite. Apologies for not catching this earlier, and I wonder if you could recheck your contribution ? Running make lapack-test after a successful build should be sufficient to demonstrate the issue.

@martin-frbg
Copy link
Copy Markdown
Collaborator

Now fixed by guowangy's #3348

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants