Skip to content

Conversation

@yuanjia111
Copy link
Contributor

1. The conclusion and detailed discussion can be found in the following link:
https://github.com/OpenMathLib/OpenBLAS/issues/5430
https://github.com/OpenMathLib/OpenBLAS/pull/5427

  1. After modifying the code, the function is normal and the performance is indeed improved. The performance verified on K1 [C908, vlen = 256].
image image image

FLOAT_V_T va, vr, vx;
unsigned int gvl = 0;
FLOAT_V_T_M1 v_res;
size_t vlmax = VSETVL_MAX_M1();
Copy link
Contributor

@ChipKerchner ChipKerchner Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused variable - as well as VSETVL_MAX_M1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Thank you for your thorough review!

@ChipKerchner
Copy link
Contributor

Looks good - seeing a 4x+ speedup from previous version.

@martin-frbg
Copy link
Collaborator

Thank you for the quick update

@martin-frbg martin-frbg added this to the 0.3.31 milestone Sep 13, 2025
@martin-frbg martin-frbg merged commit 43fdff7 into OpenMathLib:develop Sep 13, 2025
84 of 88 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants