Skip to content

Conversation

@ChipKerchner
Copy link
Contributor

Add vectorized packing for FP16 and BF16 - up to a 3X improvement.

Reactivate vector packing for FP64 transposed - turned out slowdown in previous MR was from use of vector load/store segment (which is slow on some platforms for FP64).

@ChipKerchner
Copy link
Contributor Author

#5457

@ChipKerchner ChipKerchner changed the title Add vectorized packing for FP16 and BF16. Reactivate vector packing for FP64 transposed Add vectorized packing for FP16 and BF16 for RISC-V. Reactivate vector packing for FP64 transposed Sep 26, 2025
@martin-frbg martin-frbg added this to the 0.3.31 milestone Sep 30, 2025
@martin-frbg martin-frbg merged commit aaa5c37 into OpenMathLib:develop Sep 30, 2025
84 of 88 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants