Skip to content

Comments

lib/arm/adler32: add NEON+dotprod implementation#215

Merged
ebiggers merged 2 commits intomasterfrom
dotprod
Aug 8, 2022
Merged

lib/arm/adler32: add NEON+dotprod implementation#215
ebiggers merged 2 commits intomasterfrom
dotprod

Conversation

@ebiggers
Copy link
Owner

@ebiggers ebiggers commented Aug 8, 2022

No description provided.

This improves Adler-32 performance on large inputs:

    CPU	        Old (GB/s)  New (GB/s)
    =========   ==========  ==========
    Apple M1    51.5        61.1
    Cortex-X1   34.2        45.2
    Cortex-A78  16.9        23.3
    Cortex-A76  18.2        23.1
    Cortex-A55  2.4          4.1

This roughly follows the approach of the old AVX-2 implementation, which
I recently changed to a different approach.  But vdotq_u32 (the udot
instruction) makes the approach work well on arm64.
@ebiggers ebiggers merged commit d8b5390 into master Aug 8, 2022
@ebiggers ebiggers deleted the dotprod branch August 8, 2022 06:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant