Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Inspired from the AVX2 discussion, I suggest following code for ARM targets:
Function:
FORCE_INLINE U32 XXH32_endian_align(const void* input, size_t len, U32 seed, XXH_endianess endian, XXH_alignment align)
`if (len>=16) {
const BYTE* const limit = bEnd - 16;
const uint32_t initial[4] = {
PRIME32_1 + PRIME32_2,
PRIME32_2,
0,
-PRIME32_1
};
U32 v1;
U32 v2;
U32 v3;
U32 v4;
uint32x4_t vseed = vdupq_n_u32 (seed); // v(0,1,2,3) = seed
uint32x4_t prime1 = vdupq_n_u32(PRIME32_1); // prime1(0,1,2,3) = prime1
uint32x4_t prime2 = vdupq_n_u32(PRIME32_2); // prime2(0,1,2,3) = prime2
uint32x4_t v = vld1q_u32 (initial); // read initial into vector
uint32x4_t input;
uint32x4_t tmp;
v += vseed;
do {
input = vld1q_u32((uint32_t )p);
p += 16;
/ round */
v = vmlaq_u32 (v, input, prime2); // seed += input * PRIME32_2;
tmp = vshrq_n_u32 (v, 19); // XXH_rotl32(seed, 13);
v = vsliq_n_u32 (tmp, v, 13);
v = vmulq_u32 (v, prime1); // seed *= PRIME32_1;
} while (p<=limit);
} else {
h32 = seed + PRIME32_5;
}`
On a ZYNQ (Cortex-A9) it nearly doubles speed.
PS: I am new to github, so please bare with me.