New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change dispatch breakpoint to XXH3_accumulate() #692
Conversation
This is in preparation for an SVE implementation
Wow, literally |
size_t n; \ | ||
for (n = 0; n < nbStripes; n++ ) { \ | ||
const xxh_u8* const in = input + n*XXH_STRIPE_LEN; \ | ||
XXH_PREFETCH(in + dist); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, nbStripes should be larger than 1.
if (nbStripes > 1)
XXH_PREFETCH(in + dist);
Ok that doesn't fix it. Apparently C90 doesn't let you expand empty macro parameters, which makes it pretty difficult to manage this template. Any ideas? 🤔 Also, apt seems to be 404ing. |
Indeed, empty variadic macros are disallowed in strict ANSI C90. There are ways around that, but they are fairly complex. There are blogs dedicated to this topic, |
How about this? |
Unfortunately not everything is GCC, that would be too easy 😅 |
But it really works on my platform. I've tried GCC7, GCC8 & GCC9 on my x86 platform.
|
It could fix the 404 error. |
https://github.com/Cyan4973/xxHash/runs/5572167241?check_suite_focus=true#step:10:320 Why did s390x break this time? More strict aliasing memes uncovered by inlining? |
I think key reason is in XXH3_digest_long().
https://github.com/hzhuang1/xxHash/tree/new_accum1 |
015a360
to
fd1a60a
Compare
(still doesn't fix the empty macro issue) |
My solution is to create another macro without empty parameter. hzhuang1@826ddf8 |
@easyaspi314 How about this pull request? Do you like the way to drop empty parameter? |
Not sure I would want to copy paste that . |
Is this PR still active ? I believe it's a good topic, though it seems to get into complexities. |
I tried to measure the performance on my x86 laptop. The result data is nearly same. Each data is an average value of 10 continuous samples.
The patch only changes the interface. And it doesn't touch any key code, such as accessing less memory. So I think we need more patches to improve the performance on x86 like SVE code. |
I think the patch is blocked on the empty macro parameters. What's our plan on it? |
@easyaspi314 @Cyan4973 Could you help to share the idea on empty macro parameters? Obviously, it's not supported by C90. |
Time flies fast, sorry. It seems I initially misinterpreted the issue. There is no variadic macro here, it's an empty argument in a macro with fixed nb of arguments. Unfortunately, the And btw, the solution already exists, as part of @hzhuang1 's PR #713 , # ifdef XXH_TARGET_SSE2
XXH3_ACCUMULATE_TEMPLATE_2(sse2, XXH_TARGET_SSE2)
# else
XXH3_ACCUMULATE_DEF_TEMPLATE(sse2)
# define XXH_TARGET_SSE2 XXH_TARGET_DEFAULT
# endif (with the requirement that So that looks like the available solution to me. |
@easyaspi314 Will you refresh this pull request? If you don't have time, I can refresh it instead. |
Sorry about the ghosting, I've been really busy with life and haven't been able to focus enough to code... Can you please update it? |
No problem. I'll handle it. |
This is in preparation for an SVE implementation. It may also improve performance for the normal x86 dispatcher.