Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change dispatch breakpoint to XXH3_accumulate() #692

Closed
wants to merge 2 commits into from

Conversation

easyaspi314
Copy link
Contributor

This is in preparation for an SVE implementation. It may also improve performance for the normal x86 dispatcher.

This is in preparation for an SVE implementation
@easyaspi314
Copy link
Contributor Author

easyaspi314 commented Mar 12, 2022

error: invoking macro XXH3_ACCUMULATE_TEMPLATE_3 argument 2: empty macro arguments are undefined in ISO C90 [-Werror=pedantic]

Wow, literally 1984 1989.

size_t n; \
for (n = 0; n < nbStripes; n++ ) { \
const xxh_u8* const in = input + n*XXH_STRIPE_LEN; \
XXH_PREFETCH(in + dist); \
Copy link
Contributor

@hzhuang1 hzhuang1 Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, nbStripes should be larger than 1.

if (nbStripes > 1)
    XXH_PREFETCH(in + dist);

@easyaspi314
Copy link
Contributor Author

easyaspi314 commented Mar 16, 2022

Ok that doesn't fix it. Apparently C90 doesn't let you expand empty macro parameters, which makes it pretty difficult to manage this template. Any ideas? 🤔

Also, apt seems to be 404ing.

@Cyan4973
Copy link
Owner

Ok that doesn't fix it. Apparently C90 doesn't let you expand empty macro parameters, which makes it pretty difficult to manage this template. Any ideas? 🤔

Indeed, empty variadic macros are disallowed in strict ANSI C90.

There are ways around that, but they are fairly complex.
Essentially, merge the variadic part with a fixed part,
then derive the nb of parameters in the new variadic macro,
and branch out differently depending on this total (typically 0 and non-zero).

There are blogs dedicated to this topic,
such as this one for example,
just be aware that it's a multi-stages process,
and resulting implementation is far from obvious.

@hzhuang1
Copy link
Contributor

Ok that doesn't fix it. Apparently C90 doesn't let you expand empty macro parameters, which makes it pretty difficult to manage this template. Any ideas? thinking

Also, apt seems to be 404ing.

#define XXH_TARGET_DEFAULT      __attribute__(())

How about this?

@easyaspi314
Copy link
Contributor Author

Ok that doesn't fix it. Apparently C90 doesn't let you expand empty macro parameters, which makes it pretty difficult to manage this template. Any ideas? thinking

Also, apt seems to be 404ing.

#define XXH_TARGET_DEFAULT      __attribute__(())

How about this?

Unfortunately not everything is GCC, that would be too easy 😅

@hzhuang1
Copy link
Contributor

Ok that doesn't fix it. Apparently C90 doesn't let you expand empty macro parameters, which makes it pretty difficult to manage this template. Any ideas? thinking
Also, apt seems to be 404ing.

#define XXH_TARGET_DEFAULT      __attribute__(())

How about this?

Unfortunately not everything is GCC, that would be too easy sweat_smile

But it really works on my platform. I've tried GCC7, GCC8 & GCC9 on my x86 platform.

export CFLAGS="-std=c90 -pedantic -Wno-long-long -Werror"
export CC=gcc
make clean xxhsum

@easyaspi314
Copy link
Contributor Author

Unfortunately not everything is GCC, that would be too easy 😅

But it really works on my platform. I've tried GCC7, GCC8 & GCC9 on my x86 platform.

f75.jpg

I mean that the __attribute__(()) would error MSVC as well as the other random ANSI C compilers out there that would otherwise be compatible.

@hzhuang1
Copy link
Contributor

Also, apt seems to be 404ing.

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 8eadaec..b171a4b 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -214,6 +214,7 @@ jobs:
 
     - name: apt-get install
       run: |
+        sudo apt-get update
         sudo apt-get install valgrind cppcheck
 
     - name: Environment info

It could fix the 404 error.

@easyaspi314
Copy link
Contributor Author

https://github.com/Cyan4973/xxHash/runs/5572167241?check_suite_focus=true#step:10:320

Why did s390x break this time? More strict aliasing memes uncovered by inlining?

@hzhuang1
Copy link
Contributor

https://github.com/Cyan4973/xxHash/runs/5572167241?check_suite_focus=true#step:10:320

Why did s390x break this time? More strict aliasing memes uncovered by inlining?

I think key reason is in XXH3_digest_long().

@@ -5425,10 +5463,9 @@ XXH3_digest_long (XXH64_hash_t* acc,
                             secret, state->secretLimit,
                             XXH3_accumulate, XXH3_scrambleAcc);
         /* last stripe */
-        XXH3_accumulate(acc,
-                        state->buffer + state->bufferedSize - XXH_STRIPE_LEN,
-                        secret + state->secretLimit - XXH_SECRET_LASTACC_START,
-                        1);
+        XXH3_accumulate_512(acc,
+                            state->buffer + state->bufferedSize - XXH_STRIPE_LEN,
+                            secret + state->secretLimit - XXH_SECRET_LASTACC_START);
     } else {  /* bufferedSize < XXH_STRIPE_LEN */
         xxh_u8 lastStripe[XXH_STRIPE_LEN];
         size_t const catchupSize = XXH_STRIPE_LEN - state->bufferedSize;

https://github.com/hzhuang1/xxHash/tree/new_accum1
In my branch, it could pass the build. It's just for reference.

@easyaspi314
Copy link
Contributor Author

Ah, good catch, I missed that one.

This still indicates that something is incorrect, though. I question the changes in 15ce80f, as I'm not sure GCC likes the cast. (@ellert)

Should VSX just use memcpy everywhere? It seems like GCC is impossibly strict with aliasing rules.

@easyaspi314
Copy link
Contributor Author

(still doesn't fix the empty macro issue)

@hzhuang1
Copy link
Contributor

(still doesn't fix the empty macro issue)

My solution is to create another macro without empty parameter. hzhuang1@826ddf8

@hzhuang1
Copy link
Contributor

(still doesn't fix the empty macro issue)

My solution is to create another macro without empty parameter. hzhuang1@826ddf8

@easyaspi314 How about this pull request? Do you like the way to drop empty parameter?

@easyaspi314
Copy link
Contributor Author

Not sure I would want to copy paste that .

@Cyan4973
Copy link
Owner

Cyan4973 commented May 6, 2022

This is in preparation for an SVE implementation. It may also improve performance for the normal x86 dispatcher.

Is this PR still active ?

I believe it's a good topic, though it seems to get into complexities.
Also, if it helps improve performance for x86 dispatcher, it would be great to measure it.

@Cyan4973 Cyan4973 mentioned this pull request May 10, 2022
@hzhuang1
Copy link
Contributor

hzhuang1 commented May 17, 2022

This is in preparation for an SVE implementation. It may also improve performance for the normal x86 dispatcher.

Is this PR still active ?

I believe it's a good topic, though it seems to get into complexities. Also, if it helps improve performance for x86 dispatcher, it would be great to measure it.

I tried to measure the performance on my x86 laptop. The result data is nearly same. Each data is an average value of 10 continuous samples.

  512B 1KB 2KB 4KB 8KB 16KB 32KB 64KB 128KB 256KB 512KB 1MB 2MB 4MB 8MB 16MB 32MB 64MB 128MB
base 15782.8 18045.3 18688.8 19017.5 19098.4 19360.5 19482.2 19568.4 19368.6 19590.8 19449.8 19584.3 19496.7 16841 14967.2 14904 14829 14821.2 14810.3
patch 15861.4 18074.9 18696.7 19149.9 19192.6 19420.6 19542.2 19596.6 19617.1 19566.8 19636.3 19667.8 19585.6 16922.1 14966.1 14848.5 14807.2 14869.5 14874.9

The patch only changes the interface. And it doesn't touch any key code, such as accessing less memory. So I think we need more patches to improve the performance on x86 like SVE code.

@hzhuang1
Copy link
Contributor

I think the patch is blocked on the empty macro parameters. What's our plan on it?

@hzhuang1
Copy link
Contributor

I think the patch is blocked on the empty macro parameters. What's our plan on it?

@easyaspi314 @Cyan4973 Could you help to share the idea on empty macro parameters? Obviously, it's not supported by C90.

@Cyan4973
Copy link
Owner

Cyan4973 commented Jun 7, 2022

I think the patch is blocked on the empty macro parameters. What's our plan on it?

Time flies fast, sorry.

It seems I initially misinterpreted the issue. There is no variadic macro here, it's an empty argument in a macro with fixed nb of arguments.

Unfortunately, the C90 requirement is pretty strict. An optional argument in a macro is not allowed. That requires a solution.

And btw, the solution already exists, as part of @hzhuang1 's PR #713 ,
where the XXH3_ACCUMULATE_TEMPLATE_2(sse2, XXH_TARGET_SSE2) with optionally empty XXH_TARGET_SSE2 is replaced by a more complex but also more C90 correct :

#  ifdef XXH_TARGET_SSE2
XXH3_ACCUMULATE_TEMPLATE_2(sse2, XXH_TARGET_SSE2)
#  else
XXH3_ACCUMULATE_DEF_TEMPLATE(sse2)
#    define XXH_TARGET_SSE2 XXH_TARGET_DEFAULT
#  endif

(with the requirement that XXH_TARGET_SSE2 must not be defined, not even to NULL, before that point).

So that looks like the available solution to me.
And it already passes CI tests.

@hzhuang1
Copy link
Contributor

@easyaspi314 Will you refresh this pull request? If you don't have time, I can refresh it instead.

@easyaspi314
Copy link
Contributor Author

Sorry about the ghosting, I've been really busy with life and haven't been able to focus enough to code... Can you please update it?

@hzhuang1
Copy link
Contributor

Sorry about the ghosting, I've been really busy with life and haven't been able to focus enough to code... Can you please update it?

No problem. I'll handle it.

@hzhuang1 hzhuang1 mentioned this pull request Oct 9, 2022
@hzhuang1
Copy link
Contributor

#756 is a new implementation of this pull request. Since #756 is merged, this one could be close now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants