Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming hashing is broken #816

Closed
smarchini opened this issue Mar 7, 2023 · 13 comments
Closed

Streaming hashing is broken #816

smarchini opened this issue Mar 7, 2023 · 13 comments
Labels

Comments

@smarchini
Copy link

Since 44bb94e, incremental hashing (i.e., the XXH*_update functions) produce a different output than the stateless functions. I tested them in their 64bits variant, here's how to reproduce:

#include <stdio.h>
#include <stdlib.h>
#include <xxhash.h>

#define MAX 4096

int main() {
    char *string = malloc(MAX);
    for (size_t i = 0; i < MAX; i++) string[i] = rand() & 0xffff;

    for (size_t len = 0; len < MAX; len++) {
        XXH64_hash_t direct_hash = XXH3_64bits(string, len);

        XXH3_state_t *state = XXH3_createState();
        XXH3_64bits_reset(state);
        XXH3_64bits_update(state, string, len);
        XXH64_hash_t streaming_hash = XXH3_64bits_digest(state);

        if (direct_hash != streaming_hash) printf("%zu\n", len);
    }

    free(string);
    return 0;
}

The first mismatch happens at 2049 bytes, then they realign after 64 bytes. The the second mismatch is happens at 3073 bytes, so 1024 bytes after the first one. The pattern repeats. In 3f5c75c everything works fine.

@t-mat
Copy link
Contributor

t-mat commented Mar 7, 2023

Hi @smarchini ,
Thanks for the report!

I've also confirmed your issue.
Actually, we have sanity check. But as your test shown, it's length set is incomplete.

@Cyan4973
Copy link
Owner

Cyan4973 commented Mar 7, 2023

Great point indeed,
we should enhance the sanity check with a loop test over a wider range of lengths.

Just verifying that one-shot and streaming variants output the same value could be done very quickly, without even a need to pre-record the expected values.
Then, in a second step, we could also generate the expected values, making the test even more strict.

@Cyan4973 Cyan4973 added the bug label Mar 7, 2023
@t-mat
Copy link
Contributor

t-mat commented Mar 7, 2023

I copied @smarchini's test as a gist file for testing.
We can use it with the following commands:

cd
git clone https://github.com/Cyan4973/xxHash.git xxHash-issue-816
cd xxHash-issue-816

curl -JOL https://gist.githubusercontent.com/t-mat/63b391d628980d5d276a002cc578136f/raw/xxHash-issue-816.c

git checkout 3f5c75c
git log -1
cc xxHash-issue-816.c && ./a.out && echo OK || echo Error 
# OK

git checkout 44bb94e
git log -1
cc xxHash-issue-816.c && ./a.out && echo OK || echo Error 
# Error

@easyaspi314
Copy link
Contributor

easyaspi314 commented Mar 7, 2023

Hmmm...looks like an off by one then. I'll take a look.

This is totally my fault sooo 😅

So the error seems to happen when nbStripes > 16 && nbStripes % 16 == 0.

@easyaspi314
Copy link
Contributor

easyaspi314 commented Mar 7, 2023

Got it, it should be while (nbStripes >= nbStripesPerBlock); instead of while (nbStripes > nbStripesPerBlock);. PR is up.

@t-mat
Copy link
Contributor

t-mat commented Mar 7, 2023

we should enhance the sanity check with a loop test over a wider range of lengths.

yeah we really need this test for QEMU platforms.

@t-mat
Copy link
Contributor

t-mat commented Mar 7, 2023

Draft version of sanity check for direct vs streaming hash functions.

  • Should we have larger SANITY_BUFFER_SIZE to ease to detect pattern?
  • Should we reuse XXH*_state_t ?
// cli/xsum_sanity_check.c
static void XSUM_XXH32_testDirectVsStreaming(const void* data, size_t dataSizeInBytes, XXH32_hash_t seed) {
    XXH32_hash_t directHash;
    XXH32_hash_t streamingHash;

    XXH32_state_t *state = XXH32_createState();
    XXH32_reset(state, seed);
    XXH32_update(state, data, dataSizeInBytes);
    streamingHash = XXH32_digest(state);
    XXH32_freeState(state);

    directHash = XXH32(data, dataSizeInBytes, seed);

    if (directHash != streamingHash) {
        XXH32_canonical_t directCanonical;
        XXH32_canonical_t streamingCanonical;
        const unsigned char* du8s = directCanonical.digest;
        const unsigned char* su8s = streamingU8s.digest;

        XXH32_canonicalFromHash(&directCanonical, directHash);
        XXH32_canonicalFromHash(&streamingCanonical, streamingHash);

        XSUM_log("\rError: Simple direct vs streaming loop test: %zd bytes, Internal sanity check failed. \n", dataSizeInBytes);
        XSUM_log("\rDirect { 0x%02X, 0x%02X, 0x%02X, 0x%02X }, Streaming { 0x%02X, 0x%02X, 0x%02X, 0x%02X } \n",
                du8s[0], du8s[1], du8s[2], du8s[3],
                su8s[0], su8s[1], su8s[2], su8s[3] );
        exit(1);
    }
}


XSUM_API void XSUM_sanityCheck(void)
{
    size_t i;
#define SANITY_BUFFER_SIZE 2367
    XSUM_U8 sanityBuffer[SANITY_BUFFER_SIZE];
    ...

    /* XXH32 simple loop test for direct and streaming API */
    for (i = 0; i < SANITY_BUFFER_SIZE; i++) {
        XSUM_XXH32_testDirectVsStreaming(sanityBuffer, i, 0);
    }

    ...
}

@Cyan4973
Copy link
Owner

Cyan4973 commented Mar 8, 2023

Should we have larger SANITY_BUFFER_SIZE to ease to detect pattern?

Possibly, though note that the current size would have been large enough to detect issue #816.

There must be a limit anyway. I'm slightly concerned about test runtime on weaker systems if the nb of lengths to test becomes really high. So an extended limit should probably stay within the ~ 4 KB range. Maybe 4 KB + 64 + 1 for example.

Should we reuse XXH*_state_t ?

Possibly,
this seems like a minor speed optimization,
the more important part is that it would ensure that state reuse works well too.

@easyaspi314
Copy link
Contributor

easyaspi314 commented Mar 8, 2023

There must be a limit anyway. I'm slightly concerned about test runtime on weaker systems if the nb of lengths to test becomes really high.

I mean you say that but you also "conservatively" assume 300 MB/s in the benchmark that this test would presumably run with. If I go to an extreme (my 1.5 GHz Cortex-A53 running s390x -O0 in qemu) I get as low as 10 MB/s. 🤷‍♀️

I say that we should assume that you are running the test on a competent platform or make the self test separate from xxhsum -b.

Edit: XD
(Also yes for some reason my tablet's kernel is an aarch64 kernel reporting armv7l in uname and it isn't a 32-on-64 userspace 🤨)

Screenshot_20230308-120521.png

Also we have TIMELOOP_MIN, we could set nbh_perIteration low and let it auto-adjust so we don't take minutes on extreme scenarios like -O0 on qemu on a potato processor while still being able to get an accurate result on beefy processors.

@Cyan4973
Copy link
Owner

Cyan4973 commented Mar 9, 2023

We should probably set it as a dedicated test program, so that we are free of runtime considerations within xxhsum.
Also, if we start to record all hash values to compare too, this will result in a non-negligible increase of binary size.
So better make it a test program, and make it part of make check.

@t-mat
Copy link
Contributor

t-mat commented Mar 9, 2023

As a starting point, I wrote dumb (and huge) test vector generator:
https://gist.github.com/t-mat/0c83f0beb83a12ef3f1e16e64891465d

Please let me know we can go ahead this direction or not.

Also, should we separate test vector generator and actual sanity test program?

Cyan4973 added a commit that referenced this issue Mar 9, 2023
Fix off-by-one in XXH3_consumeStripes() (Fixes #816)
Cyan4973 added a commit that referenced this issue Mar 9, 2023
to detect off-by-one scrambling error like #816
@Cyan4973
Copy link
Owner

Cyan4973 commented Mar 9, 2023

As a starting point, I wrote dumb (and huge) test vector generator: https://gist.github.com/t-mat/0c83f0beb83a12ef3f1e16e64891465d

Please let me know we can go ahead this direction or not.

This is a great start @t-mat !

Also, should we separate test vector generator and actual sanity test program?

Probably ?
The test vector generator is only used during development time (possibly only once).
Its outcome is used to update the sanity test program, which is then run every time make check is invoked.

@t-mat
Copy link
Contributor

t-mat commented Mar 9, 2023

@Cyan4973 , @easyaspi314
As for sanity test, I opened new issue #821.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants