Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collisions in mid-range keys #261

Closed
easyaspi314 opened this issue Sep 30, 2019 · 15 comments
Closed

Collisions in mid-range keys #261

easyaspi314 opened this issue Sep 30, 2019 · 15 comments

Comments

@easyaspi314
Copy link
Contributor

easyaspi314 commented Sep 30, 2019

Fucking multiply by zero memes in XXH3_mix16B

#include "xxhash.c"
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main() {
    unsigned long long array[4] = {0};
    srand(time(NULL));
    array[0] = 0xbe4ba423396cfeb8; // kSecret[0-7]
    array[2] = 0xdb979083e96dd4de; // kSecret[16-23]
    for (int i = 0; i < 1500; i++) {
        for (int j = 0; j < 4; j++)
            printf("%016llx", array[j]);
        printf(": %016llx\n", (unsigned long long) XXH3_64bits(array, sizeof(array)));
        array[1] = rand() | ((unsigned long long)rand() << 32);
        array[3] = rand() | ((unsigned long long)rand() << 32);
    }
}
@easyaspi314
Copy link
Contributor Author

easyaspi314 commented Sep 30, 2019

These collisions seem to happen with 32-byte keys, and every 129-240 byte key (although only the first 64 bytes work)

@Cyan4973
Copy link
Owner

Cyan4973 commented Sep 30, 2019

Are we back to this discussion that an attacker knowing the secret key can manufacture a collision ?
xxhash being non-cryptographic, it is not designed to protect from such case.

@easyaspi314
Copy link
Contributor Author

easyaspi314 commented Sep 30, 2019

Well then what was the point of f2163b8, cd3ab01, a5d5bf7, and 64f95e4?

I mean sure, we can't expect it to be perfect, but it shouldn't be that easy to generate infinite collisions once the seed is determined, especially since there is a "default seed" version.

@Cyan4973
Copy link
Owner

Cyan4973 commented Sep 30, 2019

The major difference of v0.7.1 is that secret is now a user-controlled value, making it possible for an implementer to deny an external actor the knowledge of said secret. As soon as secret is not known, it's a lot more difficult to generate any collision.

The UMAC kernel works on 32-bit fields.
It's not okay for one 32-bit field to cancel another 32-bit field
if the goal is to produce a 64-bit hash :
a random change can produce this impact with a probability of 1 / 2^32, which is too high for a 64-bit hash.

In contrast, in the issue mentioned here, it's required to reach the exact 64-bit value of secret in order to produce the cancellation effect. It translates into a 1 / 2^64 chance for this to happen accidentally (accidentally because we are talking about a non-cryptographic hash) . That seems fine for a 64-bit hash.

@easyaspi314
Copy link
Contributor Author

Well if we don't want to leak the secret, we should mix it up more. Right now, all you need to do is this to get quintillions of collisions:

#include <stdio.h>
#include <inttypes.h>
#include <stdlib.h>
#include <time.h>
#include <stdint.h>
#include "xxhash.c"
static const uint64_t INV_PRIME32_1 = 14962265741255716689ULL;
static const uint64_t INV_PRIME64_2 = 839798700976720815ULL;
static const uint64_t INV_PRIME64_3 = 16855272203555305545ULL;

static uint64_t inv_XXH3_avalanche(uint64_t h64)
{
    h64 ^= h64 >> 32;
    h64 *= INV_PRIME64_3;
    h64 ^= h64 >> 37;
    return h64;
}

static uint64_t
inv_XXH3_len_4to8_64b(const uint8_t* input, size_t len, uint64_t hashval)
{
    uint64_t deavalanched = inv_XXH3_avalanche(hashval) * INV_PRIME64_2;
    uint64_t mix64 = deavalanched ^ (deavalanched >> 47);
    mix64 -= len;
    mix64 *= INV_PRIME32_1;
    uint64_t keyed = mix64 ^ (mix64 >> 51);
    uint32_t const in1 = XXH_readLE32(input);
    uint32_t const in2 = XXH_readLE32(input + len - 4);
    uint64_t const in64 = in1 + ((uint64_t)in2 << 32);
    return keyed ^ in64;
}

int main(void)
{
    srand(time(NULL));
    uint64_t seed = rand() | ((uint64_t)rand() << 32);
    uint64_t secret[XXH3_SECRET_DEFAULT_SIZE / 8 + 1];
    for (int i = 0; i < sizeof(secret)/8; i++) {
        secret[i] = rand() | ((uint64_t)rand() << 32);
    }
    uint64_t hash = XXH3_64bits_withSecret("test", 4, secret, sizeof(secret));

    uint64_t firstblock = inv_XXH3_len_4to8_64b((const uint8_t *)"test", 4, hash);

    printf("secret[0-7] = %016llx %016llx\n", secret[0], firstblock);
    uint64_t array[4];
    array[0] = firstblock;
    for (int i = 1; i < 4; i++) {
        array[i] = rand() | ((uint64_t)rand() << 32);
    }
    for (unsigned long long i = 0; i < 2000; i++) {
        for (int j = 0; j < 4; j++) printf("%016" PRIx64, array[j]);
        printf(": %016" PRIx64 "\n", XXH3_64bits_withSecret(array, sizeof(array), secret, sizeof(secret)));
        array[1] = rand() | ((uint64_t)rand() << 32);
    }
}

While it still needs knowing a hash input and output (which is rare but possible), if we use something from the last stripe in the short hashes, we can make it much less useful.

@Cyan4973
Copy link
Owner

That's a good objective.
Making retrieval of the secret more difficult is valuable.

The solution will have to pay attention to potential performance impact,
including memory access pattern.

@easyaspi314
Copy link
Contributor Author

easyaspi314 commented Sep 30, 2019

Well we should still try to avoid skipping bytes in general, as it becomes an Achilles's Heel.

As for memory access, offsetting an aligned pointer is O(1). :)

@Cyan4973
Copy link
Owner

Cyan4973 commented Sep 30, 2019

offsetting an aligned pointer is O(1). :)

Sure, but also too simple...
Asking for any byte not yet in cache is a cache miss, leading to latency and more cache fill.
Reducing this effect requires little more than paying attention to cache line size.

@easyaspi314
Copy link
Contributor Author

If we do small keys in the same range, we will probably keep things cached in the high end instead of the start - it doesn't matter as much for the longer hashes because the overhead is less significant.

@scottcarey
Copy link

scottcarey commented Jan 6, 2020

Some thoughts regarding finalizing XXH3. Are the mid range key collisions the main thing holding it back from a final spec?
I have not groked why the algorithm would produce unusually high collision rate in the size ranges here. But I had a few thoughts that may or may not be useful with empirical trial and error (I'm not sure how the collision rate is being tested / detected for these).

All of the mixing steps seem static -- multiplying by constant primes, shifting by constant values.
Perhaps making some part of the mixing/shuffling dependent on the state would help? It may not even have to be on the entire state, but perhaps only 8 bytes.
This got me thinking about inexpensive operations that mix or rearrange bits outside of the multiply/shift used. Rotate is a cheap instruction on x64 (not sure about ARM) that rearranges bits and can shift some between the high and low state if needed, and it retains entropy without bias. Population count, leading zero count, or trailing zero count instructions can be used to inject data-dependent rotation or shift magnitudes.
POPCNT, LZCNT, TZCNT on x64 are similar to multiply performance on Skylake (1 per cycle, 3 cycle latency) and closer to add on Zen ( > 1 per cycle, 1 cycle latency for the first two, 2 cycles for TZCNT). POPCNT on a 64 bit value on random input will be between 29 and 35 most of the time, and between 21 and 43 ~99% of the time. TZCNT and LZCNT might also be useful, though their distributions will bias strongly towards 0 or 1, with large values when the state is close to zero.
Jittering the bits in the state around in a data dependent way other than the multiplication by primes + shifts could be useful, even if a varying rotate is not a particularly good 'mixer' sprinkling it in every so often might help.

If avoiding zeroes is of interest, LZCNT may be a branch-free way to extract a useful value for adding to the state. X + LZCNT(X) is never zero. However it may be dangerous since it will bias the distribution of away from pure randomness -- though only at the far extremes of the distribution that should only happen exceedingly rarely for a 64 bit number. Maybe it is a way to apply the user provided secret to the inner secret and avoid zeroes.

@easyaspi314
Copy link
Contributor Author

easyaspi314 commented Jan 6, 2020

These are the issues that we are considering working with:

  1. We need to remove the possibility of the mid range hashes losing 8 bytes at a time due to a multiply by zero bug. This multiply by zero is caused by canceling out the secret value, and forcing it to be set to a non-zero value is not an option.

Basically, it is this:

(input[2i] ^ secret[2i]) × (input[2i+1] ^ secret[2i+1])

If input[2i] and secret[2i] are the same, you get

0 × (input[2i+1] ^ secret[2i+1]) = 0

This was a problem with the original UMAC algorithm, however, it was much more likely to occur due to it requiring a 32-bit zero instead of a 64-bit zero.

  1. The first few bytes of the secret are heavily relied on, and the last bytes are not. Here is a diagram of basically how important each 8-byte block of the secret is to the hash:

xxh3 secret importance

The first 16 bytes are used heavily in the short hashes, and guessing one of them is very possible due to bijectivity and a lack of entropy.

The first 64 bytes are what are used in the main loop, but the last bytes are only used for scrambling and the final bytes in long hashes.

By using the green blocks for the short hashes, we can balance out that a bit more so we don't have as many throwaway bytes and it isn't as big a deal if a few bytes of the secret are known.

POPCNT, LZCNT, TZCNT

Hard pass.

XXH3 is designed to target a generic 32-bit platform with a long multiply, compiled by an ANSI C89 compiler with a long long extension.

This has plenty of benefits:

  1. It is safe to drop into any project, as it can be compiled with basically any compiler
  2. It encourages people to move on from 32-bit hash functions on 32-bit (wish std::hash would switch to std::uint64_t though…)
  3. It makes it so this hash doesn't need compatibility checks. Even the SSE2 and NEON versions are very safe to use in practice.
  4. It proves that you can write a 2019 hash for 1999 hardware.

Whenever we use intrinsics, it is either to accelerate an already optimized algorithm (all that SIMD code is enhancing a modified portable UMAC loop, and more work has gone into that scalar 128-bit multiply than I want to admit) or to work around a compiler bug (i.e. MSVC x86 stupidly calling subroutines to do a 32-bit to 64-bit multiply).

@Cyan4973
Copy link
Owner

I'm going to spend a sprint on improving XXH3/128 to the point of making a new release.
Very little changes are expected on the algorithm side, with mostly some corner cases for input sizes <= 8.

If there are parts of this issue that can be dealt with too, I believe we should consider it, and integrate the relevant changes for next release.

If you have specific modification suggestions @easyaspi314 , I'll be glad to consider them.

@easyaspi314
Copy link
Contributor Author

If we don't remove the seed-dependent collision, how about using different parts of the secret on the short hashes like I recommended?

The end bytes really don't get used outside of the scrambling and finalizing long hashes much.

I don't think it is too big a deal on the cache if we use the first N bytes vs the last N bytes, especially since initial cache misses in the long version are insignificant as it is just going to be immediately cached again.

@Cyan4973
Copy link
Owner

how about using different parts of the secret on the short hashes

Sure ! It looks like a good mitigation idea.

@easyaspi314
Copy link
Contributor Author

Closed since we mitigated it in XXH128 and explicitly mentioned it in a disclaimer for XXH3_64.

Also, the secret is now mixed with other parts of itself in short hashes to further protect the secret.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants