New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add algorithm specification of XXH3 #750
Conversation
Thanks @adrien1018 ! This is great work ! I think this documentation is in good shape, just give me a bit more time to read it in detail. |
@@ -43,13 +47,18 @@ However, a given variant shall produce exactly the same output, irrespective of | |||
### Operation notations | |||
|
|||
All operations are performed modulo {32,64} bits. Arithmetic overflows are expected. | |||
`XXH32` uses 32-bit modular operations. `XXH64` uses 64-bit modular operations. | |||
`XXH32` uses 32-bit modular operations. `XXH64` and `XXH3` uses 64-bit modular operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit: use_ 64-bit modular (plural)
- `*`: denotes modular multiplication | ||
- **Exception:** In `XXH3`, if it is in the form `(u128)x * (u128)y`, it denotes 64-bit by 64-bit normal multiplication into a full 128-bit result. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
It's interesting, the width of the modular arithmetic may be not obvious, making *
alone possibly insufficient to describe the operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reviewing the rest of the specification, I believe that the propose notification for 128-bit modular arithmetic does the job correctly.
static const u64 PRIME64_4 = 0x85EBCA77C2B2AE63ULL; // 0b1000010111101011110010100111011111000010101100101010111001100011 | ||
static const u64 PRIME64_5 = 0x27D4EB2F165667C5ULL; // 0b0010011111010100111010110010111100010110010101100110011111000101 | ||
static const u64 PRIME_MX1 = 0x165667919E3779F9ULL; // 0b0001011001010110011001111001000110011110001101110111100111111001 | ||
static const u64 PRIME_MX2 = 0x9FB21C651E98DF25ULL; // 0b1001111110110010000111000110010100011110100110001101111100100101 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kudos for giving this constant a name
This step uses a mixing operation that mixes a 16-byte segment of data, a 16-byte segment of secret and the seed into a 64-bit value as a building block. This operation treat the segment of data and secret as little-endian 64-bit values. | ||
|
||
```c | ||
mixStep(u8 data[16], size secretOffset, u64 seed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: this interface likely miss the secret
pointer parameter,
the following code would only make sense if secret
was a global variable only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize that not providing secret
as an input parameter is a good way to improve brevity, which is beneficial for readability.
I think I'll keep it this way,
I just want to make it clear that secret
is not necessarily the standard defaultSecret
, it can also be any blob of bytes set by the user.
There's already a paragraph about this point, which is probably clear enough.
Let's just ensure it.
|
||
### Step 2. Process blocks | ||
|
||
The input is consumed and processed one full block at a time. The size of the block depends on the length of the secret. Specifically, a block consists of several 64-byte stripes. The number of stripes per block is `floor((secretLength-64)/8)` . For the default 192-byte secret, there are 16 stripes in a block, and thus the block size is 1024 bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work.
I only noticed a couple of nits, don't worry I'll fix them directly in my branch.
I'm even likely to update the source in a couple of ways to better follows the text of this specification.
Thanks @adrien1018 !
minor nit and clarifications, minor source modification (constant name correspond to spec), update version, attribute credit to @adrien1018.
minor follow-ups for XXH3 specification (#750)
Update
xxhash_spec.md
to include XXH3 specification. Closes #589 and #675.