Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add algorithm specification of XXH3 #750

Merged
merged 5 commits into from Jun 29, 2023
Merged

Add algorithm specification of XXH3 #750

merged 5 commits into from Jun 29, 2023

Conversation

adrien1018
Copy link
Contributor

Update xxhash_spec.md to include XXH3 specification. Closes #589 and #675.

@Cyan4973
Copy link
Owner

Thanks @adrien1018 ! This is great work !

I think this documentation is in good shape, just give me a bit more time to read it in detail.

@@ -43,13 +47,18 @@ However, a given variant shall produce exactly the same output, irrespective of
### Operation notations

All operations are performed modulo {32,64} bits. Arithmetic overflows are expected.
`XXH32` uses 32-bit modular operations. `XXH64` uses 64-bit modular operations.
`XXH32` uses 32-bit modular operations. `XXH64` and `XXH3` uses 64-bit modular operations.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: use_ 64-bit modular (plural)

- `*`: denotes modular multiplication
- **Exception:** In `XXH3`, if it is in the form `(u128)x * (u128)y`, it denotes 64-bit by 64-bit normal multiplication into a full 128-bit result.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
It's interesting, the width of the modular arithmetic may be not obvious, making * alone possibly insufficient to describe the operation.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reviewing the rest of the specification, I believe that the propose notification for 128-bit modular arithmetic does the job correctly.

static const u64 PRIME64_4 = 0x85EBCA77C2B2AE63ULL; // 0b1000010111101011110010100111011111000010101100101010111001100011
static const u64 PRIME64_5 = 0x27D4EB2F165667C5ULL; // 0b0010011111010100111010110010111100010110010101100110011111000101
static const u64 PRIME_MX1 = 0x165667919E3779F9ULL; // 0b0001011001010110011001111001000110011110001101110111100111111001
static const u64 PRIME_MX2 = 0x9FB21C651E98DF25ULL; // 0b1001111110110010000111000110010100011110100110001101111100100101
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kudos for giving this constant a name

This step uses a mixing operation that mixes a 16-byte segment of data, a 16-byte segment of secret and the seed into a 64-bit value as a building block. This operation treat the segment of data and secret as little-endian 64-bit values.

```c
mixStep(u8 data[16], size secretOffset, u64 seed):
Copy link
Owner

@Cyan4973 Cyan4973 Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: this interface likely miss the secret pointer parameter,
the following code would only make sense if secret was a global variable only.

Copy link
Owner

@Cyan4973 Cyan4973 Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that not providing secret as an input parameter is a good way to improve brevity, which is beneficial for readability.
I think I'll keep it this way,
I just want to make it clear that secret is not necessarily the standard defaultSecret, it can also be any blob of bytes set by the user.
There's already a paragraph about this point, which is probably clear enough.
Let's just ensure it.


### Step 2. Process blocks

The input is consumed and processed one full block at a time. The size of the block depends on the length of the secret. Specifically, a block consists of several 64-byte stripes. The number of stripes per block is `floor((secretLength-64)/8)` . For the default 192-byte secret, there are 16 stripes in a block, and thus the block size is 1024 bytes.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Owner

@Cyan4973 Cyan4973 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great work.
I only noticed a couple of nits, don't worry I'll fix them directly in my branch.
I'm even likely to update the source in a couple of ways to better follows the text of this specification.

Thanks @adrien1018 !

@Cyan4973 Cyan4973 merged commit 0656ed7 into Cyan4973:dev Jun 29, 2023
Cyan4973 added a commit that referenced this pull request Jun 29, 2023
minor nit and clarifications,
minor source modification (constant name correspond to spec),
update version, attribute credit to @adrien1018.
Cyan4973 added a commit that referenced this pull request Jun 29, 2023
minor follow-ups for XXH3 specification (#750)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Specify the 128-bit variant
2 participants