-
Notifications
You must be signed in to change notification settings - Fork 775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Easier implementation for software running on "normal" architectures #543
Comments
I wish it too. Problem is, it would violate one of the rules of this repository, which is "one The objective is not all lost though. We could consider smaller specialized files, which are then aggregated into a single However, this is a complex goal, and I don't see that happening on short term.
Could you please describe your issue ?
|
It's a 64-bit alignment issue with the gcc 10.3 compiler (Ubuntu's 21 default) and the -O3 option. You can find the ongoing thread here Short version: if into a struct that do a lot of other things you insert a XXH3_state_t (that should be 64 byte aligned) very nasty things happens turning on optimizations.
Even if you try to align the member, the entire struct, or even typedefs of maps (!) to 64-byte. Disabling XXH alignment does not seems to work either, I'm still working on it. Running with -fsanitize=undefined show the behavior Thanks anyway for the answer PS If you DO NOT "inject" a XXH variables into a struct, but into code, works fine with this compiler. |
Current workaroud: using this => "manually" allocating
|
That looks like a correct workaround. Indeed, aligning a member of a struct to 64-bytes has consequences for the encompassing structure. |
Which, if I'm reading correctly, is not particularly encouraging. It looks like implementations are free to ignore I'm not sure if there's a good solution available, but one small possible improvement would be: is it possible the |
Yes that's a good suggestion. |
As for C++, please use The following code is minimal repro and solution of this issue. #define XXH_STATIC_LINKING_ONLY // access advanced declarations
#define XXH_IMPLEMENTATION // access definitions
#include "../xxhash.h"
#include <stdio.h> // printf()
static const size_t alignment = 64;
struct Issue543 {
char placeHolder[1];
XXH3_state_t xxh3;
void test() {
const uintptr_t pThis = reinterpret_cast<uintptr_t>(this);
const uintptr_t pXxh3 = reinterpret_cast<uintptr_t>(&this->xxh3);
printf("struct Issue543: ptr = %p", this);
printf(", this %% %zd = %2zd", alignment, pThis % alignment);
printf(", &this->xxh %% %zd = %2zd", alignment, pXxh3 % alignment);
printf("\n");
}
};
struct Fix543 {
char placeHolder[1];
XXH3_state_t xxh3;
static void* operator new(std::size_t sz) {
void* p = aligned_alloc(alignment, sz); // You can use std::aligned_alloc() with C++17.
printf("struct Fix543: custom new for size = %zd, ptr=%p\n", sz, p);
return p;
}
void operator delete(void* ptr) noexcept {
free(ptr);
}
void test() {
const uintptr_t pThis = reinterpret_cast<uintptr_t>(this);
const uintptr_t pXxh3 = reinterpret_cast<uintptr_t>(&this->xxh3);
printf("struct Fix543: ptr = %p", this);
printf(", this %% %zd = %2zd", alignment, pThis % alignment);
printf(", &this->xxh %% %zd = %2zd", alignment, pXxh3 % alignment);
printf("\n");
}
};
int main(int argc, char* argv[]) {
for(int i = 0; i < 8; ++i) {
auto* p = new Issue543();
p->test();
// note : we do not delete p for test.
}
for(int i = 0; i < 8; ++i) {
auto* p = new Fix543();
p->test();
// note : we do not delete p for test.
}
}
/*
$ g++ --version
g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
$ g++ -O3 issue_543.cpp && ./a.out
struct Issue543: ptr = 0x55acd6e45eb0, this % 64 = 48, &this->xxh % 64 = 48
struct Issue543: ptr = 0x55acd6e46550, this % 64 = 16, &this->xxh % 64 = 16
struct Issue543: ptr = 0x55acd6e467e0, this % 64 = 32, &this->xxh % 64 = 32
struct Issue543: ptr = 0x55acd6e46a70, this % 64 = 48, &this->xxh % 64 = 48
struct Issue543: ptr = 0x55acd6e46d00, this % 64 = 0, &this->xxh % 64 = 0
struct Issue543: ptr = 0x55acd6e46f90, this % 64 = 16, &this->xxh % 64 = 16
struct Issue543: ptr = 0x55acd6e47220, this % 64 = 32, &this->xxh % 64 = 32
struct Issue543: ptr = 0x55acd6e474b0, this % 64 = 48, &this->xxh % 64 = 48
struct Fix543: custom new for size = 640, ptr=0x55acd6e47740
struct Fix543: ptr = 0x55acd6e47740, this % 64 = 0, &this->xxh % 64 = 0
struct Fix543: custom new for size = 640, ptr=0x55acd6e47a40
struct Fix543: ptr = 0x55acd6e47a40, this % 64 = 0, &this->xxh % 64 = 0
struct Fix543: custom new for size = 640, ptr=0x55acd6e47d40
struct Fix543: ptr = 0x55acd6e47d40, this % 64 = 0, &this->xxh % 64 = 0
struct Fix543: custom new for size = 640, ptr=0x55acd6e48040
struct Fix543: ptr = 0x55acd6e48040, this % 64 = 0, &this->xxh % 64 = 0
struct Fix543: custom new for size = 640, ptr=0x55acd6e48340
struct Fix543: ptr = 0x55acd6e48340, this % 64 = 0, &this->xxh % 64 = 0
struct Fix543: custom new for size = 640, ptr=0x55acd6e48640
struct Fix543: ptr = 0x55acd6e48640, this % 64 = 0, &this->xxh % 64 = 0
struct Fix543: custom new for size = 640, ptr=0x55acd6e48940
struct Fix543: ptr = 0x55acd6e48940, this % 64 = 0, &this->xxh % 64 = 0
struct Fix543: custom new for size = 640, ptr=0x55acd6e48c40
struct Fix543: ptr = 0x55acd6e48c40, this % 64 = 0, &this->xxh % 64 = 0
*/ Also, I'd like to note that it seems recent version of
Default allocator (
|
align macro: /* align macros */
#if defined (__STDC_VERSION__) && (__STDC_VERSION__ >= 201112L) /* C11+ */
# include <stdalign.h>
# define NMH_ALIGN(n) alignas(n)
#elif defined(__GNUC__)
# define NMH_ALIGN(n) __attribute__ ((aligned(n)))
#elif defined(_MSC_VER)
# define NMH_ALIGN(n) __declspec(align(n))
#else
# define NMH_ALIGN(n) /* disabled */
#endif |
@ new and delete aligned_alloc (or whatever) does not always exists (in fact, I used this as a workaround/fix https://github.com/embeddedartistry/embedded-resources/blob/master/examples/c/malloc_aligned.c So I would like to ask a question about alignment as a mandatory prerequisite for XXH3. Is it possible to eliminate at all (optionally), of course "paying" a certain performance penalty? I only did a few quick (failed) tests as I ran into a concomitant g++ (!) bug that took me a long time Thanks for all reply PS The state is big, very big (hundreds of bytes) so, in fact, a "on demand" allocation it's an improvement for my software (sometimes it allocates 500,000 or even a few million structures), not all evils come to harm |
If you can use accessor method and sacrifice 64 bytes, it allows you to introduce manual alignment. struct Fix543_2 {
static const size_t align = 64;
char placeHolder[1];
char xxh3_buf[sizeof(XXH3_state_t) + align];
static uintptr_t computeXxh3AlignedAddr(uintptr_t base) {
// Adopted from XXH_alignedMalloc() as an example.
size_t offset = align - ((size_t)base & (align - 1));
assert(offset <= align);
return base + offset;
}
XXH3_state_t* xxh3State() {
const uintptr_t p0 = reinterpret_cast<uintptr_t>(&this->xxh3_buf[0]);
const uintptr_t p1 = computeXxh3AlignedAddr(p0);
return reinterpret_cast<XXH3_state_t*>(p1);
}
// const XXH3_state_t* xxh3State() const { ... }
void test() {
const uintptr_t pThis = reinterpret_cast<uintptr_t>(this);
const uintptr_t pXxh3 = reinterpret_cast<uintptr_t>(xxh3State());
printf("struct Fix543_2: ptr = %p", this);
printf(", alignof(*this) = %2zd", alignof(*this));
printf(", this %% %zd = %2zd", alignment, pThis % alignment);
printf(", &this->xxh %% %zd = %2zd", alignment, pXxh3 % alignment);
printf("\n");
}
};
/*
struct Fix543_2: ptr = 0x55bf02971f40, alignof(*this) = 1, this % 64 = 0, &this->xxh % 64 = 0
struct Fix543_2: ptr = 0x55bf029721d0, alignof(*this) = 1, this % 64 = 16, &this->xxh % 64 = 0
struct Fix543_2: ptr = 0x55bf02972460, alignof(*this) = 1, this % 64 = 32, &this->xxh % 64 = 0
struct Fix543_2: ptr = 0x55bf029726f0, alignof(*this) = 1, this % 64 = 48, &this->xxh % 64 = 0
struct Fix543_2: ptr = 0x55bf02972980, alignof(*this) = 1, this % 64 = 0, &this->xxh % 64 = 0
struct Fix543_2: ptr = 0x55bf02972c10, alignof(*this) = 1, this % 64 = 16, &this->xxh % 64 = 0
struct Fix543_2: ptr = 0x55bf02972ea0, alignof(*this) = 1, this % 64 = 32, &this->xxh % 64 = 0
struct Fix543_2: ptr = 0x55bf02973130, alignof(*this) = 1, this % 64 = 48, &this->xxh % 64 = 0
*/
With the following conditions, I think it's doable.
For example, add the following // xxhash.h
+ #if ! defined(XXH_ALIGN)
#if defined (__STDC_VERSION__) && (__STDC_VERSION__ >= 201112L) /* C11+ */
# include <stdalign.h>
# define XXH_ALIGN(n) alignas(n)
...
# define XXH_ALIGN(n) /* disabled */
#endif
+ #endif Compile and run the following code with and without #define XXH_STATIC_LINKING_ONLY // access advanced declarations
#define XXH_IMPLEMENTATION // access definitions
#include "../xxhash.h" // modified version of xxhash.h
#include <stdio.h> // printf()
struct MyStruct {
char placeHolder[1];
XXH3_state_t xxh3;
XXH3_state_t* xxh3State() { return &xxh3; }
};
int main(int argc, char* argv[]) {
auto* p = new MyStruct();
XXH3_state_t* state = p->xxh3State();
uint32_t seed = 0;
char buffer[] = "Test";
size_t bufferSize = sizeof(buffer);
XXH3_128bits_reset_withSeed(state, (XXH64_hash_t)seed);
XXH3_128bits_update(state, buffer, bufferSize);
uint32_t digest = (XXH3_128bits_digest(state).low64);
printf("XXH_VECTOR=%d\n", XXH_VECTOR);
printf("digest=0x%08x\n", digest);
printf("alignof(XXH3_state_t) = %zd\n", alignof(XXH3_state_t));
delete p;
} $ g++ -O3 xxh3_alignment.cpp && ./a.out
XXH_VECTOR=1
digest=0x93a401e0
alignof(XXH3_state_t) = 64
$ g++ -O3 xxh3_alignment.cpp -D XXH_VECTOR=XXH_SCALAR -D'XXH_ALIGN(x)=' && ./a.out
XXH_VECTOR=0
digest=0x93a401e0
alignof(XXH3_state_t) = 8 |
The previous macro test only detected C11 and failed in modern C++, which actually goes one step further and makes `alignas` a keyword. It's not clear that this actually improves the situation with respect to Cyan4973#543, but it should be slightly more correct in some sense.
I wish there was a simplified implementation of xxhash, in a single cpp file, without #ifdef, a ~6KB file instead of ~145KB
Something like this
https://create.stephan-brumme.com/xxhash/
but for XXH3 (the 128 bit version), just to get (snippet)
In fact, I find it particularly difficult to debug other programs in which XXH3 is "integrated", it is really difficult to understand the interaction of all the #ifdefs.
Often you have "normal" environments (x86 / x64 machines) in which to use xxhash (actually a great program, when ... it works!)
I'm having trouble with a software that compiles and works fine with gcc on Windows and FreeBSD but not Linux, due to XXH3_state_t
Thank you
The text was updated successfully, but these errors were encountered: