Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Murmer3 implementation has UB flagged by UBSan (though probably safe on desktop) #49

Closed
leethomason opened this issue Aug 31, 2022 · 2 comments · Fixed by #51
Closed
Assignees
Labels
bug Something isn't working

Comments

@leethomason
Copy link
Contributor

Describe the bug

UBSan correctly flags this line in the Murmer3 code:

/Users/runner/work/orc/orc/src/hash.cpp:40:69: runtime error: load of misaligned address 0x00010c28f87b for type 'const uint64_t' (aka 'const unsigned long long'), which requires 8 byte alignment

This is correct. ORC string pools aren't aligned to any memory address (and const char* doesn't need to be), but Murmer3 casts a void* to a uint64_t* and that causes the UBSan flag.

Note I can't imagine any machine capable of running ORC that would actually fail on an unaligned read, but the UBSan flag is correct.

Expected behavior

The correct fix is probably in Murmer3 - it takes in a void* without restriction and should deal with alignment. That's not a trivial change.

We could move our strings to 16 byte alignment, but that wastes a bunch of memory.

@leethomason leethomason added the bug Something isn't working label Aug 31, 2022
@leethomason leethomason changed the title Murmer3 implementation as UB flagged by UBSan (though probably safe on desktop) Murmer3 implementation has UB flagged by UBSan (though probably safe on desktop) Aug 31, 2022
@leethomason
Copy link
Contributor Author

I added comments to an existing issue: aappleby/smhasher#74

@fosterbrereton
Copy link
Contributor

fosterbrereton commented Aug 31, 2022

According to this random post, we may want to use memcpy to grab data 8 bytes at a time:

memcpy is the canonical way to bypass alignment and aliasing assumptions in C compilers. The compiler will turn it into whatever the most efficient sequence for unaligned memory access on the given architecture is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants