New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: represent length of short strings using 1 byte instead of 8 #40727
base: master
Are you sure you want to change the base?
Conversation
Copying a benchmark from slack (by Harmen Stopples)
|
Ah, I guess MPFR might depend on the data being aligned on some architectures. |
Very cool! I only caught the tail end of your twitch stream but wish I'd tuned in earlier. This still doesn't allow an array of Strings to be stored inline, right? That would still be a vector of pointers since Strings don't have a fixed size. |
Right. It would be interesting to explore pointer tagging for this, though a more dramatic change of course. Would pose challenges for code that uses the address of the data, when to the compiler it's just a value in a register. The MPFR code is a good example: the struct stores both a string and its address. In the inlined case, the address is inside the enclosing object, and so can't be computed from just the string itself. Solvable of course, but produces various non-obvious cases like that. (Ok, a BigFloat would always be more than 8 bytes, but you get the idea 😄 ). |
6ea4cf2
to
822a053
Compare
src/array.c
Outdated
size_t sz = len + 1 + (len < 128 ? 1 : sizeof(size_t)); | ||
#else | ||
size_t sz = len + 1 + sizeof(size_t); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This size getting should probably be extracted to a static function.
We should not feel bound to MPFR's design decisions where it possible to continue in a manner that better fits our forward. MPFR is wonderfully well cared-for by a very few; this does not allow the project time to use for design arcs. |
822a053
to
8dde34e
Compare
8dde34e
to
7401b2e
Compare
Fortunately we can hack around it for MPFR by over-allocating and aligning manually. We could even have a call to request an aligned string (i.e. request the full-word-length representation). Here's another microbenchmark that shows some time overhead: before:
after:
|
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @christopher-dG |
Yeah, there are some slowdowns... |
Since the dawn of time, mankind has sought to make things smaller. Currently a
String
on 64-bit is at least 32 bytes: type tag, length, data, plus alignment to the next multiple of 16. This PR uses a 1-byte length with the low bit as a discriminator for short strings, allowing <= 6 byte strings to fit in 16 bytes.