Is it possible to make Option<CompactStr> same size as Option<String>? #19

mateuszkj · 2021-10-27T04:26:59Z

It would be very beneficial for me to reduce size of Option<CompactStr>. I use lots of it in my projects.

I made some tests, and it looks like Option<ComactStr> takes additional 8 bytes in compaction to Option<String>.

Rust has null pointer optimization (see https://doc.rust-lang.org/std/option/index.html#representation) but I not sure if it can by used in userland crate.

sizeofs of popular Strings.

String: 24 bytes
Option<String>: 24 bytes
CompactStr: 24 bytes
Option<CompactStr>: 32 bytes
KString: 24 bytes
Option<KString>: 24 bytes
SmartString<Compact>: 24 bytes
Option<SmartString<Compact>>: 32 bytes
SmolStr: 24 bytes
Option<SmolStr>: 24 bytes

The text was updated successfully, but these errors were encountered:

NobodyXu · 2021-10-27T06:36:18Z

I think it is possible to do, but then CompactStr have to store the size of "" as 1 and "1" as 2.

Since it can store the string inline, the only field it can guarantee to not be 0 is the discriminator/length field.

ParkMyCar · 2021-10-27T16:41:19Z

Thanks for the issue!

This is a great use case I hadn't thought of. At a bit level, we should be able to represent a distinct None variant, how to get this to work with Option<CompactStr> will require some more research though. This is definitely something I'd like to pursue

tttwang23 · 2022-03-31T05:54:15Z

Why not bring the length byte to byte 0? If you disallow byte 0 to be ever 0, then you can use the NonZeroU8 rule to optimize the option. Then byte 0 having value of 1 means it is a heap case. Value 2 to 25 maps to length of 0 to 23. Value 26 onwards maps to UTF8 values for the longest inline character.

Length byte at byte 0 is nice because getting the length byte does not need an address offset calculation. Perhaps this is a way to claw back lost performance.

For the heap case, the 8 pad bytes would be the first 8 bytes, padded with value 1.

NobodyXu · 2022-03-31T06:05:53Z

But what if the string is empty?

ParkMyCar · 2022-03-31T15:08:28Z

We don't necessarily need the niche value to be zero (i.e. NonZeroU8), you can define your own NonXU8 using enums, see #22 for an example of a NonMaxU8.

The tricky part is getting the compiler to see the niche value, without losing any performance. Internally a CompactStr uses a union which the compiler skips when checking for niche values. In #22 I tried defining a "compiler public" struct for a CompactStr which was essentially (NonMaxU8, [u8; 23]), and then transmuting to the union when we need to do an operation. This way the compiler could use the value 255 in the first byte to store the None variant. Unfortunately transmuting introduced an additional copy, so it literally doubled the time to create an inline string because now we had to copy data twice (once for transmute, once for the actual creating of a CompactStr).

Also FWIW the length byte is now the last byte because it allows the memcpys for an inline string to be better aligned, which had a significant (~35%) improvement on performance

CAD97 · 2022-03-31T16:45:05Z

I tried again (I'll open a draft PR after some more experimentation) and, yeah, the compiler hates it

empty                   time:   [8.8500 ns 8.8586 ns 8.8729 ns]
                        change: [+515.19% +516.38% +517.57%] (p = 0.00 < 0.05)
                        Performance has regressed.

inline                  time:   [16.020 ns 16.024 ns 16.033 ns]
                        change: [+50.601% +50.720% +50.922%] (p = 0.00 < 0.05)
                        Performance has regressed.

packed                  time:   [11.073 ns 11.082 ns 11.090 ns]
                        change: [+228.99% +229.96% +230.67%] (p = 0.00 < 0.05)
                        Performance has regressed.

heap                    time:   [54.283 ns 54.383 ns 54.495 ns]
                        change: [+1.1258% +1.5541% +1.9540%] (p = 0.00 < 0.05)
                        Performance has regressed.

Note, though, that I'm testing in a noisy environment; the std_string bench is also seeing noise.

ParkMyCar · 2022-03-31T18:07:53Z

Thanks! I was thinking using transmute_copy might be able to help since that doesn't move the data. But IMO that starts moving into a world of being too unsafe, and at that point why not just use the compiler macros that NonZero* uses (with the exception of transmute_copy being stable)

CAD97 · 2022-03-31T18:25:23Z

No transmute_copy still moves the data; transmute_copy is roughly transmute(ptr::read(src)) or *(src as *const T as *const U). #75 is my attempt, which does the zero-copy &ReprNiche as *const ReprNiche as *const ReprUnion as &ReprUnion, but sees the regression. I took a peek at the assembly, and, have no idea why the regression exists. (Further discussion of this attempt should probably happen in #75.)

tttwang23 · 2022-04-01T02:57:32Z

How about the idea that the first 64 bit word for the heap case and short string are shared (non-union). Basically one single 64 bit memory loads the length byte, and the fist 7 byte values. Shouldn't 64 bit read & write beat memcpy? You can shift in multiples of 8 bits to access the byte values. (word64 >> (indx << 3)) as u8

You just have to map length 0 to a non-zero 64 bit word. Either the length mapping itself can be played around, or though a non-zero padding of the out of bound bytes. The heap string case would have length of 25 chars of longer, so 0 is not possible.

CAD97 · 2022-04-01T03:16:01Z

See #75. The problem isn't extra copies, and the compiler/optimizer is really good at turning small copies into the efficient thing, you aren't going to beat it by changing between aligned writes and larger typed copies.

The difference comes from opaque optimizer shenanigans, and seemingly unrelated changes are going to be what fix/diagnose it.

My current bet is on these microbenchmarks are so sensitive that the noise we're seeing may even be down to machine code alignment (I know I've read other, better articles about this, at least one of which was attached to a tool to shuffle machine code around to diagnose if that is the compounding factor, but I can't find it right now.)

Update fuzz.yml

ParkMyCar · 2022-06-17T23:52:21Z

Completed as part of #105

ParkMyCar mentioned this issue Oct 31, 2021

draft: Optimize Option<CompactStr> to be the same size as CompactStr #22

Closed

ParkMyCar mentioned this issue Mar 31, 2022

Smol option #75

Closed

nottirb pushed a commit to nottirb/compact_str that referenced this issue May 15, 2022

Merge pull request ParkMyCar#19 from samsaq/mayhem-fork

2ce4f89

Update fuzz.yml

ParkMyCar closed this as completed Jun 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to make Option<CompactStr> same size as Option<String>? #19

Is it possible to make Option<CompactStr> same size as Option<String>? #19

mateuszkj commented Oct 27, 2021 •

edited

NobodyXu commented Oct 27, 2021 •

edited

ParkMyCar commented Oct 27, 2021

tttwang23 commented Mar 31, 2022

NobodyXu commented Mar 31, 2022

ParkMyCar commented Mar 31, 2022

CAD97 commented Mar 31, 2022

ParkMyCar commented Mar 31, 2022 •

edited

CAD97 commented Mar 31, 2022 •

edited

tttwang23 commented Apr 1, 2022

CAD97 commented Apr 1, 2022

ParkMyCar commented Jun 17, 2022

Is it possible to make Option<CompactStr> same size as Option<String>? #19

Is it possible to make Option<CompactStr> same size as Option<String>? #19

Comments

mateuszkj commented Oct 27, 2021 • edited

NobodyXu commented Oct 27, 2021 • edited

ParkMyCar commented Oct 27, 2021

tttwang23 commented Mar 31, 2022

NobodyXu commented Mar 31, 2022

ParkMyCar commented Mar 31, 2022

CAD97 commented Mar 31, 2022

ParkMyCar commented Mar 31, 2022 • edited

CAD97 commented Mar 31, 2022 • edited

tttwang23 commented Apr 1, 2022

CAD97 commented Apr 1, 2022

ParkMyCar commented Jun 17, 2022

mateuszkj commented Oct 27, 2021 •

edited

NobodyXu commented Oct 27, 2021 •

edited

ParkMyCar commented Mar 31, 2022 •

edited

CAD97 commented Mar 31, 2022 •

edited