Modify a SmallVec inline size for UseList to be slightly larger.#93
Conversation
|
I'll note also that this PR adds the |
jameysharp
left a comment
There was a problem hiding this comment.
I think there's something strange going on here.
The Extend implementation for SmallVec uses Iterator::size_hint to reserve an appropriate amount of space. And the FromIterator implementation, used in Iterator::collect, just allocates a new vector and then calls extend on it. So for any iterator where size_hint yields a good approximation of the length of the iterator, collect() should be equivalent to SmallVec::with_capacity followed by extend.
Iterators over slices have an exact implementation of size_hint, and the Skip and Cloned iterators preserve however much precision was in the preceding iterator chain.
So aside from the change to double the inline array size in UseList (which I strongly approve of), I think this patch should have had zero effect on performance. How confident are you in your measurements? I'd want to dig deeper if there's a measurable effect from this despite all the performance tuning that's gone into SmallVec and Iterator.
|
Huh, that's really weird -- given that, I agree that there shouldn't be an effect. I wasn't aware that the size hinting was preserved even through |
|
I did some more controlled measurements of the two parts to this PR (the explicit sizing vs. relying on size hints to So in other words, a reliable 1% improvement but just from the smallvec inline size change. The iterator size hinting is indeed working as you describe, so the other half didn't have an effect. I'll update the PR to contain just the first part -- thanks! |
This PR updates the `UseList` type alias to a `SmallVec` with 4 `Use`s (which are 4 bytes each) rather than 2, because we get 16 bytes of space "for free" in a `SmallVec` on a 64-bit machine. This PR improves the compilation performance of Cranelift by 1% on SpiderMonkey.wasm (measured on a Linux desktop with pinned CPU frequency, and pinned to one core). It's worth noting also that before making these changes, I explored whether it would be possible to put the lists of uses and liveranges in single large backing `Vec`s; the basic reason why we can't do this is that during liverange construction, we append to many lists concurrently. One could use a linked-list arrangement, and in fact RA2 did this early in its development; the separate `SmallVec`s were better for performance overall because the cache locality wins when we traverse the lists many times. It may still be worth investigating use of an arena to allocate the vecs rather than the default heap allocator.
b05c254 to
7c9497d
Compare
elliottt
left a comment
There was a problem hiding this comment.
Thanks for the benchmarks and writeup!
This PR updates the
UseListtype alias to aSmallVecwith 4Uses (which are 4 bytes each) rather than 2, because we get 16 bytesof space "for free" in a
SmallVecon a 64-bit machine.This PR improves the compilation performance of Cranelift by 1% on
SpiderMonkey.wasm (measured on a Linux desktop with pinned CPU
frequency, and pinned to one core).
It's worth noting also that before making these changes, I explored
whether it would be possible to put the lists of uses and liveranges
in single large backing
Vecs; the basic reason why we can't do thisis that during liverange construction, we append to many lists
concurrently. One could use a linked-list arrangement, and in fact RA2
did this early in its development; the separate
SmallVecs werebetter for performance overall because the cache locality wins when we
traverse the lists many times. It may still be worth investigating use
of an arena to allocate the vecs rather than the default heap allocator.