Skip to content

Conversation

@Squareys
Copy link

@Squareys Squareys commented Aug 9, 2022

Hi @kripken & @sbc100 !

As promised in #4884, this is split off into it's own PR.
For it to have any effect, though, it still depends on avoiding the default constructor allocation, implemented in that other PR.

Benchmarks for the SmallSet template argument following in short.

Best,
Jonathan

@Squareys
Copy link
Author

Squareys commented Aug 9, 2022

Here's the benchmarks (edited for readability), script, input wasm and the executables can be found here

$ ./run-wasm-opt.sh
wasm-opt-baseline:
real    0m23.973s

wasm-opt-no-set-allocation:
real    0m23.209s

wasm-opt-smallset-10:
real    0m18.224s

wasm-opt-smallset-4:
real    0m17.072s

wasm-opt-smallset-2:
real    0m17.087s

wasm-opt-smallset-1:
real    0m17.584s

wasm-opt-smallset-0:
real    0m19.192s
---------------

Here are a couple of culprits:

Where with EffectsAnalyzer, avoiding the default constructor allocation in std::set has a great effect, it seems to cause a 10% pessimization in the rest of the code. This could be because EffectsAnalyzer has a lot of empty sets and usually few elements (which is possibly affirmed by the smaller SmallSet sizes being faster), in which case the std::optional overhead would also barely show an effect.

The rest of the code may be way less effected by that one allocation, if they drown out the improvement there by allocating many times later by exceeding the small size, and also the optional overhead might weigh in way more here.

Please also note that this benchmark is heavily biased to the way Wonderland Engine writes code (which is a non-standard flavor over heavily optimized data-oriented C++) and probably not representative of "the average C++ wasm". You may want to run the benchmarks with other .wasm input to test.

All in all, @kripken, I'm considering wheter it wouldn't make sense to revert the changes to SmallSet and just use std::optional here in EffectsAnalyzer?

@kripken
Copy link
Member

kripken commented Aug 9, 2022

Thanks!

Ok, I did some benchmarking on a UE4 game, a Dart benchmark, and Poppler, all on Linux. Raw times are noisy but the instruction counts from perf stat are pretty robust. Numbers are in K of instructions:

baseline SmallSet<10> in effects.h SmallSet<1> SmallSet<0> just SmallSet, no effects.h changes
Dart 27,193 26,793 26,878 27,232 27,088
Poppler 16,262 15,836 15,812 16,246 16,275
UE4 495,164 479,595 480,944 496,245 495,671

(runtimes are generally similar to these numbers).

Looks like SmallSet<0> and just changing SmallSet (no changes to effects.h) make almost no difference compared to baseline. 1 and 10 do make a difference. I'd say 1 is the safer option since it is the smallest memory size increase.

I also lean towards changing SmallSet. While the change by itself doesn't change much, as we use SmallSet more it will help more, and will avoid us needing to add std::optional in more places. And I do think it makes sense to do, the more I think on it: SmallSet assumes we will rarely need the full set, and using std::optional there makes us just have a pointer for it, rather than its full contents (which will be at least a pointer + a size).

So I suggest we land #4884 and then land this but with 1 instead of 10. What do you think?

@Squareys Squareys force-pushed the effects-analyzer-small-set branch from 4868e73 to 3561859 Compare August 9, 2022 17:57
@Squareys
Copy link
Author

Squareys commented Aug 9, 2022

@kripken Sounds great! I updated this PR to use 1 instead of 10!

@Squareys Squareys force-pushed the effects-analyzer-small-set branch from 3561859 to 5ba07b5 Compare September 28, 2022 13:22
@Squareys
Copy link
Author

Rebased onto the changes to #4884 !

@Squareys Squareys force-pushed the effects-analyzer-small-set branch 2 times, most recently from 4a043b0 to 4b0774d Compare November 23, 2022 11:38
@Squareys
Copy link
Author

Squareys commented Nov 23, 2022

Rebased onto the changes to #4884 once again, which should also fix the test case that was failing over there.

Also noticed I had apparently missed two sets that were still on 10 instead of 1 element. Hope that wasn't just my faulty memory and it was intended to keep those different for some reason.

@Squareys Squareys force-pushed the effects-analyzer-small-set branch 3 times, most recently from a96f0b9 to 491f1ff Compare November 23, 2022 12:51
Signed-off-by: Squareys <squareys@googlemail.com>
Signed-off-by: Squareys <squareys@googlemail.com>
…t does

Signed-off-by: Squareys <squareys@googlemail.com>
Signed-off-by: Squareys <squareys@googlemail.com>
@Squareys Squareys force-pushed the effects-analyzer-small-set branch from 491f1ff to 7ce6281 Compare March 29, 2023 21:02
@Squareys
Copy link
Author

Rebased onto the changes to #4884 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants