Skip to content

Cache HashSet in try_to_allocate_bundle_to_reg#90

Merged
cfallin merged 1 commit intobytecodealliance:mainfrom
Amanieu:conflict_set
Sep 26, 2022
Merged

Cache HashSet in try_to_allocate_bundle_to_reg#90
cfallin merged 1 commit intobytecodealliance:mainfrom
Amanieu:conflict_set

Conversation

@Amanieu
Copy link
Copy Markdown
Contributor

@Amanieu Amanieu commented Sep 25, 2022

Keep conflict_set allocated in Env instead of allocating a new one on every call. This improves register allocation performance by about 2%.

Keep `conflict_set` allocated in `Env` instead of allocating a new one
on every call. This improves register allocation performance by about
2%.
@Amanieu Amanieu changed the title Cache HashSet in try_to_allocate_bundle_to_reg` Cache HashSet in try_to_allocate_bundle_to_reg Sep 25, 2022
Copy link
Copy Markdown
Member

@cfallin cfallin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An easy win, thanks for finding this!

@cfallin cfallin merged commit 227a9fd into bytecodealliance:main Sep 26, 2022
@jameysharp
Copy link
Copy Markdown
Contributor

While double-checking my benchmarking setup before doing some perf experiments on regalloc2, I measured the effect of this PR on wasmtime. Although the effect disappears into the noise on larger inputs, I can confirm that this PR is a win across the board. I figured I might as well share the numbers.

For a small program like the bz2 benchmark from Sightglass, this PR is "1.02 ± 0.01 times faster" by CPU time, according to Hyperfine. Sightglass says it's "1.02x to 1.03x faster" by CPU cycles, and 1.02x faster by instructions retired.

For a slightly larger benchmark (pulldown-cmark), this PR is "1.01 ± 0.01 times faster" according to Hyperfine. Sightglass says "1.00x to 1.01x faster" by CPU cycles, and 1.01x faster by instructions retired.

On our largest benchmark (spidermonkey), this PR is "1.00 ± 0.01 times faster" according to Hyperfine. Sightglass reports "No difference in performance" by CPU cycles, and 1.01x faster by instructions retired.

@Amanieu Amanieu deleted the conflict_set branch November 24, 2023 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants