(opt): Multiple inline cache slots per call-site #31

Hirevo · 2023-01-27T16:24:52Z

This PR follows the work done in #13 to improve the performance of message sends using inline caches for method lookups.

The current implementation uses a single never-evicted cache slot per call-site for the inline caches, which was a notable performance improvement compared to not using any cache at all.

This PR experiments with adding more cache slots per call-site (currently: 3 slots per call-site), with the goal of further improve method dispatch performance.

The initial performance assessment is that, in this current state, the additional slots do not measurably improve performance.

For reference, here are the benchmarking numbers that led to this conclusion:

The goal of this PR is to continue exploring how to potentially improve the implementation to make better use of these additional cache slots.

smarr · 2023-01-27T17:24:40Z

This is because in our benchmarks, the large majority of calls is monomorphic.

I don't have full data for all the SOM benchmarks, but the ones that are in the AreWeFastYet benchmarks have some data here: https://github.com/smarr/are-we-fast-yet/blob/master/docs/metrics.md#dynamic-metrics

Hirevo · 2023-01-27T18:08:24Z

Oh, I see.
I was wondering if it was me who did anything that would have slowed the lookups and therefore cancel out any performance improvement, but it makes sense that most calls is indeed non-polymorphic.

Thanks for the link, this is very interesting data.

Though I have no idea what is meant by "target polymorphism" in the document.
If I am not mistaken, "observed receiver polymorphism" is when the receiver is observed to sometimes be different types at the same call-site, but I am not sure what would be "observed target polymorphism".
Is it when the receiver type and call-site stays the same but the message sent is observed to sometimes change (like when using Object>>#perform: or Method>>#invokeOn:with:) ?

smarr · 2023-01-27T19:08:48Z

For the long version :) check https://stefan-marr.de/downloads/dls22-kaleba-et-al-analyzing-the-run-time-call-site-behavior-of-ruby-applications.pdf

The short version: target polymorphism is when not just the receiver type differs but there was actually a different method. Classic example is some class hierarchy where you have the method implemented at the top of the hierarchy, but always see different subclasses. The method is the same, but you still see different receivers. So, it's polymorphic over the receivers, but actually monomorphic over the targets that are activated.

feat: multiple inline cache slots per call-site

68cc5ca

Hirevo added M-interpreter Module: Interpreter P-medium Priority: Medium C-performance Category: Performance improvements labels Jan 27, 2023

Hirevo self-assigned this Jan 27, 2023

guevara mentioned this pull request Jul 2, 2024

Inline caching in our AST interpreter guevara/read-it-later#11426

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(opt): Multiple inline cache slots per call-site #31

(opt): Multiple inline cache slots per call-site #31

Hirevo commented Jan 27, 2023

smarr commented Jan 27, 2023

Hirevo commented Jan 27, 2023 •

edited

Loading

smarr commented Jan 27, 2023

(opt): Multiple inline cache slots per call-site #31

Are you sure you want to change the base?

(opt): Multiple inline cache slots per call-site #31

Conversation

Hirevo commented Jan 27, 2023

smarr commented Jan 27, 2023

Hirevo commented Jan 27, 2023 • edited Loading

smarr commented Jan 27, 2023

Hirevo commented Jan 27, 2023 •

edited

Loading