(opt): Inline caches for method lookups when sending messages #13

Hirevo · 2020-08-11T13:50:23Z

This PR adds caches to do faster method resolutions when the receiver class and the call site is the same, as a potential optimization (yet to be measured to see if it's an actual win).

This PR only affects the bytecode interpreter.

Depends on #11.

Hirevo · 2023-01-18T18:43:38Z

During a discussion with @OctaveLarose, he pointed out that my initial implementation of inline caches using hashmaps was likely suboptimal, and that a bigger performance win could potentially be obtained by using plain old arrays for lookups.
So, this is exactly what this new set of changes are doing.

This implementation and layout of the caches is similar to how PySOM does it.

Performance measurements on this have yet to be done, but I'll try to get to it soon enough.

som-interpreter-bc/src/interpreter.rs

OctaveLarose · 2023-01-19T15:50:58Z

som-interpreter-bc/src/interpreter.rs

+                                    inline_cache_receiver.get_unchecked_mut(bytecode_idx)
+                                };
+                                let maybe_found_invocable = unsafe {
+                                    inline_cache_invocable.get_unchecked_mut(bytecode_idx)


The max inline cache size is 1 right? If you used PySOM as a ref, that's what it does indeed, but you might want to investigate a max cache size that's above that. Or not, maybe 1 is already good enough (most call sites will only call a single unique method, but ones that make heavy use of polymorphism won't be happy with a cache that small)
If it's fast as is, then it's fast as is, though! PySOM is doing OK with just 1.

PySOM should have 2 possible cache entries cached_layout1 and cached_layout2:
https://github.com/SOM-st/PySOM/blob/d85ed9d957c2210bee10a836dac4454432e6c965/src/som/interpreter/bc/interpreter.py#L704

Right, it's 2, my bad... I forgot you used the free BC after send like this

I didn't quite notice this during my exploration of PySOM, thanks for pointing this out.
I'll try to implement something like this, and report on this PR how it changes the performance.

OctaveLarose · 2023-01-19T15:52:21Z

som-interpreter-bc/src/interpreter.rs

+                                    }
+                                }
+                            }
+                            FrameKind::Method { method, .. } => {


This is something I've noticed in general when working on som-rs, there's a lot of duplication between Methods and Blocks, but they're almost the same thing right? Could they be unified somehow? (this is outside the scope of those inline caching changes, but seeing it here reminded me again)

They are unified in all other SOMs: a block refers to a method. And then everything follows from there.

Yeah, it is true that there are plenty of places where the code could be a lot cleaner than how it currently is.
I think this is because I originally wrote this code when the interpreter wasn't working yet and I just wanted to get it working before doing refactoring, but I ended up never doing them.
These two branches can definitely be unified, the only difference between them is where the location of the inline cache storage.

Hirevo · 2023-01-24T15:16:21Z

@smarr @OctaveLarose Thanks for the code reviews and sorry for the delay of my replies.

I finally went ahead and ran the benchmarks on the different implementations of inline caches up to this point, and here is a compilation of the results:

I used ReBench to run all the SOM core lib benchmarks, with the same configuration as the current rebench.conf, each time with the relevant binary compiled in release mode:

Binary name	Corresponding commit
without-inline-caches (baseline)	`b1a0c82` (current `master`)
hashmap-inline-caches	`77fd7e7`
two-vec-inline-caches	`4cee8fb`
one-vec-inline-caches	`c7e8160`

These results seems to indicate that:

The original implementation using HashMap was indeed suboptimal (granting rather small gains) and somewhat inconsistent.
The latest single Vec technique is a guaranteed performance improvement (between 4 and 20 % faster).

The remaining thing I want to do before merging this PR is cleaning up some of the code and maybe try out increasing the cache size per call site to more than 1, though maybe the latter could be done in its own PR, I am not sure yet.

Hirevo · 2023-01-24T15:22:20Z

The CI benchmarks are failing due to rebench.polomack.eu currently not being set up properly (I encountered some issues when upgrading ReBenchDB, but nothing too bad).
I'll re-run them once it gets resolved.

smarr · 2023-01-24T15:29:36Z

Apparently @OctaveLarose has also issues running the latest ReBench. But he was using the Docker image. If you run things manually, the database likely needs the migration.*.sql files applied.

Hirevo · 2023-01-24T15:50:25Z

During the upgrade a few days ago, I did encounter some database-related errors and managed to fix it by applying the migrations.
The issue that currently causes the instance to be down is that I made a rather sad mistake when trying to install Rscript 4.1 (now needed due to the use of |>), on a Debian-machine, using the unstable packages (because the stable package of Rscript is too low).
I ended up with the machine no longer having a working libc.so (due to an attempt to upgrade it as well), which meant no program could run properly.
Thankfully, no data is lost since the disks are still perfectly readable (and I had some backups), but it still meant I have to reinstall everything.
Since I have the old configurations and whatnot, it is going pretty smoothly, it is just that it takes quite a bit of time.

smarr · 2023-01-24T16:41:10Z

Yeah, sorry... I am slowly working on getting rid of R completely. But that's very slow work unfortunately...

OctaveLarose · 2023-01-24T17:12:50Z

The latest single Vec technique is a guaranteed performance improvement (between 4 and 20 % faster)

Nice, congrats!

Apparently @OctaveLarose has also issues running the latest ReBench. But he was using the Docker image. If you run things manually, the database likely needs the migration.*.sql files applied.

If relevant, I use commit 79dbe5a66a73d2ed05112956575b1a07077f8c2a, but yeah with the Docker image. I've never ran into R issues, and I dread the day I will

Hirevo · 2023-01-25T16:27:25Z

The ReBenchDB instance is now back online and the benchmarking CI runs have been re-ran.
The reports there seem to confirm the performance wins I observed on my machine.

Hirevo · 2023-01-27T12:47:53Z

I've cleaned up the code a bit to remove the duplication of code between the handling of Send and SuperSend, which did pretty much the same things, just using different variables.
I think that removing the duplication of code between blocks and methods will require deeper changes that seem out-of-scope for this PR.

I currently have an implementation of inline caches with more than one slot per call-site that I have pushed to a separate branch (opt/bigger-inline-caches).

I did run the benchmarks with inline caches of size 2 and 3 and here are the results:

They seem to not make much of a difference in actual performance, so I am thinking that we can merge this PR (with single-slot inline caches) and keep the multiple-slot inline caches as a branch (or open PR) for exploring if it can be improved.

Hirevo · 2023-01-27T16:02:43Z

The branch has been rebased on the latest master branch, prior to merging.

@smarr @OctaveLarose Thanks for the code reviews and the help to improve the implementation !

Hirevo added M-interpreter Module: Interpreter P-medium Priority: Medium C-performance Category: Performance improvements labels Aug 11, 2020

Hirevo self-assigned this Aug 11, 2020

Hirevo force-pushed the opt/inline-caches branch from 0a92249 to 11af59a Compare August 21, 2020 08:17

Hirevo force-pushed the opt/inline-caches branch from 11af59a to 986187f Compare December 11, 2021 18:35

Hirevo force-pushed the opt/inline-caches branch from 986187f to 77fd7e7 Compare January 18, 2023 17:06

smarr reviewed Jan 19, 2023

View reviewed changes

som-interpreter-bc/src/interpreter.rs Outdated Show resolved Hide resolved

OctaveLarose reviewed Jan 19, 2023

View reviewed changes

Hirevo added 6 commits January 27, 2023 16:41

Added inline caches for method lookup

5c1576d

feat: improved inline cache implementation

7ef8325

fix: fixed bug in inline cache implementation

8ce6214

feat: faster inline cache implementation

7e671be

chore: code de-duplication for message sending

e186569

chore: formatted code

7904ae0

Hirevo force-pushed the opt/inline-caches branch from 1847a4e to 7904ae0 Compare January 27, 2023 15:46

Hirevo merged commit 8e2c00e into master Jan 27, 2023

Hirevo deleted the opt/inline-caches branch January 27, 2023 16:03

Hirevo mentioned this pull request Jan 27, 2023

(opt): Multiple inline cache slots per call-site #31

Open

guevara mentioned this pull request Jul 2, 2024

Inline caching in our AST interpreter guevara/read-it-later#11426

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(opt): Inline caches for method lookups when sending messages #13

(opt): Inline caches for method lookups when sending messages #13

Hirevo commented Aug 11, 2020

Hirevo commented Jan 18, 2023 •

edited

Loading

OctaveLarose Jan 19, 2023

smarr Jan 19, 2023

OctaveLarose Jan 19, 2023

Hirevo Jan 24, 2023

OctaveLarose Jan 19, 2023

smarr Jan 19, 2023

Hirevo Jan 24, 2023

Hirevo commented Jan 24, 2023 •

edited

Loading

Hirevo commented Jan 24, 2023

smarr commented Jan 24, 2023

Hirevo commented Jan 24, 2023 •

edited

Loading

smarr commented Jan 24, 2023

OctaveLarose commented Jan 24, 2023

Hirevo commented Jan 25, 2023

Hirevo commented Jan 27, 2023

Hirevo commented Jan 27, 2023

(opt): Inline caches for method lookups when sending messages #13

(opt): Inline caches for method lookups when sending messages #13

Conversation

Hirevo commented Aug 11, 2020

Hirevo commented Jan 18, 2023 • edited Loading

OctaveLarose Jan 19, 2023

Choose a reason for hiding this comment

smarr Jan 19, 2023

Choose a reason for hiding this comment

OctaveLarose Jan 19, 2023

Choose a reason for hiding this comment

Hirevo Jan 24, 2023

Choose a reason for hiding this comment

OctaveLarose Jan 19, 2023

Choose a reason for hiding this comment

smarr Jan 19, 2023

Choose a reason for hiding this comment

Hirevo Jan 24, 2023

Choose a reason for hiding this comment

Hirevo commented Jan 24, 2023 • edited Loading

Hirevo commented Jan 24, 2023

smarr commented Jan 24, 2023

Hirevo commented Jan 24, 2023 • edited Loading

smarr commented Jan 24, 2023

OctaveLarose commented Jan 24, 2023

Hirevo commented Jan 25, 2023

Hirevo commented Jan 27, 2023

Hirevo commented Jan 27, 2023

Hirevo commented Jan 18, 2023 •

edited

Loading

Hirevo commented Jan 24, 2023 •

edited

Loading

Hirevo commented Jan 24, 2023 •

edited

Loading