Shared stack between calls #816

thedevbirb · 2023-10-18T14:41:51Z

This is an experiment for a shared stack between calls. Let me know what you think about it and if it is feasible to bring it in!
The model is very similar to last one developed in #445 (with checkpoints), although there are some differences for the allocation strategy.

Allocations

There are three different strategies here:

do one single 32MB allocation -> we've seen with the shared memory that this approach is not good and overkill most of the times
check for capacity at every push operation in the stack -> does the minimum amount of memory allocations, however it adds a small but non-negligible overhead for push operations, which results in a performance regression in a very stress test like the snailtracer bench
ensure a capacity of STACK_LIMIT every time you enter new context -> Although this is very similar to what happens normally (i.e., on new context we allocate space for the stack), it still keeps peak memory usage allocated for next contexts and allows for faster unsafe operations, reducing push operations overhead. This is the approach I kept after all the experiments

Performance

I managed to keep regression to the minimum, however benches don't really the shared stack at all. Even bench_eval on the snailtracer remains always on the same context, therefore the shared mechanism doesn't apply well.
There is of course a minimum of overhead because this abstraction ain't free, but it seems feasible.

analysis/transact/raw   time:   [8.2504 µs 8.3154 µs 8.4299 µs]
                        change: [+2.8469% +3.7488% +4.8213%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
analysis/transact/checked
                        time:   [8.2338 µs 8.3507 µs 8.4382 µs]
                        change: [+1.0517% +1.9726% +2.9546%] (p = 0.00 < 0.05)
                        Change within noise threshold.
analysis/transact/analysed
                        time:   [5.6613 µs 5.6720 µs 5.6838 µs]
                        change: [+0.6835% +2.5449% +4.1668%] (p = 0.01 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

snailtracer/transact/analysed
                        time:   [67.868 ms 68.022 ms 68.191 ms]
                        change: [+12.395% +13.393% +14.409%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild
snailtracer/eval        time:   [61.284 ms 61.386 ms 61.590 ms]
                        change: [+0.6965% +5.5300% +9.5800%] (p = 0.03 < 0.05)
                        Change within noise threshold.

transfer/transact/analysed
                        time:   [1.2470 µs 1.2484 µs 1.2505 µs]
                        change: [+6.3084% +6.6428% +6.9314%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) high mild
  6 (6.00%) high severe

crates/interpreter/src/interpreter/shared_stack.rs

rakita · 2023-10-18T14:59:39Z

You can check performance with cachegrind and docker like this: #797

and additionally, we should write the proper perf test for this. There is maybe bytecode that we can reuse from eth/tests: https://github.com/ethereum/tests/tree/develop/GeneralStateTests
There are fillers that are more readable: https://github.com/ethereum/tests/tree/develop/src/GeneralStateTestsFiller/stCallCodes

thedevbirb · 2023-10-18T16:16:16Z

You can check performance with cachegrind and docker like this: #797

and additionally, we should write the proper perf test for this. There is maybe bytecode that we can reuse from eth/tests: https://github.com/ethereum/tests/tree/develop/GeneralStateTests There are fillers that are more readable: https://github.com/ethereum/tests/tree/develop/src/GeneralStateTestsFiller/stCallCodes

I agree regarding a proper performance test for this. We need to choose something that goes up and down in call depths to have a proper picture of the gains of this setup.

In the meanwhile, I tried with cachegrind as you said (thanks, I'll keep that mind for the future). Here are the results on my machine:

main 1dcebc4 -- 482,459,263
shared_stack 6c4acdf -- 529,105,196

Therefore yes there is still some work to do to bring down regression when ~~this~~ the shared setup is not used

crates/interpreter/src/host.rs

rakita · 2023-10-25T14:59:59Z

crates/interpreter/src/interpreter.rs

-    /// Stack.
-    pub stack: Stack,
+    /// Shared stack.
+    pub shared_stack: &'a mut SharedStack,


Can we somehow expose the local stack here? With reference, we have two hops one to the Shared stack and the second to the buffer to access the stack.

Ok so what you have in mind is not exposing all the shared stack struct but only a smaller version which is simply a wrapper of buffer: *mut Buffer.

If I got this right it makes sense, however the call/create opcode and call_inner/create_inner functions would have some problems because they get the shared stack from the interpreter itself, therefore we cannot hide some methods like new/free_context.
EDIT: in general I could have problems with the Host trait and passing around SharedContext as you suggested above

Even *mut Buffer would be a pointer to the Buffer that has a pointer to the stack (Vec). With the new loop call, this is resolved as we fully move the the first structure to the Iterator.

tbh not sure how impactful is this, maybe it is insignificant

crates/interpreter/src/interpreter/shared_stack.rs

thedevbirb · 2023-10-31T08:31:15Z

Hey @rakita, I tried to ask on the ethereum/tests repository about a stress test but the suggestion was to write a custom one.

In the meanwhile I tried to implement this approval transfer bench: even if it does not reach a huge depth it loops over context changes, therefore I thought it could be good to see how it performs over those.
Here are the result (cargo bench --all against main 0d78d1e):

approval_transfer/transact/analysed
                        time:   [4.7078 µs 4.7261 µs 4.7508 µs]
                        change: [+10.900% +12.297% +13.797%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe
approval_transfer/eval  time:   [55.583 ns 55.690 ns 55.817 ns]
                        change: [-84.656% -84.591% -84.545%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

Regarding the custom test is there something in particular you had in mind?

rakita · 2023-11-07T16:10:33Z

crates/interpreter/src/interpreter/shared_stack.rs

+    /// heap lookup for basic stack operations.
+    ///
+    /// Invariant: it a valid pointer to `self.pages[self.page_idx].buffer`
+    buffer: *mut Buffer,


Would put full Page here not just the Buffer. and rename pages field to previous_pages to represent pages that are not active

hm i see the problem here, maybe we should not look at free pages as a stack. wdyt about idea of having two fields: taken_pages and free_pages and they are both Vecs.

if the context is freed you put the Page to the free_pages vec and on new_context you pop page from free_pages or create a new one if there is none.

Another idea (not related to the first part) is related to the Buffer and the context_len, Buffer is a Vec here that has its own len and additionally we have context_len that we increment.

We could work with *mut U256 which would point us directly to the top of the stack, and we could have context_len that would tell us the bound of the stack. I like this idea, but we should be careful about the usage of the pointer.

I like the first idea! It should make it easier to reason about and we can avoid the page_idx which can be annoying to work with.

Regarding the second idea let me know if I got this right: with a raw pointer to the top of the stack, all operations are not done on the buffer itself (which is preallocated with 1024 capacity) but rather on the pointer, by dereferencing and incrementing/decrementing it. If that's the case, do you think it provides a performance improvement over dealing with the Buffer pointer or it is a matter of ergonomics?

Lastly, but slighly unrelated: wdyt of putting both shared stack and shared memory under the EVMContext struct? Then when creating the interpreter we can pass a raw pointer to the current context buffer of stack and memory.

Usage of the pointer should be slightly faster, it should be less ergonomical as you directly handle the pointer.

Would be good to put it in EvmContext but I think borrowing is going to kill us there, and it would not matter a lot.

Okay, I'll try to modify the shared stack with suggestions. It will be some work to do, I hope some wait is not a problem!

Regarding EVMContext maybe I can think of it in a separate PR after this and see if it is reasonable

Take your time and pace yourself, I am fine to take over if you feel burdened to finish it, and I am fine with waiting. Both things work for me.

We can maybe saparate that pointer idea to another PR to not clog this.

Yeah if you want I'd be very happy to work on the pointer idea on a separate PR!

crates/interpreter/src/interpreter/shared_stack.rs

rakita

left a few ideas that could improve the PR, I like the page system and new context, but I wanted to check how loop call will look so we can integrate this inside it so I apologize for the long overdue review.

thedevbirb · 2023-11-10T07:59:18Z

left a few ideas that could improve the PR, I like the page system and new context, but I wanted to check how loop call will look so we can integrate this inside it so I apologize for the long overdue review.

Thanks for the feedback and no problem for the overdue!

crates/interpreter/src/interpreter/shared_stack.rs

thedevbirb · 2023-11-13T10:38:13Z

Hey there, I pushed some changes regarding the free and taken pages model. Also, now both stack and memory are under a SharedContext struct.
I still need to address some of the comments regarding UB and upstream sync, will do it shortly!

DaniPopes · 2023-11-13T12:24:15Z

crates/interpreter/src/interpreter/shared_context.rs

+pub const EMPTY_SHARED_CONTEXT: SharedContext = SharedContext {
+    stack: EMPTY_SHARED_STACK,
+    memory: EMPTY_SHARED_MEMORY,
+};


This should be a const fn empty() -> Self { ... } or const EMPTY: Self = ...; on all the related structs. Const item initializers are weird to me, so I'd prefer the former.

Okay! I kept this for consistency with rakita's work, if he also agrees with this change for me it's perfectly fine

…d dup

crates/interpreter/src/interpreter.rs

DaniPopes · 2023-11-13T13:42:13Z

crates/interpreter/src/interpreter.rs

@@ -96,14 +97,18 @@ impl Interpreter {
            instruction_result: InstructionResult::Continue,
            is_static,
            return_data_buffer: Bytes::new(),
-            shared_memory: EMPTY_SHARED_MEMORY,
-            stack: Stack::new(),
+            shared_context: EMPTY_SHARED_CONTEXT,


shouldn't this be ::new() to pre-allocate?

Actually I don't think so, because we want to manually give the created context to then interpreter when it calls the run method of EVMImpl. See https://github.com/bluealloy/revm/pull/816/files/1de2425fed99f33422e0fa37911f7695a2a39c5b#diff-1d478ba44ccc56e3b1142bd3723bf97f3e254c25dd18323481aedadce0803e91R165-R170. Maybe I am missing something from what you said

crates/interpreter/src/instructions/macros.rs

thedevbirb · 2023-11-23T11:56:09Z

Hey guys, I resolved some comments to clean this up: is there something left to do? More perf tests? Right now perf isn't ideal if you remain always in the same context. Maybe I can try to use the pointer to the top of the stack as @rakita said

rakita · 2023-11-30T10:42:43Z

Hey guys, I resolved some comments to clean this up: is there something left to do? More perf tests? Right now perf isn't ideal if you remain always in the same context. Maybe I can try to use the pointer to the top of the stack as @rakita said

Hey, mostly focusing on delivering the EvmBuilder and refactor around it, so will look at this after that.

thedevbirb · 2024-01-14T16:55:09Z

Hey @rakita, I've seen that EVM Context-Builder PR is merged. Great work!

I tried to take a look at the changes and I was wondering if it makes sense to take a look at this from scratch, if you feel a shared stack would be beneficial for this evm.

Both the pages model can be revisited (imo, it seems that a more complex strategy that allocates less is worse than a simpler one with more allocations) and also the SharedContext doesn't seem to play too nicely with the current setup. An example is passing manually SharedContext taken from the interpreter to sub_create and down to other functions such as insert_create_output which is currently used by inspector logic too.

rakita · 2024-01-15T13:15:07Z

Hey @rakita, I've seen that EVM Context-Builder PR is merged. Great work!

I tried to take a look at the changes and I was wondering if it makes sense to take a look at this from scratch, if you feel a shared stack would be beneficial for this evm.

Both the pages model can be revisited (imo, it seems that a more complex strategy that allocates less is worse than a simpler one with more allocations) and also the SharedContext doesn't seem to play too nicely with the current setup. An example is passing manually SharedContext taken from the interpreter to sub_create and down to other functions such as insert_create_output which is currently used by inspector logic too.

Thanks @lorenzofero!

My view of shared context is we want it is some way, I am open to new ideas, but just to say this is very low priority, as there is no impact by doing this.

thedevbirb · 2024-02-03T17:17:02Z

Closing this as I think it should be re-visited from scratch. Might do that in the future but I'd prefer to focus on other contributions on revm, time permitting :)

rakita reviewed Oct 18, 2023

View reviewed changes

crates/interpreter/src/interpreter/shared_stack.rs Outdated Show resolved Hide resolved

Lorenzo Feroleto added 4 commits October 23, 2023 10:32

feat: shared stack between calls

008e4e0

chore(SharedMemory): reserve on new context

62ce138

chore(SharedStack): safety comments

b72c3cf

chore(SharedStack): nits; removed old stack

96bd659

thedevbirb force-pushed the shared_stack branch from 6c4acdf to 69910ee Compare October 23, 2023 08:45

feat(SharedStack): new model with pages

85c628e

thedevbirb force-pushed the shared_stack branch from 69910ee to 85c628e Compare October 23, 2023 08:51

rakita reviewed Oct 25, 2023

View reviewed changes

chore(SharedStack): upstream sync

eb6bf68

thedevbirb mentioned this pull request Oct 27, 2023

Question: what is a suitable stress test for a vm with shared memory/stack? ethereum/tests#1317

Closed

rakita reviewed Nov 7, 2023

View reviewed changes

crates/interpreter/src/interpreter/shared_stack.rs Outdated Show resolved Hide resolved

rakita requested changes Nov 7, 2023

View reviewed changes

chore(SharedStack): upstream sync

a118a8f

DaniPopes requested changes Nov 10, 2023

View reviewed changes

DaniPopes reviewed Nov 11, 2023

View reviewed changes

crates/interpreter/src/interpreter/shared_stack.rs Outdated Show resolved Hide resolved

Lorenzo Feroleto added 4 commits November 12, 2023 20:04

feat(shared_stack): free and taken pages model

b0c7fb9

feat(shared_stack): shared context struct

9b41727

chore(shared_stack): refactor of new free logic; docs for SharedContext

4bb2655

chore(shared_stack): upstream sync

0c531dc

DaniPopes requested changes Nov 13, 2023

View reviewed changes

chore(shared_stack): restored tests for push_slice; fix UB on push an…

1de2425

…d dup

DaniPopes reviewed Nov 13, 2023

View reviewed changes

thedevbirb commented Nov 13, 2023

View reviewed changes

crates/interpreter/src/instructions/macros.rs Outdated Show resolved Hide resolved

chore(shared_stack): comments

64c1ffc

thedevbirb force-pushed the shared_stack branch from 60640ad to b61d2c1 Compare November 15, 2023 17:13

chore: drop shared prefix from memory and stack, add interpreter getters

e89b92f

thedevbirb force-pushed the shared_stack branch from b61d2c1 to e89b92f Compare November 15, 2023 22:54

chore(shared stack): upstream sync

2a04b91

rakita mentioned this pull request Nov 22, 2023

feat: add Stack empty #876

Closed

thedevbirb closed this Feb 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared stack between calls #816

Shared stack between calls #816

thedevbirb commented Oct 18, 2023

rakita commented Oct 18, 2023

thedevbirb commented Oct 18, 2023 •

edited

rakita Oct 25, 2023

thedevbirb Oct 26, 2023 •

edited

rakita Nov 7, 2023

thedevbirb commented Oct 31, 2023 •

edited

rakita Nov 7, 2023 •

edited

rakita Nov 7, 2023

rakita Nov 7, 2023

thedevbirb Nov 8, 2023

rakita Nov 9, 2023

thedevbirb Nov 10, 2023

rakita Nov 10, 2023

thedevbirb Nov 13, 2023

rakita left a comment

thedevbirb commented Nov 10, 2023

thedevbirb commented Nov 13, 2023

DaniPopes Nov 13, 2023

thedevbirb Nov 13, 2023

DaniPopes Nov 13, 2023

thedevbirb Nov 13, 2023

thedevbirb commented Nov 23, 2023

rakita commented Nov 30, 2023

thedevbirb commented Jan 14, 2024 •

edited

rakita commented Jan 15, 2024

thedevbirb commented Feb 3, 2024

Shared stack between calls #816

Shared stack between calls #816

Conversation

thedevbirb commented Oct 18, 2023

Allocations

Performance

rakita commented Oct 18, 2023

thedevbirb commented Oct 18, 2023 • edited

Choose a reason for hiding this comment

thedevbirb Oct 26, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thedevbirb commented Oct 31, 2023 • edited

rakita Nov 7, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rakita left a comment

Choose a reason for hiding this comment

thedevbirb commented Nov 10, 2023

thedevbirb commented Nov 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thedevbirb commented Nov 23, 2023

rakita commented Nov 30, 2023

thedevbirb commented Jan 14, 2024 • edited

rakita commented Jan 15, 2024

thedevbirb commented Feb 3, 2024

thedevbirb commented Oct 18, 2023 •

edited

thedevbirb Oct 26, 2023 •

edited

thedevbirb commented Oct 31, 2023 •

edited

rakita Nov 7, 2023 •

edited

thedevbirb commented Jan 14, 2024 •

edited