IL Virtual Machine #3888

Scooletz · 2022-03-18T09:15:01Z

Implements #4672

This PR proposes an introduction of another IVirtualMachine implementation based on transpiling EVM bytecode to IL. IL, or MSIL, is an intermediate language that .NET (the runtime that Nethermind uses) languages compile to, to be later JITted to assembly by the runtime. In this PR, instead of having a loop and executing instructions on the pc basis it emits the whole contract as a single method.

Tentative Plan of Future Actions

The plan of action, that is frequently updated and reorganized:

initial implementation of a few opcodes
gas calculation of existing ones
ClrMD and ASM print for the method
stack checks with potential change from Word* to offset based (start + int as the current)
endianess of the executor

Currently supported

opcodes:
- POP
- PC
- PUSH1, PUSH2, PUSH4
- DUP1
- SWAP1
- SUB
- JUMPDEST
- JUMP - this is a full blown two layer jump table, first based on the switch with fanout 128, second layer with simple ifs
- JUMPI - uses the same jump table as above + branch-free condition pop from the stack
gas management, calculations and returning OutOfGas when it happens

Ahead of ILVM

more gas cost calculation - for any sequence of instruction between flow control statements and instructions of variable cost (for example SHA3)
no stack head checks - similar as above, every operation has a push and pop behavior added, so that it can be calculated upfront whether the stack wont be breached. Majority of the checks can be optimized away or moved at the beginning of the jump
jumps - IL label for each JUMPDEST, a global jump table at the end of the function. If the destination for the jump is known at static time (PUSHN followed by JUMP), this can go directly to the label without any check
endianess - compilining a method directly for the specific endianess
tracing - compliling methods with various flavors, like no tracing at all and selecting the right one for the specific set of a tracer flags
handling all the CALLs and discussing how to interop with other contracts

Potential

Potential usages:

precompile hot contracts like StarkNet, Uniswap, Sushi when Nethermind is built so that the client includes much faster VM
provide TIERed execution, when hot contracts are IL emitted in the client whenever some statistics of usage shows that it should be done
be the fastest EVM implementation (non-business case, pure ego-driven)

Benchmarks

The following code was used for a terribly simple benchmark. It represents a simple loop that performs multiple spins

byte[] code = Prepare.EvmCode
    .PushData(repeat)
    .Op(Instruction.JUMPDEST)
    .PushData(1)
    .Op(Instruction.SWAP1)
    .Op(Instruction.SUB)
    .Op(Instruction.DUP1)
    .PushData(1 + repeat.Length) // jump adress
    .Op(Instruction.JUMPI)
    .Op(Instruction.POP)
    .Done;

The comparison with a long enough run that should amortize all the const costs is as follows:

VM	probe size	time per one million spins (less better)
existing	10_000_000	5,228 s
IL VM	1_000_000_000	0,048 s

🔥 This means that in this terrible benchmark ILVM is 100x faster than the existing one!

Benchmarks ASM output

The bytecode JITted away, according to my knowledge of JIT, and ASM and addressing, results in the following code

0000: push rbp
0001: push rdi
0002: push rsi
0003: sub rsp,90h
000a: vzeroupper
000d: lea rbp,[rsp+20h]
0012: mov rax,59E479AB6165h
001c: mov [rbp+8],rax
0020: add rsp,20h
0024: mov eax,8040h
0029: neg rax
002c: add rax,rsp
002f: jb short 00007FFCD40918F3h
0031: xor eax,eax
0033: test [rsp],esp
0036: mov rdx,rsp
0039: sub rdx,1000h
0040: mov rsp,rdx
0043: cmp rsp,rax
0046: jae short 00007FFCD40918F3h
0048: mov rsp,rax
004b: test [rsp],esp
004e: sub rsp,20h
0052: lea rdx,[rsp+20h]
0057: add rdx,20h
005b: and rdx,0FFFFFFFFFFFFFFE0h
005f: mov rsi,rcx
0062: cmp rsi,3
0066: jl 00007FFCD4091A93h
006c: add rsi,0FFFFFFFFFFFFFFFDh
0070: vxorps xmm0,xmm0,xmm0
0074: vmovdqu [rdx],xmm0
0078: vmovdqu [rdx+10h],xmm0
007d: mov dword ptr [rdx+1Ch],0E1F505h
0084: add rdx,20h
0088: cmp rsi,1Ah
008c: jl 00007FFCD4091A93h
0092: add rsi,0FFFFFFFFFFFFFFE6h
0096: vxorps xmm0,xmm0,xmm0
009a: vmovdqu [rdx],xmm0
009e: vmovdqu [rdx+10h],xmm0
00a3: mov byte ptr [rdx+1Fh],1
00a7: add rdx,20h
00ab: lea rcx,[rdx-20h]
00af: vmovdqu xmm0,[rcx]
00b3: vmovdqu [rbp+10h],xmm0
00b8: vmovdqu xmm0,[rcx+10h]
00bd: vmovdqu [rbp+20h],xmm0
00c2: lea rdi,[rdx-40h]
00c6: vmovdqu xmm0,[rdi]
00ca: vmovdqu [rcx],xmm0
00ce: vmovdqu xmm0,[rdi+10h]
00d3: vmovdqu [rcx+10h],xmm0
00d8: vmovdqu xmm0,[rbp+10h]
00dd: vmovdqu [rdi],xmm0
00e1: vmovdqu xmm0,[rbp+20h]
00e6: vmovdqu [rdi+10h],xmm0
00eb: lea rdx,[rbp+50h]
00ef: call 00007FFCD46D35B8h
00f4: mov rcx,rdi
00f7: lea rdx,[rbp+30h]
00fb: call 00007FFCD46D35B8h
0100: lea rcx,[rbp+50h]
0104: lea rdx,[rbp+30h]
0108: lea r8,[rbp+10h]
010c: call 00007FFCD42A9DC0h
0111: mov rdx,rdi
0114: mov rax,[rbp+10h]
0118: mov rcx,[rbp+18h]
011c: mov r8,[rbp+20h]
0120: mov r9,[rbp+28h]
0124: bswap r9
0127: mov [rdx],r9
012a: bswap r8
012d: mov [rdx+8],r8
0131: bswap rcx
0134: mov [rdx+10h],rcx
0138: bswap rax
013b: mov [rdx+18h],rax
013f: add rdx,20h
0143: vmovdqu xmm0,[rdx-20h]
0148: vmovdqu [rdx],xmm0
014c: vmovdqu xmm0,[rdx-10h]
0151: vmovdqu [rdx+10h],xmm0
0156: add rdx,20h
015a: vxorps xmm0,xmm0,xmm0
015e: vmovdqu [rdx],xmm0
0162: vmovdqu [rdx+10h],xmm0
0167: mov byte ptr [rdx+1Fh],5
016b: add rdx,20h
016f: lea rax,[rdx-40h]
0173: mov rcx,[rax+18h]
0177: or rcx,[rax+10h]
017b: or rcx,[rax+8]
017f: or rcx,[rax]
0182: je short 00007FFCD4091A89h
0184: add rdx,0FFFFFFFFFFFFFFE0h
0188: mov rax,[rdx]
018b: or rax,[rdx+8]
018f: or rax,[rdx+10h]
0193: mov ecx,[rdx+18h]
0196: or rax,rcx
0199: jne short 00007FFCD4091A9Ah
019b: mov eax,[rdx+1Ch]
019e: mov ecx,eax
01a0: bswap ecx
01a2: sub rdx,20h
01a6: mov r8d,ecx
01a9: and r8d,7Fh
01ad: cmp r8d,6
01b1: ja short 00007FFCD4091A9Ah
01b3: mov eax,5Fh
01b8: bt eax,r8d
01bc: jb short 00007FFCD4091A9Ah
01be: cmp ecx,5
01c1: je 00007FFCD4091956h
01c7: jmp short 00007FFCD4091A9Ah
01c9: cmp rsi,2
01cd: jl short 00007FFCD4091A93h
01cf: xor eax,eax
01d1: jmp short 00007FFCD4091A9Fh
01d3: mov eax,4
01d8: jmp short 00007FFCD4091A9Fh
01da: mov eax,8
01df: mov rcx,59E479AB6165h
01e9: cmp [rbp+8],rcx
01ed: je short 00007FFCD4091AB4h
01ef: call 00007FFD336F0280h
01f4: nop
01f5: lea rsp,[rbp+70h]
01f9: pop rsi
01fa: pop rdi
01fb: pop rbp
01fc: ret

tkstanczak · 2022-03-24T17:50:24Z

⚡

Ruteri · 2022-03-24T18:15:20Z

Great benchmark results!
I'd consider checking if the JIT is not optimising away the benchmarked logic, it'd be really good to see the IL generated for the benchmarked contract

Scooletz · 2022-03-25T09:19:19Z

Great benchmark results! I'd consider checking if the JIT is not optimising away the benchmarked logic,

It should not as there's a jump so assume it's safe.

it'd be really good to see the IL generated for the benchmarked contract

Definitely! I was thinking about the same to extract the ASM and print it in the output. This will require using https://github.com/microsoft/clrmd probably, which requires the author to load it in their head again 😅 I'll provide this print soon.

Scooletz · 2022-03-27T12:11:52Z

@Ruteri Please take a look at the description. I added the bottom section that shows the asm of the bytecode used in the benchmark. To me it looks more or less valid as I see:

vmovdqu for Word operations (which is a verctorized copy of the word)
calls that are probably for Uint256 getters, but I did not check it
add rdx, 20h to bump up the stack pointer by 32 bytes

It was not easy as I self attach and do put iced on top of it, but it should be more or less valid. Let me know what you see in there. I could push forward to even map ByteCode -> IL -> addresses, but this would be an exercise that probably would not bring a lot as there's like 95% of opcodes that are still missing in this VM

Scooletz · 2022-04-02T16:56:31Z

History rewritten, to allow interacting with VM.

Scooletz · 2022-04-04T08:31:00Z

An update before taking a break from this PR. After amending the way the tests are run the gains are much less bold that claimed before. The current way the ILVM is integrated is the call within the current implementation of VirtualMachine. This ensures that it includes all the same checks for both cases. The scenario of executing of 200000 spins in a loop looks as follows now:

regular VM execution took 00:00:01.3301542 taking 6,65ms per 1000 spins
IL VM execution took 00:00:00.0881434 taking 0,44ms per 1000 spins

The multiplier than fell down from initial 100x to 15x but now, it's embedded in the existing VM like it'd be if this was fully implemented.

benaadams · 2022-07-24T02:33:10Z

👀

zsluedem

Hello. My name is WillQ. And I am one of the participant in ethereum protocol fellow program.
I chose the IL-VM project to work on and @LukaszRozmej guided me to this pr. FYI, I am a noob in C# and dotnet. I am trying to comprehend what this pr is doing and see what I can do for.

I got some questions on the pr especially for IL-VM part.I hope I can get some help here.

And I also wrote a benchmark for this pr in my own branch https://github.com/zsluedem/nethermind/blob/il-vm/src/Nethermind/Nethermind.Evm.ILBenchmark/Program.cs .
Here is my benchmark result.

Method	Bytecode	Mean	Error	StdDev
ILEvm	5850	196.2 us	3.74 us	3.49 us
Evm	5850	192.2 us	1.57 us	1.39 us
ILEvm	6000600157	194.0 us	1.85 us	1.73 us
Evm	6000600157	194.1 us	2.05 us	1.81 us
ILEvm	600156	178.3 us	1.68 us	1.49 us
Evm	600156	215.5 us	1.59 us	1.33 us
ILEvm	6001600157	176.2 us	2.69 us	2.39 us
Evm	6001600157	218.8 us	2.73 us	2.55 us
ILEvm	60016(...)30303 [22]	193.0 us	1.81 us	1.51 us
Evm	60016(...)30303 [22]	199.6 us	0.74 us	0.66 us
ILEvm	60016005575B	190.9 us	2.33 us	2.06 us
Evm	60016005575B	195.3 us	2.36 us	1.97 us
ILEvm	6001800350	193.6 us	2.20 us	1.84 us
Evm	6001800350	196.6 us	1.93 us	1.71 us
ILEvm	600260019003	190.8 us	1.95 us	1.73 us
Evm	600260019003	196.7 us	1.16 us	1.03 us
ILEvm	6003565B	191.4 us	2.53 us	2.24 us
Evm	6003565B	196.6 us	1.50 us	1.33 us
ILEvm	63000(...)55750 [30]	6,532.3 us	39.32 us	34.85 us
Evm	63000(...)55750 [30]	11,069.8 us	112.30 us	99.56 us

I got a little bit different result compared to your testcase which haven't been warmed up. The 63000(...)55750 [30] case is the same as your loop testcase. I hope this data could help.

zsluedem · 2022-11-15T04:06:07Z

src/Nethermind/Nethermind.Evm.Test/IlVirtualMachineTests.cs

+        if (isIL)
+        {
+            // differentiate by adding one point
+            code = code.Op(Instruction.POP);


Why is this difference needed?

I don't remember.

zsluedem · 2022-11-15T08:32:01Z

src/Nethermind/Nethermind.Evm/IL/ILVirtualMachineBuilder.cs

+        const int wordToAlignTo = 32;
+
+        il.Emit(OpCodes.Ldc_I4, EvmStack.MaxStackSize * Word.Size + wordToAlignTo);
+        il.Emit(OpCodes.Localloc);


The doc of Localloc says

Allocates a certain number of bytes from the local dynamic memory pool and pushes the address (a transient pointer, type *) of the first allocated byte onto the evaluation stack.

Does this Localloc opcode would load the locals define above like uint256A into memory pool ?
And what is the first allocated byte from the docs?Is it current?

Localalloc is used here to allocate the whole EVM stack on the actual stack.

zsluedem · 2022-11-15T14:05:11Z

src/Nethermind/Nethermind.Evm/IL/ILVirtualMachineBuilder.cs

+                // 4. set the field
+                // 5. advance pointer
+                case Instruction.PUSH1:
+                    il.Load(current);


I could not really understand how stack works in msil. Could you expand this with more knowledges?
I feel like everything is operating on this current local.

Yes, the current here is the top of the evm stack. If PUSH1 is executed, the plan for it is flushed out above between lines 122-127. Load the value, zero it, set 1, advance.

Scooletz · 2022-11-18T17:45:00Z

Hello. My name is WillQ. And I am one of the participant in ethereum protocol fellow program. I chose the IL-VM project to work on and @LukaszRozmej guided me to this pr. FYI, I am a noob in C# and dotnet. I am trying to comprehend what this pr is doing and see what I can do for.

Hey, nice to meet you 😃 This PR requires fair deep understanding of IL, .NET runtime and C#. Not sure if this is the best way to start with .NET 😅 .

One remark that I need to start with is that I flushed it 6 months ago and did not revisit from this moment. My context of it atm is high level, and it might require me to spend more time on recalling specifics. Also, currently I cannot support implementing it fully or dive deep into specifics. Still, will do my best to provide you with some answers.

I got some questions on the pr especially for IL-VM part.I hope I can get some help here.

And I also wrote a benchmark for this pr in my own branch https://github.com/zsluedem/nethermind/blob/il-vm/src/Nethermind/Nethermind.Evm.ILBenchmark/Program.cs . Here is my benchmark result.
I got a little bit different result compared to your testcase which haven't been warmed up. The 63000(...)55750 [30] case is the same as your loop testcase. I hope this data could help.

In regards to the benchmarks, I can see that you call the following one

nethermind/src/Nethermind/Nethermind.Evm/VirtualMachine.cs

Line 408 in 1fb723d

public void BuildILForNext() => _buildILForNext = true;

in GlobalSetup but I don't remember the semantics for the code execution in EVM. From this PR point of view BuildILForNext was added just for initial performance check and will make the VM IL emit the next contract that goes into it. Yes, there's memorization underneath, but maybe it's broken?

nethermind/src/Nethermind/Nethermind.Evm/VirtualMachine.cs

Line 449 in 1fb723d

_codeCache.Set(codeHash, cachedCodeInfo);

The initial tests were focused on longer executions, so that the rest of the infrastructure should just work. It was a mad idea and I was pushing it further looking if it breaks. Maybe you found a breaking point, but before comparing numbers, I'd check the rest. For short ones, the numbers should be comparable I believe.

This is the best that I can share atm @zsluedem I hope it helps a bit.

Scooletz added difficult It requires detailed knowledge of the codebase and changes can easily lead to severe issues. a evm wip Work in Progress labels Mar 18, 2022

Scooletz force-pushed the il-vm branch from 62010c9 to 36e199e Compare March 23, 2022 18:08

IL based vm

1fb723d

Scooletz force-pushed the il-vm branch from eb7ec04 to 1fb723d Compare April 2, 2022 16:56

zsluedem reviewed Nov 15, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IL Virtual Machine #3888

IL Virtual Machine #3888

Scooletz commented Mar 18, 2022 •

edited by dceleda

tkstanczak commented Mar 24, 2022

Ruteri commented Mar 24, 2022

Scooletz commented Mar 25, 2022 •

edited

Scooletz commented Mar 27, 2022 •

edited

Scooletz commented Apr 2, 2022

Scooletz commented Apr 4, 2022 •

edited

benaadams commented Jul 24, 2022

zsluedem left a comment

zsluedem Nov 15, 2022

Scooletz Nov 18, 2022

zsluedem Nov 15, 2022

Scooletz Nov 18, 2022

zsluedem Nov 15, 2022

Scooletz Nov 18, 2022

Scooletz commented Nov 18, 2022

IL Virtual Machine #3888

Are you sure you want to change the base?

IL Virtual Machine #3888

Conversation

Scooletz commented Mar 18, 2022 • edited by dceleda

Tentative Plan of Future Actions

Currently supported

Ahead of ILVM

Potential

Benchmarks

Benchmarks ASM output

tkstanczak commented Mar 24, 2022

Ruteri commented Mar 24, 2022

Scooletz commented Mar 25, 2022 • edited

Scooletz commented Mar 27, 2022 • edited

Scooletz commented Apr 2, 2022

Scooletz commented Apr 4, 2022 • edited

benaadams commented Jul 24, 2022

zsluedem left a comment

Choose a reason for hiding this comment

zsluedem Nov 15, 2022

Choose a reason for hiding this comment

Scooletz Nov 18, 2022

Choose a reason for hiding this comment

zsluedem Nov 15, 2022

Choose a reason for hiding this comment

Scooletz Nov 18, 2022

Choose a reason for hiding this comment

zsluedem Nov 15, 2022

Choose a reason for hiding this comment

Scooletz Nov 18, 2022

Choose a reason for hiding this comment

Scooletz commented Nov 18, 2022

Scooletz commented Mar 18, 2022 •

edited by dceleda

Scooletz commented Mar 25, 2022 •

edited

Scooletz commented Mar 27, 2022 •

edited

Scooletz commented Apr 4, 2022 •

edited