Skip to content
This repository was archived by the owner on Feb 14, 2025. It is now read-only.
Disconnect3d edited this page Aug 14, 2018 · 5 revisions

EVM

EVM stands for the Ethereum Virtual Machine. In this document, the term EVM refers to the EVM bytecode or assembly instead of the virtual machine that executes it.

High level overview

The Ethereum Virtual Machine is a stack based virtual machine with a native 256 bit word size. A compiled Ethereum contract is composed of EVM bytecode. That bytecode can be decomposed into a sequence of EVM instructions. Each instruction has an opcode and may have operands (which are implicit and on the stack).

Ethereum Virtual Machine stack

"The stack" is an instance of a first in last out data structure that EVM instructions can manipulate. In addition to push and pop operations (where items are added or discarded from the top of the stack respectively) it also supports re-ordering operations and duplication operations where an item from the middle of the stack can be duplicated on top of the stack. In addition to

EVM assembly

You can ask solc to print a human readable representation of the EVM bytecode it generates for a given contract. Manticore will also produce a human readable (and in many ways more precise) assembly file in the mcore_* folder ending in _asm

solc --asm <contrac_name.sol>

Here is a sample of EVM assembly produced by solc with line numbers added manually.

0:   mstore(0x40, 0x60)
1:   calldataload(0x0)
2:   0x100000000000000000000000000000000000000000000000000000000
3:   swap1
4:   div
5:   0xffffffff
6:   and
7:   dup1
8:   0x1003e2d2
9:   eq
10:  tag_2
11:  jumpi
12:  dup1
13:  0x3e127e76
14:  eq
15:  tag_3
16:  jumpi

The Solidity compiler is trying to do us a favor by representing the first two lines as function calls. It's not the most honest representation of the bytecode but they're trying to make it more readable by representing as arguments what would ordinarily be push instructions. Lines that contain a number and nothing else are implicit push instructions that place a new value onto the stack. Some instructions consume items from the stack, some instructions rearrange or duplicate items on the stack, some instructions like eq in the above snippet change state internal to the Ethereum Virtual Machine.

You may have already pieced together what is happening in the above snippet but here is a summary:

  • The calldataload instruction at line 1 loads the first 256 bits (32 bytes or 1 word) of transaction data starting at offset 0x0 and places it on the stack
  • The value 0x100000000000000000000000000000000000000000000000000000000 is placed onto the top of the stack.
  • The swap1 instruction re-orders or swaps the top of the stack with element 1 on the stack
  • The division operator pops the previous two elements on the stack, performs integer division and places the result on the top of the stack. The previous top two stack values are destroyed.
  • The value 0xffffffff is pushed on the top of the stack
  • The and opcode pops the top two values on the stack and computes a bitwise and on them. The result is pushed back on the stack.
  • dup1 duplicates the value presently on the top of stack
  • 0x1003e2d2 is pushed onto the stack
  • the eq instruction pops the top two elements of the stack and performs a bitwise equality check on them. The result of this comparison is stored inside a flag that is internal to the Ethereum Virtual Machine, the old value of the flag is destroyed in this process.
  • tag_2 is a named constant. This constant value, representing an offset into the EVM bytecode is pushed onto the top of the stack.
  • jumpi is a conditional branch instruction. This is the core of an if expression in high level programming languages like Solidity. It pops the value off the top of the stack and checks the value of the flag mentioned earlier. If the flag is set, indicating a positive test for equality, execution of the bytecode stream is transferred to the loaded offset, otherwise it continues at the next line.
  • If the condition for the previous jump is not satisfied, the program executes a similar sequence of instructions checking for a new value and instead of branching to tag_2 will branch to tag_3.

This setup is similar to an if elif, else construct in python. where the test that is being performed is on the first four bytes of call data.

Short snippets of EVM assembly are not difficult to read. Most of the challenge of reading EVM comes from having to keep a model of the stack inside your head. Large contracts can be frustrating, without specialized tools you can't start reading the middle of a contract because there is no shortcut to knowing which stack elements are modified by which instructions.

Instruction Reference

The definitive EVM reference is the Yellow Paper. A slightly easier to navigate resource is the Jello Paper which specifies the semantics of the EVM opcodes in a language called K.

Transactions

Call Data

Call data is the data passed to an Ethereum contract during a transaction. The format of this data is assumed to be in the Ethereum ABI. According to the ABI the first four bytes of the call data must be the function identifier. The snippet above is

Clone this wiki locally