Skip to content
This repository has been archived by the owner on Jul 5, 2024. It is now read-only.

Executed bytecode source for opcode lookup #73

Closed
han0110 opened this issue Dec 4, 2021 · 6 comments
Closed

Executed bytecode source for opcode lookup #73

han0110 opened this issue Dec 4, 2021 · 6 comments
Labels
help wanted Extra attention is needed

Comments

@han0110
Copy link
Collaborator

han0110 commented Dec 4, 2021

In EVM there are 3 different sources for executed bytecode:

  1. When contract interaction, it's from contract's bytecode
  2. When contract creation in root call, it's from transaction's calldata
  3. When contract creation in internal call, it's from caller's memory

To avoid too much complexity on EVM circuit, we try to remove 3. by explicitly copying the caller's memory to bytecode_table, since for CREATE2 we already need to hash the initcode, so we abuse this for also CREATE.

Then the sources now are either 1. or 2., but there is another thing EVM circuit desires for:

For executed bytecode, EVM circuit expect executed bytecode to be analyzed and annotated if it's is_code on every byte, for verifying JUMP* in a single-step way, otherwise every time when it comes to JUMP*, it needs to look back to check if it's is_code or not, and this could be multi-steps.

So there are at least 2 options here to support creation transaction:

  1. EVM circuit lookup sources 1. and 2. by condition, and implement is_code annotation on both bytecode circuit and tx circuit.

    This seems reasonable but tx circuit has not been explored too much yet. Although it brings some implementation redundancy, the is_code annotation should not be too costly to do.

  2. EVM circuit lookup sources 1. always, and explicitly copying transaction's calldata to bytecode_table when it's a creation one.

    This seems to bring more loading to bytecode_table at first glance, but if a DOS attacker wants to blow up the bytecode_table, it would rather use EXTCODECOPY to read different huge contracts, which only costs a constant 2600 and generates 0x6000/2600 ≈ 9.45 bytes per gas in worst case. So it seems also a reasonable approach to avoid implementation redundancy and make the executed bytecode single source for EVM circuit.

Any feedbacks are appreciated.

@roynalnaruto
Copy link
Collaborator

roynalnaruto commented May 9, 2022

I am currently addressing the review comments on #191, and as @ed255 pointed out, it is important to finalise the definition of StepState.code_source in the specs.

@han0110 My opinion on the above issue is to use the bytecode table as the single source even for contract creation, specifically, this:

EVM circuit lookup sources 1. always, and explicitly copying transaction's calldata to bytecode_table when it's a creation one.

This will also reduce implementation complexity and any chances of possibly missing the is_create cases for any *CODE* related opcodes.

From #191 's perspective, I believe I need to populate the bytecode table for the above mentioned cases and document those changes appropriately. Would be nice to know what others think about these changes and whether or not they should be made as a separate PR. Thanks!

@miha-stopar
Copy link
Contributor

Replying as an assigned reviewer for #191. I am not very familiar with the details, but 2. option seems cleaner to me too.

@ed255
Copy link
Member

ed255 commented May 9, 2022

I also prefer option 2 (single source of bytecode: the bytecode table) because it unifies the way bytecode is accessed (and I think this simplifies the design). Specially considering that we will have a copy circuit where we can offload copying memory to bytecode table and tx.call_data to bytecode table.

@han0110
Copy link
Collaborator Author

han0110 commented May 13, 2022

As discussed in this week's call, let's implement option 2 for simplicity of EVM circuit. But for the Copy circuit (#194), it then needs to handle the copy from tx calldata in tx_table to bytecode_table in the future (or we can also implement this in Tx circuit directly), and such copy will be triggered by EVM circuit in BeginTx if it has is_create == True.

The rest todos to resolve this issue:

  • EVM circuit
    • Rename code_source to code_hash
    • (maybe) Remove is_create and is_root from tracking state and lookup them when necessary
    • In creation tx, lookup Copy circuit to make sure the init code is copied into bytecode_table and get the code_hash
  • Copy circuit (or Tx circuit)
    • Implement copy from tx calldata in tx_table to bytecode_table

@ed255
Copy link
Member

ed255 commented May 12, 2023

I believe we can consider this issue resolved as we already have CREATE in the specs #355
@han0110 since you reviewed the CREATE spec PR, can you confirm if it aligns with this discussion #73 (comment) ? If so, we can then close this issue :)

@han0110
Copy link
Collaborator Author

han0110 commented May 15, 2023

For creation transaction it's not yet implemented, but the concept is similar, so yeah we can close this issue.

@han0110 han0110 closed this as completed May 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants