feat: merge-train/avm #16986

AztecBot · 2025-09-12T08:08:56Z

BEGIN_COMMIT_OVERRIDE
feat(avm!): Bytecode Hashing (#16756)
chore(avm)!: gas gt migrations optimized (#16989)
END_COMMIT_OVERRIDE

## Constraining Bytecode Hashing ### Overview Following elsewhere in the repo, we constrain that the bytecode commitment is the poseidon2 hash of the bytecode (encoded as fields) prepended by the special separator (`GENERATOR_INDEX__PUBLIC_BYTECODE`): ``` hash = poseidon2([GENERATOR_INDEX__PUBLIC_BYTECODE, FF(bytecode[0...31]), FF(bytecode[31...62]), ... ]) ``` See `encode_bytecode` and `compute_public_bytecode_commitment` for C++ logic (unchanged by this PR). ### PIL The general idea is that each row of `bc_hashing` corresponds to a row of `poseidon2`. It is the 'link' between `bc_decomposition` and `poseidon2`, proving that the hash given by `output_hash` is indeed a valid hash of the exact bytecode from `bc_decomposition` (with no extra or missing info), prepended by `GENERATOR_INDEX__PUBLIC_BYTECODE`. The decomp circuit already helpfully sets the packed field every 31 bytes (looking forward), where `sel_packed = 1` and `packed_field` is the field given by packing the bytes from `pc` to `pc + 31` (starting at 0). Since each poseidon permutation takes in a chunk of 3 fields, each row of `bc_hashing` handles 3 packed fields. For now, I'm looking up each separately (for `i = 0, 1, 2`): ``` #[GET_PACKED_FIELD_i] sel { pc_index_i, bytecode_id, packed_fields_i } in bc_decomposition.sel_packed { bc_decomposition.pc, bc_decomposition.id, bc_decomposition.packed_field }; ``` where (generally) each `pc_index_i` is simply `pc_index_{i - 1} + 31`. Then each field is looked up into the hasher: ``` #[POSEIDON2_HASH] sel { latch, packed_fields_0, packed_fields_1, packed_fields_2, output_hash } in poseidon2_hash.sel { poseidon2_hash.end, poseidon2_hash.input_0, poseidon2_hash.input_1, poseidon2_hash.input_2, poseidon2_hash.output }; ``` Easy! Oh wait. Edge cases. #### Bytecode Start At the beginning of a new bytecode, we have `START = 1` and `pc_index = 0`. We must constrain that the first field hashed is the prepended separator: ``` #[START_IS_SEPARATOR] START * (packed_fields_0 - constants.GENERATOR_INDEX__PUBLIC_BYTECODE) = 0; ``` However, this means that `packed_fields_0` does not actually exist as a packed field in `bc_decomposition`, so the lookup `GET_PACKED_FIELD_0` will fail. To rectify this, at `START = 1`: - We have `sel_not_start` = 0 and skip the `GET_PACKED_FIELD_0` lookup - We set `pc_index_1 = 0`, so `GET_PACKED_FIELD_1` reads the first encoded field of the bytecode from `bc_decomposition` (and => `pc_index_2 = 31` and reads the second) - Instead of constraining that `pc_index' = pc_index + 93` (usual row), we only increment by 62 - The poseidon lookup remains the same #### Bytecode End As before, once we read the end of some bytecode, we set `latch = 1` and constrain that the next row is a `start` (if exists). We have two main things to consider here: - Constrain that when `latch = 1`, we are at the end of decomposed bytes given by `bc_decomposition` (and read all of them) - Deal with padding in `poseidon2` Both are dealt with in the `PADDING` section, since both require asserting we have 'trailing zeros' and have correctly read/set the packed fields. Now that each row handles 3 fields, we may have 0, 1, or 2 padding fields which will not exist in `bc_decomposition`. To avoid the lookups bricking, I added two selectors: - `sel_not_padding_1` <==> `packed_field_1` is not a padding value => switch on the `GET_PACKED_FIELD_1` lookup - `sel_not_padding_2` <==> `packed_field_2` is not a padding value => switch on the `GET_PACKED_FIELD_2` lookup (Note that `packed_field_0` can never be a padding value and must always be processed, and all padding selectors are gated by `latch`) This now adds an attack vector where we could incorrectly claim real values are padding values in order to skip the lookup and => skip hashing bytecode fields. To avoid this I added `pc_at_final_field` which is the value of `pc` at the beginning of the last field e.g. if `packed_fields_1` is the last bytecode field, then `pc_at_final_field = pc_index_1`. This is constrained by the relation `PADDING_CORRECTNESS`. From here, the simplest way to prove everything we need for padding/end of bytecode is to check that the `bc_decomposition.bytes_remaining` at `pc_at_final_field` is `< 31`. But this needs an evil range check. I don't want an evil range check on top of the 3 packed field lookups. Hence the following madness. The decomposition trace processes windows of (currently) 37 bytes, and forces these bytes to be zero if they overshoot the end of the bytecode. Iff we have less than 37 bytes remaining to process, `bc_decomposition.sel_windows_gt_remaining = 1` => we can easily check that `bytes_remaining < 37` without a range check by just looking up this selector. What about the remaining 6 bytes, you ask? These are all stored in `bc_decomposition.bytes_pc_plus_i`, so the final lookup becomes this monstrosity: ``` #[CHECK_FINAL_BYTES_REMAINING] latch { pc_at_final_field, bytecode_id, sel /* =1 */, precomputed.zero /* =0 */, precomputed.zero /* =0 */, precomputed.zero /* =0 */, precomputed.zero /* =0 */, precomputed.zero /* =0 */, precomputed.zero /* =0 */ } in bc_decomposition.sel_packed { bc_decomposition.pc, bc_decomposition.id, bc_decomposition.sel_windows_gt_remaining, bc_decomposition.bytes_pc_plus_31, bc_decomposition.bytes_pc_plus_32, bc_decomposition.bytes_pc_plus_33, bc_decomposition.bytes_pc_plus_34, bc_decomposition.bytes_pc_plus_35, bc_decomposition.bytes_pc_plus_36 }; ``` But hey, we know we're at the final field at `pc_at_final_field` and there are no bytes remaining without a range check! Note that if the bytecode ends in valid zeros, I don't think we need to differentiate between these and 'padding' non-existent zeros because the final poseidon hash result would be the same. To cover the poseidon considerations, we just need to make sure that the padding fields are indeed zero (`PADDED_BY_ZERO_i`) and if we claim that `packed_field_1` is a padding value, then so is `packed_field_2` (i.e. we can't have `[last_field, padding, some_crap]` at the end of a bytecode, `PADDING_CONSISTENCY`). If we have any padding at all, then `PADDING_2 = 1 - sel_not_padding_2` must be on, so I use this as the general padding selector when required. #### Bytecode Start & End Oh yes. These things can both be true. Isn't that fun! You might notice some ugly complexities in relations and this is probably why. A much nicer way of showing `bytes_remaining < 31` without a range check would be to set some `pc_index_to_lookup = pc_at_final_field - 6`, then show that at `pc_index_to_lookup` that `bc_decomposition.sel_windows_gt_remaining = 1`. However, if both `latch` and `start` are true here then we have `pc_index = 0` and this would underflow. I also made sure to use the `pc_index_i` columns to work out `pc_at_final_field` because they take into consideration the shift we do at `start` for the separator. Otherwise we could use something like `31*padding_amount` which would be a lot prettier (alas). ### TODOs/Qs - Should we explore using shifts to remove the 3 lookups into `bc_decomposition` for the packed fields? - Since the padding/latch conditions (I think?) constrain that we hash all bytes, the `bytecode_length` event property is unused, remove? - Is it worth converting the padding columns to a numerical `padding_amount` (=0, 1, or 2) and looking this up into `poseidon_2`?

This PR re-introduces the reverted PR #16856 which caused a performance regression in "Log derivative inverse commitments". After the introduction of separate destination lookup selectors, this original PR does not affect the performance. We introduce a specific gt selectors for the gas accounting (gt.sel_gas) while for the specific gt lookups involved by the CALL opcode, we used the already existing gt.sel_others selector. The reason is that we expect much less gt events produced by the latter one. Benchmarks on mainframe with 16 cores did not lead to any significant performance changes (ca. 2.5 seconds for the derivative inverse commitments).

AztecBot added the ci-no-squash label Sep 12, 2025

AztecBot requested review from IlyasRidhuan, Maddiaa0, fcarreiro and jeanmon as code owners September 12, 2025 08:08

AztecBot and others added 4 commits September 12, 2025 09:00

Merge branch 'next' into merge-train/avm

783fc07

Merge branch 'next' into merge-train/avm

d866972

Merge branch 'next' into merge-train/avm

5217989

fcarreiro approved these changes Sep 12, 2025

View reviewed changes

fcarreiro enabled auto-merge September 12, 2025 11:07

fcarreiro added this pull request to the merge queue Sep 12, 2025

Merged via the queue into next with commit 0ca5388 Sep 12, 2025
17 of 18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: merge-train/avm #16986

feat: merge-train/avm #16986

Uh oh!

AztecBot commented Sep 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

feat: merge-train/avm #16986

feat: merge-train/avm #16986

Uh oh!

Conversation

AztecBot commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

AztecBot commented Sep 12, 2025 •

edited

Loading