Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement transaction prologue script #14

Closed
Tracked by #29
bobbinth opened this issue Dec 29, 2022 · 20 comments · Fixed by #55
Closed
Tracked by #29

Implement transaction prologue script #14

bobbinth opened this issue Dec 29, 2022 · 20 comments · Fixed by #55
Assignees

Comments

@bobbinth
Copy link
Contributor

bobbinth commented Dec 29, 2022

Transaction prologue is a program which is executed at the beginning of a transaction (before note scripts and tx script are executed). The prologue needs to accomplish the following tasks:

  1. "Unhash" inputs and lay them out in root context's memory.
  2. Build a single vault ("tx vault") containing assets of all inputs (input notes and initial account state).
  3. Verify that all input notes are present in the Note DB.

Laying out inputs in memory

Before a transaction is executed, the stack is initialized with all inputs required to execute a transaction. I'm thinking these inputs could be arranged like so (from the top of the stack):

  1. Global inputs
    a. Last block number (1 element) - a unique sequential number of the last known block.
    b. Note DB commitment (4 elements) - commitment to the note database from the last known block.
  2. Account ID (either 2 or 3 elements) - ID of the account for this transaction.
  3. Account commitment (4 elements) - hash of the initial account states.
  4. Nullifier commitment (4 elements) - sequential hash of nullifiers of all notes consumed in this transaction.

The shape of global inputs still requires more thought - so, I'll skip it for now - but how the rest works is fairly clear. Specifically, we need to read the data for account and notes from the advice provider, write it to memory, hash it, and verify that the resulting hash matches the commitments provided via the stack.

Overall, the layout of root context's memory could look as follows:

image

Bookkeeping section

The bookkeeping section is needed to keep track of variables which are used internally by the transaction kernel. This section could look as follows:

Memory address Variable Description
$0$ tx_vault_root Root of the vault containing all asset in the transaction.
$1$ num_executed_notes Number of notes executed so far during transaction execution.
$2$ num_created_notes Number of notes created so far during transaction execution.

There will probably be other variables which we need to keep track of, but I'll leave them for the future.

Global inputs section

As mentioned above, I'm skipping this for now.

Account data section

This section will contain account details. Assuming $a$ is the memory offset of the account data section, these can be laid out as follows:

Memory address Variable Description
$a$ acct_hash Hash of the account's initial state.
$a + 1$ acct_id ID of the account. Only the first 2 - 3 elements of the word are relevant.
$a + 2$ acct_code_root Root of the account's code Merkle tree.
$a + 3$ acct_store_root Root of the account's storage Sparse Merkle tree.
$a + 4$ acct_vault_root Root of the account's asset vault.
$a + 5$ acct_nonce Account's nonce. Only the first element of the word is relevant.

Consumed notes data section

This section will contain details of all notes to be consumed. The layout of this section could look as follows:

image

Assuming $c$ is the memory offset of the consumed notes section, the meaning of the above is as follows:

Memory address Variable Description
$c$ num_notes Number of notes to be consumed in this transaction.
$c + 1$ nullifiers A list of nullifiers for all notes to be consumed in this transaction.
$c + 1024$ notes A list of all notes to be consumed in this transaction.

Here, nullifier for note0 is at memory address $c+1$, a nullifier for note1 is at memory address $c + 2$ etc. The assumption is that a single transaction can consume at most $1023$ notes. The choice of this number is somewhat arbitrary.

At address $c + 1024$ the, the actual note detail section starts. In this section we lay out all details of individual notes one after another in $1024$ address intervals. That is, details of note0 start at address $c + 1024$, details of note1 start at address $2048$ etc.

Assuming, $n$ is the memory offset of the $n$-th note, the layout of note details could look as follows:

Memory address Variable Description
$n$ note_hash Hahs of the note.
$n + 1$ serial_num Serial number of this note.
$n + 2$ script_hash MAST root of this note's script.
$n + 3$ input_hash Sequential hash of this note's inputs.
$n + 4$ vault_hash Sequential hash of this note's assets.
$n + 5$ num_assets Number of assets contained in this note's vault.
$n + 6$ assets A list of all note assets laid out one after another.

Here, each asset occupies 1 word, and thus the first asset in the note will be at address $n + 6$, the second asset will be at address $n + 7$ etc.

We do not "unhash" inputs because they are needed only when we start executing a given note - so, we can unhash them then.

Created notes data section

This section will contain data of notes created during execution of a transaction. It is not affected by transaction prologue - so, I'll skip it for now.

Building tx vault

To build a unified transaction vault we can do the following:

  1. Make a copy of the account's vault.
  2. As we unhash note assets, add each asset to the copy of the account's vault.

To implement we need to have compact Sparse Merkle tree implemented in Miden assembly.

Verify that notes are in Notes DB

One of the inputs into transaction kernel will be a commitment to notes DB. To verify that notes are in the notes DB we'd need to do the following:

  • As we lay out note detail in memory, we need to compute note hash.
  • Then, we need to prove that a note with this hash exists in the Note's DB.

To do this, we need to have Merkle Mountain Range implemented in Miden assembly, and also define better the shape of global inputs.

@bobbinth
Copy link
Contributor Author

bobbinth commented Jan 3, 2023

Regarding global inputs, based block header format described in #17, I'm now thinking that global inputs could be just a hash of the last known block. Then, as part of the prologue, this can be "unhashed" into block number, chain root, state root etc. And then state root can be "unhashed" into account DB root, note DB root, nullifier DB root etc.

@hackaugusto
Copy link
Contributor

hackaugusto commented Jan 25, 2023

tasks:

  • prepare root's context memory
    • bookkeeping section
      • create transaction vault
      • set num_executed_notes and num_created_notes to 0 (this is a no-op)
    • load data from stack
      • global inputs (partially described, just copy element-by-element to memory)
    • load data from advice provider and verifying hash
      • account data (6 words)
      • copy account vault as the tx vault (requires SMT to be finished)
      • nullifier data ( 1 + c words, need to determine how to compute the commitment of c == 0, pad it with 0x00?)
      • all notes (variable length)
        • note data (variable length, minimum 7 words, need to handle odd number of words)
          • asset data (variable length, 1 word per asset in the note's vault)
          • add each asset to the account vault (requires SMT to be finished)
          • verify note is in the Note DB (requires SMT to be finished)
  • verify input notes are in the notes db

notes:

  • initial code won't be optimized because some of the data structures are not fully specified, so the procedures will do some pointer arithmetic and data handling.
  • the tasks mentioned mostly moves data around, from stack to memory, from advice provider to memory, from memory to memory.

These can be summarized into thes procedures:

// move_from_stack_to_memory(writePtr: Address, words: Felt)
//
// Moves the `words * 4` elements from the stack to memory, starting at
// `writePtr` address
proc.move_from_stack_to_memory.0 end

// move_from_tape_to_memory(writePtr: Address, words: Felt, commitment: Commitment)
//
// Moves the `words` words from the advice tape to memory, starting at
// `writePtr` address, at the end asserts the commitment data was copied
proc.move_from_tape_to_memory.0 end

// memcopy(readPtr: Address, writePtr: Address, words: Felt)
//
// Copies `words` words from `readPtr` to `writePtr`.
proc.memcopy.0 end

@bobbinth
Copy link
Contributor Author

bobbinth commented Feb 1, 2023

Great summary! A few comments on the above.

global inputs (partially described, just copy element-by-element to memory)

The stack will contain only the hash of the last known block. So, we still need to use advice provider to "unhash" it into the block header. We'll also need to "unhash" commitment to the note DB to check whether consumed notes were in it - but this can be done later when we have MMRs implemented.

copy account vault as the tx vault (requires SMT to be finished)

We actually don't have to have SMT finished for this. Copying of account vault is just copying of the root of the SMT (i.e., copy a word from one address to another).

add each asset to the account vault (requires SMT to be finished)

Should be "add each asset to the tx vault".

verify note is in the Note DB (requires SMT to be finished)

This requires MMR to be implemented, not SMT. Also, this point seems to be repeated twice.

@frisitano
Copy link
Contributor

@bobbinth why are the global inputs required for the transaction kernel? I would have thought that validation of inputs is performed by the os kernel during block construction? Do you intend to use it for a different purpose? Is there potential for global information to become stale?

@frisitano
Copy link
Contributor

I was going to propose that we increase the note limit above 1024, however after some simple analytics I've concluded that it's probably not necessary. The max number of trades executed on uniswap in a single Ethereum block is 420 - see dune query

@bobbinth
Copy link
Contributor Author

why are the global inputs required for the transaction kernel?

The main reason is so that we can prove that notes consumed in a transaction were present in the Note DB (hash of the last known block will contain a commitment to the Note DB).

Another reason is to allow account/note code to access such variables as block height. Access to such variables is very useful for smart contracts. It is important to note that we can't guarantee that the block height at which transaction is executed will be the same as the block height at which the transaction is included into a block, but we can guarantee transaction block height will always be less than block block height - and I think is already super useful.

Another reason is that, in theory, this will allow accessing state of any account. As you've said, we can't guarantee that the state is not stale - but in some cases this may be OK (e.g., if an account cannot be updated more frequently than once an hour, any transaction which reads the account state shortly after it is updated, can be sure that the state is not stale).

@frisitano
Copy link
Contributor

frisitano commented Mar 2, 2023

There have been some changes to the specification for the transaction prologue. Below I specify only those components that I believe to have changed. If it is not referenced then the reader can above the OP to still be relevant.

Prologue Inputs

  1. Global Inputs (4 elements) - A hash of the last known block at the time the transaction was executed. This will be un-hashed inside the VM and it's data will allow for verification of consumed notes against the Note DB.
  2. AccountId (1 element) - This has changed to a single element as per feat(objects): migrate to ~64 bit AccountId #33
  3. Nullifier commitment (4 elements) - sequential has of script nullifier and script root for all consumed notes as seen here

Account data section

AccoundId has been changed to a single element and as such this must be considered here.

Memory address Variable Description
$a + 1$ acct_id ID of the account. Only the first element of the word is relevant.

Consumed notes data section

Memory address Variable Description
$n + 7$ note_metadata Metadata of the note (includes sender and tag).

@frisitano
Copy link
Contributor

frisitano commented Mar 2, 2023

This comment will contain a running thread of questions that arise during the implementation process:

  • do we need to provide "Account commitment (4 elements) - hash of the initial account states." via public inputs? I think we should be able to retrieve this by doing a lookup in the Account DB sparse merkle tree.
  • ordering and specification of account data appears outdated. Comparing with rust I would expect the following:
Memory address Variable Description
$a$ acct_hash Hash of the account's initial state.
$(a + 1)[0]$ acct_id ID of the account. (1 element)
$(a + 1)[3]$ acct_nonce Account's nonce. (1 elements)
$a + 2$ acct_vault_root Root of the account's asset vault.
$a + 3$ acct_store_root Root of the account's storage Sparse Merkle tree.
$a + 4$ acct_code_root Root of the account's code Merkle tree.

@bobbinth
Copy link
Contributor Author

bobbinth commented Mar 2, 2023

do we need to provide "Account commitment (4 elements) - hash of the initial account states." via public inputs? I think we should be able to retrieve this by doing a lookup in the Account DB sparse merkle tree.

Yes, we could do that - but I wanted to make this as "stateless" as possible. So, for example, if we pass account commitment via the stack, a wallet device would not need to maintain any information the account DB.

ordering and specification of account data appears outdated. Comparing with rust I would expect the following:

I was trying to make the nonce more easily accessible (we have specialized instructions for reading/writing the first element of a word). But I think the overhead is probably negligible and it is better to keep the layout the same everywhere.

One correction: account nonce is currently a single field element and I'm not sure we'll end up changing that.

@frisitano
Copy link
Contributor

Yes, we could do that - but I wanted to make this as "stateless" as possible. So, for example, if we pass account commitment via the stack, a wallet device would not need to maintain any information the account DB.

Ok, that makes sense. Having said that, we have a dependency on inclusion proofs for notes so in any case the client will have to do lookups. Having them do the account db lookups as well may be worth considering as it reduces the overhead of the batch / os kernel. What do you think?

I was trying to make the nonce more easily accessible (we have specialized instructions for reading/writing the first element of a word). But I think the overhead is probably negligible and it is better to keep the layout the same everywhere.

One correction: account nonce is currently a single field element and I'm not sure we'll end up changing that.

Thanks I have updated my comment.

@bobbinth
Copy link
Contributor Author

bobbinth commented Mar 3, 2023

Having said that, we have a dependency on inclusion proofs for notes so in any case the client will have to do lookups.

I think this is somewhat different. Inclusion proofs for notes could be anchored on some old block because we just need to prove that the note was created at some point. Inclusion proof for the account, would need to be done against the last block to prove that our starting point is the latest account state.

Or I guess we could do it against the old block, but then when we aggregate the proofs into a block, we'd need to prove that the state of the account didn't change between the block used in the transaction and the last block. This is not difficult, but I think it adds more complexity than providing the expected initial state via the stack.

Thanks I have updated my comment.

Thank you! There is still a tiny typo: the address of nonce should be $(a+1)[1]$ (not $(a+1)[1..]$).

@frisitano
Copy link
Contributor

Having said that, we have a dependency on inclusion proofs for notes so in any case the client will have to do lookups.

I think this is somewhat different. Inclusion proofs for notes could be anchored on some old block because we just need to prove that the note was created at some point. Inclusion proof for the account, would need to be done against the last block to prove that our starting point is the latest account state.

Or I guess we could do it against the old block, but then when we aggregate the proofs into a block, we'd need to prove that the state of the account didn't change between the block used in the transaction and the last block. This is not difficult, but I think it adds more complexity than providing the expected initial state via the stack.

Ah yes good point! This makes sense.

Thanks I have updated my comment.

Thank you! There is still a tiny typo: the address of nonce should be (a+1)[1] (not (a+1)[1..]).

whoops, I'll fix that now.

@frisitano
Copy link
Contributor

When ingesting assets from the advice provider for consumed notes it would be useful if the asset data is padded to be a multiple of the rate width. This would involve padding with an additional word when we are ingesting an odd number of assets.

@bobbinth
Copy link
Contributor Author

bobbinth commented Mar 3, 2023

When ingesting assets from the advice provider for consumed notes it would be useful if the asset data is padded to be a multiple of the rate width. This would involve padding with an additional word when we are ingesting an odd number of assets.

Agreed. Let's create an issue for this.

@frisitano
Copy link
Contributor

Observation: Each note can contain a maximum of 1018 assets.

@bobbinth
Copy link
Contributor Author

Observation: Each note can contain a maximum of 1018 assets.

I'm actually thinking we should probably reduce this to something like 256 (see #10).

@hackaugusto
Copy link
Contributor

I'm actually thinking we should probably reduce this to something like 256 (see #10).

Why do we have these limit though? We need a counter for all these values anyways, number of consumed/produced notes, number of assets in the note's vault, so on. These counters in the VM has to be either a Felt or a u32, why not make that the limit instead?

And we could have an even tinier representation outside of the VM if we use variable length encoding for counters and lay the data sequentially without padding

@bobbinth
Copy link
Contributor Author

My motivation for smaller rather than larger limits is that it reduces the amount of edge cases (and potential attack vectors) to worry about. Like we won't need to think what would happen if someone creates a note with 1M assets (i.e., do we need to have a separate code path to handle it or do we have a single code path which can handle notes of arbitrary size?).

@hackaugusto
Copy link
Contributor

do we need to have a separate code path to handle it or do we have a single code path which can handle notes of arbitrary size?

AFAIU we are setting the maximum not the exact size, so we need code to handle arbitrary size anyways, no?

@bobbinth
Copy link
Contributor Author

AFAIU we are setting the maximum not the exact size, so we need code to handle arbitrary size anyways, no?

Different sizes, yes - but not arbitrary. If we can assume that a transaction will never be bigger than say 100KB we can write code in a certain way. But if a transaction could potentially be 100MB, then the code would need to be written in a very different way.

daedlock pushed a commit to keomprotocol/miden-base that referenced this issue Feb 8, 2024
…nullifiers-handle

store: implement check nullifiers handle
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants