Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make contract addresses predictable ("deterministic") #942

Closed
ethanfrey opened this issue Aug 15, 2022 · 25 comments · Fixed by #996
Closed

Make contract addresses predictable ("deterministic") #942

ethanfrey opened this issue Aug 15, 2022 · 25 comments · Fixed by #996
Assignees
Milestone

Comments

@ethanfrey
Copy link
Member

This is a request from @sunnya97 from quite some time ago, which I am just getting around to writing down properly.

Request

The desired state is that a contract creator can pre-compute the next 5 (or 500) contract addresses it will create. This will allow them to publish said addresses and send tokens (or vesting tokens) to such addresses before they exist. The original use case was some sort of exchange creating many new addresses for users, then deploying later. However, I have come up with many more use cases:

  1. Expose a query both to users (via gRPC) and contracts (via QueryRequest) to list the next address they will create. For contracts, it will remove the most common case of reply_on, to simply get the address of the contract we just instantiated, making this a simpler one-step process. It will also make it easier to produce cycles (contract A refers to contract B, contract B refers to contract A - like a cw20 contract and the staking contract)
  2. Allow sending "vesting tokens" to a (multsig/DAO) contract. Currently, when we instantiate a contract, we create a BaseAccount. Once created, we can no longer send vesting tokens to it. Before it was created, we don't know for certainty what address it will have. This will allow one to receive vesting tokens on a pre-computed address, then deploy a new multisig/dao contract there

Current State

The current state is that we create a "sub-module address" for each contract, but the inputs we use are the codeID and a global counter of number of instances created. If someone else deploys a contract, they could claim my address or make the address unreachable. This doesn't allow the creator to have control of the address it will receive.

Ethereum has a CREATE opcode that does a "sender + nonce hash" to create the new address. And CREATE2 which does a more complex hash, including initialisation data.

Proposed Changes

In order to make these maximally predictable, I would base on the CREATE syntax. However, we cannot use the typical nonce, as (a) an external account can instantiate multiple contracts in one transaction / same nonce and (b) in CosmWasm contracts don't have nonces, and we need to count something there.

I would add a new counter per-address that maintains how many contracts this sender has created. If a contract creates 3 sub-contracts in the same transaction, each would get another counter value. I would then use (0xcafe || senderAddress (20-32 bytes) || counter (8 bytes)) as input into the standard module account hash function.

The use of that prefix ensures the has space doesn't overlap with any previously created addresses from the x/wasm module. And use of module account derivation means that it doesn't overlap with any addresses created by other modules. Existing contracts already have an address and this will just affect the generation of new addresses without breaking any backwards compatibility.

Once this is deployed, we can immediately add support for clients to query the future contract address. With the next CosmWasm upgrade (v1.2?) we can add a new query to pre-compute the address that will be used upon the next instantiate call.

@ethanfrey ethanfrey added this to the v0.29.0 milestone Aug 15, 2022
@sunnya97
Copy link
Member

sunnya97 commented Aug 16, 2022

I would recommend that instead of using a monotonically increasing nonce (counter), it should be using a 32 byte salt, similar to how it is done in CREATE2. This enables a couple of things.

One, it allows a contract to "manage" multiple uninstantiated contracts at once and instantiate them out of order. Let's say a contract wants to create 2 "deposit addresses" for both Alice and Bob to send to. It assigns Alice the first available counter value, and Bob the second available counter value. But if Bob deposits money, but Alice never does, the contract should be able to instantiate Bob's contract without having to instantiate Alice's contract first.

Secondly, by being 32 bytes, it can be a hash of precommitted to data. For example, the factory contract can use H(codeID | InstantiateMsg) as the salt to instantiating the contract. That way, the contract can't instantiate arbitrary contracts to the address, and the user can know exactly what code is going to be deployed to an address and how it will be initialized.

Edit: in fact, in CREATE2 on Ethereum, they explicitly always pass in the initCode (which is effectively the same as codeID + InstantiateMsg in CosmWasm) for this reason.

Essentially, we should be basing off of CREATE2 rather than CREATE as this gives far more flexibility and security.

@webmaster128
Copy link
Member

Ethan's approach and Sunny's aproach differ in API compatibility. MsgInstantiateContract does not have those extra fields and adding fields is a breaking change in our context. However, I think it's reasonable to create a second instantiate message type (MsgInstantiate2Contract) to implement a feature like CRATE2.

@ethanfrey
Copy link
Member Author

I looked at CREATE2 before and saw that it included both a nonce and initMsg. This may make it hard to precompute a future address.

However, we can still keep the same API with this info. It is just the args we already pass (sender, code id, init data) and a sender-specific nonce/counter from storage.

However, this may not achieve Sunnys desire of configurable addresses.

@ethanfrey
Copy link
Member Author

ethanfrey commented Aug 17, 2022

Update, I re-read the CREATE2 spec a bit more carefully:

keccak256( 0xff ++ address ++ salt ++ keccak256(init_code))[12:]

so, salt is passed in as an argument, not the auto-incrementing nonce that was used in CREATE. This actually makes a lot of sense, but is one more argument to include in the messages

It also describes how to handle collisions (using the same salt / initCode):

If a contract creation is attempted, due to either a creation transaction or the CREATE (or future CREATE2) opcode, and the destination address already has either nonzero nonce, or nonempty code, then the creation throws immediately, with exactly the same behavior as would arise if the first byte in the init code were an invalid opcode. This applies retroactively starting from genesis.


If we adopt CREATE2-like semantics, we either need to make a new message type (both for contracts and external callers), or (and maybe this is a bad idea), we could leverage the informational "label" field to get a unique salt. salt = sha256(label). This means if the same sender created two contracts with the same code id, initMsg and label, then they will have the same contractAddress and the second one will return an error.

Since label is an arbitrary string, one can store a counter ("Subcontract #2") or relevant info ("deposit address for osmos1d2983f39djdfiw") to make it unique.

While I can see reservations on repurposing this label field (which we currently only assert to be non-empty), I also have reservations about making completely new paths. I think they both have their pros and cons.

@webmaster128
Copy link
Member

Using the label field is actually a very good idea. Much better than the layer violation that happens when encoding the user-chosen salt in the inner msg that is processed by the contract.

@webmaster128
Copy link
Member

Edit: in fact, in CREATE2 on Ethereum, they explicitly always pass in the initCode (which is effectively the same as codeID + InstantiateMsg in CosmWasm) for this reason.

On Ethereum, uploading code and instantiating it is one step. I think it is safe to assume that in CosmWasm it is reasonable to upload a code once well in advance and use the code ID as one of the inputs when calculating the address.

@sunnya97
Copy link
Member

And if a initiator uses an empty label, would it default to using some incrementing nonce?

@webmaster128
Copy link
Member

webmaster128 commented Aug 18, 2022

If we calculate the address like this:

module_account(hash(big_endian(codeId) | big_endian(len(sender_address)) | sender_address | label))

we can do that completely stateless. In case the same creator wants to instantiate the same code again, we can error with You instantiated that contract with the given label already, please choose a different label.

@sunnya97
Copy link
Member

sunnya97 commented Aug 18, 2022

Yeah, I was thinking it would be nice if the creator didn't have to think about the labels. They should be able to if they want to for the examples I provided above, but if they don't want to, defaulting to an incrementing nonce would be a nice to have.

But then I realized those people can just use the legacy (current) contract instantiation method. So never mind.

@webmaster128
Copy link
Member

webmaster128 commented Aug 18, 2022

The point of using labels is that we do not have introduce a second instantiate message type. They would all get the new more powerful and simpler (stateless) address generation. For most users this should not affect anything. Only when one address instantiates a code multiple times they have to use different labels. I don't think this is too much to ask given all the benefits.

I'm not a fan of putting different calculations behind the same message type in a case distiction.

@ethanfrey
Copy link
Member Author

And if a initiator uses an empty label, would it default to using some incrementing nonce?

We already enforce label to be non empty and maximum 128 bytes.

@alpe
Copy link
Member

alpe commented Aug 26, 2022

Very nice! Just curious, what's the motivation to add len(sender_address) to the hash?

@ethanfrey
Copy link
Member Author

Very nice! Just curious, what's the motivation to add len(sender_address) to the hash?

Good question. I thought this would work closer to create2:

module_account(hash(big_endian(codeId) | canonical_sender_address | hash(initMsg) | label))

@sunnya97
Copy link
Member

Very nice! Just curious, what's the motivation to add len(sender_address) to the hash?

I assumed its for length prefixing the sender address in case a chain can have multiple sender address lengths

So like if a chain supports 20 byte addresses and 32 byte addresses, I can't spoof a 32byte_addr | label with a 20byte_addr | label

@webmaster128
Copy link
Member

As Sunny said. Whenever you have variable length inputs you either need to length prefix them or add a separator that is guaranteed to not be in the content. On Ethereum, addresses are fixed length (20 bytes). Here we already have 20 byte addresses from pubkeys and 32 byte addresses from contracts.

Is there a reason to include the init message? I would not add it. Having to handle a byte-to-byte encoding of the JSON is not that convenient. Your clients might encode the same JSON document differently and all the higher level CosmJS APIs work with objects, not JSON bytes.

@webmaster128
Copy link
Member

In cosmos/cosmjs#1253 there is a TypeScript implementation including test vectors. Hope they match the Go implementation.

@webmaster128
Copy link
Member

If we also prefix the label length, we have the option to add an optional variable-length parameter later on in a backwards compatible way. Let's do that.

Also Alex prefers to use a 1 byte length encoding, which is enough for the address and the label.

So we now have the ADR-028 "Module Account Addresses"

address.Module("wasm", uint64_be(codeId) | uint8(len(creator_address)) | creator_address | uint8(len(label)) | label)

which is the same as

address.Hash("module", ascii("wasm") | 0x00 | uint64_be(codeId) | uint8(len(creator_address)) | creator_address | uint8(len(label)) | label)

@alpe alpe self-assigned this Aug 31, 2022
@hussein-aitlahcen
Copy link

hussein-aitlahcen commented Sep 2, 2022

Hi, I'm very interested by the subject and would like to put a random question (forgive me if it's a dumb question):

If we include the code_id in the hash, how is it supposed to be deterministic? Isn't this a global counter specific to a chain and dependent on its state? If I deploy my code today or tomorrow, the resulting code_id for the same contract will be different? If so, I am then unable to precompute it's supposedly deterministic address? Am I missing something important here?

@webmaster128
Copy link
Member

webmaster128 commented Sep 3, 2022

@hussein-aitlahcen "deterministic" is not really a helpful term here (please forgive us using it in an unprecise fashion). The address generation has always been determinstic, otherwise nodes had calculated different results. The question is: what are the inputs? Until now, one of the inputs was a global instance counter. An app or develer did not have access to this value in advance, so it could not pre-calculate the address at all. With this change, you can pre-calculate the address as soon as you got the code ID. So your assumtion is corrent, you can't calculate instance addresses for a code that has not yet been stored successfully. However, since in contrast to Ethereum the uploading of code and the instantiation are two steps, we assume that this is a limitation that is not a problem in practice. If you know your code ID tomorrow, start precomuting addresses tomorrow.

Do you have a use case for which this is an issue?

@hussein-aitlahcen
Copy link

hussein-aitlahcen commented Sep 3, 2022

@hussein-aitlahcen "deterministic" is not really a helpful term here (please forgive us using it in an unprecise fashion). The address generation has always been determinstic, otherwise nodes had calculated different results. The question is: what are the inputs? Until now, one of the inputs was a global instance counter. An app or develer did not have access to this value in advance, so it could not pre-calculate the address at all. With this change, you can pre-calculate the address as soon as you got the code ID. So your assumtion is corrent, you can't calculate instance addresses for a code that has not yet been stored successfully. However, since in contrast to Ethereum the uploading of code and the instantiation are two steps, we assume that this is a limitation that is not a problem in practice. If you know your code ID tomorrow, start precomuting addresses tomorrow.

Yes, predictable is probably more accurate.

With this change, you can pre-calculate the address as soon as you got the code ID.
Do you have a use case for which this is an issue?

This is exactly what I was describing: the idea of CREATE2 was to avoid a user having to depend on a chain state to compute an address. If you still include the code ID in the address computation, then you basically void the whole purpose and make it impossible for reproducible/predictable addresses.

Here is a use case: I would like to send some tokens to a contract address not yet deployed. Neither the contract code has been uploaded nor an instance of it has been created. Including the code ID in the address defeat this use case as you depend on a global state (current code ID counter on the target chain).

Another one: I would like to deploy a set of contract on the testnet, and have a 1:1 between the contract addresses on the testnet and mainnet. Including the code ID defeat this again, because the testnet counter != mainnet counter, right?

Another one: I would like to HARDCODE a contract address in another contract, regardless of the target chain, this address must be an address I can "create" via the "CREATE2" equivalent.

The CREATE2 EIP describe this as counterfactual interaction.

Motivation
Allows interactions to (actually or counterfactually in channels) be made with addresses that do not exist yet on-chain but can be relied on to only possibly eventually contain code that has been created by a particular piece of init code. Important for state-channel use cases that involve counterfactual interactions with contracts.

Instead of having the code ID part of the contract address, perhaps we could just use the code hash instead, the resulting equation would solely depend on the USER itself, no chain state involved:

hash(user + salt + hash(initmsg) + code_hash)

Using this computation, you realise that you can actually precompute an address, regardless of the target network and it's state. In fact, I would be able to deploy, to the same address, the same contract on all Cosmos chains?

@webmaster128
Copy link
Member

webmaster128 commented Sep 5, 2022

Good. As far as I can tell there is nothing wrong with changing code_id to checksum (the hash of the Wasm blob). Then we have

address.Module("wasm", uint8(len(checksum)) | checksum | uint8(len(creator_address)) | creator_address | uint8(len(label)) | label)

which is the same as

address.Hash("module", ascii("wasm") | 0x00 | uint8(len(checksum)) | checksum | uint8(len(creator_address)) | creator_address | uint8(len(label)) | label)

where checksum is a fixed length 32 byte value (output of sha256).

@webmaster128
Copy link
Member

webmaster128 commented Sep 5, 2022

@hussein-aitlahcen thanks for all your input. Please note that due to API stability we use the label very similar to what the salt is on Ethereum (as discussed above). label should be a printable human friendly instance identifier (should look nice in explorers). But it can contain app specific components like My instance no. {counter} or My deposit id {some hashy thing}.

@hussein-aitlahcen
Copy link

@hussein-aitlahcen thanks for all your input. Please note that due to API stability we use the label very similar to what the salt is on Ethereum (as discussed above). label should be a printable human friendly instance identifier (should look nice in explorers). But it can contain app specific components like My instance no. {counter} or My deposit id {some hashy thing}.

It makes perfect sense!

@webmaster128
Copy link
Member

webmaster128 commented Sep 14, 2022

In #1000 you find a follow-up discussion of this ticket. Would be good if you could have a look and get involved.

@webmaster128
Copy link
Member

Please also note https://medium.com/cosmwasm/dev-note-3-limitations-of-instantiate2-and-how-to-deal-with-them-a3f946874230 for important updates on Instantiate2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants