Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add codespace into the determinsticExecTxResult value #844

Open
colin-axner opened this issue May 16, 2023 · 4 comments
Open

Add codespace into the determinsticExecTxResult value #844

colin-axner opened this issue May 16, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@colin-axner
Copy link

colin-axner commented May 16, 2023

Feature Request

Summary

Add abci.ExecTxResult.Codespace to the deterministicExecTxResult return value

Problem Definition

In IBC, information is passed into an acknowledgement when a chain is receiving a packet. The receiving chain is creating the acknowledgement. The acknowledgement is then relayed to the sending chain, where an IBC application can act upon the information within the acknowledgement. The receiving application may want to encode error information into the acknowledgement to indicate that the packet was unable to be successfully processed on the receiving side.

This is a very common occurrence within interchain accounts where a transaction is being executed on behalf of an actor on another chain. If a single transaction fails, the error should be relayed back to the sending chain.

One constraint is that the acknowledgement is written into state on the receiving side in order to be verified by the sender, thus non-determinsitic information must be stripped from the error before being encoded into the acknowledgement, otherwise a chain code reach consensus failure from diverging app hashes.

Currently tendermint/cometbft only include the following transaction information within the header result hash:

  • Code (result of executing the tx)
  • Data (msg responses, failed transaction have empty data)
  • GasWanted
  • GasUsed

Thus in IBC error acknowledgement, only the executing code can be included. This results in confusing error messages:

ABCI code: 10: error handling packet: see events for details.

The error message does not indicate what codespace the error came from, making the error string useless.

Furthermore, the original design of the Code field was to use it as a condensed version of the entire error string from which clients may lookup the full value, but I believe this creates for a poor developer environment particularly in a multichain world. For example, in IBC, the code comes from the counterparty chain requiring actors to query the counterparty chain to obtain the meaning of the error. This is likely impossible for on chain actors such as smart contracts. I'm not sure the original justification for this design is applicable given the actual usage of IBC:

The slow way to do this is to just get Merkle proofs for each transaction response (e.g. the <code,data,tags> tuple that we plan to Merkle-ize into the Header). This list of leaf-proofs doesn't prove the absence of tags in transactions that are absent, so either the nano-client has to receive all the proofs, or, we need to include some sequence in the transaction bytes (this is brittle because the nano-client needs to be upgraded when the tx format changes). So the latter isn't too bad, but it's a bit annoying.

The faster way to do this in many cases is to get the whole list of tx response-codes per block, and to scan those numbers to filter. If we only have OK/error response codes, and a non-OK code meant that the transaction could be ignored, then the nano-client has to parse all OK transactions to filter, or, it's possible that there isn't enough information in the transaction bytes, and so the nano-client needs to consult the response tags to filter, and now we're back to the first problem.

This is a problem for a larger issue so I do not propose that the code be replaced by a string in the determinstic tx result of the block, but I'd like to open the discussion up in order to find a way to support a better dev UX landscape.

This issue is applicable to the non-atomic transaction feature (if error information is to be returned in the transaction response)

Proposal

Include the abci.ExecTxResult.Codespace into the determinsticExecTxResult function return. This PR introduced the Code into the determinsticExecTxResult. This ADR seems to indicate that the codespace should be included:

How the codespace is hashed into block headers (ie. so it can be queried efficiently by lite clients) is left for a separate ADR.

It seems like an oversight to me that it never was. Following the original design discussion, there is no justification for why the codespace wasn't included.

@colin-axner colin-axner added enhancement New feature or request needs-triage This issue/PR has not yet been triaged by the team. labels May 16, 2023
@sergio-mena sergio-mena self-assigned this May 25, 2023
@sergio-mena sergio-mena removed the needs-triage This issue/PR has not yet been triaged by the team. label May 30, 2023
@sergio-mena
Copy link
Contributor

sergio-mena commented May 30, 2023

Hi @colin-axner, thanks for submitting your proposal. The core idea you are presenting makes total sense. The reason we haven't done it yet is because of the breakages it would entail to existing chains: it breaks the block format. Let me explain more in detail.
The following modifications represent a change in the block format (the list is not 100% exhaustive):

  • A change in some of the fields in the block
  • A change in the hashing algorithm for any of the hashes in the block header
  • A change in the fields that are used to calculate a hash in the block header

An exception to this is a change in the method/algorithm to calculate the AppHash included in the block header. Such a change is not breaking the block format.

The block header contains field version:

  • Field block must increase whenever the block format is changed (i.e., when any of the three bullets above occur)
  • Field app must increase whenever the method/algorithm to calculate the AppHash changes.

The change proposed in this issue — adding codespace into the LastResultsHash computation — is changing the block format (the block version must increase with such a change). This means:

  • All validators and full nodes in the upgrading chain would need a coordinated upgrade (all stopping at a predefined height and restarting with the new binary)
  • Ideally, the changes to CometBFT are backwards compatible: a node that is lagging behind while running the new version is able to process and validate blocks with the old version (both hashing algorithms are present in the code)

In general, we need a more holistic solution for changing the block format so that other tools for the chain don't stop working: light clients, block explorers, IBC, etc.

Even if the change proposed here could be done relatively cleanly by following the two bullets above, we are likely to tackle it together with more involved block-format-breaking changes. We are at the early stages of coming up with a holistic solution for block format changes, called "Soft Upgrades". You can check the progress here and here.

@sergio-mena sergio-mena removed their assignment May 30, 2023
@tac0turtle
Copy link
Contributor

I tend to disagree here. I believe errors and their code spaces should not be part of consensus and it was a mistake to do it in the first place. It attempts to solve a problem that doesnt exist. We have talked with a few users, osmosis is a major one, which recommend their developers to not use code spaces as it creates possible edge cases of consensus faults.

If a module would like to get extra data into consensus i agree it should be done app side.

@sergio-mena
Copy link
Contributor

Thanks for chiming in @tac0turtle

recommend their developers to not use code spaces as it creates possible edge cases of consensus faults

But code spaces are not part of ResultsHash (that's the original reason for submitting this issue), so they can't cause consensus faults (I agree, tho, that error codes can).

@webmaster128
Copy link

I believe errors and their code spaces should not be part of consensus and it was a mistake to do it in the first place.

Do you agree on the mental model that an error is identified by <codespace>/<code>? I.e. wasm/5 and sdk/5 are two distinct errors and 5 alone is not a very valuable information.

If yes, then shouldn't we either include both or none at all? Do you suggest removing code and replace it with a did_fail boolean?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants