Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further investigate execution #3

Closed
liamsi opened this issue Apr 5, 2020 · 17 comments
Closed

Further investigate execution #3

liamsi opened this issue Apr 5, 2020 · 17 comments
Labels
T:enhancement Type: Enhancement

Comments

@liamsi
Copy link
Member

liamsi commented Apr 5, 2020

This issue is to track changes needed (and tradeoffs) which will enable to update the AppHash (state root) before consensus on a block.

Roughly:

  • What changes are necessary to make the proposer run the full deliverTx sequence instead of checkTx?
  • Does the above introduce any problems or undesired tradeoffs
  • Do we need changes in the gas model (inside tendermint)
  • Clarify how much the SDK relies on the current tendermint execution model.

Related: tendermint/tendermint#2483

Also:

@adlerjohn
Copy link
Member

Is there any difference for this between the Go and Rust implementations of Tendermint?

@liamsi
Copy link
Member Author

liamsi commented Apr 7, 2020

The tendermint-rs repository doesn't contain a fullnode implementation yet. But the implementation aims to be compatible to the go one. (So the answer will be no unless we can merge our changes upstream of course).

I think there is another implementation in Rust by Codechain. Might be a good idea to look into that too.

@liamsi liamsi added the T:enhancement Type: Enhancement label Apr 8, 2020
@liamsi
Copy link
Member Author

liamsi commented Apr 15, 2020

High level (and unfinished) overview on state execution and what places would need changes in tendermint to execute Txs (and update the AppHash/state root) in the same block.

Problem

Why bother and change the current model?

In the current execution model, blocks are executed against the app only after they are committed. Full block verification (incl. state) always needs access to transactions of the previous block:

state(1) = InitialState
state(h+1) <- Execute(state(h), ABCIApp, block(h))

For more info on Execute see: https://docs.tendermint.com/master/spec/blockchain/blockchain.html#execution

This causes all sorts of confusion when implementing an app module on top of this logic. In the context of LazyLedger particularly but basically in any chain / application where we want to generate fraud proofs the "one off" between block height and state height causes further issues. First, (state) fraud proofs are also generated one-off, meaning their generation is also delayed one block. Second, this means light clients that want to verify a fraud proof need to wait for and download an extra block/header as they will find the updated root only there. In this model a light client that receives a (state) fraud proof for block at height h from a (full) node would need to verify that against the block h-1 as the Tx are included in the previous block.

Why changing the current model might still be undesired?

The more changes we add to the current tendermint codebase, the more lazyledger-core diverges from tendermint, the more time consuming it will be to keep up with (critical) fixes from upstream. While it is true that we will make changes to Tendermint anyways, the changes regarding the execution model might be less contained and touch not a single but several go packages.

Changes

Theoretically changing Tendermint to immediate seems trivial: during the consensus steps of a round (propose, prevote, precommit), we need to update the state root before we proceed further (IMO the earlier we do so, the better). Practically, the changes will certainly be a bit more elaborate.

Another weirdness of the current implementation is that Block execution can happen from two different places: either from the Consensus or the Blockchain Reactor. At the same time the Blockchain reactor can modify the state of the consensus reactor (NOTE: This design deserves a proper review and refactoring independent from what we are trying to achieve here...). Block execution modifies the mempool and both the mempool as well as block execution itself have read/write access to the app via ABCI.

"Seal Phase"

Once all transactions for the current height are executed, we need to update the state root (aka AppHash in tendermint). (For the lack of a better name, let's call this seal phase for now).

We need to figure out when it’s reasonable (and safe to do this). My understanding is that we could execute *before *we even propose a block (first step) and there is no reason to wait until the before the precommit. As both @ebuchman and @zmanian suggested to do this before the precommit instead. So I’m probably missing something important here...

A high level overview of the tendermint steps:

image

(Image credits: https://docs.google.com/presentation/d/1k0CbJQBHoMJkuOXdAKHFw9dcaXES_3dFYnCp1I1bm2M/edit#slide=id.g3acacab17a_0_528)

Note that all steps come with timeouts which are critical for the consensus to make progress. It is critical that these timeouts need to be generous enough that we don’t end up timing out and skipping rounds without making any real progress. Hence, we might have to increase the timeout in between steps where execution happens. IMO the block is proposed directly with all Tx executed and state root up to date. This means increasing timeout_propose. Of course this can mean the nodes might execute blocks which don’t end up being committed to (and waste valuable computing time) but I don’t think this will be a problem in practice.

Consensus Reactor

This is where the meat is. We need to teach the reactor to only propose blocks with state roots updated according to the current transactions included in that block. (See the tendermint spec and the code for more details.)

Currently, the BlockExecutor runs on Commit and updates the mempool. The executor gets called by the blockchain reactor (which is a mess, e.g. look at this). The current battle tested blockchain reactor (used in the hub) is v0 and I’ve hoped we can avoid touching it. We could peek into the refactored versions (v1 and v2) in case the changes we need to make exceed a few lines.

Replay

Need to update replay logic accordingly. Here and in a few other places where blocks are replayed.

Blockchain Reactor

  • TODO: Do we still need block execution in pool routine?

Mempool

What do we do with CheckTx?

  • TODO

Gas Model

The current Gas model relies on running CheckTx (via ABCI). Tx are added to the mempool as a mempoolTx with gasWanted set to the corresponding ResponseCheckTx’s GasWanted.

My understanding is that we’d rather want users to pay only for the gas that was actually consumed instead of refunding unused gas (see this issue).

Validator Updates

Currently validator sets are tracked in this way by tendermint: transactions that affect the validator set proposed in block H will be committed in the block H+1, and therefore block H+2 will be committed by the new val set.

image

tendermint/tendermint#2483

As state gets executed immediately, validator changes also happen immediately. This has two implications: first, we can get rid of Header.ValidatorHash as it isn’t the app’s output from the previous block anymore (and it simply represents the current validators that signed the block). The current validators (of height H) can be part of the state root.

Similarly Header.NextValidatorsHash needs to be changed to point to the validator set that will commit the next block H+1.

These changes substantially change the light-client model (see the spec for more details on the current model).

ABCI

ABCI connects the tendermint state machine with the application (this is the actual business logic state machine; e.g. in LazyLedger the app would be the PoS logic).

TODO

RPC

The light client uses 2 RPC end-points to get the signed headers and validator sets for given heights. Need to update the validator endpoint according to above changes to the validator sets tracked in the header/state:

https://docs.tendermint.com/master/rpc/#/Info/validators

This should be a very simple change as the RPC only reads

Further Note on dependencies between different components

This diagram illustrates read/write access of components to different kinds of state:

(Thanks to Sean aka @brapse for the diagram)

The problem here is that some of the read/write access above happens in go-routines and access is often guarded via mutexes. We have to be careful not to modify this in a way that could deadlock.

Voiced concerns

or you have to add a "block sealing" phase where the proposal block is updated with the
execution results before precommit, which can have subtle effects on round skipping logic

There might also be a DoS concern with the current gas model (as we have to run the Tx first).

@liamsi
Copy link
Member Author

liamsi commented Apr 16, 2020

I'm also dropping @adlerjohn's summary on how we can do fraud proofs in the current model:


Summary for keeping deferred execution:

  1. Invalid transactions (i.e. that modify the LL state) can be included in blocks, as we don't require non-proposers to immediately validate transactions.
  2. Block proposers can figure out if appropriate fees are paid, as fee payment is a predicate.
  3. In block n, transactions are included. In block n+1, the intermediate state roots for block n are included.
  4. If a transaction is invalid, then the pre- and post- intermediate state roots should be identical (i.e. the transactions is a no-op).
  5. If a transaction is invalid and the pre- and post- intermediate state roots aren't identical, then that's an invalid block, and provable with a fraud proof.

Since transactions and intermediate state roots aren't interleaved, but rather in separate reserved namespaces in the NMT, it's trivial to just shift which block the intermediate state roots are for.

@liamsi liamsi added this to To do in LazyLedger MVP via automation Apr 22, 2020
@liamsi liamsi moved this from To do to Low-Prio Backlog in LazyLedger MVP Apr 22, 2020
@adlerjohn
Copy link
Member

Transcluding essence of comment from other discussion thread:

Immediate execution is needed for certain fee models, including Ethereum-style unused-gas refunds and in-protocol non-proportional fee burning.

  1. Gas refunds: block producers don't know how much gas a transaction will actually pay unless they execute it immediately.
  2. Fee burning: every byte in a block and every operation needs to be paid for with burned fees. This means invalid transactions can't be included in blocks, which by definition precludes deferred execution.

Fee burning is especially important, and seeing a rise in usage in modern blockchain designs because it provides a sink for coins, instead of only the sources that the original Bitcoin design has (new coin issuance with block rewards). Sinks are invaluable as they create intrinsic demand for the coin (much like how taxes cannot be supplanted by printing money).

@zmanian
Copy link
Contributor

zmanian commented Jun 15, 2020

I'm 99% sure that Libra does immediate execution and invests a fairly large amount of code speculating execution states if multiple competing proposals exist at any height.

If we are executing blocks then ABCI will need the ability to potential abort an execution and revert state.

@ebuchman
Copy link
Contributor

ebuchman commented Jun 16, 2020

Thanks for this write up! Just a couple thoughts/questions:

which is a mess, e.g. look at this

LOL this is why we rewrote the blockchain reactor. Probably best to focus on v2 ...

we can get rid of Header.ValidatorHash as it isn’t the app’s output from the previous block anymore (and it simply represents the current validators that signed the block). The current validators (of height H) can be part of the state root.

I don't think we should get rid of Header.ValidatorHash. We want Tendermint light clients to work independent of any application, so saying "it can be part of the state root" doesn't help because then the pure Tendermint light client needs to be able to read arbitrary application roots.

Immediate execution is needed for certain fee models

Can't all of this still be done, just off-by-one?

Correct proposers should never include invalid txs in the first place (if CheckTx is written correctly), so apps can just punish proposers that do by slashing. And otherwise can't fee burning and gas refunds just happen as part of execution after the block is committed?

@adlerjohn
Copy link
Member

Can't all of this still be done, just off-by-one?

Not in an incentive-compatible manner.

In the case of gas refunds, the block proposer doesn't know how much gas a transaction will actually use (and thus, how much they'll actually get paid) unless they execute it immediately. So checking for transaction validity (does this user have enough balance to pay for the transaction's gas limit) is sufficient for safety but not incentive-compatibility.

In the case of fee burning (or invalid transactions in general), block proposers can't include two transactions from the same account in the same block without executing them immediately. So proposers that do immediate execution have access to more transactions to potentially include in their blocks (i.e. more profit), and so all proposers will do immediate execution. Since every validator can be a proposer eventually, every validator will therefore do immediate execution all the time too. Not accounting for this in-protocol means smaller validators that can't afford to do immediate execution will eventually get pushed out by bigger validators.

@liamsi
Copy link
Member Author

liamsi commented Jun 16, 2020

LOL this is why we rewrote the blockchain reactor. Probably best to focus on v2 ...

When I wrote this issue V2 didn't seem ready (I even had a branch which deleted the other reactors; but back then tendermint didn't seem to make blocks 😬 I was in touch with @erikgrinaker, can't remember what the issue was). We are planning to switch as soon as possible though: #13

I don't think we should get rid of Header.ValidatorHash. We want Tendermint light clients to work independent of any application, so saying "it can be part of the state root" doesn't help because then the pure Tendermint light client needs to be able to read arbitrary application roots.

I guess in the light of using tendermint (consensus) for LL where (tendermint) application state is not arbitrary in the sense that we optimize for one minimal core application (and devs will write their application logic client side / using some exec environment). But that also means that we can't use the tendermint light client as is (and just add a data availability) but need to re-write some of the core logic there too.

@ebuchman
Copy link
Contributor

In the case of gas refunds, the block proposer doesn't know how much gas a transaction will actually use (and thus, how much they'll actually get paid) unless they execute it immediately.

Does this really matter though? If it only costs them what's used, and they only get paid for that, isn't it fine? As long as the block gas limit is also based on what's actually used, which is maybe the real issue here: cosmos/cosmos-sdk#2150

Eg. for ETH miners, my understanding is they take whats in their mempool, sort by gasprice, and execute until they get to the gas limit. But they wouldn't like execute a transaction, realize the total gas used is too low, and then roll it back and try another one.

In the case of fee burning (or invalid transactions in general), block proposers can't include two transactions from the same account in the same block without executing them immediately.

Not sure I understand. We currently support multiple txs from the same account in a single block by making sure CheckTx is smart enough for it. Why would fee burning be any different? Can't we ensure in CheckTx that users have enough balance to send two txs with full fee burning? Or is there some other sort of dependency between the txs that could invalidate the second tx only after full execution?

@ebuchman
Copy link
Contributor

Also, one possible but maybe hacky approach at least to these fee issues is by doing full execution in CheckTx (?!)

@erikgrinaker
Copy link
Contributor

When I wrote this issue V2 didn't seem ready (I even had a branch which deleted the other reactors; but back then tendermint didn't seem to make blocks 😬 I was in touch with @erikgrinaker, can't remember what the issue was)

v2 had a race condition where it tried to switch to the consensus reactor before it had been hooked up to the switch, and then just silently gave up on everything. It also had an issue with spinning on closed channels, using 100% CPU. The former has been fixed on master by the changes introduced by state sync, but a separate fix hasn't been backported to 0.33.x. The channel spinning has been backported though.

@adlerjohn
Copy link
Member

#3 (comment)

Does this really matter though? If it only costs them what's used, and they only get paid for that, isn't it fine?

It's not, actually! Consider the case where there are three txs:

gas price gas limit
tx_1 1000 100
tx_2 900 50
tx_3 900 50

with a block gas limit of 100. Without executing any transactions, the first one looks like it'd be the most profitable to include. But if that transaction only used 1 gas while the others used the full 50, then clearly it would have been better to make a block with the last two transactions.

Eg. for ETH miners, my understanding is they take whats in their mempool, sort by gasprice, and execute until they get to the gas limit.

That's right, and you can't do this unless you immediately execute transactions.

Can't we ensure in CheckTx that users have enough balance to send two txs with full fee burning? Or is there some other sort of dependency between the txs that could invalidate the second tx only after full execution?

The first transaction could empty the user's account, making the second one invalid.

#3 (comment)

Also, one possible but maybe hacky approach at least to these fee issues is by doing full execution in CheckTx

I mean, if you just make CheckTx do full execution then that does kind of get you close to immediate execution, sure. There's still the issue of the appHash / stateRoot being off by one.

@ebuchman
Copy link
Contributor

The first transaction could empty the user's account, making the second one invalid.

Usually CheckTx is designed to check things that could decrement the users balance. So in principle this kind of thing shouldn't happen, but of course in practice if control over account funds are eg given to a contract then all bets are off.

I mean, if you just make CheckTx do full execution then that does kind of get you close to immediate execution, sure. There's still the issue of the appHash / stateRoot being off by one.

Good to know we at least have a stop gap. Fascinating to explore these boundaries of ABCI. The current design has lasted almost 5 years now, might be time to really rethink things for the next 5.

As for the off-by-one, that's more of a convenience issue right now since clients have to download an extra block? Or is there something more fundamental?

@adlerjohn
Copy link
Member

adlerjohn commented Jun 16, 2020

As for the off-by-one, that's more of a convenience issue right now since clients have to download an extra block? Or is there something more fundamental?

I have to think about this some more, but I don't see any inherent issue other than light clients having to wait for and download an extra block (and, you know, being hacky, which is a big issue 😂). Technically, if you're using state transition fraud proofs, then the last intermediate state root is the state root after applying all the transactions in the block, so light clients can just use that instead of the Tendermint appHash.

One issue might be timing. The current timeouts between each stage of the Tendermint consensus process assume that transaction execution is pipelined. But I guess that's a system parameter, so not tied to Tendermint itself.

@liamsi
Copy link
Member Author

liamsi commented Nov 3, 2020

ref: tendermint/tendermint#7898

@liamsi liamsi removed this from Low-Prio Backlog in LazyLedger MVP Mar 7, 2021
@liamsi liamsi added this to To do in Mainnet and beyond via automation Mar 16, 2021
@liamsi
Copy link
Member Author

liamsi commented Aug 16, 2021

We decided this is out of scope at least until abci++ lands in tendermint for real. We can reconsider this next year.

@liamsi liamsi closed this as completed Aug 16, 2021
Mainnet and beyond automation moved this from To do to Done Aug 16, 2021
staheri14 added a commit that referenced this issue Jan 23, 2024
…heck (#1186)

This PR addresses the vulnerabilities identified by govulncheck in [PR
#1179](https://github.com/celestiaorg/celestia-core/actions/runs/7629759074/job/20786518441?pr=1179).
It upgrades the affected modules to the versions recommended by
govulncheck.
```
Vulnerability #1: GO-2024-2466
    Denial of service in github.com/go-git/go-git/v5 and
    gopkg.in/src-d/go-git.v4
  More info: https://pkg.go.dev/vuln/GO-2024-2466
  Module: github.com/go-git/go-git/v5
    Found in: github.com/go-git/go-git/v5@v5.5.1
    Fixed in: github.com/go-git/go-git/v5@v5.11.0
    Example traces found:
Error:       #1: test/e2e/generator/generate.go:402:36: generator.gitRepoLatestReleaseVersion calls git.PlainOpenWithOptions, which calls filesystem.NewStorage
Error:       #2: test/e2e/generator/generate.go:402:36: generator.gitRepoLatestReleaseVersion calls git.PlainOpenWithOptions
Error:       #3: test/e2e/generator/generate.go:407:30: generator.gitRepoLatestReleaseVersion calls git.Repository.TagObjects

Vulnerability #2: GO-2024-2456
    Path traversal and RCE in github.com/go-git/go-git/v5 and
    gopkg.in/src-d/go-git.v4
  More info: https://pkg.go.dev/vuln/GO-2024-2456
  Module: github.com/go-git/go-git/v5
    Found in: github.com/go-git/go-git/v5@v5.5.1
    Fixed in: github.com/go-git/go-git/v5@v5.11.0
    Example traces found:
Error:       #1: test/e2e/generator/generate.go:402:36: generator.gitRepoLatestReleaseVersion calls git.PlainOpenWithOptions, which calls filesystem.NewStorage
Error:       #2: test/e2e/generator/generate.go:402:36: generator.gitRepoLatestReleaseVersion calls git.PlainOpenWithOptions
Error:       #3: test/e2e/generator/generate.go:407:30: generator.gitRepoLatestReleaseVersion calls git.Repository.TagObjects

=== Informational ===

There are 2 vulnerabilities in modules that you require that are
neither imported nor called. You may not need to take any action.
See https://pkg.go.dev/golang.org/x/vuln/cmd/govulncheck for details.

Vulnerability #1: GO-2024-2453
    Timing side channel in github.com/cloudflare/circl
  More info: https://pkg.go.dev/vuln/GO-2024-2453
  Module: github.com/cloudflare/circl
    Found in: github.com/cloudflare/circl@v1.3.1
    Fixed in: github.com/cloudflare/circl@v1.3.7

Vulnerability #2: GO-2023-[17](https://github.com/celestiaorg/celestia-core/actions/runs/7629759074/job/20786518441?pr=1179#step:5:18)65
    Leaked shared secret and weak blinding in github.com/cloudflare/circl
  More info: https://pkg.go.dev/vuln/GO-[20](https://github.com/celestiaorg/celestia-core/actions/runs/7629759074/job/20786518441?pr=1179#step:5:21)23-1765
  Module: github.com/cloudflare/circl
    Found in: github.com/cloudflare/circl@v1.3.1
    Fixed in: github.com/cloudflare/circl@v1.3.3

Your code is affected by 2 vulnerabilities from 1 module.

Share feedback at https://go.dev/s/govulncheck-feedback.
exit status 3
make: *** [Makefile:254: vulncheck] Error 1
Error: Process completed with exit code 2.
```
staheri14 added a commit that referenced this issue Mar 11, 2024
In order to fix the go vulnerabilities that are fixed in the new patch:
```
Vulnerability #1: GO-2024-2610
    Errors returned from JSON marshaling may break template escaping in
    html/template
  More info: https://pkg.go.dev/vuln/GO-2024-2610
  Standard library
    Found in: html/template@go1.22
    Fixed in: html/template@go1.22.1
    Example traces found:
      #1: test/fuzz/rpc/jsonrpc/server/handler.go:30:15: server.Fuzz calls http.ServeMux.ServeHTTP, which eventually calls template.Template.Execute
      #2: rpc/jsonrpc/server/http_server.go:256:15: server.maxBytesHandler.ServeHTTP calls http.HandlerFunc.ServeHTTP, which eventually calls template.Template.ExecuteTemplate

Vulnerability #2: GO-2024-2600
    Incorrect forwarding of sensitive headers and cookies on HTTP redirect in
    net/http
  More info: https://pkg.go.dev/vuln/GO-2024-2600
  Standard library
    Found in: net/http@go1.22
    Fixed in: net/http@go1.22.1
    Example traces found:
      #1: rpc/jsonrpc/client/http_json_client.go:213:34: client.Client.Call calls http.Client.Do
      #2: libs/cli/setup.go:89:26: cli.Executor.Execute calls cobra.Command.Execute, which eventually calls http.Client.Get
      #3: p2p/upnp/upnp.go:205:20: upnp.getServiceURL calls http.Get

Vulnerability #3: GO-2024-2599
    Memory exhaustion in multipart form parsing in net/textproto and net/http
  More info: https://pkg.go.dev/vuln/GO-2024-2599
  Standard library
    Found in: net/textproto@go1.22
    Fixed in: net/textproto@go1.22.1
    Example traces found:
      #1: rpc/jsonrpc/server/http_server.go:62:16: server.Serve calls http.Server.Serve, which eventually calls textproto.Reader.ReadLine
      #2: rpc/jsonrpc/server/http_server.go:62:16: server.Serve calls http.Server.Serve, which eventually calls textproto.Reader.ReadMIMEHeader

Vulnerability #4: GO-2024-2598
    Verify panics on certificates with an unknown public key algorithm in
    crypto/x509
  More info: https://pkg.go.dev/vuln/GO-2024-2598
  Standard library
    Found in: crypto/x509@go1.22
    Fixed in: crypto/x509@go1.22.1
    Example traces found:
      #1: libs/autofile/group.go:479:30: autofile.GroupReader.Read calls bufio.Reader.Read, which eventually calls x509.Certificate.Verify

Your code is affected by 4 vulnerabilities from the Go standard library.

```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T:enhancement Type: Enhancement
Projects
No open projects
Development

No branches or pull requests

5 participants