-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gRPC Tx simulation sometimes fails with account sequence mismatch #11597
Comments
IIRC, simulation uses the CheckTx state that is built up during a block prior to being committed. The idea that there is high volume here, at least per (relayer) account, can indicate that there is a lag between state that is in CheckTx and ultimately committed. I don't totally have a good answer to provide here in terms of remediation, but the whole "nonce mismatch" due to high volume accounts has been well reported over time. The only thing I can suggest is to wait a block or two prior to submitting txs or use multiple accounts round-robin style. |
thanks @alexanderbez
But if that's the case wouldn't we see errors during execution (
I tried to find related issues and wasn't able. Could you point me to any? Maybe it will help me understand better the root cause.
Could you please clarify? I don't understand how it would help. |
Both |
sorry for the late update. After testing more, experimenting with different changes in hermes, talking to @marbar3778, opened informalsystems/hermes#2249 with my understanding and a proposed workaround that seems to work for hermes. |
This is extremely trivial and easy: diff --git a/x/auth/ante/sigverify.go b/x/auth/ante/sigverify.go
index c77632a7d0..2fb07dda4f 100644
--- a/x/auth/ante/sigverify.go
+++ b/x/auth/ante/sigverify.go
@@ -265,7 +265,7 @@ func (svd SigVerificationDecorator) AnteHandle(ctx sdk.Context, tx sdk.Tx, simul
}
// Check account sequence number.
- if sig.Sequence != acc.GetSequence() {
+ if sig.Sequence != acc.GetSequence() && !simulate { |
@marbar3778 is this what you had in mind? |
Another possibility discussed with @marbar3778, @sergio-mena and @adizere is to have Simulate gRPC return an error during recheck Tx. Would require a new abci flag to signal the last rechecked Tx. |
Mhhh simulation is just an ABCI query. What does that have to do with the ReCheckTx flow? |
We want to be able to tell if a Simulation error happens while mempool is doing recheck tx. hermes caches the account sequence and is not aware of when recheck happens. Currently when we get the account mismatch errors during simulation, we dig out in the error log and assume that an account mismatch error with got > expected is caused by this scenario which may not be the case. |
So you're saying that you think that Hermes is getting the sequence mismatch error due to the Simulation query happening concurrently with ReCheckTx for that same tx? If so, I have no Idea how the Simulation query would know if ReCheckTx is happening concurrently for that tx...or even in general? |
Not for the same Tx, a new one. When hermes simulates a |
So I brought up an approach today during today's SDK community call that I think was confirmed would work as a reasonable solution. The idea is to bypass the nonce check during simulation. I've pushed a demo PR for experimentation purposes HERE. Please test it out and lmk what you think. If it works, then we can push it through. |
Thanks @alexanderbez This partially solves the problem. The issues I see is in the case simulation failures (without your changes) are legitimate, for example same wallet has been used outside the relayer. In this case we get a simulation error of type I did run some test with your PR and I do see less simulation errors. Will do more testing but in summary, it is better except in the cases above. We should probably also look if we break in other ways hermes, other relayers, or other user flows. |
solved with #18641 |
Great work SDK team! We'll be tracking integration in Hermes against this feature in informalsystems/hermes#3763. |
You guys will love this feature I think. It should make relayers drastically simpler IMO |
Summary of Bug
gRPC Tx simulation sometimes fails with account sequence mismatch. This is seen with a higher number of Tx-es spawning multiple blocks.
Version
v0.45.1
(running gaia v7.0.0)
Steps to Reproduce
To reproduce I run hermes relayer with a high number of transactions. In the log below there are 526 IBC messages generated. Hermes bundles 15 per transaction and performs a tx simulate followed by
broadcast_tx_sync
. Normally if simulation fails hermes breaks the loop but I modified it here to continue. The error in this run is:But then the next simulate seems perfectly fine, the simulate for seg 1513 works fine.
The symptom is that the simulation context sometimes "forgets" previous calls but recovers quickly after. Not sure if there are some caching glitches (maybe around block boundary) or we are using the gRPC wrongly. If the former I expect this to be reproducible with any type of transactions.
In this run:
Below are more logs up acc seq 1513 (all the rest after were fine)
For Admin Use
The text was updated successfully, but these errors were encountered: