Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nonce calculation is broken for private networks #1999

Open
iwooden opened this issue Aug 31, 2017 · 65 comments

Comments

Projects
None yet
@iwooden
Copy link

commented Aug 31, 2017

Currently, nonce calculation is broken for private networks. The main problem is that transaction history is retained for an account even after switching the RPC endpoint to a new network. This can cause the nonce on a new network to be too high, resulting in Metamask transactions never going through.

To reproduce:

  1. Get access to two different private network RPC endpoints.
  2. Create a new account/address in Metamask, with no previous transaction history.
  3. Connect Metamask to the first private endpoint (select "Custom RPC" from the network selection dropdown, enter the RPC URL).
  4. Submit and sign an Ethereum transaction. Note the transaction entry in the transaction history section.
  5. Connect Metamask to the second private endpoint. Note that the transaction entry is still in the transaction history section.
  6. Submit and sign an Ethereum transaction. Note that the transaction hangs in a "pending" state in the Metamask UI.
  7. If you have access to the geth console for one of the nodes servicing the second RPC endpoint, check the txpool. Note that the transaction submitted was submitted with a nonce of 0x1, when the transaction count for the address in question is 0. The transaction will never go through.

I can see in app/scripts/lib/nonce-tracker.js that you're taking the max value between the locally calculated nonce and the transaction count for the address in question. This works for public/test networks where the expected nonce for an account can never go down, but that assumption doesn't hold for private networks when you can switch from a chain where an account's transaction count is 8 to one where the transaction count is 0.

One possible fix is clearing the "Private Network" transaction history when switching RPC endpoints. This isn't totally ideal, but does result in the correct nonce being calculated, and doesn't require you to redo any of the current nonce calculation functions.

  • Expected Behavior: Transactions go through on a new private network
  • Actual Behavior: Transactions are submitted with nonces that are too high
  • Browser Used: Chrome, Metamask version 3.9.11
  • Operating System Used: Windows 10
@szerintedmi

This comment has been minimized.

Copy link

commented Sep 3, 2017

I'm struggling with the same issue while testing.
Have you found a workaround? Even restarting the browser doesn't help sometimes.

@danfinlay

This comment has been minimized.

Copy link
Contributor

commented Sep 5, 2017

Sorry for neglecting this over the long weekend, I'll be working to fix this soon.

@iwooden

This comment has been minimized.

Copy link
Author

commented Sep 5, 2017

No problem Dan, thanks for taking the time. I investigated a little further, and it looks like this issue has to do with network ID specification for private networks.

Account history is tied to network ID. So, when you switch to a private network with a different network ID, you will get a fresh account with the correct nonce. However, if you switch to a private network with the same network ID, the account history is the same and the nonce mismatch will occur.

So one workaround for right now is to ensure that the various private networks you interact with have different network IDs. However, it would still be really nice to have the option to delete the account's history from the UI, forcing the correct nonce calculation.

danfinlay added a commit that referenced this issue Sep 5, 2017

@ghost ghost assigned danfinlay Sep 5, 2017

@ghost ghost added the in progress label Sep 5, 2017

@danfinlay

This comment has been minimized.

Copy link
Contributor

commented Sep 5, 2017

Oh, I may have misunderstood the nature of this problem. I think private networks should be responsible that they have unique IDs, and we shouldn't work too hard to ensure identically-identified networks both work with MetaMask.

@szerintedmi

This comment has been minimized.

Copy link

commented Sep 5, 2017

My guess in my case that the problem is not with the network id but the fact that I use the same accounts for different networks while testing.

  1. I execute a tx with account 0x1.. on privatechain with networkid 999 - works fine
  2. I stop private chain and launch testrpc with networkid 888
  3. execution of tx with same account 0x1.. fails on testrpc: invalid nounce. Usual metamask vodoo (change network back and forth) doesn't help, neither browser restart.

@danfinlay danfinlay removed their assignment Sep 11, 2017

@danfinlay

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2017

My guess in my case that the problem is not with the network id but the fact that I use the same accounts for different networks while testing.

Lots of people do that, so it seems unlikely that if this were the bug, it would affect so few people.

I followed your reproduction steps (albeit with testrpc on both instances), and has no problem.

To clarify: You need to specify a distinct networkId and distinct chainId for MetaMask to identify a distinct network. Could you try that and let me know if it works, @szerintedmi?

@szerintedmi

This comment has been minimized.

Copy link

commented Sep 11, 2017

I retried and I still had the issue with one of my accounts. All other accs work.
(on MetaMask v3.9.13)

I played a bit and I have a workaround.

That's what I receive:

popup.js:83532 [ethjs-rpc] rpc error with payload {"id":9271578721185,"jsonrpc":"2.0","params":["0xf893198504a817c800831e848094f99564a5786fedef72ad45a7c85c3e7c9394522a880b1eb7ea25a00000a41509c8cf00000000000000000000000000000000000000000000000000000000000000008207f2a0666e7d3e499ee4a7cec79b1d112bcb6ded924a2dc2b35d246275f50030b3c9cda0445ebcf4e72e1f37280d3521cc4b22c1c93dcdcb9562afdcbde6bc38b8a45b57"],"method":"eth_sendRawTransaction"} Error: Error: the tx doesn't have the correct nonce. account has nonce of: 24 tx has nonce of: 25
    at runCall (/usr/local/lib/node_modules/ethereumjs-testrpc/build/cli.node.js:69351:10)
    at /usr/local/lib/node_modules/ethereumjs-testrpc/build/cli.node.js:11327:24
    at replenish (/usr/local/lib/node_modules/ethereumjs-testrpc/build/cli.node.js:8420:17)
    at iterateeCallback (/usr/local/lib/node_modules/ethereumjs-testrpc/build/cli.node.js:8405:17)
    at /usr/local/lib/node_modules/ethereumjs-testrpc/build/cli.node.js:8380:16
    at /usr/local/lib/node_modules/ethereumjs-testrpc/build/cli.node.js:11332:13
    at /usr/local/lib/node_modules/ethereumjs-testrpc/build/cli.node.js:64434:16
    at replenish (/usr/local/lib/node_modules/ethereumjs-testrpc/build/cli.node.js:64381:25)
    at /usr/local/lib/node_modules/ethereumjs-testrpc/build/cli.node.js:64390:9
    at eachLimit (/usr/local/lib/node_modules/ethereumjs-testrpc/build/cli.node.js:64314:36)
(anonymous) @ popup.js:83532

When I send a tx without MetaMask it works.

It seems the nonce increases with every tx I send from the non MetaMask window:

Error: the tx doesn't have the correct nonce. account has nonce of: 24 tx has nonce of: 29
Error: the tx doesn't have the correct nonce. account has nonce of: 26 tx has nonce of: 29

Once the nonce reaches the MetaMask nonce I can send tx again through MetaMask.

@danfinlay

This comment has been minimized.

Copy link
Contributor

commented Sep 11, 2017

I'm very tempted to think that the one affected account is having problems because it specifically had transactions sent on a private chain with identical IDs, just because that's the simplest explanation I can think of.

If that's not it, I'd love to cook up an edge case for our nonce-tracker that it's failing. You may have guessed that when we compose a transaction, we try to account for locally pending transactions, and so it's possible that on a new private chain with the same ID as another, we respect the highest nonce between the network-provided one and the locally-pending-tx derived one, and increment from there.

Let me know if you come up with any theories on this, I'm chipping away on another issue that I understand at the moment, but would be happy to fix any incorrect behavior here if we could identify it.

@atomical

This comment has been minimized.

Copy link

commented Oct 8, 2017

This is happening to me too. What's the alternative for using Truffle?

@okwme

This comment has been minimized.

Copy link

commented Oct 13, 2017

happens to me from time to time. no idea why. i uninstall and reinstall metamask to fix it.

TripleSpeeder added a commit to TripleSpeeder/StandingOrderDapp that referenced this issue Oct 18, 2017

Don't set fixed networkID for local testrpc testing. Metamask will no…
…t get nonce tracking right with a fresh running testrpc when previous transactions had been performed with the same account and networkid. See discussion at MetaMask/metamask-extension#1999.
@aktary

This comment has been minimized.

Copy link

commented Oct 24, 2017

@danfinlay you said

To clarify: You need to specify a distinct networkId and distinct chainId for MetaMask to identify a distinct network.

How does one set a new chain id?

I'm having this problem too. Using the same snapshot and account locally and on a dev server, the nonces get out of sync and I can't execute transactions on the local testrpc now.

@danfinlay

This comment has been minimized.

Copy link
Contributor

commented Oct 24, 2017

How does one set a new chain id?

It is dependent on the client you are using. I believe the flag is --chainId on geth, although they haven't documented this feature.

@onetom

This comment has been minimized.

Copy link

commented Nov 13, 2017

I just got the same error message:

Error: the tx doesn't have the correct nonce. account has nonce of: 0 tx has nonce of: 1
   at runCall (/usr/local/lib/node_modules/truffle/build/chain.bundled.js:60148:10)
   at /usr/local/lib/node_modules/truffle/build/chain.bundled.js:12311:24
   at replenish (/usr/local/lib/node_modules/truffle/build/chain.bundled.js:9404:17)

I'm using

  • MetaMask 3.12.0
  • Chrome 64.0.3265.0 (Official Build) canary (64-bit)
  • truffle develop chain v4.0.1 with the hardwired "candy maple cake ..." mnemonic

If I understand correctly making a couple of transactions against an ephemeral chain, then restarting such chain will lead to this situation.

Reinstalling the MetaMask extension solved the issue "of course".

@danfinlay

This comment has been minimized.

Copy link
Contributor

commented Nov 17, 2017

There are two different issues people are reporting here:

  1. They are using private blockchains, and need to use the chainId parameter, so that MetaMask's EIP 155 compatibility works with it correctly.

  2. People are developing against a local blockchain, and they reset the blockchain while MetaMask is pointing at it. In this case, you can work around the problem by switching to another network and back again, no reinstallation needed.

@robmyers

This comment has been minimized.

Copy link

commented Nov 17, 2017

To move this off of Twitter. :-) Switching between networks isn't working for me with Truffle 4's truffle develop (which uses the same mnemonic each time) and latest Metamask in either Chromium 62 or Firefox 57 on Debian Stretch. I may be doing it wrong but I can't work out how.

An example of a failed transaction in Firefox 57 with MetaMask 3.12.0 is:

Error: [ethjs-rpc] rpc error with payload {"id":9794961516034,"jsonrpc":"2.0","params":["0xf8ac088504a817c80083012c7e94345ca3e014aaf5dca488057592ee47305d9b3e1080b844a9059cbb000000000000000000000000f17f52151ebef6c7334fad080c5704d77216b73200000000000000000000000000000000000000000000000000000000000001f48222e2a06dee046890a2f3476238691be9bced035939f1c2f3e9d71dc585719412818d08a05a3c71c9227723b4321ac44e3a013a3d6a6907712e63dfa81d98739bf604a145"],"method":"eth_sendRawTransaction"} Error: Error: the tx doesn't have the correct nonce. account has nonce of: 4 tx has nonce of: 8 at runCall (/usr/lib/node_modules/truffle/build/chain.bundled.js:60148:10) at /usr/lib/node_modules/truffle/build/chain.bundled.js:12311:24 at replenish (/usr/lib/node_modules/truffle/build/chain.bundled.js:9404:17) at iterateeCallback (/usr/lib/node_modules/truffle/build/chain.bundled.js:9389:17) at /usr/lib/node_modules/truffle/build/chain.bundled.js:9364:16 at /usr/lib/node_modules/truffle/build/chain.bundled.js:12316:13 at /usr/lib/node_modules/truffle/build/chain.bundled.js:55231:16 at replenish (/usr/lib/node_modules/truffle/build/chain.bundled.js:55178:25) at /usr/lib/node_modules/truffle/build/chain.bundled.js:55187:9 at eachLimit (/usr/lib/node_modules/truffle/build/chain.bundled.js:55111:36)
@danfinlay

This comment has been minimized.

Copy link
Contributor

commented Nov 17, 2017

Reviewing the comments in here, this issue is actually a little bigger than what I was suggesting for switching networks and back again. Sorry about the skim.

The problem here is that MetaMask calculates nonces locally on a per-network basis. That means if you connect to it to "the same network" (by ID) on two different endpoints, MetaMask will currently assume these are the same network, use its same history of successful transactions, check the nonce everywhere it can, assume the current node is behind (since MetaMask is aware of a newer tx), and it'll use the latest tx it knows of to calculate nonce.

Current Causes

What we need here is a way of detecting a new network, even when all the signs indicate it's the same network. Some of the signs that prevent MetaMask from noticing this today include:

  1. When the new network is added to the same address without notice.
  2. When the two networks share the same network ID.

Possible Solutions

  • Some kind of detection logic that works across clients.
  • Some specialized indication built into testrpc.

The second one should only be used if no good solution can be found for the first one.

Some detection strategies:

Periodically check network and chain IDs

This can only partially alleviate cause #1, because if the networks share IDs, it would go undetected.

Detect when checking nonce

Since people usually experience this when sending a TX, nonce calculation time would seem like a good time, but I'm scratching for a good, definite set of indications that the chain is different.

Tracking known blocks

A method for identifying a new network on demand could be fairly reliably implemented (as long as new chains were not identical to previous ones):

  • We could track both the genesis block and a recent block (say, 10 blocks back, to avoid forking issues)
  • When calculating nonce, we could ask the node if those two blocks have the same hashes.
  • If different hashes are returned, and node is not syncing => NEW NETWORK.

Since this method requires re-checking known blocks on the node, the question rises of when we do this. Nonce calculation is a nice failsafe, since this is when the problem first burns people, but it would be better if we could detect immediately, so we don't show wrong balance, show incorrect tx history, etc.

Re-Checking known successful transactions

There's a tricky bit around "confirmed transactions". Part of why we use locally confirmed transactions, is because MetaMask is sometimes pointed at RPC endpoints that might not be fully synchronized, and in these cases, we still want to generate valid nonces.

We could re-check our oldest known successful transaction by hash, and if it's unknown, that could also be used to signal a remote network change.

Times we could perform this procedure

  • Periodically
  • On switching connection to a provider.
  • After a period of non connection, on re-connection. (Need to determine how long)

Conclusion

I think I have a fairly actionable solution here, please add any improvements anyone can think of. It seems like we've had a big bump in developer experimentation recently, because this has definitely been behavior for the lifetime, but we've had a huge spike in actual complaints, so it seems like this feature is fairly important to some segment of users.

@danfinlay danfinlay self-assigned this Nov 17, 2017

@danfinlay

This comment has been minimized.

Copy link
Contributor

commented Nov 17, 2017

When detecting a new network with an identical ID, some special things will need to be done:

  • Deal w/ tx History
  • Restart provider & RPC cache (automatically does normal provider-switching things)

Dealing with TX history

Either:

  • Transaction history will be trashed for chain with that ID
  • We migrate to storing transaction history with a different identifier (maybe the blockchain://${genesisBlock}/${laterBlock} format, although deciding which laterBlock might be ambiguous..

Short term easy solution is to trash that chain's tx history.

@stefanhuber

This comment has been minimized.

Copy link

commented Jan 31, 2018

My workaround for now is:

  • Change network id in ganache ui and restart (sometimes ganache hangs up and i have to close and open it again)
  • truffle migrate changed contracts
  • go to browser and change to another network and afterwards back to the private network

It takes around 30-60 seconds each time i do that...

Wouldn't it be easiest to create a button clear history for accounts?

@danfinlay

This comment has been minimized.

Copy link
Contributor

commented Feb 2, 2018

We've published a new version that should auto-update soon, v3.14.1, which includes a history-clearing button in Settings that can be used as a workaround for this issue:

http://metamask.helpscoutdocs.com/article/36-resetting-an-account

reset account button

@robmyers

This comment has been minimized.

Copy link

commented Feb 2, 2018

That's awesome @danfinlay ! Thank you!

@elie222

This comment has been minimized.

Copy link

commented Feb 2, 2018

@danfinlay

This comment has been minimized.

Copy link
Contributor

commented Feb 2, 2018

Reset account sounds scary though.

That's right, we don't want normal people doing it. It's scary on purpose.

Thanks to @brunobar79 for writing the change PR!

@danfinlay

This comment has been minimized.

Copy link
Contributor

commented Feb 2, 2018

a loser could lose some important data if he clicks it which doesn't seem to be the case?

The problem is that clearing a history with pending transactions can cause the user to submit transactions with an identical nonce, which can cause all kinds of confusion. It's better if users don't. We estimate nonces under normal conditions very well.

@wbt

This comment has been minimized.

Copy link
Contributor

commented Feb 5, 2018

We estimate nonces under normal conditions very well.

I'm not quite sure of that. I am getting an error in which I use Metamask to attempt a transaction (against Ganache 1.0.1) that is (properly) reverted. The transaction count goes up one in Ganache but not in Metamask, and then the next time I attempt to send a transaction from MetaMask I get "Error: the tx doesn't have the correct nonce" where the account nonce is one larger than the transaction nonce.

The workaround is to reset the network, rerun all transactions, and rerun useless transactions to get the count up to where Metamask thinks it should be (but not higher). I can then run transactions again, but I notice that any transaction replaces the one before it at the top of the list of history for that account, instead of adding a new element to the list as it used to. I also notice that now, even without reverted transactions, Metamask no longer increases the nonce when the account has had other non-Metamask transactions, resulting in nonce mismatch errors like this:
Screenshot
Notice that the transaction number jumps quite a bit from the first to second row shown; that's because all the (successful and unsuccessful) transactions in between (done via MetaMask) were also in the first position and overwritten in the UI by the one that came after.

I suspect this is a separate issue but when looking for possible duplicates, “Nonce calculation is broken for private networks” [this issue] seems to be a pretty strong candidate. All of this testing is without the newly contributed “Reset Account” button (thanks @danfinaly), so that patch is not the cause.

@benjamincburns

This comment has been minimized.

Copy link

commented Feb 14, 2018

The transaction count goes up one in Ganache but not in Metamask

@wbt - to clarify, you observe this without restarting Ganache?

@wbt

This comment has been minimized.

Copy link
Contributor

commented Feb 14, 2018

@benjamincburns Yes, that's without restarting Ganache.

@benjamincburns

This comment has been minimized.

Copy link

commented Feb 14, 2018

@wbt can you please try this with the current beta of ganache-cli (npm install -g ganache-cli@beta) ran with the --noVMErrorsOnRPCResponse flag? (e.g. ganache-cli --noVMErrorsOnRPCResponse).

If that works, then I assume you'll like running the next beta of the Ganache UI much better than the current beta or stable release.

My suspicion is that in Ganache we report the transaction failure as an RPC error when the transaction is submitted, and MetaMask (correctly) interprets that as the transaction being rejected prior to it entering the transaction pool. If that's the case, I'd regard that as a separate issue from this one, and I wouldn't discredit MetaMask's nonce tracking because of it. We're the ones breaking the "standard," there.

@benjamincburns

This comment has been minimized.

Copy link

commented Feb 14, 2018

@danfinlay incidentally when that happens we break the JSON-RPC spec and return both an error and a result field. If you wanted to make MetaMask more tolerant to our way of doing things there, you could check whether there's a result field and increment the nonce if so.

I wouldn't blame you if you don't want to do this, however.

@danfinlay

This comment has been minimized.

Copy link
Contributor

commented Feb 14, 2018

I wouldn't blame you if you don't want to do this, however.

It's not so much that I don't want to, but that we are already stretched so thin, we greatly appreciate each non-breaking change. I would encourage Ganache to return success on submitting a tx w/ on-chain error, and allow clients to identify failure the usual way, by querying tx by hash.

@benjamincburns

This comment has been minimized.

Copy link

commented Feb 14, 2018

@danfinlay it's my intention to make that the default behavior in the next major release.

@adamskrodzki

This comment has been minimized.

Copy link

commented Dec 20, 2018

Hi,
I've faced same issue today connectiong to

xDai chain (web3.eth.getNetwork() == 100) (https://dai.poa.network)

I've got nounce from Ethereum Main Network

uninstalling metamask and installing again helped.

@wbt

This comment has been minimized.

Copy link
Contributor

commented Jan 31, 2019

I'm also getting this error ("[ethjs-rpc] rpc error with payload...") again today on a long-running private network that I can't reset. Resetting the account no longer helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.