More reasons Wallet stops adding new blocks to chain on Testnet #1259

jarlfr · 2016-05-17T17:10:41Z

Here below are some more reasons I found when analyzing that will cause bitcoinj wallet get stuck and stop following the correct block chain on Testnet. We experienced a lot of problems like this on testnet and my wish was to fix them or rule out that these things could happen in production on mainnet.

I present some ideas for action but more thoughts are needed. Testnet forks can be very long, to the degree it will run beyond what can be handled with SPVBlockStore, so we need to think about this. A solution for a very low spec (memory/disk) wallet could maybe do a two-pass download from peers. (I am unsure but does bitcoinj make use getheaders in normal catch up)

Peers do not recognize hashes in `getblock`-message

The hashes sent in the getblock message should help a peer to decide which blocks the client needs to catch up with the chain. I noticed that Peers response to getblocks was to send the 500 first blocks of the block chain. These are not to much use of course. It will not change the state of of the chain head in the wallet's store so the wallet will be stuck.

Analysis: The bitcoinj seams to fill in 100 hashes starting from its chain head in a linear fashion, and if all are on a fork that was discarded, the Peer cannot find any common block except for the genesis block, thus it starts there in the reply inv-message.

Action: Use better set of hashes from the known blocks in the store (5000 for SPVBlockStore). A better selection is proposed on the bitcoin wiki: "dense to start, but then sparse". This helps but I ran into the next problem:

`getdata` 500 blocks do not trigger re-organize despite head on a dead fork

Requesting blocks with better hashes can still leave the the wallet store chain head unchanged. This will result in the same request for blocks again, and the store head is effectively stuck.

Analysis: This happens when the downloaded blocks, despite belong to the correct chain, will not trigger a re-organize despite the head is a dead chain. Why? It seems the special difficulty jumps on testnet can make a branch of blocks have less total work despite being very much longer, e.g. several 100 blocks longer than head. For some reason the network selected this longer chain for many blocks.

Action: Not sure here: one way is to follow the current rules. In these cases we need to download many blocks (1000s) to trigger a re-org. To make that happen the store needs to track not only chain head but also what, at this point, looks like a fork (less total work) to send a different getblocks and getdata to get more blocks it has not downloaded. Note that getblocks can easily be used to ask for multiple branches in one request. But, without extending the SPVBlockStore this solution does not work as we run out of space (currently to manage the reorg bitcoinj seems to need all blocks in both branches back to the split point).
Another way is maybe to discard blocks back to the split point and try to restart with getblocks to peers (but this would trust this peer more than those that resulted in the current head). Does the transaction confidence model allow this two step operation: first lowering head and total work, and then following another branch that eventually will reach higher total work much later? Currently I assume total work can only increase. I general, tx confidence changes for these deep reorg events seems very hard to handle in applications anyway. Any ideas?

To sum up, these particular finds are very unlikely to happen on the main network. It still would be nice if testnet could be made to work reliably with bitcoinj.

The text was updated successfully, but these errors were encountered:

schildbach · 2016-05-23T16:31:40Z

Thanks a lot for your analysis! I replied to the mailing list as that's a better place for discussions.

dcw312 · 2016-08-13T16:20:42Z

Is this issue still open? I ask because it looks like an interesting problem to research.

schildbach · 2016-08-13T17:05:14Z

I think yes. There has been a bit of discussion on the mailing list (see topic started May 23) but I must admit I didn't really decide what to do. If you want to help, that's great! Posting your ideas/thoughts to the mailing list would be a good start.

dcw312 mentioned this issue Aug 27, 2016

Delegate the process of creating the block locator related to 1259 #1292

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More reasons Wallet stops adding new blocks to chain on Testnet #1259

More reasons Wallet stops adding new blocks to chain on Testnet #1259

jarlfr commented May 17, 2016

schildbach commented May 23, 2016

dcw312 commented Aug 13, 2016

schildbach commented Aug 13, 2016

More reasons Wallet stops adding new blocks to chain on Testnet #1259

More reasons Wallet stops adding new blocks to chain on Testnet #1259

Comments

jarlfr commented May 17, 2016

Peers do not recognize hashes in getblock-message

getdata 500 blocks do not trigger re-organize despite head on a dead fork

schildbach commented May 23, 2016

dcw312 commented Aug 13, 2016

schildbach commented Aug 13, 2016

Peers do not recognize hashes in `getblock`-message

`getdata` 500 blocks do not trigger re-organize despite head on a dead fork