You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here below are some more reasons I found when analyzing that will cause bitcoinj wallet get stuck and stop following the correct block chain on Testnet. We experienced a lot of problems like this on testnet and my wish was to fix them or rule out that these things could happen in production on mainnet.
I present some ideas for action but more thoughts are needed. Testnet forks can be very long, to the degree it will run beyond what can be handled with SPVBlockStore, so we need to think about this. A solution for a very low spec (memory/disk) wallet could maybe do a two-pass download from peers. (I am unsure but does bitcoinj make use getheaders in normal catch up)
Peers do not recognize hashes in getblock-message
The hashes sent in the getblock message should help a peer to decide which blocks the client needs to catch up with the chain. I noticed that Peers response to getblocks was to send the 500 first blocks of the block chain. These are not to much use of course. It will not change the state of of the chain head in the wallet's store so the wallet will be stuck.
Analysis: The bitcoinj seams to fill in 100 hashes starting from its chain head in a linear fashion, and if all are on a fork that was discarded, the Peer cannot find any common block except for the genesis block, thus it starts there in the reply inv-message.
Action: Use better set of hashes from the known blocks in the store (5000 for SPVBlockStore). A better selection is proposed on the bitcoin wiki: "dense to start, but then sparse". This helps but I ran into the next problem:
getdata 500 blocks do not trigger re-organize despite head on a dead fork
Requesting blocks with better hashes can still leave the the wallet store chain head unchanged. This will result in the same request for blocks again, and the store head is effectively stuck.
Analysis: This happens when the downloaded blocks, despite belong to the correct chain, will not trigger a re-organize despite the head is a dead chain. Why? It seems the special difficulty jumps on testnet can make a branch of blocks have less total work despite being very much longer, e.g. several 100 blocks longer than head. For some reason the network selected this longer chain for many blocks.
Action: Not sure here: one way is to follow the current rules. In these cases we need to download many blocks (1000s) to trigger a re-org. To make that happen the store needs to track not only chain head but also what, at this point, looks like a fork (less total work) to send a different getblocks and getdata to get more blocks it has not downloaded. Note that getblocks can easily be used to ask for multiple branches in one request. But, without extending the SPVBlockStore this solution does not work as we run out of space (currently to manage the reorg bitcoinj seems to need all blocks in both branches back to the split point).
Another way is maybe to discard blocks back to the split point and try to restart with getblocks to peers (but this would trust this peer more than those that resulted in the current head). Does the transaction confidence model allow this two step operation: first lowering head and total work, and then following another branch that eventually will reach higher total work much later? Currently I assume total work can only increase. I general, tx confidence changes for these deep reorg events seems very hard to handle in applications anyway. Any ideas?
To sum up, these particular finds are very unlikely to happen on the main network. It still would be nice if testnet could be made to work reliably with bitcoinj.
The text was updated successfully, but these errors were encountered:
I think yes. There has been a bit of discussion on the mailing list (see topic started May 23) but I must admit I didn't really decide what to do. If you want to help, that's great! Posting your ideas/thoughts to the mailing list would be a good start.
Here below are some more reasons I found when analyzing that will cause bitcoinj wallet get stuck and stop following the correct block chain on Testnet. We experienced a lot of problems like this on testnet and my wish was to fix them or rule out that these things could happen in production on mainnet.
I present some ideas for action but more thoughts are needed. Testnet forks can be very long, to the degree it will run beyond what can be handled with SPVBlockStore, so we need to think about this. A solution for a very low spec (memory/disk) wallet could maybe do a two-pass download from peers. (I am unsure but does bitcoinj make use getheaders in normal catch up)
Peers do not recognize hashes in
getblock
-messageThe hashes sent in the getblock message should help a peer to decide which blocks the client needs to catch up with the chain. I noticed that Peers response to getblocks was to send the 500 first blocks of the block chain. These are not to much use of course. It will not change the state of of the chain head in the wallet's store so the wallet will be stuck.
Analysis: The bitcoinj seams to fill in 100 hashes starting from its chain head in a linear fashion, and if all are on a fork that was discarded, the Peer cannot find any common block except for the genesis block, thus it starts there in the reply inv-message.
Action: Use better set of hashes from the known blocks in the store (5000 for SPVBlockStore). A better selection is proposed on the bitcoin wiki: "dense to start, but then sparse". This helps but I ran into the next problem:
getdata
500 blocks do not trigger re-organize despite head on a dead forkRequesting blocks with better hashes can still leave the the wallet store chain head unchanged. This will result in the same request for blocks again, and the store head is effectively stuck.
Analysis: This happens when the downloaded blocks, despite belong to the correct chain, will not trigger a re-organize despite the head is a dead chain. Why? It seems the special difficulty jumps on testnet can make a branch of blocks have less total work despite being very much longer, e.g. several 100 blocks longer than head. For some reason the network selected this longer chain for many blocks.
Action: Not sure here: one way is to follow the current rules. In these cases we need to download many blocks (1000s) to trigger a re-org. To make that happen the store needs to track not only chain head but also what, at this point, looks like a fork (less total work) to send a different
getblocks
andgetdata
to get more blocks it has not downloaded. Note thatgetblocks
can easily be used to ask for multiple branches in one request. But, without extending the SPVBlockStore this solution does not work as we run out of space (currently to manage the reorg bitcoinj seems to need all blocks in both branches back to the split point).Another way is maybe to discard blocks back to the split point and try to restart with
getblocks
to peers (but this would trust this peer more than those that resulted in the current head). Does the transaction confidence model allow this two step operation: first lowering head and total work, and then following another branch that eventually will reach higher total work much later? Currently I assume total work can only increase. I general, tx confidence changes for these deep reorg events seems very hard to handle in applications anyway. Any ideas?To sum up, these particular finds are very unlikely to happen on the main network. It still would be nice if testnet could be made to work reliably with bitcoinj.
The text was updated successfully, but these errors were encountered: