Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spontaneous blockchain corruption #198

Closed
dooglus opened this issue Feb 3, 2014 · 16 comments
Closed

spontaneous blockchain corruption #198

dooglus opened this issue Feb 3, 2014 · 16 comments
Assignees

Comments

@dooglus
Copy link
Contributor

dooglus commented Feb 3, 2014

I don't know what just happened, but it's not good...

I upgraded from v1.4 to v1.5 recently. The first time I tried, it appeared to work OK but when I restarted it complained of a read error checking the block index. So I ran again with -reindex, and it was then fine for over 24h.

Now, however, when I ran a 'listaccounts' (which I do often) I see one of my accounts is negative. This never should happen, since I wait for accounts to become positive (via receiving deposits) then 'move' their balance to the default account, leaving the balance at zero. I run 'listaccounts' with an argument of 10, meaning it only counts deposits with 10 or more confirmations.

So one account is showing negative. "listtransactions" on that account shows me that two of the deposits have 0 confirmations. But dogechain.info tells me they were confirmed a couple of days ago. They're both in the same block.

Comparing the output of "getblockhash 80086" on two different machines, I see the same hash as dogechain.info reports. So it's not a fork issue, or a non-synced issue.

Then running "getblock 4fae16eaa18cded69c88ec88a8c3b8e86b2c2b980a1d1001ee4c07cba5b7799f" (that's the block hash) on the affected machine, I see:

{
    [...]
    "tx" : [
        "b36370558341999b24e1e8bb99ad6e3af195e112d79c2fec5ffb8cc7d8b61632",
        "b1a1cc8c8d5176a277f92062f6b1fb687ac7f2781876b9bf2a51adf3c3f58837",
        "659d599b203f1ed04003926da46210f29919c834c02001ba6d55ec2bf941bff6",
        [...]
        "f4c1151d174cf47c979c101a7f03547844bc110a697ba639f7dc46635ecc75fc",
        "6e480466b309f7a93acd32e194b06193f048e9e7cba5a0dd68736f3785992da6",
        "d21633ba23f70118185227be58a63527675641ad37967e2aa461559f577aec43",
        "d21633ba23f70118185227be58a63527675641ad37967e2aa461559f577aec43",
        "d21633ba23f70118185227be58a63527675641ad37967e2aa461559f577aec43",
        [...]
        "d21633ba23f70118185227be58a63527675641ad37967e2aa461559f577aec43",
        "d21633ba23f70118185227be58a63527675641ad37967e2aa461559f577aec43",
        "d21633ba23f70118185227be58a63527675641ad37967e2aa461559f577aec43"
    ],
    [...]
}

whereas on a working server I see:

{
    [...]
    "tx" : [
        "b36370558341999b24e1e8bb99ad6e3af195e112d79c2fec5ffb8cc7d8b61632",
        "b1a1cc8c8d5176a277f92062f6b1fb687ac7f2781876b9bf2a51adf3c3f58837",
        "659d599b203f1ed04003926da46210f29919c834c02001ba6d55ec2bf941bff6",
        [...]
        "f4c1151d174cf47c979c101a7f03547844bc110a697ba639f7dc46635ecc75fc",
        "c49db4133ce7e738ae182e4d47d1b11d29673594f379c887e377249438cd3154",
        "992ff8d415081366d60feaef79e703c0305e4dd4524777efd7ccc78e6e2b3d32",
        "65d0cb5f073931cb8117c49ef1462948add9fdc6badd2ea8d99b4bfc98a54641",
        "b37d3104d219469dcf1557a22bbad41b434eba6c1b104581dae792f51c4a725f",
        [...]
        "52980b4f19fce5fb8c388021cba5890ceca80b9b5a5b31b807d0e9ed853380ed",
        "eb8594241487e6f29f023130358d7539f8ad42a691d6588369728d8bdf740562",
        "68327ea6dff395441d3ed64166d885f4e34b9b12248fe7e86e0a4ad1b0e2418a"
    ],
    [...]
}

ie. the last 28 transactions in the list are the same, and the one before the last 28 is wrong.

The tx list on both servers includes the two transactions which are appearing unconfirmed on my local server.

Do I need to reindex the blockchain? Or redownload it?

@dooglus
Copy link
Contributor Author

dooglus commented Feb 3, 2014

Checking back through the debug.log, I see a bunch of errors that were apparently allowed to pass unchecked:

2014-02-02 21:05:23 AddToWallet ca5d4092259cd9893d577e61b9b001d57a286cbe0d207216099c6bd8e57e7fe0  
2014-02-02 21:05:23 AddToWallet 20f4f8989c56392f328b43af03747a07efceefc1ddd677a8a185b87c055e1f41  
2014-02-02 21:05:23 AddToWallet d00fe7a8d7ae0906ccd7c4f4149f5425365d921c9d7aee686e3031188f684dd4  
2014-02-02 21:05:23 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error
2014-02-02 21:05:23 AddToWallet 659d599b203f1ed04003926da46210f29919c834c02001ba6d55ec2bf941bff6  update
2014-02-02 21:05:23 AddToWallet 54b67da306f7967b3187a8972dffe11991167f3a80202615bd5037cd9aa18b3a  update
2014-02-02 21:05:23 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error
2014-02-02 21:05:23 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error
2014-02-02 21:05:23 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error
2014-02-02 21:05:23 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error
2014-02-02 21:05:23 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error
2014-02-02 21:05:23 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error
2014-02-02 21:05:23 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error

I guess this is related to the corruption I'm now seeing.

@dooglus
Copy link
Contributor Author

dooglus commented Feb 3, 2014

Is there a safe way to go back from 1.5 to 1.4? Is the wallet.dat from 1.5 compatible with 1.4?

It's worrying seeing corruption like this happen in 1.5.

@billym2k
Copy link
Contributor

billym2k commented Feb 3, 2014

That's pretty gross. The wallets should be completely compatible.

This is the same type of issue you were running into in your comments in #159 ?

@dooglus
Copy link
Contributor Author

dooglus commented Feb 3, 2014

This bit is identical:

ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error

but in one case it causes dogecoind to fail to start up, whereas in the other case the error is pretty much ignored (other than being logged) and causes incorrect results to be reported.

I have a few different wallets which I often switch between. I make sure to close down dogecoind before switching the wallet.dat files in and out. It seems to be when I put a new one in place and run dogecoind that this error first happens, when it's rescanning the new blocks (new to this wallet, that is) at the start. (Each wallet 'remembers' how many blocks were in the chain the last time it was used, so dogecoind knows which blocks it has to scan for new transactions when you start it up).

@dooglus
Copy link
Contributor Author

dooglus commented Feb 3, 2014

I have switched back to version 1.4 now. At first I couldn't get it to start up. I kept seeing this:

Loading block index...
dbenv.open LogDir=/home/chris/.dogecoin/database ErrorFile=/home/chris/.dogecoin/db.log


************************
EXCEPTION: 11DbException       
Db::open: Invalid argument       
dogecoin in AppInit()       

when using the .dogecoin/ directory as v1.5 had left it. But fortunately I made a backup of ~/.dogecoin/ before running v1.5 for the first time, so restored from that backup and then v1.4 would start up happily. I'm having to sync 2 days' worth of blocks from the network again, but that's to be expected.

I guess things will be back to normal once that sync is done.

@xtech9
Copy link

xtech9 commented Feb 3, 2014

Looks a bit like it could be file permission problem.

@voidref
Copy link
Contributor

voidref commented Feb 3, 2014

The blockchain db was switched from Berkeley DB to Level DB in the change from 1.4 to 1.5.

This wasn't our choice, it was a part of the update to the 0.8x litecoin core.

So the 1.4 and 1.5 blockchains are not compatible, but the wallet.dat should be, as Billy said already.

@asadhaider
Copy link
Contributor

Jesus, I'm glad I haven't upgraded yet on my live wallet.

@dooglus what about setting checklevel to a higher value?

@dooglus
Copy link
Contributor Author

dooglus commented Feb 3, 2014

@asadhaider The problem seems to be noticed when reading the db already. ie. the level is high enough. Notice it's unable to 'ReadFromDisk' or 'deserialize', which suggests it's not an internal consistency error, but rather some more fundamental corruption.

I should have posted this log, which is how I initially noticed the problem:

http://pastie.org/8692929

Checking those transaction IDs on dogechain.info shows they are confirmed.

I finally switched back to v1.4. It wouldn't open the blockchain, so I went back to an old backup of the blockchain I have, written by v1.4. It took hours to sync the last 2 days' worth of blocks, and once it had, account balances were still wrong. I was using the wallet that had been used by v1.5. Then I did a 'dogecoind -rescan' and that fixed the account balances.

Now the only sign that anything's still wrong is that I see things like this from 'listtransaction':

{
    "account" : "46881",
    "category" : "move",
    "time" : 1391353718,
    "amount" : -90000.00000000,
    "otheraccount" : "",
    "comment" : "\u0000\u0001\u0001n\u000597550"
},
{
    "account" : "46881",
    "address" : "DRhva2WmSyzs7irPPhcdLndjxFhBn4dGZP",
    "category" : "receive",
    "amount" : 90000.00000000,
    "confirmations" : 894,
    "blockhash" : "5859bfb87a98181f2d8af067699438300aa3c3b124ea139d60b0b609fc67d0e8",
    "blockindex" : 1,
    "txid" : "93ae13d20ffd35d63daca6703d0064b1822046e3293fab476f79d6d44abd96c6",
    "time" : 1391394297,
    "n" : "320360",
    "timesmart" : "1391394297"
},

I'm pretty sure I never saw garbage in the comment field before. And I don't remember seeing "timesmart" before, either. I guess these fields were written to the wallet by v1.5, and they've not gone away when switching back to v1.4. I need the 'move' transactions I made using v1.5 to be in my wallet, so can't just go back to the v1.4 backup of the wallet unless I manually re-add all the move transactions back in.

@ummjackson
Copy link
Contributor

Are you still experiencing this issue on 1.5.1 @dooglus ?

@ummjackson ummjackson self-assigned this Feb 9, 2014
@dooglus
Copy link
Contributor Author

dooglus commented Feb 9, 2014

I'm currently running v1.5 still, and haven't had the issue since I went back to a backup of my v1.4 data directory, deleted just the blkindex.dat file and then ran v1.5. I kept the old blk0001.dat and blk0002.dat files because I didn't want to have to redownload all the blocks again.

It seems possible that my blkindex.dat file was corrupted and caused a bad upgrade, but I'm not sure.

@dooglus
Copy link
Contributor Author

dooglus commented Feb 9, 2014

I upgraded to v1.5.1 on the live server, stopped v1.5, deleted blkindex.dat and the blocks/ and chainstate/ directories, then started v1.5.1.

I see:

$ grep -C5 -i -e 'ReadFromDisk' -e 'Dogecoin version' ~/.dogecoin/debug.log 





2014-02-09 20:41:03 Dogecoin version v1.5.1.0 (2013-12-15 13:31:28 -0500)
2014-02-09 20:41:03 Using OpenSSL version OpenSSL 1.0.1e 11 Feb 2013
2014-02-09 20:41:03 Default data directory /home/.../.dogecoin
2014-02-09 20:41:03 Using data directory /home/.../.dogecoin
2014-02-09 20:41:03 Using at most 125 connections (12000 file descriptors available)
2014-02-09 20:41:03 Using 4 threads for script verification
--
2014-02-09 21:36:04 SetBestChain: new best=c0fa3a9fcc60bf3ccc37c25f3b80491bd6785cdb0c4139c2c86fe9b77363c81b  height=93904  log2_work=57.706217  tx=7042406  date=2014-02-09 21:35:39 progress=1.000000
2014-02-09 21:36:04 ProcessBlock: ACCEPTED
2014-02-09 21:36:04 getblocks -1 to 0000000000000000000000000000000000000000000000000000000000000000 limit 500
2014-02-09 21:36:04 getblocks -1 to 0000000000000000000000000000000000000000000000000000000000000000 limit 500
2014-02-09 21:36:04 getblocks -1 to 0000000000000000000000000000000000000000000000000000000000000000 limit 500
2014-02-09 21:36:05 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error
2014-02-09 21:36:05 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error
2014-02-09 21:36:05 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error
2014-02-09 21:36:05 getblocks 68650 to 0000000000000000000000000000000000000000000000000000000000000000 limit 500
2014-02-09 21:36:05   getblocks stopping at limit 69149 5e6ef06c8508d1357ca4339a8f5df752509a6b980128a9a92b7e4a5161d2c44a
2014-02-09 21:36:05 received getdata (4 invsz)
2014-02-09 21:36:05 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error
2014-02-09 21:36:05 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error
2014-02-09 21:36:05 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error
2014-02-09 21:36:05 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error
2014-02-09 21:36:05 getblocks 69822 to 9db974b42dc0c7f160ed41fba951980a473af9e21d795891b68b532b914a3a7c limit 500
2014-02-09 21:36:05   getblocks stopping at limit 70321 8eee09e3ac0198ad76d64d894e28a16ccee2fd6e15dcad56b2776f32180f629d
2014-02-09 21:36:05 received getdata for: block 99cbc30cdc379fb3f06f61c8cea70b4317eca196bcdec9947cb2da29d11580a6
2014-02-09 21:36:05 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error
2014-02-09 21:36:05 received getdata (10 invsz)
2014-02-09 21:36:05 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error
2014-02-09 21:36:05 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error
2014-02-09 21:36:05 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error
2014-02-09 21:36:05 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error
2014-02-09 21:36:05 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error
2014-02-09 21:36:05 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error
2014-02-09 21:36:05 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error
2014-02-09 21:36:05 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error
2014-02-09 21:36:05 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error
2014-02-09 21:36:05 ERROR: bool CBlock::ReadFromDisk(const CDiskBlockPos&)() : deserialize or I/O error
2014-02-09 21:36:05 getblocks 69822 to dc942a7dacaab584ea8250ea04eb043671ea3a8e811d3139fc3744db6e3d036f limit 500
2014-02-09 21:36:05   getblocks stopping at limit 70321 8eee09e3ac0198ad76d64d894e28a16ccee2fd6e15dcad56b2776f32180f629d
2014-02-09 21:36:05 getblocks 69822 to 2c69ec48b6f641e40da8b3d2df6a7cc4a84644e8b89d3d74171e1e7a439d2d63 limit 500
2014-02-09 21:36:05   getblocks stopping at limit 70321 8eee09e3ac0198ad76d64d894e28a16ccee2fd6e15dcad56b2776f32180f629d
2014-02-09 21:36:05 getblocks 68651 to 0000000000000000000000000000000000000000000000000000000000000000 limit 500
$ 

So the problem is still present in v1.5.1 unfortunately.

@rnicoll
Copy link
Contributor

rnicoll commented Apr 24, 2014

@dooglus Any sign of this in 1.6/1.7?

@dooglus
Copy link
Contributor Author

dooglus commented Apr 24, 2014

I've not tried 1.7 (didn't know there was a 1.7, actually), and haven't had any trouble with 1.6.

I've tried to limit the number of times I've started and stopped the server though to reduce the number of problems I have.

@rnicoll
Copy link
Contributor

rnicoll commented Apr 24, 2014

1.7 is in beta, but ready for widespread testing, so trying to encourage anyone who's seen issues before to check 1.7 :)

Can this be closed, or want to leave it open a bit longer in case it recurs?

@dooglus
Copy link
Contributor Author

dooglus commented Apr 24, 2014

I can still post to it if it's closed, so that's good enough for me.

@rnicoll rnicoll closed this as completed Apr 24, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants