Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bitcoin SV Endpoint stuck at block 743584 #910

Closed
MerlinB opened this issue Jun 15, 2022 · 34 comments
Closed

Bitcoin SV Endpoint stuck at block 743584 #910

MerlinB opened this issue Jun 15, 2022 · 34 comments

Comments

@MerlinB
Copy link

MerlinB commented Jun 15, 2022

Since a few days your Bitcoin SV endpoint shows no new data and is stuck at block 743584. Please fix this!

@Har01d
Copy link
Member

Har01d commented Jun 16, 2022

Unfortunately both our main and reserve node got corrupted (Error: Error: A fatal internal error occurred) for some reason while we didn't experience any technical issues like power outages. First we tried to reindex (which helps when the node database becomes corrupted) but no luck. Right now we're syncing from the scratch, it seems like it will take a day or two more.

The problem is that we don't understand the root cause of the issue and there's nothing we can really do about it as we need a node close to our database servers as the blocks are very big and it takes time to download them, so we can't use an external node provider. I also hope that this is not some SV-specific issue as the database is now over 4 terabytes and maybe there's some LevelDB or some other limit being hit.

Please also note that we'll be dropping full support with guaranteed data consistency for Bitcoin SV in the coming weeks. Please use another Bitcoin SV data provider if it's crucial to your business. The announcement will be made through our API versioning system: https://blockchair.com/api/docs#link_M06

I'll update this issue once there's some new information.

@cyrrile-mec
Copy link

Hi!
Is there an update for this issue?

Thanks!

@Har01d
Copy link
Member

Har01d commented Jun 17, 2022

No update yet unfortunately as the sync process takes ages considering the blockchain size and that there are very few nodes close to our server's location (and it seems like there are very few nodes in general, see https://blockchair.com/bitcoin-sv/nodes). Our developers estimate the sync would take at least a day more.

@brad1121
Copy link

Can you confirm your excessiveblocksize is set higher than 4GB? Say 10GB (in bytes)?
Ensure you have enough ram ( >64GB ) and mempool allowance (maxmempool=25000 ) for example.
Check you are not out of disk space, Full chain with txindex requires at least 4TB as of writing.

If you reach out to me privately I can tell some of our nodes to add yours to assist with better connection to quality peers.

@IntNulls
Copy link

Wow. I don't see how BSV can expect adoption with issues like this happening. Block explorers are very few for BSV and with only 2 functioning right now. I can't see this being a good thing for anyone considering running a node for their business/enterprise environment (trust me, they want their own node and rather not rely on someone else).

Hope this gets fixed soon, BC is an invaluable resource for information as other block explorers don't have the level of detail BC does.

@Daibp
Copy link

Daibp commented Jun 17, 2022

Hello, I send BSV to my d'cent wallet. D'cent tracks blockchair and so the BSV is not showing up on my d'cent wallet.

It is on the blockchain when I check whatsoncain.

When your update gets done will it then show up in my wallet on wallets like d'cent that check d'cent? Otherwise I'm kinda stuffed.

@Har01d
Copy link
Member

Har01d commented Jun 18, 2022

Can you confirm your excessiveblocksize is set higher than 4GB? Say 10GB (in bytes)? Ensure you have enough ram ( >64GB ) and mempool allowance (maxmempool=25000 ) for example. Check you are not out of disk space, Full chain with txindex requires at least 4TB as of writing.

There's obviously some database-related error (ERROR: ReadBlockFromDisk: Deserialize or I/O error - CAutoFile::read: end of file: iostream error at CBlockDiskPos for our main node, something similar was for our reserve one a bit while ago, but we don't have the log).

The puzzle is that this issue should be connected to either a hard drive failure or an unclean shutdown, but we have registered neither. We run other nodes on these servers and we don't observe any issues with them.

And ofc we've checked some obvious stuff like if we have enough disk space :) We're running Bitcoin-like nodes for many years and have seen some shit. This may be just a very unlikely coincidence of two different hard drives failing though.

Wow. I don't see how BSV can expect adoption with issues like this happening. Block explorers are very few for BSV and with only 2 functioning right now. I can't see this being a good thing for anyone considering running a node for their business/enterprise environment (trust me, they want their own node and rather not rely on someone else).

I think the main issue with BSV in general is that it's not quite possible to predict the blockchain size at all. The current limit seems to be 4 GB and people were actively testing to hit it. So potentially it's 4 * 144 = 576 GB of blockchain data every day. Plus indexes. Plus services like block explorers run their own database (we run even two for extra speed and analytics). So for Blockchair this is potentially up to 60 terabytes a month just with the current limit (which is expected to get increased).

The second important issue is that if it was some useful data like real transactions, real people would come to block explorers to see their transactions, businesses would buy API subscriptions, so we'd be able to cover the disk costs, the development costs, the cost of trying to figure our how to fit 10 exabytes into Postgres (not very trivial I think), etc.

But the reality is that 99.99% or so of Bitcoin SV transactions are junk, so despite being the biggest Bitcoin-like blockchain with most transactions, Bitcoin SV constitutes only 0.3% of our visitor numbers and there are very few API clients using Bitcoin SV (0.2% of all API requests most of which are free API calls for the stats). Unfortunately, this doesn't cover all these costs. So that's why we can't run more than 2 nodes, and even these two nodes will get stuck at some point because we'll go bankrupt buying all these disks to store the junk data. But we're trying our best :)

With this amount of junk data I just don't see a business model for a BSV explorer which would work in the long term (maybe an explorer run by a miner?). The same goes for exchanges for example I think. If you have to buy 10 racks of servers to validate the blockchain, but you only have 10 clients paying trading fees, you'll go bankrupt.

When your update gets done will it then show up in my wallet on wallets like d'cent that check d'cent? Otherwise I'm kinda stuffed.

Yes, we hope so. The sync process is somewhere around block #730000, and I hope it won't result in another corrupted database. We'll keep everyone updated.

@IntNulls
Copy link

IntNulls commented Jun 18, 2022

@Daibp

Hello, I send BSV to my d'cent wallet. D'cent tracks blockchair and so the BSV is not showing up on my d'cent wallet.

It is on the blockchain when I check whatsoncain.

When your update gets done will it then show up in my wallet on wallets like d'cent that check d'cent? Otherwise I'm kinda stuffed.

See if you can get the private keys to the BSV wallet (IME, multi-coin wallets like Atomic have privkeys for every coin, so you can import it to another wallet) and import them to another wallet. You should be able to see your correct balance then.

@Daibp
Copy link

Daibp commented Jun 19, 2022

@Daibp

Hello, I send BSV to my d'cent wallet. D'cent tracks blockchair and so the BSV is not showing up on my d'cent wallet.
It is on the blockchain when I check whatsoncain.
When your update gets done will it then show up in my wallet on wallets like d'cent that check d'cent? Otherwise I'm kinda stuffed.

See if you can get the private keys to the BSV wallet (IME, multi-coin wallets like Atomic have privkeys for every coin, so you can import it to another wallet) and import them to another wallet. You should be able to see your correct balance then.

Hello. Unfortunately, d'cent don't seem to be able to provide that (once I get this sorted I'm not using them again for that very reason!).

I'm pretty much stuck with my BSV in the ether till blockchair sort themselves out.

@Ljzn
Copy link

Ljzn commented Jun 20, 2022

But the reality is that 99.99% or so of Bitcoin SV transactions are junk, so despite being the biggest Bitcoin-like blockchain with most transactions, Bitcoin SV constitutes only 0.3% of our visitor numbers and there are very few API clients using Bitcoin SV (0.2% of all API requests most of which are free API calls for the stats). Unfortunately, this doesn't cover all these costs.

It's ok to prune the pure data outputs, for example, output with zero coin and script like "OP_RETURN data". Those output in the main part of large blocks. Unfortunately, the official node didn't provide such function out of box.

@rzadeh
Copy link

rzadeh commented Jun 20, 2022

@Har01d what version of BSV node were/are you running? The issue you describe above sounds like a known issue I had with an older version of the BSV node, a few versions ago.

Junk transactions or not, the BSV chain is carrying them, right? What other chains can do this volume today?

BC is an excellent Explorer, I use it daily, but if they are dropping full support for BSV I can recommend https://whatsonchain.com as it also has APIs etc.

@pfromberg
Copy link

@Har01d. Three days ago your IBD processes reached block #730000. They should have reached the tip of the chain by now. Can you give us an update?

@Axiantor
Copy link

Unfortunately both our main and reserve node got corrupted (Error: Error: A fatal internal error occurred) for some reason while we didn't experience any technical issues like power outages. First we tried to reindex (which helps when the node database becomes corrupted) but no luck. Right now we're syncing from the scratch, it seems like it will take a day or two more.

The problem is that we don't understand the root cause of the issue and there's nothing we can really do about it as we need a node close to our database servers as the blocks are very big and it takes time to download them, so we can't use an external node provider. I also hope that this is not some SV-specific issue as the database is now over 4 terabytes and maybe there's some LevelDB or some other limit being hit.

Please also note that we'll be dropping full support with guaranteed data consistency for Bitcoin SV in the coming weeks. Please use another Bitcoin SV data provider if it's crucial to your business. The announcement will be made through our API versioning system: https://blockchair.com/api/docs#link_M06

I'll update this issue once there's some new information.

This is what you are looking for: https://www.bitcoinsv.io/liteclient

Also BSV is the only scalable blockchain. You think you'll have many visitors from other blockchains in a couple of years? Good luck with that.

@F1r3Hydr4nt
Copy link

There is also this

@Har01d
Copy link
Member

Har01d commented Jun 21, 2022

@Har01d. Three days ago your IBD processes reached block #730000. They should have reached the tip of the chain by now. Can you give us an update?

It got almost 3 days for the node to reach to 739304 and crash again with ERROR: ReadBlockFromDisk: Deserialize or I/O error - CAutoFile::read: end of file: iostream error at CBlockDiskPos. I still think this is somehow connected to the node size getting near 4 terabytes.

@Har01d what version of BSV node were/are you running? The issue you describe above sounds like a known issue I had with an older version of the BSV node, a few versions ago.

I have a suspicion this might be the issue. Our general policy regarding running nodes is not to upgrade unless there's a major change or a hard fork. This policy saved us (at least lots of our time) numerous times. Sometimes minor upgrades bring major bugs (especially when there's no clear release schedule, LTS versions, etc.)

So we run a rather old version, but according to the changelog (https://github.com/bitcoin-sv/bitcoin-sv/releases) it should be fine. Newer versions don't run on our current system (we're getting N5boost10filesystem16filesystem_errorE errors, I suppose they are incompatible with our libboost-filesystem version or so).

There's a good chance however that there has been some breaking change not reflected in the changelog, so we're running a test. We're currently syncing two nodes on another server: an older version and the last one. Hopefully in several days we'll have some outcome (if my hypothesis is correct, the older version would get corrupted, and the newest will sync).

As of now we've been able to sync with 1.0.11 in another location, but this location is too distant from our database location. There are two issues with that:

  1. For larger blocks getblock generates 20+ GB of JSON data. We don't currently have a way to transfer this in under 10 minutes.
  2. But we decided that it'd be better to process later than never and started to work with this node... Only to face another issue which is all connects with rest endpoints break exactly after 1 minute and 40 seconds of download time. It seems like there's some hardcoded 100 seconds limit, but we can't find where. Any tips?

The later is somewhat similar to this open issue: bitcoin-sv/bitcoin-sv#258

Some things just break after certain size if no one tests them.

The two nodes we're syncing now are close to our database location, so I hope once both the old and the latest version are synced (at least one of them!) we'll be finally able to populate our databases with new data.

This is what you are looking for: https://www.bitcoinsv.io/liteclient

No, we can't use a pruned / light / whatever-not-full node. This defeats the purpose of an explorer: to be able to query any blockchain entity since the genesis block.

Not to mention that most probably its RPC API is not compatible with the full client and there are some missing major things, so it'd require to fully rewrite our engine. You can't expect this to be done in 2 days :)

@rzadeh
Copy link

rzadeh commented Jun 21, 2022

@Har01d I run the full BSV node on Ubuntu 20.04 LTS with the data store on an XFS filesystem with the inode64 mount option in /etc/fstab

@brad1121
Copy link

brad1121 commented Jun 22, 2022 via email

@brad1121
Copy link

brad1121 commented Jun 22, 2022 via email

@kyuupichan
Copy link

kyuupichan commented Jun 22, 2022

2. But we decided that it'd be better to process later than never and started to work with this node... Only to face another issue which is all connects with rest endpoints break exactly after 1 minute and 40 seconds of download time. It seems like there's some hardcoded 100 seconds limit, but we can't find where. Any tips?

rpcservertimeout=300

and like Brad says, stream in binary not JSON

@kamk
Copy link

kamk commented Jun 22, 2022

I'd also recommend for storage a filesystem supporting snapshots such ZFS or BTRFS. As the biggest part of underlying DB is append-only data, there's no big overhead using this technique. Also recovery from corrupted DB is relatively fast.

@bazylski
Copy link

bazylski commented Jun 22, 2022

@Har01d Just turn it off, and save money for the winter. All the people that were using it are conveniently already here. You can inform them they just need to ask Craig on Twitter what's their balance from now on.

@rzadeh
Copy link

rzadeh commented Jun 22, 2022

@Har01d Just turn it off, and save money for the long winter. All the people that were using it are conveniently already here. You can inform them they just need to Craig on Twitter what's their balance from now on.

@bazylski that would have been funny if you could speak English...

@Har01d
Copy link
Member

Har01d commented Jun 22, 2022

We've been able to make a bridge between our distant node and the database server, so the block data is being slowly but surely populated into our databases (this is visible at https://blockchair.com/bitcoin-sv/blocks). We're still in the process of syncing two nodes (different versions) close to the database server to speed things up.

On that error, do you see any evidence in logs that your OS killed the process and bitcoind has then restarted. Usually this happens when you run out of memory. It is the fastest way to corrupt the bitcoind blocks dir.

No, it's most certainly not an OOM issue. It syncs fine and then just fails:

2022-06-14 22:23:34 [bitcoin-worker1-N-CAsyncTaskPool] UpdateTip: new best=00000000000000000b603cb3045cb51b7e62e701229050db516f06bad6472f0f height=743732 version=0x20000000 log2_work=88.287883 tx=1163999565 date='2022-06-13 12:21:48' progress=0.999663 cache=4168.5MiB(9407570txo)
2022-06-14 22:23:38 [bitcoin-worker1-N-CAsyncTaskPool] Pre-allocating up to position 0x100000 in rev21134.dat
2022-06-14 22:23:38 [bitcoin-worker1-N-CAsyncTaskPool] UpdateTip: new best=000000000000000009f86662bdaf21f489bb3c26712c15c6f9df6886a6a5a99b height=743733 version=0x20002000 log2_work=88.287884 tx=1164000343 date='2022-06-13 12:22:42' progress=0.999663 cache=4168.6MiB(9407682txo)
2022-06-14 22:23:38 [bitcoin-worker1-N-CAsyncTaskPool] ERROR: ReadBlockFromDisk: Deserialize or I/O error - CAutoFile::read: end of file: iostream error at CBlockDiskPos(nFile=21135, nPos=8)
2022-06-14 22:23:38 [bitcoin-worker1-N-CAsyncTaskPool] *** Failed to read block
2022-06-14 22:23:38 [bitcoin-worker1-N-CAsyncTaskPool] Error: Error: A fatal internal error occurred, see bitcoind.log for details
2022-06-14 22:23:38 [bitcoin-worker1-N-CAsyncTaskPool] ERROR: operator(): ActivateBestChain failed
2022-06-19 03:05:15 [bitcoin-worker2-N-CAsyncTaskPool] UpdateTip: new best=000000000000000000465434ba0e6bfc06cc126dd6cbe589300812dc2733285d height=739303 version=0x20400000 log2_work=88.283936 tx=1152776707 date='2022-05-13 16:25:46' progress=0.991335 cache=3279.1MiB(10050084txo)
2022-06-19 03:05:17 [bitcoin-worker2-N-CAsyncTaskPool] Block 000000000000000010632a18ecfbb840e1ae5d4903c0864a94df613ac6e4e503 will not be considered by the current tip activation as a different activation is already validating it's ancestor and moving towards this block.
2022-06-19 03:05:17 [bitcoin-worker1-N-CAsyncTaskPool] Pre-allocating up to position 0x500000 in rev19132.dat
2022-06-19 03:05:17 [bitcoin-worker1-N-CAsyncTaskPool] UpdateTip: new best=0000000000000000110fb7aab63797fdab69e97f5b52b99f0cedf410a7090d8c height=739304 version=0x20000000 log2_work=88.283936 tx=1152782805 date='2022-05-13 16:41:37' progress=0.991338 cache=3280.0MiB(10051512txo)
2022-06-19 03:05:17 [bitcoin-worker1-N-CAsyncTaskPool] ERROR: ReadBlockFromDisk: Deserialize or I/O error - CAutoFile::read: end of file: iostream error at CBlockDiskPos(nFile=19139, nPos=1430988653)
2022-06-19 03:05:17 [bitcoin-worker1-N-CAsyncTaskPool] *** Failed to read block
2022-06-19 03:05:17 [bitcoin-worker1-N-CAsyncTaskPool] Error: Error: A fatal internal error occurred, see bitcoind.log for details
2022-06-19 03:05:17 [bitcoin-worker1-N-CAsyncTaskPool] ERROR: operator(): ActivateBestChain failed

and so on. After restart it successfully rewinds and verifies the latest blocks, and then crashes with the very same error.

You should move away from JSON rpc for getting blocks. 20gb of JSON means an additional 20Gb of memory needed to pass it around. Bitcoind has a rest interface rest=1 which also does data streaming (hex or binary). For fetching large datasets this is far more efficient.

Yes, we already use the REST interface, but our engine works with JSON only. Theoretically, it's possible to rewrite the engine to work with binary data directly, but that's not feasible business-wise at this point.

@Har01d
Copy link
Member

Har01d commented Jun 22, 2022

Theoretically, it's possible to rewrite the engine to work with binary data directly, but that's not feasible business-wise at this point.

And even if it's not the node which would need additional 20 GB to pass JSON, it'd be our engine that'd require the very same amount of memory to parse and validate the binary data. So I don't see what's the gain here to be honest.

@brad1121
Copy link

you do it in chunks as it comes through, streaming it all just to put it back in to JSON makes no sense. Protocol doc spells out how to identify parsing the data raw.
Process as it comes in, write out to your DB/etc on the fly. I'm not saying it is trivial, especially when JSON works fine for small stuff, but its faster than dealing with JSON structures. If you're storing as JSON in a noSQL DB, then that's another issue...

@SteffenNS
Copy link

https://blockchair.com/bitcoin-sv/blocks seems to be stuck at block 744,642.

Maybe it is time to move away from your "rather old version" of the node software if you haven't done so yet. You said you were "able to sync with 1.0.11 in another location" so the 1.0.11 version will probably also work in your main location.

@Har01d
Copy link
Member

Har01d commented Jun 27, 2022

It's not stuck, it just takes several hours to process a 4 GB block (744643) with the current network configuration.

Re: syncing a new node in the main location. We started the process a week ago, and it's on block 734000 now. Considering it's 3.4 TB out of 4 TB I think it'd take 2-3 days more.

Right now we're syncing 3 nodes as a part of our experiment and it feels like this is slowing down the process a bit (compared to when we were resyncing just 1) as there are only ~10 fully synced nodes on the network (see https://blockchair.com/bitcoin-sv/nodes) and most probably we're just hitting their speed limits.

This is also a theoretical (or practical in this case I'd say) attack vector IMO. If there's only 10 full nodes on the network with a 4 TB blockchain, an attacker can easily spawn 100 nodes which would result in overwhelming the network. An "honest miner" won't be able to fully sync from scratch in a reasonable amount of time if ever. Basically this is a DoS which easily exhausts the bandwidth. And it seems like this is only solvable via whitelists, but this goes against the decentralized nature of the network. Another option is "pay to download" which is solvable with either PoW or its derivative (pay with BSV directly). but this is somewhat cringe too.

@Axiantor
Copy link

Imagine when the network is too light for attackers not to be able to perform malleable attack but to heavy for a service to run. A bit of reality: https://twitter.com/connolly_dan/status/1502951300581007360?s=20&t=TIvaQcg6DICLV1preLYMug

@Daibp
Copy link

Daibp commented Jun 29, 2022

Update: My BSV is now appearing in my d'cent wallet (which checks blockchair)

@Har01d
Copy link
Member

Har01d commented Jun 30, 2022

We've been able to confirm an issue with the nodes. Basically there was a never announced hard fork which not only cut off old version nodes from the network, but also corrupted their databases. If I have time, I'll provide more details.

It seems like it's a wide issue as Blockchair is not the only data provider which got stuck because of that, we see more stuck nodes than usual. Some block explorers such as https://bsv.tokenview.com/ (also crashed two weeks ago at block 744065) may also be affected by the same.

Edit. Clarifying: it's not the hard fork itself which corrupted the old nodes, it's kinda vice versa. At some point this database issue (which I still suppose has something to do with going over 4 terabytes) was fixed with an update, but it seems like it was never announced as a mandatory hard-forking change.

@Axiantor
Copy link

We've been able to confirm an issue with the nodes. Basically there was a never announced hard fork which not only cut off old version nodes from the network, but also corrupted their databases. If I have time, I'll provide more details.

It seems like it's a wide issue as Blockchair is not the only data provider which got stuck because of that, we see more stuck nodes than usual. Some block explorers such as https://bsv.tokenview.com/ (also crashed two weeks ago at block 744065) may also be affected by the same.

Edit. Clarifying: it's not the hard fork itself which corrupted the old nodes, it's kinda vice versa. At some point this database issue (which I still suppose has something to do with going over 4 terabytes) was fixed with an update, but it seems like it was never announced as a mandatory hard-forking change.

Since the genesis update no change in consensus rules has happened. You can check it yourself. https://www.bitcoinsv.io/releases

@Har01d
Copy link
Member

Har01d commented Jun 30, 2022

You can check it yourself. https://www.bitcoinsv.io/releases

This is exactly what I'm saying: there's been a hard-forking change, but nothing has been announced. Most probably it has gone unnoticed as there are too few node operators and there was no proper testing. You can't sync from scratch using 1.0.0 anymore (at least we couldn't, and it was the reason our nodes crashed, and most probably some other data providers stalled because of this as well).

The situation is very similar to CVE-2013-3220 in Bitcoin from 2013: https://en.bitcoin.it/wiki/BIP_0050. It was a very similar split due to switching from Berkeley DB to LevelDB.

@F1r3Hydr4nt
Copy link

There was no protocol hard fork, the node update removed the dust limit which was never a part of the codebase since the very first Bitcoin client nor is it a consensus change, miners still have the option to mine whatever they like. Furthermore there was an announcement.

As somebody mentioned previously, you need to prepare for scale so a blockchain can provide global levels of adoption and surpass VISA etc. JSON approaches will need to be dropped in favour of streaming raw data to reach a peak load of 1 Trillion transactions per second.

@Har01d
Copy link
Member

Har01d commented Jul 13, 2022

So far our explorer has been running fine for the past two weeks using v.1.0.11, so I'm closing this thread. Further announcements regarding Bitcoin SV support will be made through the changelog.

the node update removed the dust limit

The issue had nothing to do with the dust limit, that was a node database failure. Once again, and we've double-checked this, it's not possible to sync to the tip using some older versions anymore (a clean test was performed using v.1.0.0).

Bitcoin SV developers failed to mention there was a mandatory upgrade (which in turn by definition was a software-level hard-forking change as some older nodes are not compatible with newer). This led to users with older versions, including Blockchair, to get out of sync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

16 participants