[pull request idea] addressindex, spentindex, timestampindex (Bitcore patches) #10370

karelbilek · 2017-05-09T12:19:35Z

We (SatoshiLabs) use bitcore for our backend; and bitcore depends on a fork of bitcoin-core, that adds additional indexes and some other functionality.

Since bitpay stopped updating the fork to newest bitcoin, effectively making bitcore unable to see segwit, we keep rebasing the code to the latest releases. So I thought it wouldn't be a bad idea to try to actually put it into upstream :)

What the code does is adding additional indexes into bitcoin-core that enable to run full block explorer, like insight, and also adds some RPC calls for those indexes. I think it would be beneficial to the ecosystem at large to have these RPC calls/indexes available in upstream.

Most of the work has been done by @braydonf ; I just rebased the code (first to 0.14.* here - https://github.com/satoshilabs/bitcoin - and on master in this PR) and tested it with segwit. (The original, outdated bitcore code is at https://github.com/bitpay/bitcoin )

This is not meant as a definitive PR - it is too big and needs to be cleaned up and broken into smaller parts (it's obvious it has been rebased several times) - but more as a request for discussion; I want to know if I should start spending time with cleaning the commits up etc. or if you don't want this functionality in the bitcoin-core at all.

The added RPCs are:

getblockdeltas
getblockhashes
getaddressmempool
getaddressutxos
getaddressdeltas
getaddressbalance
getaddresstxids
getspentinfo

These calls are sufficient to build block explorer.

Adds new bitcoin.conf configuration options for three new indexes: -addressindex=1 -spentindex=1 -timestampindex=1 The addressindex records all changes to an address for retrieving txids, balances and unspent outputs for addresses. Changes are stored and sorted in block order. Both p2sh and p2pkh address types are supported. The index records two sets of key/value pairs. The first records all activity and is useful for viewing transaction history and all changes. The second is specifically for retrieving unspent outputs by address, and is smaller as values are removed once they are spent. The spentindex has multiple purposes and brings closer together inputs and outputs of transactions. The main purpose is to efficiently determine the address and amount of an input's previous output. The second purpose is to be able to determine which input spent an output. The timestampindex keeps track of timestamps with block hashes and is useful for searching blocks by date instead of by height. This is useful for a block explorer that will give search options by date. The index uses logical time correction to make sure that the results are sorted in block order. The logical time of a block is actual timestamp of the block, unless it is less than (earlier) the previous block's logical time, and in that case it is one second greater than the previous block's logical time. Includes logical time fix by Chethan Krishna Conflicts: src/main.cpp src/main.h src/txdb.cpp src/txdb.h src/txmempool.h

Adds new rpc commands that use the address, spent and timestamp indexes, including the new commands: - getblockdeltas - getblockhashes - getaddressmempool - getaddressutxos - getaddressdeltas - getaddressbalance - getaddresstxids - getspentinfo And modifications to the command: - getrawtransaction Conflicts: src/rpc/blockchain.cpp src/rpc/client.cpp src/rpc/misc.cpp src/rpc/rawtransaction.cpp src/rpc/server.cpp src/rpcserver.h

Tests the functionality of the indexes as well as the rpc commands

There was a previous assumption that blockindex would be quite small. With addressindex and spentindex enabled the blockindex is much larger and the amount of cache allocated for it should also increase. Furthermore, enabling compression should decrease the amount of disk space required and less data to write/read. The default leveldb max_open_files is set to 1000, for the blockindex the default is set to 1000 with compression. The 64 value that is current is kept for the utxo database and does not enable compression. Two additional options are added here to be able to configure the values for leveldb and the block index: - `-dbmaxopenfiles` A number of files for leveldb to keep open - `-dbcompression` Boolean 0 or 1 to enable snappy leveldb compression Conflicts: src/dbwrapper.cpp src/init.cpp

karelbilek · 2017-05-09T12:28:18Z

The tests seem to be failing on some rebased code.

But as I said, I added this PR as a general discussion request. If there is an interest, I will clean this up.

jonasschnelli · 2017-05-09T13:35:07Z

Having an independent address-index available somewhere may be handy and in general a great thing. Though, I'm not sure if we want to add this to the Bitcoin-Core repository. It already contains (too) much. Wallet, utilities, txindex, UI, p2p, consensus.

I would prefer to keep the address-index (and all other sorts of indexes) in other repositories.
Since Core offers ZMQ, creating an address-index via a python script/application (with it's independent leveldb dababase/layer) should not be very complex.
The python application could then handle the additional RPC commands.

If there are missing ZMQ features (I guess there is still an open PR for mempool ZMQ notifications), we should rather add those instead of the indexes.

But I'm open for other opinions.

karelbilek · 2017-05-09T13:58:21Z

I am not sure about the exact reasoning (cc @braydonf again), but older versions of Bitcore (< 3) had something similar - unpatched bitcoin core with additional indexes in node.js (see https://github.com/bitpay/bitcore-node/tree/v2.1.1/lib/services ) - but in Bitcore 3, they instead moved to including the address index in the bitcoin core itself.

Perhaps it is easier to maintain one database over 2 separate databases? (I am guessing here)

dcousens · 2017-05-09T14:04:49Z

@Runn1ng have you looked at https://github.com/bitcoinjs/indexd?
It has all the indexes listed above, and it works seamlessly with an unpatched bitcoin-core RPC.

karelbilek · 2017-05-09T14:10:10Z

@dcousens thanks for noticing me! I will definitely look at that.

braydonf · 2017-05-09T14:13:07Z

There was a significant performance gain by building the indexes at the same time as the txindex, since each block only needed to be read and deserialized once.

TheBlueMatt · 2017-05-09T14:19:09Z

See-also #9806 (and #8660 and #5048). I think the jury might still be out on whether we want more indexes supported in Bitcoin Core.

dcousens · 2017-05-09T14:20:17Z

@braydonf at ~400ms per 1MB block... that really shouldn't be an issue - and the above is Javascript.
If you're concern is related to the initial import, tools like https://github.com/dcousens/fast-dat-parser can dump the leveldb as fast as it can push IO.

IMHO I don't think bitcoind should support any further indexes...

braydonf · 2017-05-10T04:55:32Z

In the past, we had implemented indexes in several ways, as mentioned, and found this to be the best option. I do think additional indexes, if needed, should be implemented within a full/spv node, the closeness reduces the overall complexity and duplication of disk and CPU usage. It otherwise ends up being what is effectively two nodes next to each other. The point that I realized this was when it was necessary to keep what is effectively another UTXO set for determining the address for inputs.

As far as the utility of the indexes, and if they should be included in Bitcoin Core: I'd like to see the ability for an address index to be pruned (not included in this current implementation). An address index of the UTXO set and memory pool only, for example, may enable a pruned Bitcoin QT wallet to not need to rescan (and download) all blocks when adding new keys to a wallet. This could support many different hardware/offline wallets, with addresses and keys that the full node may not be aware. And in a way that doesn't depend on remote servers. This could also be useful for accepting bitcoin payments in an environment that the keys may change frequently e.g. on a web server, while also reducing the setup costs.

As far as specific implementation details here, there may be some work here still: The indexes may ideally be implemented with a B/B+ tree in C (e.g. LMDB, WiredTiger, and etc.), the LSM tree of LevelDB is useful for smaller write heavy sets of data that are not queried often. The write amplification that happens in block compaction, especially when the database reaches 100+ GB can become problematic. This implementation of indexes is impacted by this issue, however it's mostly revolved by running on SSD and could be helped by running manual compaction (also not implemented here). That same issue may eventually affect the UTXO database as it grows in size.

dcousens · 2017-05-10T05:25:43Z

the closeness reduces the overall complexity and duplication of disk and CPU usage

If you are using a pruned or SPV node, you really shouldn't be suffering from disk duplication problems.
The CPU usage for an index is near completely negligible compared to your IO.
You don't need to verify anything if you trust your data source (your node).

An address index ... [would be optimal] when adding new keys to a wallet

Personally, I would only concept ACK this if it was completely contained to src/wallet.

As far as I'm concerned, bitcoind is my source of truth.
Any "indexes" are purely convenient views of that truth - and therefore as a matter of single responsibility - they should be isolated.

I don't see how there could be any significant IO inequal to that as if it was inlined within bitcoind - short of your local pipe and RPC overhead - which, as far as I have measured - is completely negligible compared to the disk IO.
If any effort was to be made in the name of "performance", it should be focused on a high performance binary pipe, not just moving cruft into bitcoind for the sake of convenience.

...reducing the setup costs...

I agree that the lack of convenient setup/configuration for these services is not great, but that doesn't mean we have to merge it all into a single binary.

The write amplification that happens in block compaction, especially when the database reaches 100+ GB can become problematic

I don't see how an inline index would alleviate this.

junderw · 2017-05-10T10:19:21Z

Allowing Bitcore-Wallet-Service to run on top of a vanilla bitcoind + bitcore (no patched binaries needed during install) would be extremely useful.

priestc · 2017-05-16T12:41:38Z

I just want to say that a UTXO database index in bitcoin core is a very good idea in my opinion.

Keep in mind also there is BIP131, which requires a index on the UTXO set to be implemented. If BIP131 gets implemented, the UTXO database no longer becomes optional (as it will be required to validate new transactions), but since the UTXO index is relatively tiny compared to the txindex I don't think its a problem.

In the future, I see the wallet features and UI elements removed from bitcoin core, and backend stuff like indexes taking it's place.My opinion is we should go hog wild and add as many (optional) indexes as conceivable.

jtimon · 2017-07-18T23:19:05Z

As said in #9806 I'm not opposed to an additional optional index similar to -txindex. For example "-scriptpubkeyindex".
But instead I think it's more generic to use the scriptPubKey directly as key instead of addresses.
The translation from address to script is simple for p2pkh (and other script templates) can be done externally or maybe internally too as an additional rpc call.

The minimum necessary for an explorer I think would be -scriptpubkeyindex plus a call searchscriptpukey that takes the hex of a scriptpubkey and returns a list of outpoints (perhaps including the block hash, see #10275 ).
From there, one can call getrawtransaction and/or gettxout can be used. But this is just the minimal design, not the optimal or more convenient. At the same time, I think the list currently in the OP is too much.

But for the block explorer use case I assume no pruning and serving spent transactions too, which will result in a big index and code that I'm sure some people won't be very happy to help maintain. Even if that's the case, I'm interested in this use case and I may be interested in rebasing the same resulting branch from major version to major version.
I want to note that "I run my own explorer locally for privacy reasons" is a perfectly valid use case, not only for this index, but also for non pruned full nodes in general (and we shouldn't disregard potential reasons for users to want to run archival nodes, beyond "helping the network").

Starting with a branch that only indexes the unspent scriptpupkeys as @braydonf suggests may be much more acceptable (and whether or not the explorer part is acceptable, the diff to maintain will be much smaller if that gets accepted). Not only because the index would be much smaller and because it doesn't conflict with pruning in any way, but also because the use case of removing the need for rescan is much more compelling.

If you are using a pruned or SPV node, you really shouldn't be suffering from disk duplication problems.

Sure, the explorer and norescan are different use cases.
Even non pruned full nodes could benefit from the latter, for example, allowing to import private keys leveraging the index, maybe with an additional norescan option that allows you to only look in the scriptpubkeyUtxoIndex and completely avoid a rescan to get info about already spent outputs that key once controlled.

You don't need to verify anything if you trust your data source (your node).

Pruned or not, you don't need to trust your data source if you are already a full node. I think the SPV use case is just adding confusion, the utxo-only index can be very useful without taking SPV nodes into account at all.

I agree that the lack of convenient setup/configuration for these services is not great, but that doesn't mean we have to merge it all into a single binary.

Absolutely, adding the index doesn't mean we should also add every potential call that would make use of it. Even if that would be more convenient, marginal use cases can perhaps go through less direct routes like implementing some things on their side or calling the rpc twice instead of once to get some data they want, which is may not be a great penalty in performance depending on the case.

karelbilek · 2017-08-17T22:26:14Z

I am closing this PR. We are running bitcoin with these custom patches on our servers and while it is working so far, it is not very maintainable long-term and the leveldb performance is subpar for the addressindex.

External index is the only solution.

Braydon Fuller and others added 12 commits May 9, 2017 12:36

tests: adds rpc tests for address, spent and timestamp indexes

7424114

Tests the functionality of the indexes as well as the rpc commands

tests: adds unit test for IsPayToPublicKeyHash method

9ae322d

indexes: refactoring and fixes applying changes to 0.13

1b99ecb

indexes: additional logging and checks for indexes

a74235a

tests: update rpc index tests for 0.13

95d72be

tests: test dbwrapper options compression and maxopenfiles

bc474a1

tests: include and fix txindex test with rpc tests

06726dd

build: include timestampindex.h in makefile

74a8501

Bitcore 0.14.0 fixes

7a4ad0c

jonasschnelli mentioned this pull request May 15, 2017

How to get external address transaction information？ #10401

Closed

karelbilek mentioned this pull request Jul 27, 2017

addressindex, spentindex, timestampindex (Bitcore patches) Bitcoin-ABC/bitcoin-abc#34

Closed

karelbilek mentioned this pull request Aug 17, 2017

Rebase bitcore patches to 0.14 bitpay/bitcoin#43

Closed

karelbilek closed this Aug 17, 2017

jonasschnelli mentioned this pull request May 12, 2018

Motives unclear jonasschnelli/bitcoincore-indexd#5

Open

ch4ot1c mentioned this pull request Aug 20, 2018

Significant updates enabling explorer, includes mempool and address indexing methods z-classic/zclassic#169

Closed

This was referenced Sep 3, 2018

Bitcore integration z-classic/zclassic#182

Closed

Cleanup and Merge of Bitcore-integration z-classic/zclassic#184

Open

karelbilek mentioned this pull request Oct 5, 2018

Add address-based index (attempt 4?) #14053

Closed

cryptozeny mentioned this pull request Dec 24, 2019

merge AddressIndex into master sugarchain-project/sugarchain#40

Open

bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pull request idea] addressindex, spentindex, timestampindex (Bitcore patches) #10370

[pull request idea] addressindex, spentindex, timestampindex (Bitcore patches) #10370

karelbilek commented May 9, 2017 •

edited

Loading

karelbilek commented May 9, 2017

jonasschnelli commented May 9, 2017

karelbilek commented May 9, 2017 •

edited

Loading

dcousens commented May 9, 2017 •

edited

Loading

karelbilek commented May 9, 2017

braydonf commented May 9, 2017

TheBlueMatt commented May 9, 2017

dcousens commented May 9, 2017 •

edited

Loading

braydonf commented May 10, 2017

dcousens commented May 10, 2017 •

edited

Loading

junderw commented May 10, 2017

priestc commented May 16, 2017

jtimon commented Jul 18, 2017

karelbilek commented Aug 17, 2017

[pull request idea] addressindex, spentindex, timestampindex (Bitcore patches) #10370

[pull request idea] addressindex, spentindex, timestampindex (Bitcore patches) #10370

Conversation

karelbilek commented May 9, 2017 • edited Loading

karelbilek commented May 9, 2017

jonasschnelli commented May 9, 2017

karelbilek commented May 9, 2017 • edited Loading

dcousens commented May 9, 2017 • edited Loading

karelbilek commented May 9, 2017

braydonf commented May 9, 2017

TheBlueMatt commented May 9, 2017

dcousens commented May 9, 2017 • edited Loading

braydonf commented May 10, 2017

dcousens commented May 10, 2017 • edited Loading

junderw commented May 10, 2017

priestc commented May 16, 2017

jtimon commented Jul 18, 2017

karelbilek commented Aug 17, 2017

karelbilek commented May 9, 2017 •

edited

Loading

karelbilek commented May 9, 2017 •

edited

Loading

dcousens commented May 9, 2017 •

edited

Loading

dcousens commented May 9, 2017 •

edited

Loading

dcousens commented May 10, 2017 •

edited

Loading