Add an address-based transaction index #2802

sipa · 2013-06-27T22:50:29Z

Changes:

Maintain a salt to perturbate the address index (protection against collisions).
Add support for address index entries in the block index, and maintain those if -addrindex is specified. It indexes the use of every >8-byte data push in output script or consumed script - or in case of no such push, the entire script.
Add a searchrawtransactions RPC call, which can look up raw transactions by address.

I both hate and love this patch. I hate making it easy to build infrastructure that relies on a fully-indexed blockchain (which shouldn't be necessary), as it may remove the incentive to build more scalable solutions. On the other hand, in cases where the alternative is relying on a trusted third party, this approach is certainly preferable, and would allow an RPC-based blockexplorer, for example.

A full -txindex=1 -addrindex=1 index is about 2.7 GiB now.

mikehearn · 2013-06-28T14:24:36Z

I'm sure you'll hate this idea even more, but why not expose searchrawtransactions via P2P and advertise it in a service bit? "fast import of a private key" is a pretty common feature request for SPV wallets. I don't think it's particularly useful myself, but apparently other people do and they go and use blockchain.info currently.

sipa · 2013-06-28T14:37:51Z

We've had this discussion. #2168

petertodd · 2013-06-28T19:56:07Z

There needs to be some way to find out how many entries are matched by a given address to allow blockexplorer apps's to skip to the most recent transactions and display them first if requested. Failing that, at least have Python-style index ranges to allow you to start from the last entry. Yes, address reuse sucks etc. etc. but it's a feature blockchain.info has and people will want it.

I like the search by PUSHDATA; nice to be able to search for 13MH4zmU4UT4Ct6BhoRFGjigC8gN9a9FNn and see all my multisig addresses. It may be worthwhile to also add scriptSigs to the database as well so that spent multisig P2SH's can be identified.

EDIT: interesting, looks like we have a ExtractDestination() or similar failure with bare OP_CHECKMULTISIG...

sipa · 2013-06-28T19:59:34Z

@petertodd Thanks for catching those two comments.

There is a count and a skip argument, so you can paginate results (the database is queried each time, but the relevant transactions are only read from disk when actually requested).

petertodd · 2013-06-28T20:01:33Z

I know you can paginate; I'm saying that you need to be able to paginate in reverse direction. (my understanding is the results are in order of confirmations right?)

sipa · 2013-06-28T20:07:56Z

@petertodd They are in increasing position on disk, which for now corresponds to number of confirmations (except for transactions in sidechains - yes, those are returned too). I could just reverse the order, I guess.

One problem for the future is that when we'll have headers-first sync and parallel block downloading, block order on disk won't be necessarily consistent anymore with chain order. A solution is of course fetching all blocks, and sorting them by confirmations before pagination, but that's a very significant overhead. Another solution is storing the height of each entry in the index...

petertodd · 2013-06-28T20:10:18Z

re: sidechains, interesting! That should be documented...

Gah, that does add a decent amount of complexity, although users are going to see it as important that they can see their latest satoshidice crap. :( I don't have a great solution here, although I would lean towards storing height if that's what it takes.

sipa · 2013-06-28T20:13:14Z

I don't mind the complexity of adding that (though it probably adds several 100 MiB right now already), but I'm really against using an address index as a way to track new payments (which is what this basically boils down to, right?) - you should just have a wallet that watches the chain for that...

petertodd · 2013-06-28T20:15:54Z

Well, maybe this index is actually slightly premature, and what we need more immediately is a way to search just the UTXO set?

petertodd · 2013-06-28T20:17:03Z

Or, think in terms of searchrawtransactions should have UTXO-only and full-chain modes?

sipa · 2013-06-28T20:17:58Z

@petertodd What I want it for is an easy local blockexplorer. And the nice thing about the block tree (and all its indexes) is that they're append only, which is easy implementation-wise.

I agree an address index to the UTXO is useful too, and probably less controversial. It's not implemented however :)

sipa · 2013-06-28T20:22:07Z

@petertodd Also, searching the UTXO set by address would have a very different interface anyway, as it's a set of transaction outputs, not a set of transactions. It wouldn't make sense in the same RPC command, IMHO.

petertodd · 2013-06-28T20:41:15Z

@sipa Good point.

Well, seems to me that for a blockexplorer simply being able to iterate forward and reverse should be enough for the UI, and at worse you can add the height index later when the order guarantee breaks. In that case:

searchrawtransactions <address> [skip=0] [count=100] [verbose=1]

Return count transactions with <address> present in their scriptSig,
skipping skip at the beginning. The ordering is oldest transaction first;
if skip is negative the order returned is newest transaction first and skip+1
transactions are skipped. If verbose=0 only txids are returned rather than
the full transactions.

luke-jr · 2013-06-28T21:28:30Z

How about using a 64-bit hash of the full script instead?

sipa · 2013-06-28T21:31:04Z

Use the source, Luke!

sipa · 2013-06-29T09:22:12Z

@luke-jr More seriously, every index entry adds around 10 bytes, and given the current amount of address reuse, those constitute the majority of the database. I've considered adding the entire script too, but that would mean a 50% increase (or more) because of that reason. Plus, if you know the full script, just pick the largest data push in it, and search for that.

sipa · 2013-06-29T11:26:23Z

@petertodd There is an alternative implementation possible, where we store (height, txoffset) for each index entry instead of (filenum, blockoffset, txoffset) - That's smaller too, and allows consistent ordering. The downside is that it can't support side-chain matches, as heights are not unique.

mikehearn · 2013-06-29T16:07:54Z

Yes, I know we had the discussion before. Once again, I'll remind us all that we don't have any real power in this. If people can't get the features they need in a decentralised way they'll create a centralised way instead, hence, blockchain.info API. The idea that people will say "oh there's no P2P command to read a blockchain index, guess I won't write that app after all" doesn't seem right to me.

luke-jr · 2013-06-29T16:38:03Z

@sipa I meant the 64-bit hash instead of the current 64-bit key. Any address can be converted to a script and hashed just fine.

johndillon · 2013-06-29T16:39:21Z

@mikehearn If you can't offer a feature in a decentralized way with reasonably low resource consumption you should be happy that centralized services pop up and offer it instead.

If you want to be useful design an API that allows you to pay via micro-transactions those resources you are using by querying a node that has gone to the expense of using @sipa's code to maintain a blockchain index. It would be easy to add this as a service bit and use preferential peering to make it possible for nodes to find peers supporting that API. Such a system would still be decentralized and resource consumption would be paid for in a fair and equitable way.

You do after have a brand new micro-transactions system...

sipa · 2013-06-29T16:43:52Z

@luke-jr That doesn't allow you to query pay-to-pubkeys given the corresponding address.

sipa · 2013-06-29T16:47:48Z

@mikehearn I'll agree to have an address index exposed to the P2P network if it can be done in an authenticated way, like Alan Reiner's proposal for exposing an address-indexed committed merkle tree, and even then only for the UTXO set, and not the entire history.

I'm completely opposed to providing any service on the P2P network that requires the entire history being available, except bootstrapping a new full node.

johndillon · 2013-06-29T16:48:52Z

@sipa My preference would be to support the full (filenum, blockoffset, txoffset) set myself. Whatever the additional space used is it doesn't seem like a big deal once you've decided to go to the effort of creating the index in the first place. Hard-drive space is cheap.

Incidentally, it does speak to how it would be useful to be able to iterate over every block stored, including orphans, and be able to deliberately add orphans to your database even after the fact. Heh, timestamp your orphan blocks and it would even be reasonable to maintain a set of every orphan ever created after the fact while allowing users to submit new blocks to this database even far into the future. (the timestamp is the anti-spam measure)

luke-jr · 2013-06-29T16:50:26Z

@sipa Neither does the specialized form you're suggesting...?

johndillon · 2013-06-29T16:52:19Z

@luke-jr It does because every PUSHDATA > 20 bytes is indexed by first computing Hash160(data).

Even individual multisigs in a bare OP_CHECKMULTISIG can be searched for as @petertodd pointed out.

petertodd · 2013-06-29T17:01:22Z

@sipa I'll second @johndillon's thoughts on the full (filenum, blockoffset, txoffset) index.

Clever idea re: an orphan database... It'd be useful to have an index of all children for a given block too, but that can be a different pull-req; by that point we'll have a -all-blockchain-indexes flag...

luke-jr · 2013-06-29T17:06:52Z

Aha, missed that part.

sipa · 2013-06-29T17:12:58Z

@petertodd To have a consistent ordering, we'd need (filenum, blockoffset, txoffset, height) even.

gastonmorixe · 2013-07-05T03:52:01Z

This is awesome 👍 please, merge it

sipa · 2013-07-13T23:05:14Z

Rebased, and added a stable ordering (height-based) and negative offsets, to simplify backward pagination.

1) Maintain a salt to perturbate the address index (protection against collisions). 2) Add support for address index entries in the block index, and maintain those if -addrindex is specified. It indexes the use of every >8-byte data push in output script or consumed script - or in case of no such push, the entire script. 3) Add a searchrawtransactions RPC call, which can look up raw transactions by address.

sipa · 2013-07-20T13:06:17Z

Rebased and fixed the bugs reported by @petertodd.

BitcoinPullTester · 2013-07-21T00:01:10Z

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/4790f3c823a33fae44b82ef7962372e38b1b0131 for binaries and test log.
This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/
Contact BlueMatt on freenode if something looks broken.

sipa · 2013-07-26T19:43:00Z

@petertodd Feel free to test the pageability now.

petertodd · 2013-07-26T20:59:32Z

@sipa Will do after my litecoin audit's done.

petertodd · 2013-07-31T05:27:27Z

searchrawtransactions 1111111111111111111114oLvT2 doesn't work - looks like the issue is the first few lines in FindTransactionsByDestination() because the keyid is 0, which is correct in that case.

petertodd · 2013-07-31T05:39:23Z

I'm having good luck with the code otherwise - matches results on blockchain.info, and non-standard oddities like searching for OP_RETURN data work fine. Also results returned appear to all be in correct order.

petertodd · 2013-07-31T06:17:11Z

Getting some weird results with mwy5FX7MVgDutKYbXBxQG5q7EL6pmhHT58=Hash160('') on testnet, and the equivalent on mainnet. I'm seeing testnet txids a1f6a4ffcfd7bb4775790932aff1f82ac6a9b3b3e76c8faf8b11328e948afcca and 75f7d5e99912875e88d667afb48021b0b74916539c518618a8db4966661509df returned in the results - the former has a standard scriptPubKey, and a nonstandard scriptSig of "1", or hex "51". (IE push the number 1 to the stack) The latter has a totally empty scriptPubkey. Opcode 51 isn't within the range of opcodes 0 to PUSHDATA4, so BuildAddrIndex should have gone to the !fHaveData branch, but that returns Hash160(script), which is definitely not empty.

A search for a script of just opcode 52, Hash160('\x52') doesn't show this problem, (returns empty) and searching for script's without pushdata's like Hash160(''\x61') (op_nop) also works as expected. Oddly though Hash160('\x51\x51'), two pushes of the constant 1 to the stack, doesn't work and returns nothing even though there is a script exactly equal to that.

I'll be honest, I'm a bit mystified, although I haven't delved into a debugger yet.

gastonmorixe · 2013-07-31T06:41:22Z

I just wanted to thank all of you for this implementation. Specially sipa who hates it and he did wrote it :)

Here you can see my work in progress from my Rails App, I do all the parsing and get the balances. Thank you all.

http://cl.ly/QNUL

Ton

gavinandresen · 2013-08-13T06:37:58Z

ACK from me. I think we should pull; edge-case bugs with weird, non-standard transactions I don't care about.

gmaxwell · 2013-08-13T07:00:27Z

Only reason I see to delay the pull is that any fixes to address misindexing pointed out above may require a complete reindex.

petertodd · 2013-08-13T09:05:38Z

I disagree given the reason why these tx's aren't working isn't understood yet - could be a symptom of a subtle bug.

jgarzik · 2013-08-25T02:59:03Z

ACK

ghost · 2013-10-21T11:28:44Z

Would we be able to see this merged? There's no outstanding issues in it (I've run it for literally months now), and there'd be considerable benefit in having this in the master.

sipa · 2013-10-21T11:31:01Z

Seems there is a not-understood problem at least (as reported by @petertodd), and I don't plan to work on this any time soon.

sipa · 2013-10-21T11:37:47Z

In any case, I want to have watch-only wallet support before this, as it is a much more scalable solution for many problems that you would use an address-based index for. I'm closing this for now.

ghost · 2013-10-21T12:24:08Z

This did however allow for some quite handy local block explorers. Between Abe and BlockExplorer's respective database messes, you're looking 100+ GB of external databases and literally weeks of CPU time. Those not being an option, I'm back forced to query Blockchain.info and Blockexplorer for my data; which is slow and somewhat demonstrates the links between the various addresses to a third party. It also means that I'm reliant on two extremely unstable and latent services for something which -addrindex would have allowed locally.

I realise that I can still just continue on using -addrindex as my own patch, but it would be of benefit to others to have something like it available in the main builds for developers who aren't aware it exists.

sipa · 2013-10-21T12:28:12Z

Yes, I understand - that was mostly the reason for writing it. If the choice is between people using centralized indexing services and being able to run their own, I certainly would encourage the latter. But on the other hand, if the choice is between people building infrastructure that relies on such indexes being available for wallet services (preventing pruning later on), I certainly would encourage watch-only wallets.

In any case, I'm just closing it to clean up the pull request list. I'll probably get back to it at some point (or someone else may).

laanwj · 2013-10-21T12:36:40Z

@63 if you care a lot about this, why not pick it up yourself and try to resolve the remaining issue and submit a pull?

jmcorgan · 2014-02-11T19:41:45Z

I've refreshed this branch to work with current master and have issue a new pull request #3652.

Credits to sipa, based on the following resources: bitcoin#2802 sipa@fd867c7 sipa@4790f3c

Replace it with regular "governance" category limited LogPrint-s where it makes sense.

sipa closed this Oct 21, 2013

jmcorgan mentioned this pull request Feb 11, 2014

Adds transaction index by address (updates #2802 to current master) #3652

Closed

dexX7 mentioned this pull request May 4, 2014

Address indexed, based on v0.9.1ish jmcorgan/bitcoin#1

Closed

dexX7 added a commit to dexX7/bitcoin that referenced this pull request Jun 16, 2014

Adopt sipa's address-based index branch

402f643

Credits to sipa, based on the following resources: bitcoin#2802 sipa@fd867c7 sipa@4790f3c

dexX7 added a commit to dexX7/bitcoin that referenced this pull request Jun 16, 2014

Adopt sipa's address-based index branch

5095a0c

Credits to sipa, based on the following resources: bitcoin#2802 sipa@fd867c7 sipa@4790f3c

sipa mentioned this pull request Oct 8, 2015

Unobfuscate chainstate data in CCoinsViewDB::GetStats #6777

Merged

rnicoll mentioned this pull request Oct 15, 2015

Add address-based index #6835

Closed

marcinja mentioned this pull request Aug 24, 2018

Add address-based index (attempt 4?) #14053

Closed

Bushstar pushed a commit to Bushstar/omnicore that referenced this pull request Apr 5, 2019

Drop DBG macros uses from governance modules (bitcoin#2802)

5057ad5

Replace it with regular "governance" category limited LogPrint-s where it makes sense.

bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021

Add an address-based transaction index #2802

Add an address-based transaction index #2802

Conversation

sipa commented Jun 27, 2013

mikehearn commented Jun 28, 2013

sipa commented Jun 28, 2013

petertodd commented Jun 28, 2013

sipa commented Jun 28, 2013

petertodd commented Jun 28, 2013

sipa commented Jun 28, 2013

petertodd commented Jun 28, 2013

sipa commented Jun 28, 2013

petertodd commented Jun 28, 2013

petertodd commented Jun 28, 2013

sipa commented Jun 28, 2013

sipa commented Jun 28, 2013

petertodd commented Jun 28, 2013

luke-jr commented Jun 28, 2013

sipa commented Jun 28, 2013

sipa commented Jun 29, 2013

sipa commented Jun 29, 2013

mikehearn commented Jun 29, 2013

luke-jr commented Jun 29, 2013

johndillon commented Jun 29, 2013

sipa commented Jun 29, 2013

sipa commented Jun 29, 2013

johndillon commented Jun 29, 2013

luke-jr commented Jun 29, 2013

johndillon commented Jun 29, 2013

petertodd commented Jun 29, 2013

luke-jr commented Jun 29, 2013

sipa commented Jun 29, 2013

gastonmorixe commented Jul 5, 2013

sipa commented Jul 13, 2013

sipa commented Jul 20, 2013

BitcoinPullTester commented Jul 21, 2013

sipa commented Jul 26, 2013

petertodd commented Jul 26, 2013

petertodd commented Jul 31, 2013

petertodd commented Jul 31, 2013

petertodd commented Jul 31, 2013

gastonmorixe commented Jul 31, 2013

gavinandresen commented Aug 13, 2013

gmaxwell commented Aug 13, 2013

petertodd commented Aug 13, 2013

jgarzik commented Aug 25, 2013

ghost commented Oct 21, 2013

sipa commented Oct 21, 2013

sipa commented Oct 21, 2013

ghost commented Oct 21, 2013

sipa commented Oct 21, 2013

laanwj commented Oct 21, 2013

jmcorgan commented Feb 11, 2014