Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an address-based transaction index #2802

Closed
wants to merge 2 commits into from
Closed

Conversation

sipa
Copy link
Member

@sipa sipa commented Jun 27, 2013

Changes:

  • Maintain a salt to perturbate the address index (protection against collisions).
  • Add support for address index entries in the block index, and maintain those if -addrindex is specified. It indexes the use of every >8-byte data push in output script or consumed script - or in case of no such push, the entire script.
  • Add a searchrawtransactions RPC call, which can look up raw transactions by address.

I both hate and love this patch. I hate making it easy to build infrastructure that relies on a fully-indexed blockchain (which shouldn't be necessary), as it may remove the incentive to build more scalable solutions. On the other hand, in cases where the alternative is relying on a trusted third party, this approach is certainly preferable, and would allow an RPC-based blockexplorer, for example.

A full -txindex=1 -addrindex=1 index is about 2.7 GiB now.

@mikehearn
Copy link
Contributor

I'm sure you'll hate this idea even more, but why not expose searchrawtransactions via P2P and advertise it in a service bit? "fast import of a private key" is a pretty common feature request for SPV wallets. I don't think it's particularly useful myself, but apparently other people do and they go and use blockchain.info currently.

@sipa
Copy link
Member Author

sipa commented Jun 28, 2013

We've had this discussion. #2168

@petertodd
Copy link
Contributor

There needs to be some way to find out how many entries are matched by a given address to allow blockexplorer apps's to skip to the most recent transactions and display them first if requested. Failing that, at least have Python-style index ranges to allow you to start from the last entry. Yes, address reuse sucks etc. etc. but it's a feature blockchain.info has and people will want it.

I like the search by PUSHDATA; nice to be able to search for 13MH4zmU4UT4Ct6BhoRFGjigC8gN9a9FNn and see all my multisig addresses. It may be worthwhile to also add scriptSigs to the database as well so that spent multisig P2SH's can be identified.

EDIT: interesting, looks like we have a ExtractDestination() or similar failure with bare OP_CHECKMULTISIG...

@sipa
Copy link
Member Author

sipa commented Jun 28, 2013

@petertodd Thanks for catching those two comments.

There is a count and a skip argument, so you can paginate results (the database is queried each time, but the relevant transactions are only read from disk when actually requested).

@petertodd
Copy link
Contributor

I know you can paginate; I'm saying that you need to be able to paginate in reverse direction. (my understanding is the results are in order of confirmations right?)

@sipa
Copy link
Member Author

sipa commented Jun 28, 2013

@petertodd They are in increasing position on disk, which for now corresponds to number of confirmations (except for transactions in sidechains - yes, those are returned too). I could just reverse the order, I guess.

One problem for the future is that when we'll have headers-first sync and parallel block downloading, block order on disk won't be necessarily consistent anymore with chain order. A solution is of course fetching all blocks, and sorting them by confirmations before pagination, but that's a very significant overhead. Another solution is storing the height of each entry in the index...

@petertodd
Copy link
Contributor

re: sidechains, interesting! That should be documented...

Gah, that does add a decent amount of complexity, although users are going to see it as important that they can see their latest satoshidice crap. :( I don't have a great solution here, although I would lean towards storing height if that's what it takes.

@sipa
Copy link
Member Author

sipa commented Jun 28, 2013

I don't mind the complexity of adding that (though it probably adds several 100 MiB right now already), but I'm really against using an address index as a way to track new payments (which is what this basically boils down to, right?) - you should just have a wallet that watches the chain for that...

@petertodd
Copy link
Contributor

Well, maybe this index is actually slightly premature, and what we need more immediately is a way to search just the UTXO set?

@petertodd
Copy link
Contributor

Or, think in terms of searchrawtransactions should have UTXO-only and full-chain modes?

@sipa
Copy link
Member Author

sipa commented Jun 28, 2013

@petertodd What I want it for is an easy local blockexplorer. And the nice thing about the block tree (and all its indexes) is that they're append only, which is easy implementation-wise.

I agree an address index to the UTXO is useful too, and probably less controversial. It's not implemented however :)

@sipa
Copy link
Member Author

sipa commented Jun 28, 2013

@petertodd Also, searching the UTXO set by address would have a very different interface anyway, as it's a set of transaction outputs, not a set of transactions. It wouldn't make sense in the same RPC command, IMHO.

@petertodd
Copy link
Contributor

@sipa Good point.

Well, seems to me that for a blockexplorer simply being able to iterate forward and reverse should be enough for the UI, and at worse you can add the height index later when the order guarantee breaks. In that case:

searchrawtransactions <address> [skip=0] [count=100] [verbose=1]

Return count transactions with <address> present in their scriptSig,
skipping skip at the beginning. The ordering is oldest transaction first;
if skip is negative the order returned is newest transaction first and skip+1
transactions are skipped. If verbose=0 only txids are returned rather than
the full transactions.

@luke-jr
Copy link
Member

luke-jr commented Jun 28, 2013

How about using a 64-bit hash of the full script instead?

@sipa
Copy link
Member Author

sipa commented Jun 28, 2013

Use the source, Luke!

@sipa
Copy link
Member Author

sipa commented Jun 29, 2013

@luke-jr More seriously, every index entry adds around 10 bytes, and given the current amount of address reuse, those constitute the majority of the database. I've considered adding the entire script too, but that would mean a 50% increase (or more) because of that reason. Plus, if you know the full script, just pick the largest data push in it, and search for that.

@sipa
Copy link
Member Author

sipa commented Jun 29, 2013

@petertodd There is an alternative implementation possible, where we store (height, txoffset) for each index entry instead of (filenum, blockoffset, txoffset) - That's smaller too, and allows consistent ordering. The downside is that it can't support side-chain matches, as heights are not unique.

@mikehearn
Copy link
Contributor

Yes, I know we had the discussion before. Once again, I'll remind us all that we don't have any real power in this. If people can't get the features they need in a decentralised way they'll create a centralised way instead, hence, blockchain.info API. The idea that people will say "oh there's no P2P command to read a blockchain index, guess I won't write that app after all" doesn't seem right to me.

@luke-jr
Copy link
Member

luke-jr commented Jun 29, 2013

@sipa I meant the 64-bit hash instead of the current 64-bit key. Any address can be converted to a script and hashed just fine.

@johndillon
Copy link

@mikehearn If you can't offer a feature in a decentralized way with reasonably low resource consumption you should be happy that centralized services pop up and offer it instead.

If you want to be useful design an API that allows you to pay via micro-transactions those resources you are using by querying a node that has gone to the expense of using @sipa's code to maintain a blockchain index. It would be easy to add this as a service bit and use preferential peering to make it possible for nodes to find peers supporting that API. Such a system would still be decentralized and resource consumption would be paid for in a fair and equitable way.

You do after have a brand new micro-transactions system...

@sipa
Copy link
Member Author

sipa commented Jun 29, 2013

@luke-jr That doesn't allow you to query pay-to-pubkeys given the corresponding address.

@sipa
Copy link
Member Author

sipa commented Jun 29, 2013

@mikehearn I'll agree to have an address index exposed to the P2P network if it can be done in an authenticated way, like Alan Reiner's proposal for exposing an address-indexed committed merkle tree, and even then only for the UTXO set, and not the entire history.

I'm completely opposed to providing any service on the P2P network that requires the entire history being available, except bootstrapping a new full node.

@johndillon
Copy link

@sipa My preference would be to support the full (filenum, blockoffset, txoffset) set myself. Whatever the additional space used is it doesn't seem like a big deal once you've decided to go to the effort of creating the index in the first place. Hard-drive space is cheap.

Incidentally, it does speak to how it would be useful to be able to iterate over every block stored, including orphans, and be able to deliberately add orphans to your database even after the fact. Heh, timestamp your orphan blocks and it would even be reasonable to maintain a set of every orphan ever created after the fact while allowing users to submit new blocks to this database even far into the future. (the timestamp is the anti-spam measure)

@luke-jr
Copy link
Member

luke-jr commented Jun 29, 2013

@sipa Neither does the specialized form you're suggesting...?

@johndillon
Copy link

@luke-jr It does because every PUSHDATA > 20 bytes is indexed by first computing Hash160(data).

Even individual multisigs in a bare OP_CHECKMULTISIG can be searched for as @petertodd pointed out.

@petertodd
Copy link
Contributor

@sipa I'll second @johndillon's thoughts on the full (filenum, blockoffset, txoffset) index.

Clever idea re: an orphan database... It'd be useful to have an index of all children for a given block too, but that can be a different pull-req; by that point we'll have a -all-blockchain-indexes flag...

@luke-jr
Copy link
Member

luke-jr commented Jun 29, 2013

Aha, missed that part.

@sipa
Copy link
Member Author

sipa commented Jun 29, 2013

@petertodd To have a consistent ordering, we'd need (filenum, blockoffset, txoffset, height) even.

@gastonmorixe
Copy link

This is awesome 👍 please, merge it

@sipa
Copy link
Member Author

sipa commented Jul 13, 2013

Rebased, and added a stable ordering (height-based) and negative offsets, to simplify backward pagination.

1) Maintain a salt to perturbate the address index (protection against
   collisions).
2) Add support for address index entries in the block index, and
   maintain those if -addrindex is specified. It indexes the use of
   every >8-byte data push in output script or consumed script - or in
   case of no such push, the entire script.
3) Add a searchrawtransactions RPC call, which can look up raw
   transactions by address.
@sipa
Copy link
Member Author

sipa commented Jul 20, 2013

Rebased and fixed the bugs reported by @petertodd.

@BitcoinPullTester
Copy link

Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/4790f3c823a33fae44b82ef7962372e38b1b0131 for binaries and test log.
This test script verifies pulls every time they are updated. It, however, dies sometimes and fails to test properly. If you are waiting on a test, please check timestamps to verify that the test.log is moving at http://jenkins.bluematt.me/pull-tester/current/
Contact BlueMatt on freenode if something looks broken.

@sipa
Copy link
Member Author

sipa commented Jul 26, 2013

@petertodd Feel free to test the pageability now.

@petertodd
Copy link
Contributor

@sipa Will do after my litecoin audit's done.

@petertodd
Copy link
Contributor

searchrawtransactions 1111111111111111111114oLvT2 doesn't work - looks like the issue is the first few lines in FindTransactionsByDestination() because the keyid is 0, which is correct in that case.

@petertodd
Copy link
Contributor

I'm having good luck with the code otherwise - matches results on blockchain.info, and non-standard oddities like searching for OP_RETURN data work fine. Also results returned appear to all be in correct order.

@petertodd
Copy link
Contributor

Getting some weird results with mwy5FX7MVgDutKYbXBxQG5q7EL6pmhHT58=Hash160('') on testnet, and the equivalent on mainnet. I'm seeing testnet txids a1f6a4ffcfd7bb4775790932aff1f82ac6a9b3b3e76c8faf8b11328e948afcca and 75f7d5e99912875e88d667afb48021b0b74916539c518618a8db4966661509df returned in the results - the former has a standard scriptPubKey, and a nonstandard scriptSig of "1", or hex "51". (IE push the number 1 to the stack) The latter has a totally empty scriptPubkey. Opcode 51 isn't within the range of opcodes 0 to PUSHDATA4, so BuildAddrIndex should have gone to the !fHaveData branch, but that returns Hash160(script), which is definitely not empty.

A search for a script of just opcode 52, Hash160('\x52') doesn't show this problem, (returns empty) and searching for script's without pushdata's like Hash160(''\x61') (op_nop) also works as expected. Oddly though Hash160('\x51\x51'), two pushes of the constant 1 to the stack, doesn't work and returns nothing even though there is a script exactly equal to that.

I'll be honest, I'm a bit mystified, although I haven't delved into a debugger yet.

@gastonmorixe
Copy link

I just wanted to thank all of you for this implementation. Specially sipa who hates it and he did wrote it :)

Here you can see my work in progress from my Rails App, I do all the parsing and get the balances. Thank you all.

http://cl.ly/QNUL

Ton

@gavinandresen
Copy link
Contributor

ACK from me. I think we should pull; edge-case bugs with weird, non-standard transactions I don't care about.

@gmaxwell
Copy link
Contributor

Only reason I see to delay the pull is that any fixes to address misindexing pointed out above may require a complete reindex.

@petertodd
Copy link
Contributor

I disagree given the reason why these tx's aren't working isn't understood yet - could be a symptom of a subtle bug.

@jgarzik
Copy link
Contributor

jgarzik commented Aug 25, 2013

ACK

@ghost
Copy link

ghost commented Oct 21, 2013

Would we be able to see this merged? There's no outstanding issues in it (I've run it for literally months now), and there'd be considerable benefit in having this in the master.

@sipa
Copy link
Member Author

sipa commented Oct 21, 2013

Seems there is a not-understood problem at least (as reported by @petertodd), and I don't plan to work on this any time soon.

@sipa
Copy link
Member Author

sipa commented Oct 21, 2013

In any case, I want to have watch-only wallet support before this, as it is a much more scalable solution for many problems that you would use an address-based index for. I'm closing this for now.

@sipa sipa closed this Oct 21, 2013
@ghost
Copy link

ghost commented Oct 21, 2013

This did however allow for some quite handy local block explorers. Between Abe and BlockExplorer's respective database messes, you're looking 100+ GB of external databases and literally weeks of CPU time. Those not being an option, I'm back forced to query Blockchain.info and Blockexplorer for my data; which is slow and somewhat demonstrates the links between the various addresses to a third party. It also means that I'm reliant on two extremely unstable and latent services for something which -addrindex would have allowed locally.

I realise that I can still just continue on using -addrindex as my own patch, but it would be of benefit to others to have something like it available in the main builds for developers who aren't aware it exists.

@sipa
Copy link
Member Author

sipa commented Oct 21, 2013

Yes, I understand - that was mostly the reason for writing it. If the choice is between people using centralized indexing services and being able to run their own, I certainly would encourage the latter. But on the other hand, if the choice is between people building infrastructure that relies on such indexes being available for wallet services (preventing pruning later on), I certainly would encourage watch-only wallets.

In any case, I'm just closing it to clean up the pull request list. I'll probably get back to it at some point (or someone else may).

@laanwj
Copy link
Member

laanwj commented Oct 21, 2013

@63 if you care a lot about this, why not pick it up yourself and try to resolve the remaining issue and submit a pull?

@jmcorgan
Copy link
Contributor

I've refreshed this branch to work with current master and have issue a new pull request #3652.

dexX7 added a commit to dexX7/bitcoin that referenced this pull request Jun 16, 2014
Credits to sipa, based on the following resources:

bitcoin#2802

sipa@fd867c7

sipa@4790f3c
dexX7 added a commit to dexX7/bitcoin that referenced this pull request Jun 16, 2014
Credits to sipa, based on the following resources:

bitcoin#2802

sipa@fd867c7

sipa@4790f3c
@rnicoll rnicoll mentioned this pull request Oct 15, 2015
Bushstar pushed a commit to Bushstar/omnicore that referenced this pull request Apr 5, 2019
Replace it with regular "governance" category limited LogPrint-s where it makes sense.
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet