Add address-based index by rnicoll · Pull Request #6835 · bitcoin/bitcoin

rnicoll · 2015-10-15T18:17:01Z

Update of #3652 (which is an update of #2802), to latest master. RPC tests are coming, but I wanted to open this now so it's clear there's work going on, make sure we don't duplicate effort.

dexX7 · 2015-10-16T13:53:15Z

src/txdb.cpp

It looks like you rebased from the original.

This leaks, you'll need to delete the iterator, or use a smart pointer.

Thanks, added the missing commit in now

dcousens · 2015-10-21T22:38:46Z

~~concept ACK~~

btcdrak · 2015-10-22T09:23:41Z

I am very much in favour of a full address indexes as it removes the need for third party applications like insight-api separately parsing the blockchain. However, I'm not sure this PR's approach is the best way. I dont believe it is reorg safe, and there are other issues reported by @gmaxwell https://github.com/CounterpartyXCP/counterparty-lib/issues/783#issuecomment-120130217.

I think overall the approach of #5048 creating a separate index is better (but applied to all addresses rather than just the utxo).

dcousens · 2015-10-22T09:59:37Z

src/main.cpp

Why not just always use the HASH160 of the script? That way you could query any UTXO.

+1 as there are more interesting script types than address types. Address lookup could just produce the necessary output script and look that up. Exposing both would be great.

afk11 · 2015-10-23T20:12:50Z

Concept ACK, but needs rebase.

gmaxwell · 2015-10-23T22:21:52Z

Concept (weak)NAK. (Please don't be upset, I feel that stating my view here is a courtesy over simply saying nothing at all).

I don't see this approach being viable.

There are several interrelated problems, but the foremost of this is that this approach forces its users to use an non-scalable and maintainable method of accessing data in the Bitcoin blockchain-- requiring 50GB+ storage and rapidly growing, where usually a watching wallet is sufficient. This means that user who do adopt this approach will rapidly be pushed off onto trusting third party API services as the resource costs exceed their tolerance.

And we're left then with an invasive change that is used by few users and will not be adequately supported (which we've also seen with this change, that no one has even bothered to diagnose the problems with index consistency that this particular implementation has had).

I would much rather see rapid utxo-only rescan (which would be much less resource intensive, and so more widely used) for watching import and similar functionality. And for indexing beyond that the indexes should be isolated and modularized to make them maintainable for small user bases, or even for out of tree indexing code.

rnicoll · 2015-10-24T18:36:46Z

Wanted to say, I am following the comments here, I'm just prioritising work elsewhere at the moment. Thanks for all the comments, feedback and analysis everyone, they are all appreciated.

Using hash of the full script sounds like an excellent idea.
The index consistency issues are likely arising due to lack of recognition of reorgs happening (and therefore the index becomes inconsistent with the blocks on disk).
Rapid UTXO-scan sounds like a good first step and we can come back to this.

I will look at this in more depth later, but unlikely before November. If anyone else wants to contribute, PRs against my branch would be much appreciated.

1) Maintain a salt to perturbate the address index (protection against collisions). 2) Add support for address index entries in the block index, and maintain those if -addrindex is specified. It indexes the use of every >8-byte data push in output script or consumed script - or in case of no such push, the entire script. 3) Add a searchrawtransactions RPC call, which can look up raw transactions by address.

dexX7 · 2015-10-25T13:09:03Z

If this patch is going to be overhauled, which I'd like very much, then there are a few points to consider:

filtering out of orphaned transactions: similar to the txdb, orphans are kept in the DB, which caused quite a bit confusion in the past. Currently the way to go is to manually check, whether transactions are indeed part of the main chain (e.g. by checking, whether "confirmations" > 0), but this may be done on a lower level. Some time ago I extended the RPC to have an option to filter on the RPC level, but this may also not be ideal: dexX7@ffde890
use more appropriate data types: verbose (and includeorphans) should be bool, not int.
refine what's actually being indexed: currently a) every data push >=8 bytes, b) if no such pushes, the entire script is indexed. given that only the lookup by destinations is exposed, it may not be useful to index only the parts that are exposed and needed
maybe add additional filters or aggregation calls: for http://redeem.bitwatch.co/ I'm using "listallunspent", which targets unspent outputs (in a somewhat ugly way!), and I've also played with "getallbalance" to retrieve balance information of arbitrary destinations. It's certainly not very fast, but for me an acceptable middle ground.
optionally maintain a mempool based view: DB entries are only written after confirmation, but it would be incredibly useful, if the unconfirmed chain state is available and exposed in one way or another.

Especially to the first and last point a few notes: I'm not sure who else uses the address-index patch, but most notably is probably counterparty-lib. They are manually filtering the results to get rid of orphans, and they are manually fetching the mempool over and over to get a better understanding of the unconfirmed chain state. This seems incredibly wasteful, performance wise, and doing the work directly in Core may provide a significant performance boost.

dcousens · 2015-10-26T10:24:05Z

After a discussion on IRC with @laanwj. I've changed my opinion to concept NACK with the conclusion this should be moved to an application external to bitcoind.

update: I have since written software to do this in an external index, see https://github.com/bitcoinjs/indexd (see this an example implementation)

btcdrak · 2015-10-26T11:03:56Z

I was also part of the conversations with @laanwj and @dcousens and I'm also in favour of doing this externally to bitcoind.

sipa · 2016-05-20T16:18:37Z

I vote to close this; this should be done in an external index. Perhaps at some point when we can a more modular design it can be supported as an optionally buildable module, but let's not complicate the current database tracking code.

dexX7 reviewed Oct 16, 2015
View reviewed changes

laanwj added the UTXO Db and Indexes label Oct 20, 2015

dcousens reviewed Oct 22, 2015
View reviewed changes

rnicoll force-pushed the search-raw-transactions branch from 0a42fd5 to 5c9a794 Compare October 24, 2015 19:10

sipa closed this May 25, 2016

bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021

Conversation

rnicoll commented Oct 15, 2015

Uh oh!

dexX7 Oct 16, 2015

Choose a reason for hiding this comment

Uh oh!

rnicoll Oct 16, 2015

Choose a reason for hiding this comment

Uh oh!

dcousens commented Oct 21, 2015

Uh oh!

btcdrak commented Oct 22, 2015

Uh oh!

dcousens Oct 22, 2015

Choose a reason for hiding this comment

Uh oh!

afk11 Oct 23, 2015

Choose a reason for hiding this comment

Uh oh!

afk11 commented Oct 23, 2015

Uh oh!

gmaxwell commented Oct 23, 2015

Uh oh!

rnicoll commented Oct 24, 2015

Uh oh!

dexX7 commented Oct 25, 2015

Uh oh!

dcousens commented Oct 26, 2015 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

btcdrak commented Oct 26, 2015

Uh oh!

sipa commented May 20, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

dcousens commented Oct 26, 2015 •

edited

Loading