Skip to content

Add address-based index#6835

Closed
rnicoll wants to merge 1 commit intobitcoin:masterfrom
rnicoll:search-raw-transactions
Closed

Add address-based index#6835
rnicoll wants to merge 1 commit intobitcoin:masterfrom
rnicoll:search-raw-transactions

Conversation

@rnicoll
Copy link
Copy Markdown
Contributor

@rnicoll rnicoll commented Oct 15, 2015

Update of #3652 (which is an update of #2802), to latest master. RPC tests are coming, but I wanted to open this now so it's clear there's work going on, make sure we don't duplicate effort.

src/txdb.cpp Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you rebased from the original.

This leaks, you'll need to delete the iterator, or use a smart pointer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, added the missing commit in now

@dcousens
Copy link
Copy Markdown
Contributor

concept ACK

@btcdrak
Copy link
Copy Markdown
Contributor

btcdrak commented Oct 22, 2015

I am very much in favour of a full address indexes as it removes the need for third party applications like insight-api separately parsing the blockchain. However, I'm not sure this PR's approach is the best way. I dont believe it is reorg safe, and there are other issues reported by @gmaxwell https://github.com/CounterpartyXCP/counterparty-lib/issues/783#issuecomment-120130217.

I think overall the approach of #5048 creating a separate index is better (but applied to all addresses rather than just the utxo).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just always use the HASH160 of the script? That way you could query any UTXO.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 as there are more interesting script types than address types. Address lookup could just produce the necessary output script and look that up. Exposing both would be great.

@afk11
Copy link
Copy Markdown
Contributor

afk11 commented Oct 23, 2015

Concept ACK, but needs rebase.

@gmaxwell
Copy link
Copy Markdown
Contributor

Concept (weak)NAK. (Please don't be upset, I feel that stating my view here is a courtesy over simply saying nothing at all).

I don't see this approach being viable.

There are several interrelated problems, but the foremost of this is that this approach forces its users to use an non-scalable and maintainable method of accessing data in the Bitcoin blockchain-- requiring 50GB+ storage and rapidly growing, where usually a watching wallet is sufficient. This means that user who do adopt this approach will rapidly be pushed off onto trusting third party API services as the resource costs exceed their tolerance.

And we're left then with an invasive change that is used by few users and will not be adequately supported (which we've also seen with this change, that no one has even bothered to diagnose the problems with index consistency that this particular implementation has had).

I would much rather see rapid utxo-only rescan (which would be much less resource intensive, and so more widely used) for watching import and similar functionality. And for indexing beyond that the indexes should be isolated and modularized to make them maintainable for small user bases, or even for out of tree indexing code.

@rnicoll
Copy link
Copy Markdown
Contributor Author

rnicoll commented Oct 24, 2015

Wanted to say, I am following the comments here, I'm just prioritising work elsewhere at the moment. Thanks for all the comments, feedback and analysis everyone, they are all appreciated.

  1. Using hash of the full script sounds like an excellent idea.
  2. The index consistency issues are likely arising due to lack of recognition of reorgs happening (and therefore the index becomes inconsistent with the blocks on disk).
  3. Rapid UTXO-scan sounds like a good first step and we can come back to this.

I will look at this in more depth later, but unlikely before November. If anyone else wants to contribute, PRs against my branch would be much appreciated.

1) Maintain a salt to perturbate the address index (protection against
   collisions).
2) Add support for address index entries in the block index, and
   maintain those if -addrindex is specified. It indexes the use of
   every >8-byte data push in output script or consumed script - or in
   case of no such push, the entire script.
3) Add a searchrawtransactions RPC call, which can look up raw
   transactions by address.
@rnicoll rnicoll force-pushed the search-raw-transactions branch from 0a42fd5 to 5c9a794 Compare October 24, 2015 19:10
@dexX7
Copy link
Copy Markdown
Contributor

dexX7 commented Oct 25, 2015

If this patch is going to be overhauled, which I'd like very much, then there are a few points to consider:

  1. filtering out of orphaned transactions: similar to the txdb, orphans are kept in the DB, which caused quite a bit confusion in the past. Currently the way to go is to manually check, whether transactions are indeed part of the main chain (e.g. by checking, whether "confirmations" > 0), but this may be done on a lower level. Some time ago I extended the RPC to have an option to filter on the RPC level, but this may also not be ideal: dexX7@ffde890
  2. use more appropriate data types: verbose (and includeorphans) should be bool, not int.
  3. refine what's actually being indexed: currently a) every data push >=8 bytes, b) if no such pushes, the entire script is indexed. given that only the lookup by destinations is exposed, it may not be useful to index only the parts that are exposed and needed
  4. maybe add additional filters or aggregation calls: for http://redeem.bitwatch.co/ I'm using "listallunspent", which targets unspent outputs (in a somewhat ugly way!), and I've also played with "getallbalance" to retrieve balance information of arbitrary destinations. It's certainly not very fast, but for me an acceptable middle ground.
  5. optionally maintain a mempool based view: DB entries are only written after confirmation, but it would be incredibly useful, if the unconfirmed chain state is available and exposed in one way or another.

Especially to the first and last point a few notes: I'm not sure who else uses the address-index patch, but most notably is probably counterparty-lib. They are manually filtering the results to get rid of orphans, and they are manually fetching the mempool over and over to get a better understanding of the unconfirmed chain state. This seems incredibly wasteful, performance wise, and doing the work directly in Core may provide a significant performance boost.

@dcousens
Copy link
Copy Markdown
Contributor

dcousens commented Oct 26, 2015

After a discussion on IRC with @laanwj. I've changed my opinion to concept NACK with the conclusion this should be moved to an application external to bitcoind.

update: I have since written software to do this in an external index, see https://github.com/bitcoinjs/indexd (see this an example implementation)

@btcdrak
Copy link
Copy Markdown
Contributor

btcdrak commented Oct 26, 2015

I was also part of the conversations with @laanwj and @dcousens and I'm also in favour of doing this externally to bitcoind.

@sipa
Copy link
Copy Markdown
Member

sipa commented May 20, 2016

I vote to close this; this should be done in an external index. Perhaps at some point when we can a more modular design it can be supported as an optionally buildable module, but let's not complicate the current database tracking code.

@sipa sipa closed this May 25, 2016
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants