Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add address-based index #6835

Closed
wants to merge 1 commit into from
Closed

Conversation

@rnicoll
Copy link
Contributor

rnicoll commented Oct 15, 2015

Update of #3652 (which is an update of #2802), to latest master. RPC tests are coming, but I wanted to open this now so it's clear there's work going on, make sure we don't duplicate effort.

@dexX7
dexX7 reviewed Oct 16, 2015
View changes
src/txdb.cpp Outdated
@@ -163,6 +167,40 @@ bool CBlockTreeDB::WriteTxIndex(const std::vector<std::pair<uint256, CDiskTxPos>
return WriteBatch(batch);
}

bool CBlockTreeDB::ReadAddrIndex(const uint160 &addrid, std::vector<CExtDiskTxPos> &list) {
CLevelDBIterator *iter = NewIterator();

This comment has been minimized.

Copy link
@dexX7

dexX7 Oct 16, 2015

Contributor

It looks like you rebased from the original.

This leaks, you'll need to delete the iterator, or use a smart pointer.

This comment has been minimized.

Copy link
@rnicoll

rnicoll Oct 16, 2015

Author Contributor

Thanks, added the missing commit in now

@dcousens

This comment has been minimized.

Copy link
Contributor

dcousens commented Oct 21, 2015

concept ACK

@btcdrak

This comment has been minimized.

Copy link
Member

btcdrak commented Oct 22, 2015

I am very much in favour of a full address indexes as it removes the need for third party applications like insight-api separately parsing the blockchain. However, I'm not sure this PR's approach is the best way. I dont believe it is reorg safe, and there are other issues reported by @gmaxwell CounterpartyXCP/counterparty-lib#783 (comment).

I think overall the approach of #5048 creating a separate index is better (but applied to all addresses rather than just the utxo).

if (pkeyid) {
addrid = static_cast<uint160>(*pkeyid);
} else {
const CScriptID *pscriptid = boost::get<CScriptID>(&dest);

This comment has been minimized.

Copy link
@dcousens

dcousens Oct 22, 2015

Contributor

Why not just always use the HASH160 of the script? That way you could query any UTXO.

This comment has been minimized.

Copy link
@afk11

afk11 Oct 23, 2015

Contributor

+1 as there are more interesting script types than address types. Address lookup could just produce the necessary output script and look that up. Exposing both would be great.

@afk11

This comment has been minimized.

Copy link
Contributor

afk11 commented Oct 23, 2015

Concept ACK, but needs rebase.

@gmaxwell

This comment has been minimized.

Copy link
Contributor

gmaxwell commented Oct 23, 2015

Concept (weak)NAK. (Please don't be upset, I feel that stating my view here is a courtesy over simply saying nothing at all).

I don't see this approach being viable.

There are several interrelated problems, but the foremost of this is that this approach forces its users to use an non-scalable and maintainable method of accessing data in the Bitcoin blockchain-- requiring 50GB+ storage and rapidly growing, where usually a watching wallet is sufficient. This means that user who do adopt this approach will rapidly be pushed off onto trusting third party API services as the resource costs exceed their tolerance.

And we're left then with an invasive change that is used by few users and will not be adequately supported (which we've also seen with this change, that no one has even bothered to diagnose the problems with index consistency that this particular implementation has had).

I would much rather see rapid utxo-only rescan (which would be much less resource intensive, and so more widely used) for watching import and similar functionality. And for indexing beyond that the indexes should be isolated and modularized to make them maintainable for small user bases, or even for out of tree indexing code.

@rnicoll

This comment has been minimized.

Copy link
Contributor Author

rnicoll commented Oct 24, 2015

Wanted to say, I am following the comments here, I'm just prioritising work elsewhere at the moment. Thanks for all the comments, feedback and analysis everyone, they are all appreciated.

  1. Using hash of the full script sounds like an excellent idea.
  2. The index consistency issues are likely arising due to lack of recognition of reorgs happening (and therefore the index becomes inconsistent with the blocks on disk).
  3. Rapid UTXO-scan sounds like a good first step and we can come back to this.

I will look at this in more depth later, but unlikely before November. If anyone else wants to contribute, PRs against my branch would be much appreciated.

1) Maintain a salt to perturbate the address index (protection against
   collisions).
2) Add support for address index entries in the block index, and
   maintain those if -addrindex is specified. It indexes the use of
   every >8-byte data push in output script or consumed script - or in
   case of no such push, the entire script.
3) Add a searchrawtransactions RPC call, which can look up raw
   transactions by address.
@rnicoll rnicoll force-pushed the rnicoll:search-raw-transactions branch to 5c9a794 Oct 24, 2015
@dexX7

This comment has been minimized.

Copy link
Contributor

dexX7 commented Oct 25, 2015

If this patch is going to be overhauled, which I'd like very much, then there are a few points to consider:

  1. filtering out of orphaned transactions: similar to the txdb, orphans are kept in the DB, which caused quite a bit confusion in the past. Currently the way to go is to manually check, whether transactions are indeed part of the main chain (e.g. by checking, whether "confirmations" > 0), but this may be done on a lower level. Some time ago I extended the RPC to have an option to filter on the RPC level, but this may also not be ideal: dexX7@ffde890
  2. use more appropriate data types: verbose (and includeorphans) should be bool, not int.
  3. refine what's actually being indexed: currently a) every data push >=8 bytes, b) if no such pushes, the entire script is indexed. given that only the lookup by destinations is exposed, it may not be useful to index only the parts that are exposed and needed
  4. maybe add additional filters or aggregation calls: for http://redeem.bitwatch.co/ I'm using "listallunspent", which targets unspent outputs (in a somewhat ugly way!), and I've also played with "getallbalance" to retrieve balance information of arbitrary destinations. It's certainly not very fast, but for me an acceptable middle ground.
  5. optionally maintain a mempool based view: DB entries are only written after confirmation, but it would be incredibly useful, if the unconfirmed chain state is available and exposed in one way or another.

Especially to the first and last point a few notes: I'm not sure who else uses the address-index patch, but most notably is probably counterparty-lib. They are manually filtering the results to get rid of orphans, and they are manually fetching the mempool over and over to get a better understanding of the unconfirmed chain state. This seems incredibly wasteful, performance wise, and doing the work directly in Core may provide a significant performance boost.

@dcousens

This comment has been minimized.

Copy link
Contributor

dcousens commented Oct 26, 2015

After a discussion on IRC with @laanwj. I've changed my opinion to concept NACK with the conclusion this should be moved to an application external to bitcoind.

update: I have since written software to do this in an external index, see https://github.com/bitcoinjs/indexd (see this an example implementation)

@btcdrak

This comment has been minimized.

Copy link
Member

btcdrak commented Oct 26, 2015

I was also part of the conversations with @laanwj and @dcousens and I'm also in favour of doing this externally to bitcoind.

@sipa

This comment has been minimized.

Copy link
Member

sipa commented May 20, 2016

I vote to close this; this should be done in an external index. Perhaps at some point when we can a more modular design it can be supported as an optionally buildable module, but let's not complicate the current database tracking code.

@sipa sipa closed this May 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.