txoutsbyaddress index #5048

cozz · 2014-10-05T23:45:42Z

Adds new rpc call "gettxoutsbyaddress" as requested in #4007.
Disabled by default, enabled with -txoutsbyaddressindex.

The index is built from the normal utxo on first use, reindex is not required.
For the GUI there is progress shown, on bitcoind you just need to be patient.

The qa-test includes a simple reorganization.
I tested all 4 code parts in main.cpp by commenting them out and check if the qa-test fails at the expected line.
I had to modify the rpc call for this test, as normally wrong outputs in the address index are not exposed to the user, because the rpc call relies on the normal GetCoins call.
The rest of the code has been tested in random ad-hoc testing.
I only tested linux on my machine.

Its all in 1 commit for laziness reasons, but if splitting in multiple commits would help reviewing, I could do that.

sipa · 2014-10-06T00:15:02Z

Would it not be possible to keep this information outside of (every) CCoinsView? We only care about it for the tip, and not for intermediate caches that are created during validation (i.e., just update it after a block is validated).

cozz · 2014-10-06T16:47:42Z

I can probably move the updates from (Dis)ConnectBlock to (Dis)ConnectTip, and make them with pcoinsTip, if this is what you mean?

I would have to move the CBlockUndo declaration too, if thats ok, because I need that information.

sipa · 2014-10-08T22:04:14Z

I actually meant keeping this out of coins.{h,cpp} entirely. We don't need every (partial copy) of a cache everywhere during validation of blocks and transactions to track modification of this index. Just updating a single independent index (in main, perhaps with an implementation in an independent file) after validation has succeeded should be sufficient.

The reason I ask this is because CCoinsView is both performance and consensus critical, and I'd really like to not complicate reasoning about either of those further.

cozz · 2014-10-09T17:01:22Z

So where exactly should we update the index, (Dis)ConnectTip?
I think the address index needs to be written to disk in the same leveldb-batch with the utxo, dont you?
(Because you are talking about an independent index)
Otherwise, I dont think we can ensure a consistent address index, if we write async from the utxo to disk. Thats the reason for putting this in coins.{h,cpp}, as the connect/disconnect calls rely on the hashBestChain, which is also written to disk in the same leveldb-batch.

sipa · 2014-10-09T19:11:23Z

Right, it's easier if the write is atomic (otherwise you'll need to have a separate catch up function to rebuild the index if it's out of date with the chainstate, and as it's always going to be updated in lockstep with the chainstate on disk, better make it part of it).

One way would be to have CCoinsViewDB take a reference/pointer to a map with extra index entries to write, and have BatchWrite serialize those as well. That's not particularly clean, and it's pretty inconvenient to query as well.

It seems there is not really a better way than to make this part of the CCoinsView framework. I just wished we could keep optional features out of the consensus code...

cozz · 2014-10-11T03:59:23Z

Update:
Moved the code from coins.{h,cpp} to new files coinsbyaddress.{h,cpp},
while the address index is still written to disk together with the utxo.

cozz · 2014-10-16T17:04:36Z

Update: fixed a minor shutdown segfault in InitError case

theuni · 2014-10-16T19:02:41Z

There are lots of objects passed by ref that should be const ref it seems, and several new member functions that aren't marked as const. Beyond the usual const rabbit-hole, it makes this substantially harder to review (for me, anyway). Could you please fix those up?

I made a few changes here as an example, please have a look: theuni@d6c94fa

sipa · 2014-10-16T19:55:03Z

src/txdb.cpp

@@ -57,6 +73,15 @@ bool CCoinsViewDB::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlock) {
        CCoinsMap::iterator itOld = it++;
        mapCoins.erase(itOld);
    }
+    if (fTxOutsByAddressIndex && pcoinsByAddress) // only if -txoutsbyaddressindex


CCoinsViewDB shouldn't assume there is just one pcoinsByAddress global; can you pass it in in the constructor at least?

luke-jr · 2014-10-16T20:00:09Z

How do I lookup a txout by some non-address-represented script?

sipa · 2014-10-16T20:19:28Z

I would suggest storing scriptPubKeys by their hash160 or hash256, rather than storing them in full in the map (requiring an extra heap allocation + 24 bytes of overhead for each). That's also more compact on disk than using the CScriptCompressor.

sipa · 2014-10-16T20:30:12Z

Calling the datastructure ByScript is probably more correct than ByAddress.

sipa · 2014-10-16T20:30:41Z

You can make gettxoutsbyaddress (or equivalent) also take a hex-encoding script for arbitrary script lookups.

cozz · 2014-10-18T04:10:33Z

Updated as requested.

If someone had this compiled before, you need to delete the index with -txoutsbyaddressindex=0 and then rebuild again with -txoutsbyaddressindex for incompatibility reasons. (database key is now hash of script)

luke-jr · 2014-10-18T04:19:38Z

src/coinsbyscript.cpp

+}
+
+uint160 CCoinsViewByScript::getKey(const CScript &script) {
+    return Hash160(script);


Might be better to use Hash160(first-push-in-script-data)

I dont understand.
Is there anything wrong with hashing the complete script?
We dont want 2 different scripts to result in the same hash.

Using a hash of the first push allows one to search by the hash of the first push, even without knowing the full script. This is useful for anyone wanting to find transactions to old pay-to-pubkey scripts.

I sure hope you're not assuming there will never be collisions?

Sure I assume that, why would there be collisions?
Why cant you just hash the full old pay-to-pubkey script, is there any unknown data in those scripts?

I am no bitcoin script expert, but I think it is safer to hash from the first to the last bit, otherwise there is always room for an obvious attack, where I send coins to a script I can spend, but the hash is the same of a script you can spend.

After thinking, I guess what you are suggesting is possible, but then we would need to have 2 rpc-functions:

gettxoutsbyaddress, which is just a wrapper around an internal searchCoinsByFirstPush(..)
and loops through all search results to ensure the script is actually OP_DUP OP_HASH etc.

searchtxoutsbyscriptsfirstpush, which just returns the search results

Not sure, if this is worth the effort though. Are there really usecases, where you only know parts of the script?

Old pay-to-pubkey scripts ( OP_CHECKSIG) were often referred to by a version 0 address with the pubkey hash. If one is searching for those, they likely only have the hash of the pubkey, not the pubkey itself.

Ok I see. But I am unsure about redesigning and introducing a second rpc-call, just to support some old scripts nobody is creating anymore nowadays.
To me this feature is not important, but we can hear some more opinions here. If more people would like to see this implemented, I may consider looking into it.

I agree this feature is not important, but it would be nice to have the index compatible in case anyone ever wanted it.

With a hash(fullscript) index you can still easily search for pay-to-pubkey outputs - but you do need to know the full pubkey. I think that's perfectly acceptable.

btcdrak · 2014-10-26T20:53:53Z

Does this include or intersect/depend on #3652 ?

cozz · 2014-10-27T18:12:12Z

@btcdrak no, they are two independent indexes. This one is more lightweight, but not as powerful.
With this index you can only query the unspent outputs for an address. With the other index you can query all transactions involving the address.

Of course with getrawtransaction you can also query the transaction which created the unspent output, but you can not query the transaction history for an address with this index. Only the current "balance" basically.

Depending on your use-case, this index may be all you need. But if you care about transaction history, you still need the other index. So even if we merge this index into the master, you still may want to have the other index in addition.

btcdrak · 2014-10-27T19:59:41Z

@cozz I've been using the addrindex patch, it's extremely powerful. If we're merging this PR I really think we should also have the other also (with is optional by config).

laanwj · 2014-10-27T20:16:16Z

The advantage of this one to the address index is that it doesn't interfere with pruning. It doesn't require the whole block chain, and is a much smaller index to boot.

I do agree with @sipa that this really doesn't belong in the core consensus code. We're trying to reduce that code to the absolute essentials (also because it will end up in a consensus library at some point), so we should not expand it with functionality that is not required for validation.

btcdrak · 2014-10-27T20:20:49Z

@laanwj You could say the same thing about txindex but these are optional. Even if these end up being split out eventually as the wallet code is for example, there is a real need and real world usecases for these indexes.

laanwj · 2014-10-27T20:27:02Z

One of the goals of moving the consensus and utxo code to a library, is that it will be possible to build additional tools and indexes without having to include everything and the kitchen sink into this project. Indexers can just use our code instead of the other way around.

Note that I don't doubt for a moment that there are real use-cases. But the goal of Bitcoin Core is to provide the core infrastructure, not to satisfy every possible use-case. To keep this project maintainable we need to move away from that monolithic approach.

sipa · 2014-11-04T14:04:18Z

src/leveldbwrapper.h

@@ -148,6 +148,7 @@ class CLevelDBWrapper
    }

    bool WriteBatch(CLevelDBBatch& batch, bool fSync = false) throw(leveldb_error);
+    bool WriteBatch(leveldb::WriteBatch& batch, bool fSync = false) throw(leveldb_error);


Why is this necessary? I'd prefer to not expose the underlying LevelDB-specific datatypes, so the database layer can be swapped out.

sipa · 2014-11-04T14:07:33Z

Thanks for changing the approach, and apologies for the slow response - it again needs rebase now.

ecdsa · 2015-04-28T10:55:33Z

it needs to be rebased again...

cozz · 2015-05-01T12:12:22Z

Rebased.

ecdsa · 2015-05-17T11:32:20Z

hi can you rebase this again? thanks

cozz · 2015-05-18T12:45:19Z

Rebased.

jgarzik · 2015-07-23T18:20:45Z

Leaning towards closing this PR. This garnered some interest, but no ACKs after a long time. It is not fair to ask @cozz to continually rebase if it's not going in, in the short/medium term.

cozz · 2015-07-27T00:43:55Z

Yes, a decision for 0.12 would be good. Otherwise spending more time on this, just feels like a waste to me.

btcdrak · 2015-07-27T07:24:53Z

I have this PR on my list to test this week.

btcdrak · 2015-08-05T09:58:12Z

Sorry for the delay, I finally tested it yesterday and today on a random sample set of 0 confirms and old txs.

Tested ACK

dcousens · 2015-08-08T08:16:30Z

@btcdrak, I haven't tested this yet, but fundamentally what makes this so much larger than your patch?

sipa · 2015-08-08T14:48:23Z

src/txdb.cpp

@@ -35,17 +36,33 @@ void static BatchWriteCoins(CLevelDBBatch &batch, const uint256 &hash, const CCo
        batch.Write(make_pair(DB_COINS, hash), coins);
 }

+void static BatchWriteCoins(CLevelDBBatch &batch, const uint160 &hash, const CCoinsByScript &coins) {


Can you name this function differently? I found it confusing to see it being used by the by-script logic.

sipa · 2015-08-08T15:07:02Z

Agree with @jgarzik, it's not fair to keep this lingering for too long.

I like the use case it solves, but I still dislike the degree to which it's entangled with the CCoinsView. I think this feature should be something that is independent, rather than complicating the code necessary for its core function.

Would it be reasonable to implement it as a separate database file, and have it function like the wallet? Something that registers with CValidationInterface, and listens for new transactions being added (and removed) from the blockchain. It could store its own CBlockLocator (like the wallet does, triggered by CValidationInterface's SetBestChain) so it can "rescan" like the wallet does too.

A by-script index for the mempool could be done separately, using an extra index added after #6470's multiindex for the mempool?

cozz · 2015-08-08T20:14:50Z

Well, I am sorry, I am closing this now. As you may have noticed, I am not contributing to this project anymore, because I disagree with basically anything here and I am doing something else now. If anybody wants to pick this up, feel free to just copy and paste from my code.

Off-topic:
I disagree with most code changes you guys are making. For example main.cpp was much more readable in 0.8 than it is now. Or libconsensus. You guys are very good low-level coders, but you seriously have zero understanding of what encapsulation of problems is. (hint: its not moving code to another file) You are actually making the source code worse in my opinion.

You havent merged my wallet-pulls, which I consider as important bug-fixes because the wallet-performance is just a bug. Instead you consider moving code around as more important. Long-term goal is to remove GUI and wallet even.

And last but not least, the block size. It is your responsibility to give people the latest in technology. This includes scale. Your job here is to max out the block size. This is like if your job is to make a 20000 people stadium full and you only let 1000 people in, with the argument that you can never satisfy demand anyway.

You are acting like, if we go higher than 1MB, then suddenly everybody has to fully trust the miners. The threshold to rent a server for a day and verify all blocks to not be affordable to normal people is very much higher than 1MB.

I believe we need different people in the lead position here to be successful. At least gavin should
be back on the throne with final say on really everything.

So sorry, but long-term I have given up on the project. I am still following a little, so if there are maintanance things for my code, feel free to contact.

sipa · 2015-08-08T20:59:01Z

I'm sorry you feel that way, and I understand the sentiment also partially. I agree moving code to another file is not proper encapsulation, but some of the time, moving is a first step, and an easily reviewable one. Especially the separation of consensus code is not something we want to do without very high assurances behaviour does not change. I think we are making progress, but it is admittedly slow. My only suggestion above here was a more encapsulated approach, by the way, not a NAK... Regarding total complexity: I fully agree the code is not "nice". It wasn't nice when Satoshi disappeared, and it still isn't. Significant parts have been properly encapsulated since (script execution, addrman, wallet, UTXO management, network base, UI interaction, limited size data structures), but at the same time, the code definitely has grown too. I believe that was necessary: higher performance often means the problem becomes more complex (for example the introduction of caches, batch processing, precomputed values, ...). I often very much wish to rewrite everything in a clean way, but that's just not something that would be reviewable. About your wallet patches not being merged: I'm aware. But the truth is that there is little interest in the wallet code, from both users and developers. We need people who are interested in reviewing and testing to make changes. That of course leads to the wallet code growing more and more outdated, reinforcing that circle. One solution to this may be @jonasschnelli's new wallet project, which aims to add a new expiremental wallet from scratch, which would be more competitive in features. Regarding block size: please discuss that on the mailing list. As a Bitcoin Core maintainer, I believe we should merge any consensus change that appears uncontroversial to the community.

cozz · 2015-08-08T22:02:43Z

I didnt expect such an honest response, thanks, that gives you a lot of respect.
It doesnt make me change my mind about this project though.

ghost · 2017-07-26T02:01:32Z

how quick does this returns data ?
has anybody tried it ?
Thanks.

shamoons · 2018-05-05T01:38:19Z

Still open?

cozz force-pushed the cozz8 branch from 714576e to 89c3c50 Compare October 7, 2014 01:44

cozz force-pushed the cozz8 branch 2 times, most recently from a06a1b8 to 0816dca Compare October 11, 2014 03:43

jgarzik mentioned this pull request Oct 13, 2014

Adds transaction index by address (updates #2802 to current master) #3652

Closed

cozz force-pushed the cozz8 branch from 0816dca to 5ca098e Compare October 16, 2014 16:52

sipa reviewed Oct 16, 2014
View reviewed changes

cozz force-pushed the cozz8 branch 2 times, most recently from 995ccff to 5f81fee Compare October 18, 2014 03:56

luke-jr reviewed Oct 18, 2014
View reviewed changes

sipa reviewed Nov 4, 2014
View reviewed changes

cozz force-pushed the cozz8 branch 2 times, most recently from f481aaa to ecfc843 Compare May 1, 2015 12:11

cozz force-pushed the cozz8 branch from ecfc843 to 2d1df9f Compare May 18, 2015 10:47

txoutsbyaddress index

76e96e1

cozz force-pushed the cozz8 branch from 2d1df9f to 76e96e1 Compare May 18, 2015 11:22

dexX7 mentioned this pull request May 30, 2015

Commands for creating transactions from non-wallet addresses OmniLayer/omnicore#61

Closed

sipa reviewed Aug 8, 2015
View reviewed changes

cozz closed this Aug 8, 2015

btcdrak mentioned this pull request Oct 22, 2015

Add address-based index #6835

Closed

This was referenced Sep 1, 2016

Basic SPV address tracking MetacoSA/NBitcoin#186

Closed

txoutsbyaddress index (take 2) #8660

Closed

droark mentioned this pull request Feb 20, 2017

txoutsbyaddress index (take 3) #9806

Closed

TheBlueMatt mentioned this pull request May 9, 2017

[pull request idea] addressindex, spentindex, timestampindex (Bitcore patches) #10370

Closed

bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021

txoutsbyaddress index #5048

txoutsbyaddress index #5048

Conversation

cozz commented Oct 5, 2014

sipa commented Oct 6, 2014

cozz commented Oct 6, 2014

sipa commented Oct 8, 2014

cozz commented Oct 9, 2014

sipa commented Oct 9, 2014

cozz commented Oct 11, 2014

cozz commented Oct 16, 2014

theuni commented Oct 16, 2014

Choose a reason for hiding this comment

luke-jr commented Oct 16, 2014

sipa commented Oct 16, 2014

sipa commented Oct 16, 2014

sipa commented Oct 16, 2014

cozz commented Oct 18, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

btcdrak commented Oct 26, 2014

cozz commented Oct 27, 2014

btcdrak commented Oct 27, 2014

laanwj commented Oct 27, 2014

btcdrak commented Oct 27, 2014

laanwj commented Oct 27, 2014

Choose a reason for hiding this comment

sipa commented Nov 4, 2014

ecdsa commented Apr 28, 2015

cozz commented May 1, 2015

ecdsa commented May 17, 2015

cozz commented May 18, 2015

jgarzik commented Jul 23, 2015

cozz commented Jul 27, 2015

btcdrak commented Jul 27, 2015

btcdrak commented Aug 5, 2015

dcousens commented Aug 8, 2015

Choose a reason for hiding this comment

sipa commented Aug 8, 2015

cozz commented Aug 8, 2015

sipa commented Aug 8, 2015 via email

cozz commented Aug 8, 2015

ghost commented Jul 26, 2017 • edited by ghost Loading

shamoons commented May 5, 2018

ghost commented Jul 26, 2017 •

edited by ghost

Loading