Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

txoutsbyaddress index #5048

Closed
wants to merge 1 commit into from
Closed

txoutsbyaddress index #5048

wants to merge 1 commit into from

Conversation

cozz
Copy link
Contributor

@cozz cozz commented Oct 5, 2014

Adds new rpc call "gettxoutsbyaddress" as requested in #4007.
Disabled by default, enabled with -txoutsbyaddressindex.

The index is built from the normal utxo on first use, reindex is not required.
For the GUI there is progress shown, on bitcoind you just need to be patient.

The qa-test includes a simple reorganization.
I tested all 4 code parts in main.cpp by commenting them out and check if the qa-test fails at the expected line.
I had to modify the rpc call for this test, as normally wrong outputs in the address index are not exposed to the user, because the rpc call relies on the normal GetCoins call.
The rest of the code has been tested in random ad-hoc testing.
I only tested linux on my machine.

Its all in 1 commit for laziness reasons, but if splitting in multiple commits would help reviewing, I could do that.

txoutsbyaddress

txoutsbyaddress3

@sipa
Copy link
Member

sipa commented Oct 6, 2014

Would it not be possible to keep this information outside of (every) CCoinsView? We only care about it for the tip, and not for intermediate caches that are created during validation (i.e., just update it after a block is validated).

@cozz
Copy link
Contributor Author

cozz commented Oct 6, 2014

I can probably move the updates from (Dis)ConnectBlock to (Dis)ConnectTip, and make them with pcoinsTip, if this is what you mean?

I would have to move the CBlockUndo declaration too, if thats ok, because I need that information.

@sipa
Copy link
Member

sipa commented Oct 8, 2014

I actually meant keeping this out of coins.{h,cpp} entirely. We don't need every (partial copy) of a cache everywhere during validation of blocks and transactions to track modification of this index. Just updating a single independent index (in main, perhaps with an implementation in an independent file) after validation has succeeded should be sufficient.

The reason I ask this is because CCoinsView is both performance and consensus critical, and I'd really like to not complicate reasoning about either of those further.

@cozz
Copy link
Contributor Author

cozz commented Oct 9, 2014

So where exactly should we update the index, (Dis)ConnectTip?
I think the address index needs to be written to disk in the same leveldb-batch with the utxo, dont you?
(Because you are talking about an independent index)
Otherwise, I dont think we can ensure a consistent address index, if we write async from the utxo to disk. Thats the reason for putting this in coins.{h,cpp}, as the connect/disconnect calls rely on the hashBestChain, which is also written to disk in the same leveldb-batch.

@sipa
Copy link
Member

sipa commented Oct 9, 2014

Right, it's easier if the write is atomic (otherwise you'll need to have a separate catch up function to rebuild the index if it's out of date with the chainstate, and as it's always going to be updated in lockstep with the chainstate on disk, better make it part of it).

One way would be to have CCoinsViewDB take a reference/pointer to a map with extra index entries to write, and have BatchWrite serialize those as well. That's not particularly clean, and it's pretty inconvenient to query as well.

It seems there is not really a better way than to make this part of the CCoinsView framework. I just wished we could keep optional features out of the consensus code...

@cozz cozz force-pushed the cozz8 branch 2 times, most recently from a06a1b8 to 0816dca Compare October 11, 2014 03:43
@cozz
Copy link
Contributor Author

cozz commented Oct 11, 2014

Update:
Moved the code from coins.{h,cpp} to new files coinsbyaddress.{h,cpp},
while the address index is still written to disk together with the utxo.

@cozz
Copy link
Contributor Author

cozz commented Oct 16, 2014

Update: fixed a minor shutdown segfault in InitError case

@theuni
Copy link
Member

theuni commented Oct 16, 2014

There are lots of objects passed by ref that should be const ref it seems, and several new member functions that aren't marked as const. Beyond the usual const rabbit-hole, it makes this substantially harder to review (for me, anyway). Could you please fix those up?

I made a few changes here as an example, please have a look: theuni@d6c94fa

@@ -57,6 +73,15 @@ bool CCoinsViewDB::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlock) {
CCoinsMap::iterator itOld = it++;
mapCoins.erase(itOld);
}
if (fTxOutsByAddressIndex && pcoinsByAddress) // only if -txoutsbyaddressindex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CCoinsViewDB shouldn't assume there is just one pcoinsByAddress global; can you pass it in in the constructor at least?

@luke-jr
Copy link
Member

luke-jr commented Oct 16, 2014

How do I lookup a txout by some non-address-represented script?

@sipa
Copy link
Member

sipa commented Oct 16, 2014

I would suggest storing scriptPubKeys by their hash160 or hash256, rather than storing them in full in the map (requiring an extra heap allocation + 24 bytes of overhead for each). That's also more compact on disk than using the CScriptCompressor.

@sipa
Copy link
Member

sipa commented Oct 16, 2014

Calling the datastructure ByScript is probably more correct than ByAddress.

@sipa
Copy link
Member

sipa commented Oct 16, 2014

You can make gettxoutsbyaddress (or equivalent) also take a hex-encoding script for arbitrary script lookups.

@cozz cozz force-pushed the cozz8 branch 2 times, most recently from 995ccff to 5f81fee Compare October 18, 2014 03:56
@cozz
Copy link
Contributor Author

cozz commented Oct 18, 2014

Updated as requested.

If someone had this compiled before, you need to delete the index with -txoutsbyaddressindex=0 and then rebuild again with -txoutsbyaddressindex for incompatibility reasons. (database key is now hash of script)

}

uint160 CCoinsViewByScript::getKey(const CScript &script) {
return Hash160(script);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to use Hash160(first-push-in-script-data)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont understand.
Is there anything wrong with hashing the complete script?
We dont want 2 different scripts to result in the same hash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a hash of the first push allows one to search by the hash of the first push, even without knowing the full script. This is useful for anyone wanting to find transactions to old pay-to-pubkey scripts.

I sure hope you're not assuming there will never be collisions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I assume that, why would there be collisions?
Why cant you just hash the full old pay-to-pubkey script, is there any unknown data in those scripts?

I am no bitcoin script expert, but I think it is safer to hash from the first to the last bit, otherwise there is always room for an obvious attack, where I send coins to a script I can spend, but the hash is the same of a script you can spend.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking, I guess what you are suggesting is possible, but then we would need to have 2 rpc-functions:

  • gettxoutsbyaddress, which is just a wrapper around an internal searchCoinsByFirstPush(..)
    and loops through all search results to ensure the script is actually OP_DUP OP_HASH etc.
  • searchtxoutsbyscriptsfirstpush, which just returns the search results

Not sure, if this is worth the effort though. Are there really usecases, where you only know parts of the script?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old pay-to-pubkey scripts ( OP_CHECKSIG) were often referred to by a version 0 address with the pubkey hash. If one is searching for those, they likely only have the hash of the pubkey, not the pubkey itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see. But I am unsure about redesigning and introducing a second rpc-call, just to support some old scripts nobody is creating anymore nowadays.
To me this feature is not important, but we can hear some more opinions here. If more people would like to see this implemented, I may consider looking into it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this feature is not important, but it would be nice to have the index compatible in case anyone ever wanted it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a hash(fullscript) index you can still easily search for pay-to-pubkey outputs - but you do need to know the full pubkey. I think that's perfectly acceptable.

@btcdrak
Copy link
Contributor

btcdrak commented Oct 26, 2014

Does this include or intersect/depend on #3652 ?

@cozz
Copy link
Contributor Author

cozz commented Oct 27, 2014

@btcdrak no, they are two independent indexes. This one is more lightweight, but not as powerful.
With this index you can only query the unspent outputs for an address. With the other index you can query all transactions involving the address.

Of course with getrawtransaction you can also query the transaction which created the unspent output, but you can not query the transaction history for an address with this index. Only the current "balance" basically.

Depending on your use-case, this index may be all you need. But if you care about transaction history, you still need the other index. So even if we merge this index into the master, you still may want to have the other index in addition.

@btcdrak
Copy link
Contributor

btcdrak commented Oct 27, 2014

@cozz I've been using the addrindex patch, it's extremely powerful. If we're merging this PR I really think we should also have the other also (with is optional by config).

@laanwj
Copy link
Member

laanwj commented Oct 27, 2014

The advantage of this one to the address index is that it doesn't interfere with pruning. It doesn't require the whole block chain, and is a much smaller index to boot.

I do agree with @sipa that this really doesn't belong in the core consensus code. We're trying to reduce that code to the absolute essentials (also because it will end up in a consensus library at some point), so we should not expand it with functionality that is not required for validation.

@btcdrak
Copy link
Contributor

btcdrak commented Oct 27, 2014

@laanwj You could say the same thing about txindex but these are optional. Even if these end up being split out eventually as the wallet code is for example, there is a real need and real world usecases for these indexes.

@laanwj
Copy link
Member

laanwj commented Oct 27, 2014

One of the goals of moving the consensus and utxo code to a library, is that it will be possible to build additional tools and indexes without having to include everything and the kitchen sink into this project. Indexers can just use our code instead of the other way around.

Note that I don't doubt for a moment that there are real use-cases. But the goal of Bitcoin Core is to provide the core infrastructure, not to satisfy every possible use-case. To keep this project maintainable we need to move away from that monolithic approach.

@@ -148,6 +148,7 @@ class CLevelDBWrapper
}

bool WriteBatch(CLevelDBBatch& batch, bool fSync = false) throw(leveldb_error);
bool WriteBatch(leveldb::WriteBatch& batch, bool fSync = false) throw(leveldb_error);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary? I'd prefer to not expose the underlying LevelDB-specific datatypes, so the database layer can be swapped out.

@sipa
Copy link
Member

sipa commented Nov 4, 2014

Thanks for changing the approach, and apologies for the slow response - it again needs rebase now.

@ecdsa
Copy link

ecdsa commented Apr 28, 2015

it needs to be rebased again...

@cozz cozz force-pushed the cozz8 branch 2 times, most recently from f481aaa to ecfc843 Compare May 1, 2015 12:11
@cozz
Copy link
Contributor Author

cozz commented May 1, 2015

Rebased.

@ecdsa
Copy link

ecdsa commented May 17, 2015

hi can you rebase this again? thanks

@cozz
Copy link
Contributor Author

cozz commented May 18, 2015

Rebased.

@jgarzik
Copy link
Contributor

jgarzik commented Jul 23, 2015

Leaning towards closing this PR. This garnered some interest, but no ACKs after a long time. It is not fair to ask @cozz to continually rebase if it's not going in, in the short/medium term.

@cozz
Copy link
Contributor Author

cozz commented Jul 27, 2015

Yes, a decision for 0.12 would be good. Otherwise spending more time on this, just feels like a waste to me.

@btcdrak
Copy link
Contributor

btcdrak commented Jul 27, 2015

I have this PR on my list to test this week.

@btcdrak
Copy link
Contributor

btcdrak commented Aug 5, 2015

Sorry for the delay, I finally tested it yesterday and today on a random sample set of 0 confirms and old txs.

Tested ACK

@dcousens
Copy link
Contributor

dcousens commented Aug 8, 2015

@btcdrak, I haven't tested this yet, but fundamentally what makes this so much larger than your patch?

@@ -35,17 +36,33 @@ void static BatchWriteCoins(CLevelDBBatch &batch, const uint256 &hash, const CCo
batch.Write(make_pair(DB_COINS, hash), coins);
}

void static BatchWriteCoins(CLevelDBBatch &batch, const uint160 &hash, const CCoinsByScript &coins) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you name this function differently? I found it confusing to see it being used by the by-script logic.

@sipa
Copy link
Member

sipa commented Aug 8, 2015

Agree with @jgarzik, it's not fair to keep this lingering for too long.

I like the use case it solves, but I still dislike the degree to which it's entangled with the CCoinsView. I think this feature should be something that is independent, rather than complicating the code necessary for its core function.

Would it be reasonable to implement it as a separate database file, and have it function like the wallet? Something that registers with CValidationInterface, and listens for new transactions being added (and removed) from the blockchain. It could store its own CBlockLocator (like the wallet does, triggered by CValidationInterface's SetBestChain) so it can "rescan" like the wallet does too.

A by-script index for the mempool could be done separately, using an extra index added after #6470's multiindex for the mempool?

@cozz
Copy link
Contributor Author

cozz commented Aug 8, 2015

Well, I am sorry, I am closing this now. As you may have noticed, I am not contributing to this project anymore, because I disagree with basically anything here and I am doing something else now. If anybody wants to pick this up, feel free to just copy and paste from my code.

Off-topic:
I disagree with most code changes you guys are making. For example main.cpp was much more readable in 0.8 than it is now. Or libconsensus. You guys are very good low-level coders, but you seriously have zero understanding of what encapsulation of problems is. (hint: its not moving code to another file) You are actually making the source code worse in my opinion.

You havent merged my wallet-pulls, which I consider as important bug-fixes because the wallet-performance is just a bug. Instead you consider moving code around as more important. Long-term goal is to remove GUI and wallet even.

And last but not least, the block size. It is your responsibility to give people the latest in technology. This includes scale. Your job here is to max out the block size. This is like if your job is to make a 20000 people stadium full and you only let 1000 people in, with the argument that you can never satisfy demand anyway.

You are acting like, if we go higher than 1MB, then suddenly everybody has to fully trust the miners. The threshold to rent a server for a day and verify all blocks to not be affordable to normal people is very much higher than 1MB.

I believe we need different people in the lead position here to be successful. At least gavin should
be back on the throne with final say on really everything.

So sorry, but long-term I have given up on the project. I am still following a little, so if there are maintanance things for my code, feel free to contact.

@cozz cozz closed this Aug 8, 2015
@sipa
Copy link
Member

sipa commented Aug 8, 2015 via email

@cozz
Copy link
Contributor Author

cozz commented Aug 8, 2015

I didnt expect such an honest response, thanks, that gives you a lot of respect.
It doesnt make me change my mind about this project though.

@ghost
Copy link

ghost commented Jul 26, 2017

how quick does this returns data ?
has anybody tried it ?
Thanks.

@shamoons
Copy link

shamoons commented May 5, 2018

Still open?

@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet