-
Notifications
You must be signed in to change notification settings - Fork 35.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
txoutsbyaddress index #5048
txoutsbyaddress index #5048
Conversation
Would it not be possible to keep this information outside of (every) CCoinsView? We only care about it for the tip, and not for intermediate caches that are created during validation (i.e., just update it after a block is validated). |
I can probably move the updates from (Dis)ConnectBlock to (Dis)ConnectTip, and make them with pcoinsTip, if this is what you mean? I would have to move the CBlockUndo declaration too, if thats ok, because I need that information. |
I actually meant keeping this out of coins.{h,cpp} entirely. We don't need every (partial copy) of a cache everywhere during validation of blocks and transactions to track modification of this index. Just updating a single independent index (in main, perhaps with an implementation in an independent file) after validation has succeeded should be sufficient. The reason I ask this is because CCoinsView is both performance and consensus critical, and I'd really like to not complicate reasoning about either of those further. |
So where exactly should we update the index, (Dis)ConnectTip? |
Right, it's easier if the write is atomic (otherwise you'll need to have a separate catch up function to rebuild the index if it's out of date with the chainstate, and as it's always going to be updated in lockstep with the chainstate on disk, better make it part of it). One way would be to have CCoinsViewDB take a reference/pointer to a map with extra index entries to write, and have BatchWrite serialize those as well. That's not particularly clean, and it's pretty inconvenient to query as well. It seems there is not really a better way than to make this part of the CCoinsView framework. I just wished we could keep optional features out of the consensus code... |
a06a1b8
to
0816dca
Compare
Update: |
Update: fixed a minor shutdown segfault in InitError case |
There are lots of objects passed by ref that should be const ref it seems, and several new member functions that aren't marked as const. Beyond the usual const rabbit-hole, it makes this substantially harder to review (for me, anyway). Could you please fix those up? I made a few changes here as an example, please have a look: theuni@d6c94fa |
@@ -57,6 +73,15 @@ bool CCoinsViewDB::BatchWrite(CCoinsMap &mapCoins, const uint256 &hashBlock) { | |||
CCoinsMap::iterator itOld = it++; | |||
mapCoins.erase(itOld); | |||
} | |||
if (fTxOutsByAddressIndex && pcoinsByAddress) // only if -txoutsbyaddressindex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CCoinsViewDB shouldn't assume there is just one pcoinsByAddress global; can you pass it in in the constructor at least?
How do I lookup a txout by some non-address-represented script? |
I would suggest storing scriptPubKeys by their hash160 or hash256, rather than storing them in full in the map (requiring an extra heap allocation + 24 bytes of overhead for each). That's also more compact on disk than using the CScriptCompressor. |
Calling the datastructure ByScript is probably more correct than ByAddress. |
You can make gettxoutsbyaddress (or equivalent) also take a hex-encoding script for arbitrary script lookups. |
995ccff
to
5f81fee
Compare
Updated as requested. If someone had this compiled before, you need to delete the index with -txoutsbyaddressindex=0 and then rebuild again with -txoutsbyaddressindex for incompatibility reasons. (database key is now hash of script) |
} | ||
|
||
uint160 CCoinsViewByScript::getKey(const CScript &script) { | ||
return Hash160(script); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to use Hash160(first-push-in-script-data)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont understand.
Is there anything wrong with hashing the complete script?
We dont want 2 different scripts to result in the same hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a hash of the first push allows one to search by the hash of the first push, even without knowing the full script. This is useful for anyone wanting to find transactions to old pay-to-pubkey scripts.
I sure hope you're not assuming there will never be collisions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I assume that, why would there be collisions?
Why cant you just hash the full old pay-to-pubkey script, is there any unknown data in those scripts?
I am no bitcoin script expert, but I think it is safer to hash from the first to the last bit, otherwise there is always room for an obvious attack, where I send coins to a script I can spend, but the hash is the same of a script you can spend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking, I guess what you are suggesting is possible, but then we would need to have 2 rpc-functions:
- gettxoutsbyaddress, which is just a wrapper around an internal searchCoinsByFirstPush(..)
and loops through all search results to ensure the script is actually OP_DUP OP_HASH etc. - searchtxoutsbyscriptsfirstpush, which just returns the search results
Not sure, if this is worth the effort though. Are there really usecases, where you only know parts of the script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old pay-to-pubkey scripts ( OP_CHECKSIG) were often referred to by a version 0 address with the pubkey hash. If one is searching for those, they likely only have the hash of the pubkey, not the pubkey itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see. But I am unsure about redesigning and introducing a second rpc-call, just to support some old scripts nobody is creating anymore nowadays.
To me this feature is not important, but we can hear some more opinions here. If more people would like to see this implemented, I may consider looking into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this feature is not important, but it would be nice to have the index compatible in case anyone ever wanted it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a hash(fullscript) index you can still easily search for pay-to-pubkey outputs - but you do need to know the full pubkey. I think that's perfectly acceptable.
Does this include or intersect/depend on #3652 ? |
@btcdrak no, they are two independent indexes. This one is more lightweight, but not as powerful. Of course with getrawtransaction you can also query the transaction which created the unspent output, but you can not query the transaction history for an address with this index. Only the current "balance" basically. Depending on your use-case, this index may be all you need. But if you care about transaction history, you still need the other index. So even if we merge this index into the master, you still may want to have the other index in addition. |
@cozz I've been using the addrindex patch, it's extremely powerful. If we're merging this PR I really think we should also have the other also (with is optional by config). |
The advantage of this one to the address index is that it doesn't interfere with pruning. It doesn't require the whole block chain, and is a much smaller index to boot. I do agree with @sipa that this really doesn't belong in the core consensus code. We're trying to reduce that code to the absolute essentials (also because it will end up in a consensus library at some point), so we should not expand it with functionality that is not required for validation. |
@laanwj You could say the same thing about |
One of the goals of moving the consensus and utxo code to a library, is that it will be possible to build additional tools and indexes without having to include everything and the kitchen sink into this project. Indexers can just use our code instead of the other way around. Note that I don't doubt for a moment that there are real use-cases. But the goal of Bitcoin Core is to provide the core infrastructure, not to satisfy every possible use-case. To keep this project maintainable we need to move away from that monolithic approach. |
@@ -148,6 +148,7 @@ class CLevelDBWrapper | |||
} | |||
|
|||
bool WriteBatch(CLevelDBBatch& batch, bool fSync = false) throw(leveldb_error); | |||
bool WriteBatch(leveldb::WriteBatch& batch, bool fSync = false) throw(leveldb_error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary? I'd prefer to not expose the underlying LevelDB-specific datatypes, so the database layer can be swapped out.
Thanks for changing the approach, and apologies for the slow response - it again needs rebase now. |
it needs to be rebased again... |
f481aaa
to
ecfc843
Compare
Rebased. |
hi can you rebase this again? thanks |
Rebased. |
Leaning towards closing this PR. This garnered some interest, but no ACKs after a long time. It is not fair to ask @cozz to continually rebase if it's not going in, in the short/medium term. |
Yes, a decision for 0.12 would be good. Otherwise spending more time on this, just feels like a waste to me. |
I have this PR on my list to test this week. |
Sorry for the delay, I finally tested it yesterday and today on a random sample set of 0 confirms and old txs. Tested ACK |
@btcdrak, I haven't tested this yet, but fundamentally what makes this so much larger than your patch? |
@@ -35,17 +36,33 @@ void static BatchWriteCoins(CLevelDBBatch &batch, const uint256 &hash, const CCo | |||
batch.Write(make_pair(DB_COINS, hash), coins); | |||
} | |||
|
|||
void static BatchWriteCoins(CLevelDBBatch &batch, const uint160 &hash, const CCoinsByScript &coins) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you name this function differently? I found it confusing to see it being used by the by-script logic.
Agree with @jgarzik, it's not fair to keep this lingering for too long. I like the use case it solves, but I still dislike the degree to which it's entangled with the CCoinsView. I think this feature should be something that is independent, rather than complicating the code necessary for its core function. Would it be reasonable to implement it as a separate database file, and have it function like the wallet? Something that registers with CValidationInterface, and listens for new transactions being added (and removed) from the blockchain. It could store its own CBlockLocator (like the wallet does, triggered by CValidationInterface's SetBestChain) so it can "rescan" like the wallet does too. A by-script index for the mempool could be done separately, using an extra index added after #6470's multiindex for the mempool? |
Well, I am sorry, I am closing this now. As you may have noticed, I am not contributing to this project anymore, because I disagree with basically anything here and I am doing something else now. If anybody wants to pick this up, feel free to just copy and paste from my code. Off-topic: You havent merged my wallet-pulls, which I consider as important bug-fixes because the wallet-performance is just a bug. Instead you consider moving code around as more important. Long-term goal is to remove GUI and wallet even. And last but not least, the block size. It is your responsibility to give people the latest in technology. This includes scale. Your job here is to max out the block size. This is like if your job is to make a 20000 people stadium full and you only let 1000 people in, with the argument that you can never satisfy demand anyway. You are acting like, if we go higher than 1MB, then suddenly everybody has to fully trust the miners. The threshold to rent a server for a day and verify all blocks to not be affordable to normal people is very much higher than 1MB. I believe we need different people in the lead position here to be successful. At least gavin should So sorry, but long-term I have given up on the project. I am still following a little, so if there are maintanance things for my code, feel free to contact. |
I'm sorry you feel that way, and I understand the sentiment also partially.
I agree moving code to another file is not proper encapsulation, but some
of the time, moving is a first step, and an easily reviewable one.
Especially the separation of consensus code is not something we want to do
without very high assurances behaviour does not change. I think we are
making progress, but it is admittedly slow.
My only suggestion above here was a more encapsulated approach, by the way,
not a NAK...
Regarding total complexity: I fully agree the code is not "nice". It wasn't
nice when Satoshi disappeared, and it still isn't. Significant parts have
been properly encapsulated since (script execution, addrman, wallet, UTXO
management, network base, UI interaction, limited size data structures),
but at the same time, the code definitely has grown too. I believe that was
necessary: higher performance often means the problem becomes more complex
(for example the introduction of caches, batch processing, precomputed
values, ...). I often very much wish to rewrite everything in a clean way,
but that's just not something that would be reviewable.
About your wallet patches not being merged: I'm aware. But the truth is
that there is little interest in the wallet code, from both users and
developers. We need people who are interested in reviewing and testing to
make changes. That of course leads to the wallet code growing more and more
outdated, reinforcing that circle. One solution to this may be
@jonasschnelli's new wallet project, which aims to add a new expiremental
wallet from scratch, which would be more competitive in features.
Regarding block size: please discuss that on the mailing list. As a Bitcoin
Core maintainer, I believe we should merge any consensus change that
appears uncontroversial to the community.
|
I didnt expect such an honest response, thanks, that gives you a lot of respect. |
how quick does this returns data ? |
Still open? |
Adds new rpc call "gettxoutsbyaddress" as requested in #4007.
Disabled by default, enabled with -txoutsbyaddressindex.
The index is built from the normal utxo on first use, reindex is not required.
For the GUI there is progress shown, on bitcoind you just need to be patient.
The qa-test includes a simple reorganization.
I tested all 4 code parts in main.cpp by commenting them out and check if the qa-test fails at the expected line.
I had to modify the rpc call for this test, as normally wrong outputs in the address index are not exposed to the user, because the rpc call relies on the normal GetCoins call.
The rest of the code has been tested in random ad-hoc testing.
I only tested linux on my machine.
Its all in 1 commit for laziness reasons, but if splitting in multiple commits would help reviewing, I could do that.