Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
txoutsbyaddress index (take 3) #9806
Conversation
fanquake
added the
UTXO Db and Indexes
label
Feb 20, 2017
|
Wshadow statistics: |
|
@paveljanik - Thanks! Fixed that up. Looks like that's the only shadow warning. |
|
Thanks for reviving this. IMO this is important. |
laanwj
added this to the
0.15.0
milestone
Feb 22, 2017
sidhujag
commented
Mar 14, 2017
•
|
Concept ACK, are there any statistics on how much extra burden this places on an ordinary node with the index off? |
|
needs rebase |
dcousens
referenced
this pull request
in bitcoinjs/indexd
Mar 16, 2017
Closed
Suitability in bitcoinjs organization #5
|
@instagibbs - Rebased. Thanks for the heads up. (Side note: Is there any way to have GH tell you when a conflict occurs? That would be really handy.) @sidhujag - Good question. I don't know offhand. I'm happy to do some benchmarking. I may need some help with that. If anybody would like to make some suggestions, I'm all ears. All - Will get more movement on this. Life intervened for awhile and has finally slowed down enough to where I can dedicate more time to this. |
sidhujag
commented
Mar 18, 2017
|
Thanks droark great feature btw.. this makes walletless spending very easy |
|
Am placing a to-do list here to remind myself of what I need to do, and solicit feedback on anything people might think is missing.
|
|
needs rebase. |
|
@btcdrak - Thanks. Rebased. |
weex
commented
Mar 23, 2017
|
Running this on a .bitcoin/ folder that was last used with 0.13 and was run pruned to 2000. It seems 1GB of ram is too low for this enabled as I got an out of memory error. Now on restarting I get the following after/during the "Rescanning..." step:
|
| @@ -0,0 +1,280 @@ | ||
| +// Copyright (c) 2014-2016 The Bitcoin developers |
luke-jr
Apr 17, 2017
Member
Standard copyright line says "The Bitcoin Core developers".
Might as well start this one off with the end year 2017.
| + | ||
| +#include <boost/thread.hpp> | ||
| + | ||
| +using namespace std; |
| + | ||
| +static const char DB_COINS_BYSCRIPT = 'd'; | ||
| +static const char DB_FLAG = 'F'; | ||
| +static const char DB_BEST_BLOCK = 'B'; |
luke-jr
Apr 17, 2017
Member
IMO it would be better to not have overlapping chars between databases, such that they could be combined cleanly if desired. Therefore, I suggest using 'D' for DB_BEST_BLOCK (and eliminating DB_FLAG here entirely).
| +CCoinsViewByScript::CCoinsViewByScript(CCoinsViewByScriptDB* viewIn) : base(viewIn) { } | ||
| + | ||
| +bool CCoinsViewByScript::GetCoinsByScript(const CScript &script, CCoinsByScript &coins) { | ||
| + const CScriptID key = CScriptID(script); |
luke-jr
Apr 17, 2017
Member
This isn't the purpose of CScriptID, which is specific to P2SH addresses. Suggest having a static ScriptIndexHash function that returns a (possibly typedef'd) uint160.
| + return false; | ||
| +} | ||
| + | ||
| +CCoinsMapByScript::iterator CCoinsViewByScript::FetchCoinsByScript(const CScript &script, bool fRequireExisting) { |
| + return cacheCoinsByScript.emplace_hint(it, key, tmp); | ||
| +} | ||
| + | ||
| +CCoinsByScript &CCoinsViewByScript::GetCoinsByScript(const CScript &script, bool fRequireExisting) { |
| + | ||
| +CCoinsByScript &CCoinsViewByScript::GetCoinsByScript(const CScript &script, bool fRequireExisting) { | ||
| + CCoinsMapByScript::iterator it = FetchCoinsByScript(script, fRequireExisting); | ||
| + assert(it != cacheCoinsByScript.end()); |
| + else | ||
| + batch.Write(make_pair(DB_COINS_BYSCRIPT, it->first), it->second); | ||
| + CCoinsMapByScript::iterator itOld = it++; | ||
| + pcoinsViewByScriptIn->cacheCoinsByScript.erase(itOld); |
luke-jr
Apr 17, 2017
Member
I don't see why this is necessary: we clear the entire cache when complete.
Eliminating this erase allows simplifying the entire loop to a normal C++11 for-each.
| + if (!pcursor->GetKey(hash)) | ||
| + break; | ||
| + v.push_back(hash); | ||
| + if (v.size() >= 10000) |
| + if (i > 0) | ||
| + LogPrintf("Address index with %d addresses successfully deleted.\n", i); | ||
| + | ||
| + return true; |
| + if (!pcursor->GetKey(txhash) || !pcursor->GetValue(coins)) | ||
| + break; | ||
| + | ||
| + for (unsigned int j = 0; j < coins.vout.size(); j++) |
luke-jr
Apr 17, 2017
Member
Use size_t and ++j. It may also be better to do this backward:
for (size_t j = coins.vout.size(); j--; ) {
(note j-- in this case because we want to look at the pre-decrement value)
| + i++; | ||
| + } | ||
| + | ||
| + if (mapCoinsByScript.size() >= 10000) |
luke-jr
Apr 17, 2017
Member
Since we're doing partial writes, we should ensure DB_BEST_BLOCK is cleared before we begin.
| + else | ||
| + batch.Write(make_pair(DB_COINS_BYSCRIPT, it->first), it->second); | ||
| + CCoinsMapByScript::iterator itOld = it++; | ||
| + mapCoinsByScript.erase(itOld); |
| + else | ||
| + batch.Write(make_pair(DB_COINS_BYSCRIPT, it->first), it->second); | ||
| + CCoinsMapByScript::iterator itOld = it++; | ||
| + mapCoinsByScript.erase(itOld); |
| + db.WriteBatch(batch); | ||
| + } | ||
| + LogPrintf("Address index with %d outputs successfully built.\n", i); | ||
| + return true; |
| @@ -368,6 +373,7 @@ std::string HelpMessage(HelpMessageMode mode) | ||
| strUsage += HelpMessageOpt("-sysperms", _("Create new files with system default permissions, instead of umask 077 (only effective with disabled wallet functionality)")); | ||
| #endif | ||
| strUsage += HelpMessageOpt("-txindex", strprintf(_("Maintain a full transaction index, used by the getrawtransaction rpc call (default: %u)"), DEFAULT_TXINDEX)); | ||
| + strUsage += HelpMessageOpt("-txoutindex", strprintf(_("Maintain an address to unspent outputs index (rpc: getutxoindex). The index is built on first use. (default: %u)"), 0)); |
luke-jr
Apr 17, 2017
Member
Should probably be renamed to -utxoscriptindex.
-txoutindex suggests it indexes all txouts (by what?).
| @@ -1432,11 +1438,13 @@ bool AppInitMain(boost::thread_group& threadGroup, CScheduler& scheduler) | ||
| UnloadBlockIndex(); | ||
| delete pcoinsTip; | ||
| delete pcoinsdbview; | ||
| + delete pcoinsByScriptDB; |
luke-jr
Apr 17, 2017
Member
Not really clear why we have anything to delete here, but don't we need to delete pcoinsByScript also?
| @@ -1484,6 +1492,58 @@ bool AppInitMain(boost::thread_group& threadGroup, CScheduler& scheduler) | ||
| } | ||
| } | ||
| + // Check -txoutindex | ||
| + pcoinsByScriptDB->ReadFlag("txoutindex", fTxOutIndex); |
luke-jr
Apr 17, 2017
Member
I don't think we really need a dedicated flag in the db for this. Either it exists and has a "bestblock" defined, or it doesn't...
| + if (GetBoolArg("-txoutindex", false)) | ||
| + { | ||
| + // build index | ||
| + if (!fTxOutIndex) |
luke-jr
Apr 17, 2017
Member
We also need to build the index if it's outdated. For example, if the user ran an older version to sync. (This is fixed automatically if you replace the flag with a check that the index's best block matches that of the chainstate.)
| @@ -0,0 +1,116 @@ | ||
| +// Copyright (c) 2014-2016 The Bitcoin developers |
| +{ | ||
| +public: | ||
| + // unspent transaction outputs | ||
| + std::set<COutPoint> setCoins; |
| + // empty constructor | ||
| + CCoinsByScript() { } | ||
| + | ||
| + bool IsEmpty() const { |
| +private: | ||
| + CCoinsViewByScriptDB *base; | ||
| + | ||
| + mutable uint256 hashBlock; |
| @@ -0,0 +1,72 @@ | ||
| +// Copyright (c) 2014-2016 The Bitcoin Core developers |
| + | ||
| +#include <boost/thread/thread.hpp> // boost::thread::interrupt | ||
| + | ||
| +using namespace std; |
| +using namespace std; | ||
| + | ||
| +//! Calculate statistics about the unspent transaction output set | ||
| +bool GetUTXOStats(CCoinsView *view, CCoinsViewByScriptDB *viewbyscriptdb, CCoinsStats &stats) |
| @@ -598,6 +601,109 @@ static bool rest_getutxos(HTTPRequest* req, const std::string& strURIPart) | ||
| return true; // continue to process further HTTP reqs on this cxn | ||
| } | ||
| +static bool rest_getutxoindex(HTTPRequest* req, const std::string& strURIPart) |
| @@ -876,6 +828,8 @@ UniValue gettxoutsetinfo(const JSONRPCRequest& request) | ||
| " \"bestblock\": \"hex\", (string) the best block hash hex\n" | ||
| " \"transactions\": n, (numeric) The number of transactions\n" | ||
| " \"txouts\": n, (numeric) The number of output transactions\n" | ||
| + " \"addresses\": n, (numeric) The number of addresses and scripts. Only if -txoutindex=1\n" |
| @@ -876,6 +828,8 @@ UniValue gettxoutsetinfo(const JSONRPCRequest& request) | ||
| " \"bestblock\": \"hex\", (string) the best block hash hex\n" | ||
| " \"transactions\": n, (numeric) The number of transactions\n" | ||
| " \"txouts\": n, (numeric) The number of output transactions\n" | ||
| + " \"addresses\": n, (numeric) The number of addresses and scripts. Only if -txoutindex=1\n" | ||
| + " \"utxoindex\": n, (numeric) The number of output transactions. Only if -txoutindex=1\n" |
| + "getutxoindex ( minconf [\"address\",...] count from )\n" | ||
| + "\nReturns a list of unspent transaction outputs by address (or script).\n" | ||
| + "The list is ordered by confirmations in descending order.\n" | ||
| + "Note that passing minconf=0 will include the mempool.\n" |
luke-jr
Apr 17, 2017
Member
Consider the case of a current UTXO that has been spent in the mempool. Under what conditions is the mempool ignored and that UTXO returned? Would you ever want to see it while also searching the mempool?
| + "1. minconf (numeric) Minimum confirmations\n" | ||
| + "2. \"addresses\" (string) A json array of bitcoin addresses (or scripts)\n" | ||
| + " [\n" | ||
| + " \"address\" (string) bitcoin address (or script)\n" |
luke-jr
Apr 17, 2017
Member
The address abstraction ends before/when coins hit the UTXO set. UTXOs do not have and are not associated to addresses.
Rename this to "scripts", and don't support specifying as addresses.
| + + HelpExampleRpc("getutxoindex", "6, \"[\\\"1PGFqEzfmQch1gKD3ra4k18PNj3tTUUSqg\\\",\\\"1LtvqCaApEdUGFkpKMM4MstjcaL4dKg8SP\\\"]\"") | ||
| + ); | ||
| + | ||
| + if (!fTxOutIndex) |
| + int nMinDepth = request.params[0].get_int(); | ||
| + UniValue inputs = request.params[1].get_array(); | ||
| + | ||
| + int nCount = 999999999; |
| + pcoinsByScript->GetCoinsByScript(script, coinsByScript); | ||
| + | ||
| + if (nMinDepth == 0) | ||
| + mempool.GetCoinsByScript(script, coinsByScript); |
luke-jr
Apr 17, 2017
Member
Need to document that mempool.GetCoinsByScript must only add, not replace.
| + } | ||
| + | ||
| + UniValue results(UniValue::VARR); | ||
| + sort(vSort.begin(), vSort.end()); |
| + | ||
| + UniValue results(UniValue::VARR); | ||
| + sort(vSort.begin(), vSort.end()); | ||
| + for (unsigned int i = (unsigned int)nFrom; i < vSort.size(); i++) |
| @@ -122,6 +122,63 @@ void TxToJSON(const CTransaction& tx, const uint256 hashBlock, UniValue& entry) | ||
| } | ||
| } | ||
| +void CoinsByScriptToJSON(const CCoinsByScript& coinsByScript, int nMinDepth, UniValue& vObjects, std::vector<std::pair<int, unsigned int>>& vSort, bool fIncludeHex) | ||
| +{ | ||
| + BOOST_FOREACH(const COutPoint &outpoint, coinsByScript.setCoins) |
| + mempool.pruneSpent(outpoint.hash, coins); // TODO: this should be done by the CCoinsViewMemPool | ||
| + } | ||
| + else if (!pcoinsTip->GetCoins(outpoint.hash, coins)) | ||
| + continue; |
luke-jr
Apr 17, 2017
Member
Inconsistent behaviour for UTXOs created in mined transactions vs in the mempool. We should probably consider mempool spends the same for both mempool-created UTXOs and confirmed-tx-created UTXOs.
| + o.push_back(Pair("value", ValueFromAmount(coins.vout[outpoint.n].nValue))); | ||
| + o.push_back(Pair("scriptPubKey", oScriptPubKey)); | ||
| + o.push_back(Pair("version", coins.nVersion)); | ||
| + o.push_back(Pair("coinbase", coins.fCoinBase)); |
| @@ -57,7 +57,7 @@ static CBlock BuildBlockTestCase() { | ||
| BOOST_AUTO_TEST_CASE(SimpleRoundTripTest) | ||
| { | ||
| - CTxMemPool pool; | ||
| + CTxMemPool pool(false); |
| @@ -98,7 +99,8 @@ CCoinsViewCursor *CCoinsViewDB::Cursor() const | ||
| that restriction. */ | ||
| i->pcursor->Seek(DB_COINS); | ||
| // Cache key of first record | ||
| - i->pcursor->GetKey(i->keyTmp); | ||
| + if (!i->pcursor->Valid() || !i->pcursor->GetKey(i->keyTmp)) |
| @@ -370,6 +371,16 @@ unsigned int CTxMemPool::GetTransactionsUpdated() const | ||
| return nTransactionsUpdated; | ||
| } | ||
| +void CTxMemPool::GetCoinsByScript(const CScript& script, CCoinsByScript& coinsByScript) const | ||
| +{ |
| @@ -432,6 +443,11 @@ bool CTxMemPool::addUnchecked(const uint256& hash, const CTxMemPoolEntry &entry, | ||
| vTxHashes.emplace_back(tx.GetWitnessHash(), newit); | ||
| newit->vTxHashesIdx = vTxHashes.size() - 1; | ||
| + if (fTxOutIndex) |
| @@ -2051,6 +2055,62 @@ void PruneAndFlush() { | ||
| FlushStateToDisk(state, FLUSH_STATE_NONE); | ||
| } | ||
| +void static UpdateAddressIndex(const CTxOut& txout, const COutPoint& outpoint, bool fInsert) |
| + } | ||
| +} | ||
| + | ||
| +void static UpdateAddressIndex(const CBlock& block, CBlockUndo& blockundo, bool fConnect) |
| @@ -47,14 +47,14 @@ def __init__(self): | ||
| super().__init__() | ||
| self.setup_clean_chain = True | ||
| self.num_nodes = 3 | ||
| + self.extra_args = [["-txoutindex"]] * 3 |
| @@ -0,0 +1,95 @@ | ||
| +#!/usr/bin/env python3 |
|
@luke-jr - Thanks for the detailed feedback. I'll look over it this weekend. |
cozz
and others
added some commits
Oct 5, 2014
This was referenced May 9, 2017
|
Thanks for preparing this @droark. Is this PR still alive? |
|
@dexX7 - Thanks for checking in. Some family-related issues came up recently that I had to deal with for awhile. I'm almost done rebasing the PR and will catch up with the remaining feedback ASAP, along with some test harness changes that probably need to be folded in. The path forward looks pretty clear to me, IMO. I just need to wrap up the work. |
laanwj
removed this from the
0.15.0
milestone
Jul 6, 2017
|
I still need to read more, but... Regarding searching by address , by scriptPubKey, or by COutPoint (tx_id, output_pos), I'm not sure whether I want them all or a subset of them. This all assumes you create a new scriptPubKey -> COutPoint index. Regarding utxo vs stxo vs txo... So although I'm not against gettxoutsbyaddress and I celebrate concept ACK it, but I think it would be nicer to get gettxoutsbyoutpoint and gettxoutsbyscript reviewed and merged in that order first. But I think some other people don't like chained PRs all that much, so just take it as "if we're offering gettxoutsbyaddress, we should be able to offer gettxoutsbyoutpoint and gettxoutsbyscript 'almost for free'(tm) too". It may seem contradictory: I'm asking you to do more with respect to utxo/stxo, but I'm suggesting to do less with respect to addresses and properly removing things for the users that want this feature but don't want to be full archives, which is something we can optimize further later after leveraging the easy important one, which is just looking in the utxo first. Anyway, feel free to note the parts of my long post you like in whatever order you like best and ignore the rest. I plan to have a deeper look either way beyond this concept ack. All the best but needs rebase. |
|
Sorry, no, using GetTransaction doesn't leverage the utxo index we maintain...One would need to search in the utxo (with CCoinsViewCache::AccessCoin) first and then try with GetTransaction only if we want to also serve stxo, which should perhaps be left out of scope for this PR as a later improvement. |
|
Sorry for the long post, not for being long but for being incorrect. There's not need for a gettxobyoutpoint or similar because it already exists and is named gettxout! Only the slower parts were missing, which I plan to serve in #10822 which is just a draft that needs lots of testing. To reiterate, the important parts of my previous fedback:
The PR that should help extend whatever is done for utxo to txo if you do the index: outpoint -> script; is the following: As said it needs testing, but there's no need to wait, let's expose utxo if we can first and then txo "almost for free"(tm). Sorry again for invading the pr in a distractive way, but this needs rebase. |
droark commentedFeb 20, 2017
This is an attempt to revive PR #8660 (and, by extension, PR #5048). For now, this PR simply compiles without fresh warnings or errors. Once it is confirmed that no more conflicts exist, the remaining comments/requests from #8660 will be fully addressed.