Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index for BIP 157 block filters #14121

Merged
merged 12 commits into from Apr 18, 2019

Conversation

@jimpo
Copy link
Contributor

commented Aug 31, 2018

This introduces a new BlockFilterIndex class, which is required for BIP 157 support.

The index is uses the asynchronous BaseIndex infrastructure driven by the ValidationInterface callbacks. Filters are stored sequentially in flat files and the disk location of each filter is indexed in LevelDB along with the filter hash and header. The index is designed to ensure persistence of filters reorganized out of the main chain to simplify the BIP 157 net implementation.

Stats (block height = 565500):

  • Syncing the index from scratch takes 45m
  • Total index size is 3.8 GiB

@jimpo jimpo force-pushed the jimpo:bip157-index branch 2 times, most recently Aug 31, 2018

@DrahtBot

This comment has been minimized.

Copy link
Contributor

commented Aug 31, 2018

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

src/index/blockfilter.cpp Outdated
std::make_pair(pindex->GetBlockHash(), filter.GetHash()));
batch.Write(std::make_pair(DB_FILTER_HEADER, height_key),
std::make_pair(pindex->GetBlockHash(), filter.ComputeHeader(prev_header)));
return m_db->WriteBatch(batch);

This comment has been minimized.

Copy link
@leishman

leishman Aug 31, 2018

Contributor

Out of scope for this PR, but is there a reason we can't write a batch of entries for more than a single block? Could we write 100 - 1000 blocks worth of entries in each batch write to speed up the migrations? This introduces some complexity, but perhaps it's worth it?

This comment has been minimized.

Copy link
@jimpo

jimpo Sep 2, 2018

Author Contributor

Could be worth looking into during ThreadSync. As you note though, it's a fairly independent change.

Show resolved Hide resolved src/index/blockfilter.h Outdated
Show resolved Hide resolved src/index/blockfilter.cpp Outdated

@jimpo jimpo force-pushed the jimpo:bip157-index branch Aug 31, 2018

Show resolved Hide resolved src/index/blockfilter.cpp Outdated
@gmaxwell

This comment has been minimized.

Copy link
Member

commented Sep 1, 2018

Storing large records in leveldb is generally a bad idea. Is there a particular reason this doesn't work like the undo data?

@jimpo

This comment has been minimized.

Copy link
Contributor Author

commented Sep 1, 2018

@gmaxwell If by that you mean writing the filters sequentially in flat files then indexing the disk positions in LevelDB, I hadn't considered that, but it may be worthwhile. The downside of course is additional complexity. What are you mostly concerned about, read or write performance? I'd want to benchmark reads and writes to determine if the DB value sizes are problematic before making the change. With filters on average being 2% of block size and a LevelDB file size limit of 2 MiB, each file could still store ~200 filters (ignoring keys and overhead and such).

@jimpo

This comment has been minimized.

Copy link
Contributor Author

commented Sep 3, 2018

@gmaxwell I put together a (not-production-ready) branch to test your suggestion of writing filters to flat files. In sample size n=1 experiments, I measured that the time to write the entire block index was <1% faster using flat files, and reading 5,000 sequential filters (starting at height 500,000) was 11% slower. The total storage space consumed is nearly the same (3.4 GiB total). Happy to provide the log files/iPython notebooks I used if you'd like.

Given the additional complexity and absence of significantly improved performance, I think writing filters directly into LevelDB is the way to go.

@jimpo jimpo force-pushed the jimpo:bip157-index branch 3 times, most recently Sep 3, 2018

src/blockfilter.h Outdated
@@ -44,19 +54,16 @@ class GCSFilter
public:

/** Constructs an empty filter. */
GCSFilter(uint64_t siphash_k0 = 0, uint64_t siphash_k1 = 0, uint8_t P = 0, uint32_t M = 0);
GCSFilter(const Params& params = Params());

This comment has been minimized.

Copy link
@practicalswift

practicalswift Sep 5, 2018

Member

Please make explicit :-)

src/index/blockfilter.cpp Outdated

bool BlockFilterIndex::Rewind(const CBlockIndex* current_tip, const CBlockIndex* new_tip)
{
assert(current_tip->GetAncestor(new_tip->nHeight) == new_tip);

This comment has been minimized.

Copy link
@practicalswift

practicalswift Sep 5, 2018

Member

Assertions should not have side effects. Please move GetAncestor outside of assertion :-)

This comment has been minimized.

Copy link
@jimpo

jimpo Sep 5, 2018

Author Contributor

What is the side effect? GetAncestor is a const method.

This comment has been minimized.

Copy link
@practicalswift

practicalswift Sep 10, 2018

Member

@jimpo You're right! Forget my comment :-)

src/blockfilter.h Outdated
uint32_t GetN() const { return m_N; }
uint32_t GetM() const { return m_M; }
const Params& GetParams() const { return m_params; }

This comment has been minimized.

Copy link
@practicalswift

practicalswift Sep 5, 2018

Member

Remove GetParams()? Not used?

This comment has been minimized.

Copy link
@jimpo

jimpo Sep 5, 2018

Author Contributor

No, it's not used, but it feels like there should be a getter.

This comment has been minimized.

Copy link
@practicalswift

practicalswift Sep 6, 2018

Member

Unused code is untested code, so I suggest removing it or adding a test for it :-)

test/functional/rpc_getblockfilter.py Outdated
self.num_nodes = 2
self.extra_args = [["-blockfilterindex"], []]

def run_test (self):

This comment has been minimized.

Copy link
@practicalswift

practicalswift Sep 7, 2018

Member
./test/functional/rpc_getblockfilter.py:21:17: E211 whitespace before '('
@Sjors
Copy link
Member

left a comment

Concept ACK.

Can you mention -blockfilterindex and getblockfilter in the PR description, as well as perhaps add a release note?

It's nice to be able to quickly delete an index, so having a separate file for each type makes sense to me (as is the case now: indexes/blockindex/basic/...).

Lightly tested on macOS. I get a few mismatching headers compared to the test vectors:

src/bitcoin-cli getblockfilter 00000000fd3ceb2404ff07a785c7fdcc76619edc8ed61bd25134eaa22084366a "basic"
{
  "filter": "0db414c859a07e8205876354a210a75042d0463404913d61a8e068e58a3ae2aa080026",
  "header": "c582d51c0ca365e3fcf36c51cb646d7f83a67e867cb4743fd2128e3e022b700c"
}

000000000000015d6077a411a8f5cc95caf775ccf11c54e27df75ce58d187313 
-> "header": "546c574a0472144bcaf9b6aeabf26372ad87c7af7d1ee0dbfae5e099abeae49c"

0000000000000c00901f2049055e2a437c819d79a3d54fd63e6af796cd7b8a79
-> "header": "0965a544743bbfa36f254446e75630c09404b3d164a261892372977538928ed5

The filters do match.

We should probably include those test vectors. In addition, it would be nice to have script to compare every single block with the btcd RPC, testnet and mainnet.

Show resolved Hide resolved src/index/blockfilter.cpp Outdated
@ghost

ghost approved these changes Sep 8, 2018

Copy link

left a comment

good idea

@ryanofsky
Copy link
Contributor

left a comment

I should finish the review on Monday

Apologies! I didn't get quite as far with the review as I would have liked. I left many minor suggestions, but as usual please just change whatever interests you and ignore anything else.

Started review (will update list below with progress).

  • e3845e4 index: Allow atomic commits of index state to be extended. (1/12)
  • c368acb index: Ensure block locator is not stale after chain reorg. (2/12)
  • a0bd77e blockfilter: Functions to translate filter types to/from names. (3/12)
  • 4fa1f82 serialize: Serialization support for big-endian 32-bit ints. (4/12)
  • 5964d2f index: Implement block filter index with write operations. (5/12)
  • 88ecade index: Implement lookup methods on block filter index. (6/12)
  • 9283baa test: Unit tests for block index filter. (7/12)
  • d85dd54 test: Unit test for block filter index reorg handling. (8/12)
  • 7a955d1 index: Access functions for global block filter indexes. (9/12)
  • 88fac30 init: Add CLI option to enable block filter index. (10/12)
  • 1426fb7 rpc: Add getblockfilter RPC method. (11/12)
  • e6c6654 blockfilter: Update BIP 158 test vectors. (12/12)
Show resolved Hide resolved src/index/base.cpp
if (!GetDB().WriteBestBlock(locator)) {
error("%s: Failed to write locator to disk", __func__);
}
Commit();

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Mar 25, 2019

Contributor

In commit "index: Allow atomic commits of index state to be extended." (e3845e4)

It would be good to check for failures on Commit() calls in this git commit. Even though the current implementation in this git commit always returns true, it can start to return false when the subclasses is added, and these calls don't appear to be updated later.

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Mar 28, 2019

Contributor

In commit "index: Allow atomic commits of index state to be extended." (e3845e4)

Would be good to add NODISCARD to all methods that return errors on failure. Even in cases where you don't want to handle or log the errors, it's clarifying if invoking code shows that errors are being ignored, and maybe has comments saying why the errors are expected or ok to ignore.

This comment has been minimized.

Copy link
@jimpo

jimpo Apr 5, 2019

Author Contributor

I explicitly don't want to handle failure as Commit() already logs on errors and in the places where the return value is ignored, the index is committing a later state so it is safe to just continue. I will add comments in the appropriate places saying this.

Show resolved Hide resolved src/index/blockfilterindex.cpp Outdated
Show resolved Hide resolved src/index/blockfilterindex.cpp
Show resolved Hide resolved src/index/blockfilterindex.cpp Outdated
Show resolved Hide resolved src/index/blockfilterindex.cpp Outdated
Show resolved Hide resolved src/index/blockfilterindex.cpp Outdated
Show resolved Hide resolved src/index/blockfilterindex.cpp Outdated

bool BlockFilterIndex::Init()
{
if (!m_db->Read(DB_FILTER_POS, m_next_filter_pos)) {

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Mar 25, 2019

Contributor

In commit "index: Implement block filter index with write operations." (5964d2f)

Is it possible to distinguish between entry not existing here and failure to deserialize? Might be a little nicer to be able to return an error if there's something weird here, instead of resetting the next write to position 0.

This comment has been minimized.

Copy link
@jimpo

jimpo Apr 5, 2019

Author Contributor

Very good point!

/* The index database stores three items for each block: the disk location of the encoded filter,
* its dSHA256 hash, and the header. Those belonging to blocks on the active chain are indexed by
* height, and those belonging to blocks that have been reorganized out of the active chain are
* indexed by block hash. This ensures that filter data for any block that becomes part of the

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Mar 25, 2019

Contributor

In commit "index: Implement block filter index with write operations." (5964d2f)

It'd be good to mention that values and not just the keys are different in these two cases (height keys appear to have values prefixed by the block hash).

@ryanofsky
Copy link
Contributor

left a comment

Slightly tested ACK e6c6654 (I just watched the "Syncing basic block filter index with block chain from height" messages and tested the RPC). Again no major comments, feel free to ignore them.

/// Write the current chain block locator to the DB.
bool WriteBestBlock(const CBlockIndex* block_index);
/// Write the current chain block locator and other index state to the DB.
bool Commit();

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Mar 28, 2019

Contributor

In commit "index: Allow atomic commits of index state to be extended." (e3845e4):

Giving the private method the same name as the protected method inherited by subclasses makes the code harder to follow. Would suggest calling this InternalCommit() or something like that.

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Apr 10, 2019

Contributor

re: #14121 (comment)

In commit "index: Allow atomic commits of index state to be extended." (4368384)

Current change isn't what I was suggesting and I think is actually worse than the original (but still ok if this is what makes most sense to you). I was suggesting renaming the Commit() that's internal to the base class InternalCommit(), not calling the other Commit() that's shared and overridden externally by subclasses InternalCommit(). Calling the overridden Commit() internal also breaks consistency with the other overridden methods (init, rewind, writeblock).

This comment has been minimized.

Copy link
@jimpo

jimpo Apr 10, 2019

Author Contributor

The difference from Init, Rewind, and WriteBlock and is that CommitInternal has a different method signature from Commit, which is why I thought you found it confusing, whereas the others have only a single method signature that the child classes override.

I don't have too much of a preference, though I do think switching the Internal would be weird, as I've generally found the convention to be that an outer function delegating to an inner helper function might call FuncInternal, not the other way around.

@@ -54,8 +54,8 @@ class BaseIndex : public CValidationInterface
/// over and the sync thread exits.
void ThreadSync();

/// Write the current chain block locator to the DB.
bool WriteBestBlock(const CBlockIndex* block_index);
/// Write the current chain block locator and other index state to the DB.

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Mar 28, 2019

Contributor

In commit "index: Allow atomic commits of index state to be extended." (e3845e4)

It isn't clear from this comment when this is called or what "other index state" might be referring to. Maybe expand comment to something like: "Write the current chain block locator and other index state from subclasses to the DB. This is called after blocks are added or rewound."

if (!GetDB().WriteBestBlock(locator)) {
error("%s: Failed to write locator to disk", __func__);
}
Commit();

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Mar 28, 2019

Contributor

In commit "index: Allow atomic commits of index state to be extended." (e3845e4)

Would be good to add NODISCARD to all methods that return errors on failure. Even in cases where you don't want to handle or log the errors, it's clarifying if invoking code shows that errors are being ignored, and maybe has comments saying why the errors are expected or ok to ignore.

@@ -69,6 +69,10 @@ class BaseIndex : public CValidationInterface
/// Write update index entries for a newly connected block.
virtual bool WriteBlock(const CBlock& block, const CBlockIndex* pindex) { return true; }

/// Virtual method called internally that can be overridden to atomically
/// commit more index state.
virtual bool Commit(CDBBatch& batch);

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Mar 28, 2019

Contributor

In commit "index: Allow atomic commits of index state to be extended." (e3845e4)

Comment should mention how this relates to WriteBlock (and Rewind), that it's called after one or more WriteBlock/Rewind calls in order to flush state to disk and store extra data. Comment should also mention that subclasses that override this must include a call to the overridden method (or the index position won't be updated).

It's also not clear what's supposed to happen if this returns false. Maybe a comment could clarify. It seems like the only thing that happens is that the batch update is not written, and the next block position is not saved, but the sync still continues? I don't understand when this behavior would be useful, and why it wouldn't be better to abort syncing the index in this case. Maybe the behavior is ok, but comment should at least mention what return value is expected from the overridden method and what effect it will have.

@@ -95,17 +95,18 @@ void BaseIndex::ThreadSync()
int64_t last_locator_write_time = 0;
while (true) {
if (m_interrupt) {
WriteBestBlock(pindex);
m_best_block_index = pindex;

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Mar 28, 2019

Contributor

In commit "index: Allow atomic commits of index state to be extended." (e3845e4)

It seems good to set m_best_block_index here, but I don't understand why it wasn't being set before. Is this a bugfix?

This comment has been minimized.

Copy link
@jimpo

jimpo Apr 5, 2019

Author Contributor

Previously, since m_synced wasn't set to and WriteBestBlock took an explicit pointer argument, it wasn't necessary. But now WriteBestBlock reads m_best_block_index instead of taking an argument.

Show resolved Hide resolved src/test/blockfilter_index_tests.cpp Outdated
Show resolved Hide resolved src/init.cpp Outdated
if (!enabled_filter_types.empty()) {
size_t n_indexes = enabled_filter_types.size();
int64_t max_cache = (max_filter_index_cache << 20) / n_indexes;
filter_index_cache = std::min(nTotalCache / 8, max_cache);

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Mar 28, 2019

Contributor

In commit "init: Add CLI option to enable block filter index." (88fac30)

Did you want to divide nTotalCache by n_indexes here? Shouldn't matter now because n_indexes is one. But as n_indexes grows, nTotalCache on the next line could keep decreasing and even go below 0.

This comment has been minimized.

Copy link
@jimpo

jimpo Apr 5, 2019

Author Contributor

Updated the logic.

db_it->Next();
}

results.resize(results_size);

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Mar 28, 2019

Contributor

In commit "index: Implement lookup methods on block filter index." (88ecade)

It seems like this line isn't doing anything and could be removed (new size should be same as previous size).

This comment has been minimized.

Copy link
@jimpo

jimpo Apr 5, 2019

Author Contributor

This is the first time results is accessed in this method, and the size has to be set appropriately.

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Apr 10, 2019

Contributor

re: #14121 (comment)

In commit "index: Implement lookup methods on block filter index." (b5e8200)

This is the first time results is accessed in this method, and the size has to be set appropriately.

Sorry, you're right. I confused results with the values variable above.

size_t i = static_cast<size_t>(block_index->nHeight - start_height);
if (block_hash == values[i].first) {
results[i] = std::move(values[i].second);
continue;

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Mar 28, 2019

Contributor

In commit "index: Implement lookup methods on block filter index." (88ecade)

Seems like this could break instead of continue. Would add a comment here if there's a case where this needs to keep iterating.

This comment has been minimized.

Copy link
@jimpo

jimpo Apr 5, 2019

Author Contributor

Hmm? If it was a break, it would only copy one value to the result value rather than all of them.

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Apr 10, 2019

Contributor

re: #14121 (comment)

In commit "index: Implement lookup methods on block filter index." (b5e8200)

Hmm? If it was a break, it would only copy one value to the result value rather than all of them.

Again this is me thinking results and values where the same variable, so it would be safe to break at the forking point because the earlier blocks would already be filled in. This could maybe be a legitimate optimization, but not worth implementing at this point.

@jimpo jimpo force-pushed the jimpo:bip157-index branch from e6c6654 to c7efb65 Apr 6, 2019

@ryanofsky
Copy link
Contributor

left a comment

Slightly tested ACK c7efb65 (I just rebuilt the index with the updated PR and tested the RPC). Changes since last review: rebase, fixed compile errors in internal commits, new comments, updated error messages, tweaked cache size logic, renamed commit method, renamed constants and globals, fixed whitespace, extra BlockFilterIndex::Init error check.

I think this change is in a good state and could be merged in it's current form. I left more comments but they are minor and you should ignore them if you don't want to deal with them.

I think this change has had almost enough review to be merged. It would benefit from a re-ack by @jamesob, and some more cursory reviews by other contributors to confirm that this only creates a new blockindex, and is disabled by default, and has no effect on the existing txindex or validation or wallet or anything crazy like that.

/// Write the current chain block locator to the DB.
bool WriteBestBlock(const CBlockIndex* block_index);
/// Write the current chain block locator and other index state to the DB.
bool Commit();

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Apr 10, 2019

Contributor

re: #14121 (comment)

In commit "index: Allow atomic commits of index state to be extended." (4368384)

Current change isn't what I was suggesting and I think is actually worse than the original (but still ok if this is what makes most sense to you). I was suggesting renaming the Commit() that's internal to the base class InternalCommit(), not calling the other Commit() that's shared and overridden externally by subclasses InternalCommit(). Calling the overridden Commit() internal also breaks consistency with the other overridden methods (init, rewind, writeblock).

/// Write the current index state (eg. chain block locator and subclass-specific items) to disk.
///
/// Recommendations for error handling:
/// If called on a successor of the previous committed best block in the index, the index can

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Apr 10, 2019

Contributor

In commit "index: Allow atomic commits of index state to be extended." (4368384)

Thank you for adding this comment, but I had to read this several times before I figured out that the "index can continue processing" isn't describing something that happens externally but is describing what you are supposed to do when handling errors from this function. I'd suggest phrasing this less passively as:

/// Recommendation for handling errors returned by this function:
///
/// If calling this function fails, and m_best_block_index is a descendant
/// of a block that was previously committed, it is safe to ignore the error
/// because the index will not get corrupted (just needs to catch up from
/// further behind on reboot). If m_best_block_index is not a descendant of
/// the last block committed (due to a chain reorganization), the error can't
/// be ignored and the index should halt until Commit succeeds or it could 
/// end up getting corrupted.
WriteBestBlock(pindex);
m_best_block_index = pindex;
// No need to handle errors in Commit. If it fails, the error will be already be
// logged. The best way to recover is to continue, as index cannot be corrupted by

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Apr 10, 2019

Contributor

In commit "index: Allow atomic commits of index state to be extended." (4368384)

I don't understand "index cannot be corrupted by a missed commit." Does it mean that if this Commit call fails you expect a future Commit call to fix whatever the problem is? If so, this seems like a property that holds for TxIndex, but not necessarily for BlockFilterIndex since that index is writing external files. Should clarify whatever is meant by this comment.

It would also be really helpful to give a specific example where this behavior would be desirable. Is there a case you are imagining where some commits fail and some succeed and after that everything is fine? I am having trouble trying to conjure up a scenario.

Naively, as a user relying on the index I would expect that if updating the index ever failed, then the index would get disabled and lookups into the index would return RPC errors instead of silently returning bad data or incomplete data. I would probably also want writing to stop after the first error, so I could potentially debug the issue, and so my logs wouldn't fill up with more and more write errors after the first one.

Maybe this implementation makes more sense than what I'm describing and is better for some reason, but whatever the rationale is, it could definitely be stated more clearly.

return false;
}

// The latest filter position gets written in Commit by the call to the BaseIndex::Rewind.

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Apr 10, 2019

Contributor

re: #14121 (comment)

I don't really like the change in behavior to Commit/Rewind. Commit just does DB/disk persistence, which is why errors can generally be ignored

Up to you but I'd encourage you to take another look. Your comment sounds more like a reaction to the method naming and not the substance of the change. I agree with you naming in my suggested change is not ideal. I actually wanted to call the base Commit() method UpdatePosition() and the overridden Commit() method Flush(), but I reverted back to your names to try to avoid adding unrelated differences. (IMO, all the mentions of committing and atomicity in this PR are misleading because almost nothing about the implementation is atomic. It is true that updates of m_best_block, DB_BEST_BLOCK, and fltr*.dat files are ordered, but operations being ordered is different from them being atomic, and it feels like too much is implied by the current naming even if the code is technically correct.)

In any case, here are the advantages of my change beyond reducing the amount of code:

  1. It updates m_best_block_index in the same place as DB_BEST_BLOCK.
  2. It flattens control flow, getting rid the BlockFilterIndex::Rewind calling BaseIndex::Rewind calling BlockFilterIndex::Commit calling BaseIndex::Commit sequence and the whole call super anti-pattern.

It's sounds crazy to me to cite updating m_best_block_index and DB_BEST_BLOCK in different places instead of the same place as an advantage of the current code. This seems like a footgun and anti-ergonomic decision, like putting the steering wheel on the outside of a car. When you want to turn your car left, do you just want to turn the wheel? Or is it better to stop the car, get out, point the wheel, make the turn, stop again, straighten, then keep going? When you update the position, do you just want to call one method? Or is it better to manually set a member variable and go through a rube-goldberg sequence of virtual method calls?

db_it->Next();
}

results.resize(results_size);

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Apr 10, 2019

Contributor

re: #14121 (comment)

In commit "index: Implement lookup methods on block filter index." (b5e8200)

This is the first time results is accessed in this method, and the size has to be set appropriately.

Sorry, you're right. I confused results with the values variable above.

size_t i = static_cast<size_t>(block_index->nHeight - start_height);
if (block_hash == values[i].first) {
results[i] = std::move(values[i].second);
continue;

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Apr 10, 2019

Contributor

re: #14121 (comment)

In commit "index: Implement lookup methods on block filter index." (b5e8200)

Hmm? If it was a break, it would only copy one value to the result value rather than all of them.

Again this is me thinking results and values where the same variable, so it would be safe to break at the forking point because the earlier blocks would already be filled in. This could maybe be a legitimate optimization, but not worth implementing at this point.

@jimpo

This comment has been minimized.

Copy link
Contributor Author

commented Apr 10, 2019

@ryanofsky Let me clarify the intended behavior of Commit, so we can figure out how to properly word it. The design really stems from the optimization where the block index is only flushed to disk infrequently or on shutdown, and the auxiliary indexes mimic this.

During normal operation, when the chain is extended by new blocks, entries are written into the database height index and their corresponding filters are written to flat files. However, the in-database pointers to the latest block in the index and end position of the flat file sequence are not updated. This happens only on the ChainStateFlushed callback, or at the end of the initial catchup sync. There are a few reasons. 1) Until the ChainStateFlushed callback is received, we cannot guarantee that the new block entries added to the index will have corresponding block index entries that have been flushed to disk. This is in part why we store a block locator instead of a single block index (in case the index happens to get ahead of the block index), but it would be best to avoid writing ahead of the block index regardless. 2) BlockConnected can be implemented without taking cs_main, but if it had to generate a block locator to write to DB_BEST_BLOCK, then it would need to take cs_main. So the idea is that BlockConnected advances the in-memory state of the index for sure and writes some data to the database and to disk, but these database/disk writes will not be seen until the new block locator is written. This is why m_best_block is updated separately from DB_BEST_BLOCK. In the event of an unclean crash, we are not guaranteed that the index will resume from the state it was last in -- it may have to catch up from further behind, but we do ensure that it is not corrupted.

Now, with the block filter index, two things change. First, since it has an index of things by height, we need to ensure tip of the index is overwritten to the fork point (and committed to disk) before overwriting any height index entries, hence the Rewind method. Secondly, if we commit the new best block in the index, we also need to ensure the flat files have been flushed to disk, which is why CommitInternal is overridden in the subclass.

So what happens if the call to Commit the latest flat files and DB_BEST_BLOCK fails? It's not good (and it logs an error), but the in-memory and database state of the index are valid even if the database state is lagging behind. Which is why I think the best way to recover is to continue until the next Commit. If the index ever reaches some point where a database or disk write fails during BlockConnected, then a fatal error is raised and bitcoind shuts down.

If you have a way of simplifying the logic so that 1) m_best_block_index is separate (and generally further ahead) than DB_BEST_BLOCK and 2) index state is guaranteed to be committed to the fork point of a reorg before connecting new blocks, then I'm open to suggestions. Thoughts?

@zquestz

This comment has been minimized.

Copy link

commented Apr 10, 2019

@jimpo do you have a list of the remaining tasks for full Neutrino support (BIP 157 and 158) that need to be done once this PR lands? Would love to get an idea of the remaining scope of work. =)

@MarcoFalke

This comment has been minimized.

Copy link
Member

commented Apr 12, 2019

utACK c7efb65

@ryanofsky

This comment has been minimized.

Copy link
Contributor

commented Apr 12, 2019

re: #14121 (review) from me

I think this change has had almost enough review to be merged.

There was a new ACK from Marco since I wrote this, so maybe this is about ready?

re: #14121 (comment) from jimpo

If you have a way of simplifying the logic so that 1) m_best_block_index is separate (and generally further ahead) than DB_BEST_BLOCK and 2) index state is guaranteed to be committed to the fork point of a reorg before connecting new blocks, then I'm open to suggestions. Thoughts?

We're just talking past each other. I already made the simplification in #14121 (comment). Here is a rebased version: 00c3a91 (pr/blockfilt) with some extra renames, since I think neither of us liked the previous names. Here what I see as advantages of this change:

  1. Simpler control flow, no more rewind function calling other rewind function calling commit calling other commit.
  2. No more call super footgun and anti-pattern
  3. No misleading "commit" language and claims about atomicity
  4. Clearly distinct BaseIndex internal methods and hook methods implemented by subclasses.
  5. Slightly less code overall

Unless there's a bug or actual drawback to using this change, I hope you'll consider it. But either way the PR looks good to me. I tested and ACKed it already and will keep reviewing if there are more updates.

@MarcoFalke MarcoFalke merged commit c7efb65 into bitcoin:master Apr 18, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

MarcoFalke added a commit that referenced this pull request Apr 18, 2019

Merge #14121: Index for BIP 157 block filters
c7efb65 blockfilter: Update BIP 158 test vectors. (Jim Posen)
19308c9 rpc: Add getblockfilter RPC method. (Jim Posen)
ff35105 init: Add CLI option to enable block filter index. (Jim Posen)
accc8b8 index: Access functions for global block filter indexes. (Jim Posen)
2bc90e4 test: Unit test for block filter index reorg handling. (Jim Posen)
6bcf099 test: Unit tests for block index filter. (Jim Posen)
b5e8200 index: Implement lookup methods on block filter index. (Jim Posen)
75a76e3 index: Implement block filter index with write operations. (Jim Posen)
2ad2338 serialize: Serialization support for big-endian 32-bit ints. (Jim Posen)
ba6ff9a blockfilter: Functions to translate filter types to/from names. (Jim Posen)
62b7a4f index: Ensure block locator is not stale after chain reorg. (Jim Posen)
4368384 index: Allow atomic commits of index state to be extended. (Jim Posen)

Pull request description:

  This introduces a new BlockFilterIndex class, which is required for BIP 157 support.

  The index is uses the asynchronous BaseIndex infrastructure driven by the ValidationInterface callbacks. Filters are stored sequentially in flat files and the disk location of each filter is indexed in LevelDB along with the filter hash and header. The index is designed to ensure persistence of filters reorganized out of the main chain to simplify the BIP 157 net implementation.

  Stats (block height = 565500):
  - Syncing the index from scratch takes 45m
  - Total index size is 3.8 GiB

ACKs for commit c7efb6:
  MarcoFalke:
    utACK c7efb65
  ryanofsky:
    Slightly tested ACK c7efb65 (I just rebuilt the index with the updated PR and tested the RPC). Changes since last review: rebase, fixed compile errors in internal commits, new comments, updated error messages, tweaked cache size logic, renamed commit method, renamed constants and globals, fixed whitespace, extra BlockFilterIndex::Init error check.

Tree-SHA512: f8ed7a9b6f76df45933aa5eba92b27b3af83f6df2ccb3728a5c89eec80f654344dc14f055f6f63eb9b3a7649dd8af6553fe14969889e7e2fd2f8461574d18f28
@MarcoFalke
Copy link
Member

left a comment

Just some documentation feedback that I withheld earlier to avoid stalling this pull.

int height;

DBHeightKey() : height(0) {}
DBHeightKey(int height_in) : height(height_in) {}

This comment has been minimized.

Copy link
@MarcoFalke

MarcoFalke Apr 18, 2019

Member

in commit 75a76e3

Should this be explicit?

This comment has been minimized.

Copy link
@jimpo

jimpo Apr 19, 2019

Author Contributor

Yeah, probably a good idea.

CScript coinbase_script_pub_key = GetScriptForDestination(coinbaseKey.GetPubKey().GetID());
std::vector<std::shared_ptr<CBlock>> chainA, chainB;
BOOST_REQUIRE(BuildChain(tip, coinbase_script_pub_key, 10, chainA));
BOOST_REQUIRE(BuildChain(tip, coinbase_script_pub_key, 10, chainB));

This comment has been minimized.

Copy link
@MarcoFalke

MarcoFalke Apr 18, 2019

Member

in commit 2bc90e4:

Should asset that the chains are different in this test to not accidentally reorg to the same chain?

This comment has been minimized.

Copy link
@jimpo

jimpo Apr 19, 2019

Author Contributor

I don't understand.

// if using block pruning, then disallow txindex
if (gArgs.GetArg("-prune", 0)) {
if (gArgs.GetBoolArg("-txindex", DEFAULT_TXINDEX))
return InitError(_("Prune mode is incompatible with -txindex."));
if (!g_enabled_filter_types.empty()) {
return InitError(_("Prune mode is incompatible with -blockfilterindex."));
}

This comment has been minimized.

Copy link
@MarcoFalke

MarcoFalke Apr 18, 2019

Member

in commit ff35105:

Why is it incompatible? Maybe should add a cpp comment to explain.

See also https://github.com/bitcoin/bips/blob/master/bip-0157.mediawiki#node-operation

"Nodes MAY prune block data after generating and storing all filters for a block."

This comment has been minimized.

Copy link
@jimpo

jimpo Apr 19, 2019

Author Contributor

You can turn on the filter index and it will build in the background using the blocks on disk. This wouldn't work in pruned mode. It might be safe though to allow pruning if the block filter index tip is already within chain tip height - MIN_BLOCKS_TO_KEEP. But then we couldn't prune until the block filter index is in sync (it has its own sort of initial sync logic).

Not prohibitively difficult though, just requires a bit more thought.

@@ -118,6 +118,7 @@ static const int64_t MAX_FEE_ESTIMATION_TIP_AGE = 3 * 60 * 60;
static const bool DEFAULT_PERMIT_BAREMULTISIG = true;
static const bool DEFAULT_CHECKPOINTS_ENABLED = true;
static const bool DEFAULT_TXINDEX = false;
static const char* const DEFAULT_BLOCKFILTERINDEX = "0";

This comment has been minimized.

Copy link
@MarcoFalke

MarcoFalke Apr 18, 2019

Member

I believe we make this extern in the header, so that it is not added to each obj file?

This comment has been minimized.

Copy link
@jimpo

jimpo Apr 19, 2019

Author Contributor

Yeah. This applies to all the defaults, right?

This comment has been minimized.

Copy link
@MarcoFalke

MarcoFalke Apr 22, 2019

Member

I think only strings and cstrings and not POD types

@MarcoFalke

This comment has been minimized.

Copy link
Member

commented Apr 18, 2019

@ryanofsky Your proposed changes should probably go in a follow up pull request, as they also refactor txindex, which should be untouched by this pr

MarcoFalke added a commit to MarcoFalke/bitcoin that referenced this pull request Apr 18, 2019

Merge bitcoin#15843: tests: fix outdated include in blockfilter_index…
…_tests

89e8df1 tests: fix outdate include in blockfilter_index_tests (James O'Beirne)

Pull request description:

  Build is currently failing due to bad merge of bitcoin#15788 and bitcoin#14121.

ACKs for commit 89e8df:
  fanquake:
    tACK 89e8df1

Tree-SHA512: d3fea861f80d660b4a2827ca7241237311b68de4175d3db938a9a1d538e1325822410c98d84ba0734208af8163fbcc42cf2732788311ea22f3834c95eeb330b8

@MarcoFalke MarcoFalke removed this from Blockers in High-priority for review Apr 18, 2019

@jimpo jimpo deleted the jimpo:bip157-index branch Apr 18, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.