Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index for BIP 157 block filters #14121

Merged
merged 12 commits into from Apr 18, 2019

index: Implement block filter index with write operations.

  • Loading branch information...
jimpo committed Aug 28, 2018
commit 75a76e36199eba228856d59318cb54ca64ca6b57
@@ -133,6 +133,7 @@ BITCOIN_CORE_H = \
httprpc.h \
httpserver.h \
index/base.h \
index/blockfilterindex.h \
index/txindex.h \
indirectmap.h \
init.h \
@@ -253,6 +254,7 @@ libbitcoin_server_a_SOURCES = \
httprpc.cpp \
httpserver.cpp \
index/base.cpp \
index/blockfilterindex.cpp \
index/txindex.cpp \
interfaces/chain.cpp \
interfaces/handler.cpp \
@@ -0,0 +1,282 @@
// Copyright (c) 2018 The Bitcoin Core developers
// Distributed under the MIT software license, see the accompanying
// file COPYING or http://www.opensource.org/licenses/mit-license.php.

#include <map>

#include <index/blockfilterindex.h>
#include <util/system.h>
#include <validation.h>

/* The index database stores three items for each block: the disk location of the encoded filter,
* its dSHA256 hash, and the header. Those belonging to blocks on the active chain are indexed by
* height, and those belonging to blocks that have been reorganized out of the active chain are
* indexed by block hash. This ensures that filter data for any block that becomes part of the

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Mar 25, 2019

Contributor

In commit "index: Implement block filter index with write operations." (5964d2f)

It'd be good to mention that values and not just the keys are different in these two cases (height keys appear to have values prefixed by the block hash).

* active chain can always be retrieved, alleviating timing concerns.
*
* The filters themselves are stored in flat files and referenced by the LevelDB entries. This
* minimizes the amount of data written to LevelDB and keeps the database values constant size. The
* disk location of the next block filter to be written (represented as a FlatFilePos) is stored
* under the DB_FILTER_POS key.
*
* Keys for the height index have the type [DB_BLOCK_HEIGHT, uint32 (BE)]. The height is represented
* as big-endian so that sequential reads of filters by height are fast.
* Keys for the hash index have the type [DB_BLOCK_HASH, uint256].
*/
constexpr char DB_BLOCK_HASH = 's';
constexpr char DB_BLOCK_HEIGHT = 't';
constexpr char DB_FILTER_POS = 'P';
This conversation was marked as resolved by jimpo

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Mar 25, 2019

Contributor

In commit "index: Implement block filter index with write operations." (5964d2f)

Can you mention the DB_FILTER_POS key in the comment above? I also think DB_NEXT_FILTER_POS would be a slightly more descriptive and consistent name.

This comment has been minimized.

Copy link
@jimpo

jimpo Apr 5, 2019

Author Contributor

Added a comment.


constexpr unsigned int MAX_FLTR_FILE_SIZE = 0x1000000; // 16 MiB
/** The pre-allocation chunk size for fltr?????.dat files */

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Mar 25, 2019

Contributor

In commit "index: Implement block filter index with write operations." (5964d2f)

Obviously bikeshedding here but "filt" seems like a more obvious abbreviation for filter than "fltr".

constexpr unsigned int FLTR_FILE_CHUNK_SIZE = 0x100000; // 1 MiB

namespace {

struct DBVal {
uint256 hash;
uint256 header;
FlatFilePos pos;

ADD_SERIALIZE_METHODS;

template <typename Stream, typename Operation>
inline void SerializationOp(Stream& s, Operation ser_action) {
READWRITE(hash);
READWRITE(header);
READWRITE(pos);
}
};

struct DBHeightKey {
int height;

DBHeightKey() : height(0) {}
DBHeightKey(int height_in) : height(height_in) {}

This comment has been minimized.

Copy link
@MarcoFalke

MarcoFalke Apr 18, 2019

Member

in commit 75a76e3

Should this be explicit?

This comment has been minimized.

Copy link
@jimpo

jimpo Apr 19, 2019

Author Contributor

Yeah, probably a good idea.


template<typename Stream>
void Serialize(Stream& s) const
{
ser_writedata8(s, DB_BLOCK_HEIGHT);
ser_writedata32be(s, height);
}

template<typename Stream>
void Unserialize(Stream& s)
{
char prefix = ser_readdata8(s);
if (prefix != DB_BLOCK_HEIGHT) {
throw std::ios_base::failure("Invalid format for block filter index DB height key");
}
height = ser_readdata32be(s);
}
};

struct DBHashKey {
uint256 hash;

DBHashKey(const uint256& hash_in) : hash(hash_in) {}

ADD_SERIALIZE_METHODS;

template <typename Stream, typename Operation>
inline void SerializationOp(Stream& s, Operation ser_action) {
char prefix = DB_BLOCK_HASH;
READWRITE(prefix);
if (prefix != DB_BLOCK_HASH) {
throw std::ios_base::failure("Invalid format for block filter index DB hash key");
}

READWRITE(hash);
}
};

}; // namespace

BlockFilterIndex::BlockFilterIndex(BlockFilterType filter_type,
size_t n_cache_size, bool f_memory, bool f_wipe)
: m_filter_type(filter_type)
{
const std::string& filter_name = BlockFilterTypeName(filter_type);
if (filter_name.empty()) throw std::invalid_argument("unknown filter_type");

fs::path path = GetDataDir() / "indexes" / "blockfilter" / filter_name;
fs::create_directories(path);

m_name = filter_name + " block filter index";
m_db = MakeUnique<BaseIndex::DB>(path / "db", n_cache_size, f_memory, f_wipe);
m_filter_fileseq = MakeUnique<FlatFileSeq>(std::move(path), "fltr", FLTR_FILE_CHUNK_SIZE);
}

bool BlockFilterIndex::Init()
{
if (!m_db->Read(DB_FILTER_POS, m_next_filter_pos)) {

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Mar 25, 2019

Contributor

In commit "index: Implement block filter index with write operations." (5964d2f)

Is it possible to distinguish between entry not existing here and failure to deserialize? Might be a little nicer to be able to return an error if there's something weird here, instead of resetting the next write to position 0.

This comment has been minimized.

Copy link
@jimpo

jimpo Apr 5, 2019

Author Contributor

Very good point!

// Check that the cause of the read failure is that the key does not exist. Any other errors
// indicate database corruption or a disk failure, and starting the index would cause
// further corruption.
if (m_db->Exists(DB_FILTER_POS)) {
return error("%s: Cannot read current %s state; index may be corrupted",
__func__, GetName());
}

// If the DB_FILTER_POS is not set, then initialize to the first location.
m_next_filter_pos.nFile = 0;
m_next_filter_pos.nPos = 0;
}
return BaseIndex::Init();
}

bool BlockFilterIndex::CommitInternal(CDBBatch& batch)
{
const FlatFilePos& pos = m_next_filter_pos;

// Flush current filter file to disk.
CAutoFile file(m_filter_fileseq->Open(pos), SER_DISK, CLIENT_VERSION);
if (file.IsNull()) {
return error("%s: Failed to open filter file %d", __func__, pos.nFile);
}
if (!FileCommit(file.Get())) {
return error("%s: Failed to commit filter file %d", __func__, pos.nFile);
}

batch.Write(DB_FILTER_POS, pos);
return BaseIndex::CommitInternal(batch);
}

size_t BlockFilterIndex::WriteFilterToDisk(FlatFilePos& pos, const BlockFilter& filter)
{
assert(filter.GetFilterType() == GetFilterType());

size_t data_size =
GetSerializeSize(filter.GetBlockHash(), CLIENT_VERSION) +
GetSerializeSize(filter.GetEncodedFilter(), CLIENT_VERSION);

// If writing the filter would overflow the file, flush and move to the next one.

This comment has been minimized.

Copy link
@jamesob

jamesob Mar 20, 2019

Member

Conceptual nit: this logic seems like it should live in FlatFileSeq since it seems similar in nature to FlatFileSeq::Allocate().

This comment has been minimized.

Copy link
@jimpo

jimpo Mar 22, 2019

Author Contributor

I agree, but the logic for block/undo files is weird because they are synchronized (like the undo file numbers increment when the block file numbers do, not at a size limit).

Would be a good thing to look at refactoring after this is merged maybe.

if (pos.nPos + data_size > MAX_FLTR_FILE_SIZE) {
CAutoFile last_file(m_filter_fileseq->Open(pos), SER_DISK, CLIENT_VERSION);
if (last_file.IsNull()) {
LogPrintf("%s: Failed to open filter file %d\n", __func__, pos.nFile);
return 0;
}
if (!TruncateFile(last_file.Get(), pos.nPos)) {
LogPrintf("%s: Failed to truncate filter file %d\n", __func__, pos.nFile);
return 0;
}
if (!FileCommit(last_file.Get())) {
LogPrintf("%s: Failed to commit filter file %d\n", __func__, pos.nFile);
return 0;
}

pos.nFile++;
pos.nPos = 0;
}

// Pre-allocate sufficient space for filter data.
bool out_of_space;
m_filter_fileseq->Allocate(pos, data_size, out_of_space);
if (out_of_space) {
LogPrintf("%s: out of disk space\n", __func__);
return 0;
}

CAutoFile fileout(m_filter_fileseq->Open(pos), SER_DISK, CLIENT_VERSION);
if (fileout.IsNull()) {
LogPrintf("%s: Failed to open filter file %d\n", __func__, pos.nFile);
return 0;
}

fileout << filter.GetBlockHash() << filter.GetEncodedFilter();
return data_size;
}

bool BlockFilterIndex::WriteBlock(const CBlock& block, const CBlockIndex* pindex)
{
CBlockUndo block_undo;
uint256 prev_header;

if (pindex->nHeight > 0) {
if (!UndoReadFromDisk(block_undo, pindex)) {
return false;
}

std::pair<uint256, DBVal> read_out;
if (!m_db->Read(DBHeightKey(pindex->nHeight - 1), read_out)) {
return false;
}

uint256 expected_block_hash = pindex->pprev->GetBlockHash();
if (read_out.first != expected_block_hash) {
return error("%s: previous block header belongs to unexpected block %s; expected %s",
__func__, read_out.first.ToString(), expected_block_hash.ToString());
}

prev_header = read_out.second.header;
}

BlockFilter filter(m_filter_type, block, block_undo);

size_t bytes_written = WriteFilterToDisk(m_next_filter_pos, filter);
if (bytes_written == 0) return false;

std::pair<uint256, DBVal> value;
value.first = pindex->GetBlockHash();
value.second.hash = filter.GetHash();
value.second.header = filter.ComputeHeader(prev_header);
value.second.pos = m_next_filter_pos;

if (!m_db->Write(DBHeightKey(pindex->nHeight), value)) {
return false;
}

m_next_filter_pos.nPos += bytes_written;

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Mar 28, 2019

Contributor

In commit "index: Implement lookup methods on block filter index." (88ecade)

This change should be moved to the previous commit (5964d2f), which doesn't compile without it.

return true;
}

static bool CopyHeightIndexToHashIndex(CDBIterator& db_it, CDBBatch& batch,
const std::string& index_name,
int start_height, int stop_height)
{
DBHeightKey key(start_height);
db_it.Seek(key);

for (int height = start_height; height <= stop_height; ++height) {
if (!db_it.GetKey(key) || key.height != height) {
return error("%s: unexpected key in %s: expected (%c, %d)",
__func__, index_name, DB_BLOCK_HEIGHT, height);
}

std::pair<uint256, DBVal> value;
if (!db_it.GetValue(value)) {
return error("%s: unable to read value in %s at key (%c, %d)",
__func__, index_name, DB_BLOCK_HEIGHT, height);
}

batch.Write(DBHashKey(value.first), std::move(value.second));

db_it.Next();
}
return true;
}

bool BlockFilterIndex::Rewind(const CBlockIndex* current_tip, const CBlockIndex* new_tip)
{
assert(current_tip->GetAncestor(new_tip->nHeight) == new_tip);

CDBBatch batch(*m_db);
std::unique_ptr<CDBIterator> db_it(m_db->NewIterator());

// During a reorg, we need to copy all filters for blocks that are getting disconnected from the
// height index to the hash index so we can still find them when the height index entries are
// overwritten.
if (!CopyHeightIndexToHashIndex(*db_it, batch, m_name, new_tip->nHeight, current_tip->nHeight)) {
return false;
}

// The latest filter position gets written in Commit by the call to the BaseIndex::Rewind.

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Mar 28, 2019

Contributor

In commit "index: Implement block filter index with write operations." (5964d2f)

This comment and the general control flow here where BlockFilterIndex::Rewind calls BaseIndex::Rewind which calls BlockFilterIndex::Commit which calls BaseIndex::Commit is confusing to follow (made worse by there being two BaseIndex::Commit methods). I flattened the control flow in a (+42/-49) simplification in 8bb65ce (pr/blockfilt), and I think it'd be good to squash these changes into the PR, especially since the early commit that introduces Commit()/Rewind() doesn't compile right now anyway.

This comment has been minimized.

Copy link
@jimpo

jimpo Apr 6, 2019

Author Contributor

I looked over your proposed changes. I updated one of the commit methods to CommitInternal as suggested (though the opposite one from you -- I think Commit calling CommitInternal makes more sense).

However, I don't really like the change in behavior to Commit/Rewind. Commit just does DB/disk persistence, which is why errors can generally be ignored, so giving it responsibility for writing m_best_block_index makes everything more confusing. However, Rewind is a special case where ignoring errors in Commit would be unsafe. I added more comments on the methods to explain this.

This comment has been minimized.

Copy link
@ryanofsky

ryanofsky Apr 10, 2019

Contributor

re: #14121 (comment)

I don't really like the change in behavior to Commit/Rewind. Commit just does DB/disk persistence, which is why errors can generally be ignored

Up to you but I'd encourage you to take another look. Your comment sounds more like a reaction to the method naming and not the substance of the change. I agree with you naming in my suggested change is not ideal. I actually wanted to call the base Commit() method UpdatePosition() and the overridden Commit() method Flush(), but I reverted back to your names to try to avoid adding unrelated differences. (IMO, all the mentions of committing and atomicity in this PR are misleading because almost nothing about the implementation is atomic. It is true that updates of m_best_block, DB_BEST_BLOCK, and fltr*.dat files are ordered, but operations being ordered is different from them being atomic, and it feels like too much is implied by the current naming even if the code is technically correct.)

In any case, here are the advantages of my change beyond reducing the amount of code:

  1. It updates m_best_block_index in the same place as DB_BEST_BLOCK.
  2. It flattens control flow, getting rid the BlockFilterIndex::Rewind calling BaseIndex::Rewind calling BlockFilterIndex::Commit calling BaseIndex::Commit sequence and the whole call super anti-pattern.

It's sounds crazy to me to cite updating m_best_block_index and DB_BEST_BLOCK in different places instead of the same place as an advantage of the current code. This seems like a footgun and anti-ergonomic decision, like putting the steering wheel on the outside of a car. When you want to turn your car left, do you just want to turn the wheel? Or is it better to stop the car, get out, point the wheel, make the turn, stop again, straighten, then keep going? When you update the position, do you just want to call one method? Or is it better to manually set a member variable and go through a rube-goldberg sequence of virtual method calls?

// But since this creates new references to the filter, the position should get updated here
// atomically as well in case Commit fails.
batch.Write(DB_FILTER_POS, m_next_filter_pos);
if (!m_db->WriteBatch(batch)) return false;

return BaseIndex::Rewind(current_tip, new_tip);
}
@@ -0,0 +1,53 @@
// Copyright (c) 2018 The Bitcoin Core developers
// Distributed under the MIT software license, see the accompanying
// file COPYING or http://www.opensource.org/licenses/mit-license.php.

#ifndef BITCOIN_INDEX_BLOCKFILTERINDEX_H
#define BITCOIN_INDEX_BLOCKFILTERINDEX_H

#include <blockfilter.h>
#include <chain.h>
#include <flatfile.h>
#include <index/base.h>

/**
* BlockFilterIndex is used to store and retrieve block filters, hashes, and headers for a range of
* blocks by height. An index is constructed for each supported filter type with its own database
* (ie. filter data for different types are stored in separate databases).
*
* This index is used to serve BIP 157 net requests.
*/
class BlockFilterIndex final : public BaseIndex
{
private:
BlockFilterType m_filter_type;
std::string m_name;
std::unique_ptr<BaseIndex::DB> m_db;

FlatFilePos m_next_filter_pos;
std::unique_ptr<FlatFileSeq> m_filter_fileseq;

size_t WriteFilterToDisk(FlatFilePos& pos, const BlockFilter& filter);

protected:
bool Init() override;

bool CommitInternal(CDBBatch& batch) override;

bool WriteBlock(const CBlock& block, const CBlockIndex* pindex) override;

bool Rewind(const CBlockIndex* current_tip, const CBlockIndex* new_tip) override;

BaseIndex::DB& GetDB() const override { return *m_db; }

const char* GetName() const override { return m_name.c_str(); }

public:
/** Constructs the index, which becomes available to be queried. */
explicit BlockFilterIndex(BlockFilterType filter_type,

This comment has been minimized.

Copy link
@practicalswift

practicalswift Dec 28, 2018

Member

Isn’t explicit redundant here?

This comment has been minimized.

Copy link
@Empact

Empact Mar 10, 2019

Member

Not as of C++11, because of list initialization
https://en.cppreference.com/w/cpp/language/explicit

size_t n_cache_size, bool f_memory = false, bool f_wipe = false);

BlockFilterType GetFilterType() const { return m_filter_type; }
};

#endif // BITCOIN_INDEX_BLOCKFILTERINDEX_H
ProTip! Use n and p to navigate between commits in a pull request.
You can’t perform that action at this time.