Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor block file logic #15118

Merged
merged 9 commits into from Mar 2, 2019

Conversation

@jimpo
Copy link
Contributor

jimpo commented Jan 7, 2019

This cleans up and refactors block file helpers so that they may be used by the block filter indexer. Per design discussion about storing BIP 157 block filters, it has been suggested that they are stored in the same way as block and undo data. This refactor is sufficient to simplify file operations for this use case, though in the future perhaps more pruning-related logic ought to be moved into the new classes.

The basic abstraction is a FlatFileSeq which manages access to a sequence of numbered files into which raw data is written.

@jimpo jimpo changed the title Flatfile Refactor block file logic Jan 7, 2019

@fanquake fanquake added the Refactoring label Jan 7, 2019

src/init.cpp Outdated
@@ -1631,8 +1631,8 @@ bool AppInitMain(InitInterfaces& interfaces)

// ********************************************************* Step 11: import blocks

if (!CheckDiskSpace() && !CheckDiskSpace(0, true))
return false;
if (!CheckDiskSpace(GetDataDir()) && !CheckDiskSpace(GetBlocksDir()))

This comment has been minimized.

@Empact

Empact Jan 7, 2019

Member

Should this not be ||? /cc @jonasschnelli 386a6b6

This comment has been minimized.

@jimpo

jimpo Jan 7, 2019

Author Contributor

Yeah, seems like it to me.

This comment has been minimized.

@Empact
Show resolved Hide resolved src/util/system.cpp Outdated

@jimpo jimpo force-pushed the jimpo:flatfile branch Jan 7, 2019

Show resolved Hide resolved src/flatfile.h Outdated
src/flatfile.h Outdated
*
* @param[in] pos The first unwritten position in the file to be flushed.
* @param[in] finalize True if no more data will be written to this file.
* @return True on success, false on failure.

This comment has been minimized.

@Empact

Empact Jan 7, 2019

Member

nit: true

This comment has been minimized.

@kallewoof

kallewoof Jan 30, 2019

Member

Start of sentence True is fine IMO.

Show resolved Hide resolved src/validation.cpp
Show resolved Hide resolved src/flatfile.h
@Empact

This comment has been minimized.

Copy link
Member

Empact commented Jan 7, 2019

Concept ACK hooray objects!

@promag
Copy link
Member

promag left a comment

Concept ACK.

Code review up to ad59122.

Show resolved Hide resolved src/validation.cpp Outdated
src/validation.cpp Outdated
@@ -3024,7 +3028,7 @@ static bool FindBlockPos(CDiskBlockPos &pos, unsigned int nAddSize, unsigned int
}
}
else

This comment has been minimized.

@promag

promag Jan 7, 2019

Member

Commit 2af8b71

Care add {}?

This comment has been minimized.

@jimpo

jimpo Jan 7, 2019

Author Contributor

I added the braces after the code is moved to FlatFileSeq::Allocate.

src/flatfile.cpp Outdated
#include <tinyformat.h>

FlatFileSeq::FlatFileSeq(fs::path dir, const char* prefix, size_t chunk_size)
: m_dir(std::move(dir)), m_prefix(prefix), m_chunk_size(chunk_size)

This comment has been minimized.

@promag

promag Jan 7, 2019

Member

Commit 88cf780

nit, initialize one per line?

This comment has been minimized.

@jimpo

jimpo Jan 7, 2019

Author Contributor

I don't see this in the style guide?

This comment has been minimized.

@promag

promag Jan 7, 2019

Member

You also don't see to declare class members in multiple lines, but fair enough.

This comment has been minimized.

@kallewoof

kallewoof Jan 30, 2019

Member

+1 on one per line as it makes diffs prettier when adding/removing stuff, but I'm fine as it is too, since it's only 3 vars.

Show resolved Hide resolved src/flatfile.h Outdated
src/flatfile.h Outdated
#ifndef BITCOIN_FLATFILE_H
#define BITCOIN_FLATFILE_H

#include <chain.h>

This comment has been minimized.

@promag

promag Jan 7, 2019

Member

Commit 88cf780

Prefer forward declaration of CDiskBlockPos?

This comment has been minimized.

@jimpo

jimpo Jan 7, 2019

Author Contributor

Good point. Though CDiskBlockPos is moved into this header later on, so I don't think it's necessary to change in the earlier commit.

Show resolved Hide resolved src/validation.cpp
src/flatfile.cpp Outdated
@@ -13,3 +14,26 @@ fs::path FlatFileSeq::FileName(const CDiskBlockPos& pos) const
{
return m_dir / strprintf("%s%05u.dat", m_prefix, pos.nFile);
}

FILE* FlatFileSeq::Open(const CDiskBlockPos& pos, bool fReadOnly)

This comment has been minimized.

@promag

promag Jan 7, 2019

Member

Commit ad59122

Just asking for a follow up, could this be changed to std::fstream?

This comment has been minimized.

@jimpo

jimpo Jan 7, 2019

Author Contributor

Maybe... but a lot of file utilities like FileCommit, AllocateFileRange, etc would have to be modified as well. I think the better approach would be to return a CAutoFile and define the move constructor/assignment operator properly. And probably move it to a different module, like in util.

@@ -310,6 +311,8 @@ static void FindFilesToPruneManual(std::set<int>& setFilesToPrune, int nManualPr
static void FindFilesToPrune(std::set<int>& setFilesToPrune, uint64_t nPruneAfterHeight);
bool CheckInputs(const CTransaction& tx, CValidationState &state, const CCoinsViewCache &inputs, bool fScriptChecks, unsigned int flags, bool cacheSigStore, bool cacheFullScriptStore, PrecomputedTransactionData& txdata, std::vector<CScriptCheck> *pvChecks = nullptr);
static FILE* OpenUndoFile(const CDiskBlockPos &pos, bool fReadOnly = false);
static FlatFileSeq BlockFileSeq();
static FlatFileSeq UndoFileSeq();

This comment has been minimized.

@jamesob

jamesob Jan 8, 2019

Member

Any reason not to just have these on-hand as global objects instead of having to reconstruct an object for each use through these functions? Obviously there isn't much of a performance concern given these calls happen before disk IO, but I'm just curious.

This comment has been minimized.

@jimpo

jimpo Jan 8, 2019

Author Contributor

They are constructed with GetDataDir/GetBlocksDir which are not static, so the initialization logic would get a bit tricky. Could be OK with lazy evaluation or something, but didn't seem worth it. Open to suggestions.

This comment has been minimized.

@jamesob

jamesob Jan 8, 2019

Member

Ah, okay. Like I said, no big deal AFAICT.

Show resolved Hide resolved src/validation.cpp Outdated
@jamesob
Copy link
Member

jamesob left a comment

All commits so far reviewed (up to 697d4c3fc). Great change aside from the bug I've noted above. This does a nice job of cleaning up and deduplicating flatfile data access code, and adds some test coverage. When taking into account the use for BIP157 filter storage (forthcoming), this is a very valuable patch.

"payments to be sent directly from one party to another without going "
"through a financial institution.");
std::string line2("Digital signatures provide part of the solution, but the main benefits are "
"lost if a trusted third party is still required to prevent double-spending.");

This comment has been minimized.

@jamesob

jamesob Jan 8, 2019

Member

Cute.

@jimpo jimpo force-pushed the jimpo:flatfile branch 2 times, most recently to fbbf962 Jan 8, 2019

Show resolved Hide resolved src/flatfile.cpp Outdated
@DrahtBot

This comment has been minimized.

Copy link
Contributor

DrahtBot commented Jan 9, 2019

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #15340 (gui: Introduce bilingual GUI error messages by hebasto)
  • #13868 (Remove unused fScriptChecks parameter from CheckInputs by Empact)
  • #13746 (-masterdatadir for datadir bootstrapping by kallewoof)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@laanwj laanwj added this to Blockers in High-priority for review Jan 24, 2019

@jimpo jimpo force-pushed the jimpo:flatfile branch from fbbf962 to 9b2d9ee Jan 24, 2019

@jamesob
Copy link
Member

jamesob left a comment

tACK 9b2d9ee, though the last two commits are in need of fixup.

I read over the code once more since the last rebase and feedback commits. To test, I ran a reindex using this branch on a datadir with a preexisting chain of height 522,000 that had been built without these changes. The reindex completed successfully. I started a recent version of master (12b3010) and saw that block verification succeeded and startup ran as normal.

I'll reACK after squash.

FlatFilePos(int nFileIn, unsigned int nPosIn) {
nFile = nFileIn;
nPos = nPosIn;
}

This comment has been minimized.

@jamesob

jamesob Jan 25, 2019

Member

I know this is kind of a move, but worth using an initializer list?

@kallewoof
Copy link
Member

kallewoof left a comment

utACK with some nits

@@ -0,0 +1,91 @@
// Copyright (c) 2009-2010 Satoshi Nakamoto
// Copyright (c) 2009-2019 The Bitcoin Core developers

This comment has been minimized.

@kallewoof

kallewoof Jan 30, 2019

Member

Copyright looked off to me initially, but I guess if you move code, the copyright stays the same. (I was expecting "2019" only).

src/flatfile.cpp Outdated
#include <tinyformat.h>

FlatFileSeq::FlatFileSeq(fs::path dir, const char* prefix, size_t chunk_size)
: m_dir(std::move(dir)), m_prefix(prefix), m_chunk_size(chunk_size)

This comment has been minimized.

@kallewoof

kallewoof Jan 30, 2019

Member

+1 on one per line as it makes diffs prettier when adding/removing stuff, but I'm fine as it is too, since it's only 3 vars.

src/flatfile.h Outdated
*
* @param[in] pos The first unwritten position in the file to be flushed.
* @param[in] finalize True if no more data will be written to this file.
* @return True on success, false on failure.

This comment has been minimized.

@kallewoof

kallewoof Jan 30, 2019

Member

Start of sentence True is fine IMO.

FILE* FlatFileSeq::Open(const FlatFilePos& pos, bool read_only)
{
if (pos.IsNull())
return nullptr;

This comment has been minimized.

@kallewoof

kallewoof Jan 30, 2019

Member

This is a move I believe, so feel free to ignore: add {} or remove newline. (Also applies to code 4 lines down.)

This comment has been minimized.

@jimpo

jimpo Feb 1, 2019

Author Contributor

Yeah, it's a move. Would rather not change.

return nullptr;
}
if (pos.nPos) {
if (fseek(file, pos.nPos, SEEK_SET)) {

This comment has been minimized.

@kallewoof

kallewoof Jan 30, 2019

Member

Can do single if here.

if (pos.nPos && fseek(file, pos.nPos, SEEK_SET)) {
@@ -71,6 +71,7 @@ void AllocateFileRange(FILE *file, unsigned int offset, unsigned int length);
bool RenameOver(fs::path src, fs::path dest);
bool LockDirectory(const fs::path& directory, const std::string lockfile_name, bool probe_only=false);
bool DirIsWritable(const fs::path& directory);
bool CheckDiskSpace(const fs::path& dir, uint64_t nAdditionalBytes = 0);

This comment has been minimized.

@kallewoof

kallewoof Jan 30, 2019

Member

Nit/style: naming convention, additional_bytes?

This comment has been minimized.

@jimpo

jimpo Feb 1, 2019

Author Contributor

This is just moving code out of validation.{h,cpp}. Would rather not also rename vars.


BOOST_CHECK_EQUAL(seq.Allocate(FlatFilePos(0, 99), 2, out_of_space), 101);
BOOST_CHECK_EQUAL(fs::file_size(seq.FileName(FlatFilePos(0, 99))), 200);
BOOST_CHECK(!out_of_space);

This comment has been minimized.

@kallewoof

kallewoof Jan 30, 2019

Member

Would adding

    uint64_t too_big = 1024 * 1024 + fs::space(data_dir).available;
    BOOST_CHECK_EQUAL(seq.Allocate(FlatFilePos(0, 0), too_big, out_of_space), 0);
    BOOST_CHECK_EQUAL(fs::file_size(seq.FileName(FlatFilePos(0, 0))), 200);
    BOOST_CHECK(out_of_space);

be overkill?

This comment has been minimized.

@jimpo

jimpo Feb 1, 2019

Author Contributor

Good idea.

This comment has been minimized.

@jimpo

jimpo Feb 1, 2019

Author Contributor

I couldn't get this test passing on Travis... tried setting too_big to a petabyte as well, and there was still some issue.

This comment has been minimized.

@kallewoof

kallewoof Feb 14, 2019

Member

The reason is that size_t is 32 bit (edit: on certain systems) and there is more than 2^32 bytes available on the system, resulting in an overflow. The Allocate method should probably take a uint64_t to avoid this.

{
constexpr uint64_t nMinDiskSpace = 52428800;
uint64_t nFreeBytesAvailable = fs::space(dir).available;
return nFreeBytesAvailable >= nMinDiskSpace + nAdditionalBytes;

This comment has been minimized.

@kallewoof

kallewoof Jan 30, 2019

Member

Nit/style: var_names

This comment has been minimized.

@jimpo

jimpo Feb 1, 2019

Author Contributor

This is just moving code out of validation.{h,cpp}. Would rather not also rename vars.

return AbortNode("Disk space is low!", _("Error: Disk space is low!"));
}
if (bytes_allocated != 0 && fPruneMode) {
fCheckForPruning = true;

This comment has been minimized.

@kallewoof

kallewoof Jan 30, 2019

Member

Can also do

fCheckForPruning |= bytes_allocated != 0 && fPruneMode;

and remove the if stuff.

return state.Error("out of disk space");
}
if (bytes_allocated != 0 && fPruneMode) {
fCheckForPruning = true;

This comment has been minimized.

@kallewoof

kallewoof Jan 30, 2019

Member

Can get rid of if case here as described above.


FlatFileSeq::FlatFileSeq(fs::path dir, const char* prefix, size_t chunk_size)
: m_dir(std::move(dir)), m_prefix(prefix), m_chunk_size(chunk_size)
{}

This comment has been minimized.

@etscrivner

etscrivner Jan 31, 2019

Contributor

Should this raise an error if chunk_size = 0?

return file;
}

size_t FlatFileSeq::Allocate(const FlatFilePos& pos, size_t add_size, bool& out_of_space)

This comment has been minimized.

@etscrivner

etscrivner Jan 31, 2019

Contributor

Should add_size = 0 be allowed? Seems like this would always cause n_new_chunks = n_old_chunks thereby causing the function to return 0.

This comment has been minimized.

@jimpo

jimpo Feb 1, 2019

Author Contributor

Returning 0 seems like the right thing to do there. Probably better than throwing an exception or something.

@@ -135,6 +135,13 @@ bool DirIsWritable(const fs::path& directory)
return true;
}

bool CheckDiskSpace(const fs::path& dir, uint64_t nAdditionalBytes)
{
constexpr uint64_t nMinDiskSpace = 52428800;

This comment has been minimized.

@etscrivner

etscrivner Jan 31, 2019

Contributor

It would be good to add an additional comment above this indicating its approximate size. Previously this comment was present:

// Check for nMinDiskSpace bytes (currently 50MB)
size_t new_size = n_new_chunks * m_chunk_size;
size_t inc_size = new_size - old_size;

if (CheckDiskSpace(m_dir, inc_size)) {

This comment has been minimized.

@etscrivner

etscrivner Jan 31, 2019

Contributor

Any concerns about the type mismatch here between inc_size (size_t) and the second argument to CheckDiskSpace which is a uint64_t?

This comment has been minimized.

@jimpo

jimpo Feb 1, 2019

Author Contributor

Do you get a compiler warning? If not, the implicit cast ought to be fine.

This comment has been minimized.

@etscrivner

etscrivner Feb 2, 2019

Contributor

No compiler warning, maybe just a note for the future that the sizes of the integer types should probably be reconciled at some point.

@jimpo jimpo force-pushed the jimpo:flatfile branch 3 times, most recently from b62dbd6 to 010d0cc Feb 1, 2019

@jimpo jimpo force-pushed the jimpo:flatfile branch 3 times, most recently from 30558ad to 3003e2d Feb 5, 2019

@jimpo jimpo force-pushed the jimpo:flatfile branch from 3003e2d to 2e476e9 Feb 13, 2019

@DrahtBot DrahtBot removed the Needs rebase label Feb 13, 2019

@ryanofsky
Copy link
Contributor

ryanofsky left a comment

utACK 2e476e9

@@ -3016,21 +3016,13 @@ static bool FindBlockPos(CDiskBlockPos &pos, unsigned int nAddSize, unsigned int
vinfoBlockFile[nFile].nSize += nAddSize;

if (!fKnown) {
unsigned int nOldChunks = (pos.nPos + BLOCKFILE_CHUNK_SIZE - 1) / BLOCKFILE_CHUNK_SIZE;
unsigned int nNewChunks = (vinfoBlockFile[nFile].nSize + BLOCKFILE_CHUNK_SIZE - 1) / BLOCKFILE_CHUNK_SIZE;

This comment has been minimized.

@ryanofsky

ryanofsky Feb 20, 2019

Contributor

In commit "validation: Refactor block file pre-allocation into FlatFileSeq." (72d6932)

Note (just for reference because I found this function confusing): In the new code this line is replaced with:

unsigned int n_new_chunks = (pos.nPos + add_size + m_chunk_size - 1) / m_chunk_size;

which is equivalent because vinfoBlockFile[nFile].nSize == pos.nPos + add_size in the !fKnown case due to:

pos.nPos = vinfoBlockFile[nFile].nSize;
...
vinfoBlockFile[nFile].nSize += nAddSize;		

above

This comment has been minimized.

@gwillen

gwillen Feb 22, 2019

Contributor

Probably the right move would be to add some comments in the new code, to reduce confusion to future readers?

This comment has been minimized.

@jimpo

jimpo Feb 22, 2019

Author Contributor

I'm happy to add comments anywhere you think it would make the new code more clear, but I don't want to add comments because the old code was unnecessarily confusing and this is a slight change for anyone who happened to be familiar with it. I personally find the new Allocate function clear enough as is (including the n_new_chunks calculation), but of course I wrote it...

A PR comment like the one above seems like a fine way to address this sort of thing IMO.

return AbortNode("Disk space is low!", _("Error: Disk space is low!"));
}
if (bytes_allocated != 0 && fPruneMode) {
fCheckForPruning = true;

This comment has been minimized.

@ryanofsky

ryanofsky Feb 20, 2019

Contributor

In commit "validation: Refactor block file pre-allocation into FlatFileSeq." (72d6932)

Probably doesn't make any difference, but previously the fCheckForPruning = true setting would happen before return AbortNode, but now it will be skipped if there is no disk space.

Same comment applies in FindUndoPos below.

fclose(file);
return error("%s: failed to truncate file %d", __func__, pos.nFile);
}
if (!FileCommit(file)) {

This comment has been minimized.

@ryanofsky

ryanofsky Feb 20, 2019

Contributor

In commit "validation: Refactor file flush logic into FlatFileSeq." (5bc3ece)

Probably ok, but previous code would still call FileCommit when TruncateFile failed, and now it is skipped.

@gwillen
Copy link
Contributor

gwillen left a comment

utACK 2e476e9, all comments are nits.

Show resolved Hide resolved src/validation.cpp Outdated
Show resolved Hide resolved src/validation.cpp Outdated
Show resolved Hide resolved src/flatfile.cpp
Show resolved Hide resolved src/validation.cpp Outdated

@jimpo jimpo force-pushed the jimpo:flatfile branch from 2e476e9 to dc10b9b Feb 22, 2019

@jimpo jimpo force-pushed the jimpo:flatfile branch from dc10b9b to 04cca33 Feb 23, 2019

@laanwj

This comment has been minimized.

Copy link
Member

laanwj commented Mar 2, 2019

utACK 04cca33

@laanwj laanwj merged commit 04cca33 into bitcoin:master Mar 2, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

laanwj added a commit that referenced this pull request Mar 2, 2019

Merge #15118: Refactor block file logic
04cca33 Style cleanup. (Jim Posen)
4c01e4e flatfile: Unit tests for FlatFileSeq methods. (Jim Posen)
65a489e scripted-diff: Rename CBlockDiskPos to FlatFilePos. (Jim Posen)
d6d8a78 Move CDiskBlockPos from chain to flatfile. (Jim Posen)
e038093 validation: Refactor file flush logic into FlatFileSeq. (Jim Posen)
992404b validation: Refactor block file pre-allocation into FlatFileSeq. (Jim Posen)
e2d2abb validation: Refactor OpenDiskFile into method on FlatFileSeq. (Jim Posen)
9183d6e validation: Extract basic block file logic into FlatFileSeq class. (Jim Posen)
62e7add util: Move CheckDiskSpace to util. (Jim Posen)

Pull request description:

  This cleans up and refactors block file helpers so that they may be used by the block filter indexer. Per [design discussion](#14121 (comment)) about storing BIP 157 block filters, it has been suggested that they are stored in the same way as block and undo data. This refactor is sufficient to simplify file operations for this use case, though in the future perhaps more pruning-related logic ought to be moved into the new classes.

  The basic abstraction is a `FlatFileSeq` which manages access to a sequence of numbered files into which raw data is written.

Tree-SHA512: b2108756777f2dad8964a1a2ef2764486e708a4a4a8cfac47b5de8bcb0625388964438eb096b10cfd9ea39212c299b5cb32fa943e768db2333cf49ea7def157e

@fanquake fanquake removed this from Blockers in High-priority for review Mar 2, 2019

@jimpo jimpo deleted the jimpo:flatfile branch Mar 5, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.