Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to optional specify the directory for the blocks storage #12653

Merged
merged 3 commits into from Mar 27, 2018

Conversation

Projects
None yet
@jonasschnelli
Copy link
Member

jonasschnelli commented Mar 9, 2018

Since the actual block files taking up more and more space, it may be desirable to have them stored in a different location then the data directory (use case: SSD for chainstate, etc., HD for blocks).

This PR adds a -blocksdir option that allows one to keep the blockfiles and the blockindex external from the data directory (instead of creating symlinks).

I fist had an option to keep the blockindex within the datadir, but seems to make no sense since accessing the index will (always) lead to access (r/w) the block files.

@jonasschnelli jonasschnelli force-pushed the jonasschnelli:2018/03/blocksdir branch 2 times, most recently Mar 9, 2018

@randolf

randolf approved these changes Mar 9, 2018

Copy link
Contributor

randolf left a comment

As the size of the blockchain continues to increase, this will facilitate the need to store the blocks on a different hard drive, hence this feature serves a probable future need. There are likely other uses for this feature too.

@eklitzke

This comment has been minimized.

Copy link
Member

eklitzke commented Mar 10, 2018

The argument for keeping the block index in the data dir would be that in the vast majority of cases it means the block index will be on the same filesystem as the utxo database, and therefore we can rely on regular filesystem semantics regarding data ordering. If the block index and utxo database are on different filesystems then it's not possible to guarantee that one is ahead of the other without fully serializing the operations with fsync.

I will try to look through the code more and think about this more closely. I think this change is fine the way the code currently works because the flushes are basically serialized anyway (and we fsync for block storage and block index updates), but this might become annoying long term.

@jonasschnelli jonasschnelli force-pushed the jonasschnelli:2018/03/blocksdir branch Mar 11, 2018

@jonasschnelli

This comment has been minimized.

Copy link
Member Author

jonasschnelli commented Mar 11, 2018

Added a commit (basically a one-liner) that will make the block index database stick in datadir.

@eklitzke

This comment has been minimized.

Copy link
Member

eklitzke commented Mar 11, 2018

utACK Once you fix the python lint failure.

@promag

This comment has been minimized.

Copy link
Member

promag commented Mar 11, 2018

Concept ACK.

@@ -182,6 +182,7 @@ void ReleaseDirectoryLocks();

bool TryCreateDirectories(const fs::path& p);
fs::path GetDefaultDataDir();
const fs::path &GetBlocksDir(bool fNetSpecific = true);

This comment has been minimized.

@MarcoFalke

MarcoFalke Mar 11, 2018

Member

I am wondering why this should be net specific, considering that we have the same concept for wallets and the GetWalletDir is not net specific.

This comment has been minimized.

@laanwj

laanwj Mar 15, 2018

Member

Isn't the blocks directory by definition net specific?

test/functional/feature_blocksdir.py Outdated

def run_test(self):
self.stop_node(0)
node0path = os.path.join(self.options.tmpdir, "node0");

This comment has been minimized.

@promag

promag Mar 12, 2018

Member

Remove ;.

test/functional/feature_blocksdir.py Outdated
shutil.rmtree(node0path)
initialize_datadir(self.options.tmpdir, 0)
self.log.info("Starting with non exiting blocksdir ...")
self.assert_start_raises_init_error(0, ["-blocksdir="+self.options.tmpdir+ "/blocksdir"], "Specified blocks director")

This comment has been minimized.

@promag

promag Mar 12, 2018

Member

Add

blocksdir = os.path.join(self.options.tmpdir, "blocksdir")

and reuse here and below.

src/txdb.cpp Outdated
@@ -147,7 +147,7 @@ size_t CCoinsViewDB::EstimateSize() const
return db.EstimateSize(DB_COIN, (char)(DB_COIN+1));
}

CBlockTreeDB::CBlockTreeDB(size_t nCacheSize, bool fMemory, bool fWipe) : CDBWrapper(GetDataDir() / "blocks" / "index", nCacheSize, fMemory, fWipe) {
CBlockTreeDB::CBlockTreeDB(size_t nCacheSize, bool fMemory, bool fWipe) : CDBWrapper(gArgs.IsArgSet("-blocksdir") ? GetDataDir() / "blockindex" : GetBlocksDir() / "index", nCacheSize, fMemory, fWipe) {

This comment has been minimized.

@promag

promag Mar 12, 2018

Member

"blockindex"? Should be "blocks" / "index"?

Maybe add GetBlockIndexDir()?

This comment has been minimized.

@laanwj

laanwj Mar 15, 2018

Member

Yes, this seems wrong.
I tried providing my existing blocks directory:

$ src/bitcoind -testnet -blocksdir=/store2/tmp/bitcoin/testnet3/blocks -printtoconsole
2018-03-15T19:29:04Z Opening LevelDB in /.../.bitcoin/testnet3/blockindex
2018-03-15T19:29:04Z Opened LevelDB successfully
...
2018-03-15T19:29:05Z : Error initializing block database.
Please restart with -reindex or -reindex-chainstate to recover.
: Error initializing block database.
Please restart with -reindex or -reindex-chainstate to recover.

This fails. Shouldn't it use the index directory within the blocksdir?

This comment has been minimized.

@jonasschnelli

jonasschnelli Mar 20, 2018

Author Member

Using the same directory looks most obvious correct, though, if you switch the blocks dir to a different path (with -blocksdir=<path>) users may run into troubles since the old blocks dir may still be in the datadir...
But I agree, lets keep "block" / "Index" for now...
fixed.

@laanwj

This comment has been minimized.

Copy link
Member

laanwj commented Mar 13, 2018

Concept ACK, this would avoid some symlinking gymnastics I'm currently doing.
(ref; #10922)

@laanwj laanwj self-assigned this Mar 13, 2018

@hkjn

This comment has been minimized.

Copy link
Contributor

hkjn commented Mar 13, 2018

ACK 34ec8e3, once lint issues are fixed.

Does what it says on the tin, seems like a useful feature.

@eklitzke

This comment has been minimized.

Copy link
Member

eklitzke commented Mar 15, 2018

I might have to retract my earlier statement, as @sipa says here that the blocks index and block files should be on the same filesystem, contradicting my earlier statement. I'm not sure if I understand that reasoning though, as we fsync after writing the block and before committing to the block index, which should be fine across filesystems. If it's really true that they need to be on the same filesystem that would be problematic in the future if we ever wanted to merge the block index and the chainstate index.

@laanwj

This comment has been minimized.

Copy link
Member

laanwj commented Mar 15, 2018

I think this is almost ready. Will test. However, python-nanny wants you to fix:

./test/functional/feature_blocksdir.py:20:63: E703 statement ends with a semicolon
^---- failure generated from contrib/devtools/lint-python.sh
@donaloconnor

This comment has been minimized.

Copy link
Contributor

donaloconnor commented Mar 15, 2018

utACK

@jonasschnelli jonasschnelli force-pushed the jonasschnelli:2018/03/blocksdir branch Mar 18, 2018

@laanwj

This comment has been minimized.

Copy link
Member

laanwj commented Mar 19, 2018

I might have to retract my earlier statement, as @sipa says here that the blocks index and block files should be on the same filesystem

Wouldn't this imply that we're better off without commit 4855b9e? That would keep the block index with the blocks, which is @sipa's preference.

@jonasschnelli jonasschnelli force-pushed the jonasschnelli:2018/03/blocksdir branch to a192636 Mar 20, 2018

@jonasschnelli

This comment has been minimized.

Copy link
Member Author

jonasschnelli commented Mar 20, 2018

I might have to retract my earlier statement, as @sipa says here that the blocks index and block files should be on the same filesystem

Wouldn't this imply that we're better off without commit 4855b9e? That would keep the block index with the blocks, which is @sipa's preference.

@sipa: can you elaborate your concerns of having the index outside of the blocks directory?

@sipa

This comment has been minimized.

Copy link
Member

sipa commented Mar 20, 2018

@jonasschnelli @eklitzke I don't think there's an actual problem with having the blocks and the block index on separate filesystems. I was worried about them going out of sync due to management (you copy one but not the other, etc).

@jonasschnelli

This comment has been minimized.

Copy link
Member Author

jonasschnelli commented Mar 20, 2018

Thanks @sipa.
I think then we should keep the last commit (a192636).

@laanwj

This comment has been minimized.

Copy link
Member

laanwj commented Mar 27, 2018

In that case, utACK a192636

@laanwj laanwj merged commit a192636 into bitcoin:master Mar 27, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

laanwj added a commit that referenced this pull request Mar 27, 2018

Merge #12653: Allow to optional specify the directory for the blocks …
…storage

a192636 -blocksdir: keep blockindex leveldb database in datadir (Jonas Schnelli)
f38e4fd QA: Add -blocksdir test (Jonas Schnelli)
386a6b6 Allow to optional specify the directory for the blocks storage (Jonas Schnelli)

Pull request description:

  Since the actual block files taking up more and more space, it may be desirable to have them stored in a different location then the data directory (use case: SSD for chainstate, etc., HD for blocks).

  This PR adds a `-blocksdir` option that allows one to keep the blockfiles and the blockindex external from the data directory (instead of creating symlinks).

  I fist had an option to keep the blockindex within the datadir, but seems to make no sense since accessing the index will (always) lead to access (r/w) the block files.

Tree-SHA512: f8b9e1a681679eac25076dc30e45e6e12d4b2d9ac4be907cbea928a75af081dbcb0f1dd3e97169ab975f73d0bd15824c00c2a34638f3b284b39017171fce2409
@Sjors

This comment has been minimized.

Copy link
Member

Sjors commented Mar 29, 2018

Sorry I'm late to the party, but I'm seeing some strange behavior on macOS, see #12828.

@opacey

This comment has been minimized.

Copy link

opacey commented Apr 3, 2018

I ran a little test just to see the effect of putting the entire datadir an external USB 3.0 HDD versus the internal SSD (cross ref #10736):
MacBookPro2013 ExternalUSBHDD (NoNetwork) = ~1min, CPU ~35%
MacBookPro2013 InternalSSD (NoNetwork) = ~2.0sec, CPU ~120%
MacBookPro2017 InternalSSD (NoNetwork) = ~2.0sec, CPU ~120%
MacBookPro2017 InternalSSD (NetworkOn) = ~2.5sec, CPU ~120%

Timings are gaps between block validations. There is a ~30x slow down when using the USB 3.0 HDD.

I also experimented with different combinations of the following settings:
par=1, 2, 3, 4
dbcache=1000, 2000, 4000, 5000
maxsigcachesize=64, 1000
...they didn't seem to have a significant effect on timing except dbcache (bigger = better).

I'm surprised that signature verification speed increase is negligible between a 2013 Core i5 and a 2017 Core i5 [and that par doesn't seem to effect results significantly, though that's off topic.]

@promag

This comment has been minimized.

Copy link
Member

promag commented Oct 2, 2018

This PR adds a -blocksdir option that allows one to keep the blockfiles and the blockindex external from the data directory (instead of creating symlinks).

@jonasschnelli could you update/fix the PR description?

Edit: Done

hebasto added a commit to hebasto/bitcoin that referenced this pull request Oct 5, 2018

Add "Blocksdir" to Debug window
To get the current blocksdir is valuable for debug purposes after
merging bitcoin#12653.

laanwj added a commit that referenced this pull request Oct 18, 2018

Merge #14374: qt: Add "Blocksdir" to Debug window
2ab9140 Add tooltips for both datadir and blocksdir (Hennadii Stepanov)
3045704 Add "Blocksdir" to Debug window (Hennadii Stepanov)

Pull request description:

  To get the current `blocksdir` is valuable for debug purposes after
  merging #12653.

  ![screenshot from 2018-10-02 23-16-52](https://user-images.githubusercontent.com/32963518/46374770-2ef6f580-c69a-11e8-85c2-44a49fa36b28.png)

Tree-SHA512: a93f2c00ee19cf6acb499d3bd9bccf4be8ef01c53c44d917ad401aa4797db02cbccb71a9c24e05262ea09345e15f9299381367fdc6951f21dd3788a4a58d2132

fish-en pushed a commit to fish-en/bitcoin that referenced this pull request Nov 11, 2018

Add "Blocksdir" to Debug window
To get the current blocksdir is valuable for debug purposes after
merging bitcoin#12653.

fish-en added a commit to fish-en/bitcoin that referenced this pull request Nov 13, 2018

Add "Blocksdir" to Debug window
To get the current blocksdir is valuable for debug purposes after
merging bitcoin#12653.

jfhk added a commit to jfhk/bitcoin that referenced this pull request Nov 14, 2018

Add "Blocksdir" to Debug window
To get the current blocksdir is valuable for debug purposes after
merging bitcoin#12653.

HashUnlimited pushed a commit to HashUnlimited/chaincoin that referenced this pull request Nov 26, 2018

Add "Blocksdir" to Debug window
To get the current blocksdir is valuable for debug purposes after
merging bitcoin#12653.
@Empact

This comment has been minimized.

Copy link
Member

Empact commented Jan 7, 2019

laanwj added a commit that referenced this pull request Jan 9, 2019

Merge #15124: Fail AppInitMain if either disk space check fails
ba8c8b2 Fail if either disk space check fails (Ben Woosley)

Pull request description:

  Rather than both.

  Introduced in 386a6b6, #12653

Tree-SHA512: 24765dd3c62b742c491d7d9a751917c2ce6f3819a8764a7725ce84910ef69bffca07f4c0dfbeed8c4f978a12c4b04a2ac3b8c2ff59602330a8a3e8a68878c41b

laanwj added a commit that referenced this pull request Jan 16, 2019

Merge #14409: utils and libraries: Make 'blocksdir' always net specific
e4a0c35 Improve blocksdir functional test. (Hennadii Stepanov)
c3f1821 Make blockdir always net specific (Hennadii Stepanov)

Pull request description:

  The blocks directory is net specific by definition.

  Also this prevents the side effect of calling `GetBlocksDir(false)` in the non-mainnet environment.
  Currently a new node creates an unused `blocks\` directory in the root of the data directory when `-testnet` or `-regtest` is specified.

  Refs:
  - #12653
  - #12653 (comment) by @laanwj
  - #14595 (comment)

Tree-SHA512: c9957a68a4a200ebd2010823a56db7e61563afedcb7c9828e86b13f3af2990e07854b622c1f3374756f94574acb3ea32de7d2a399eef6c0623f0e11265155627
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.