Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New file-partition.md doc describing how to partition files to ensure fast initial blockchain synchronization.. #10922

Closed
wants to merge 11 commits into from

Conversation

Projects
None yet
7 participants
@jimhashhq
Copy link

jimhashhq commented Jul 24, 2017

After native build from source on Mac OS, my initial attempts to synchronize the blockchain were very very slow. Upon finding Issue Sync Taking Too Long, I found discussion by all and comments by @sipa in particular to be very useful, and reorganized $datadir folders on my local macOS build/install and summarized steps taken in file-partition.md doc. These comments might find their audience more appropriately elsewhere, please feel free to suggest, thank you very much.
-jimhash
Note: This looks to be logged as issue: #10736

jimhashhq added some commits Jul 24, 2017

Create file-partition.md
Describe partitioning of datadir files between the high-frequency/low-capacity "index" files and the low-frequency/high-capacity "blocks" files.  These steps are probably obvious to more adept bitcoind admins, but for newbies, like myself I didn't see these steps written anywhere else.
@sipa

This comment has been minimized.

Copy link
Member

sipa commented Jul 25, 2017

I think you missed something. There are three directories that matter:

  • $DATADIR/blocks (raw blocks)
  • $DATADIR/blocks/index (LevelDB database with information about raw blocks)
  • $DATADIR/chainstate (LevelDB database that holds the UTXO set)

The first is high-bandwidth, low IO. The second has hardly any activity at all. The third is where all activity happens, and is critical for performance.

You should not separate the blocks from the blocks/index directory, as things may get ugly if they're out of sync. You can however put the chainstate on a faster/smaller device.

@jimhashhq

This comment has been minimized.

Copy link
Author

jimhashhq commented Jul 25, 2017

Hi @sipa, thank you very much for the prompt follow up. Per your feedback, I have updated the doc to explicitly call out that chainstate folder must also stay on a fast (internal) disk. The "file-partition.md" notes I made should describe:

  • Installing $datadir to local internal disk (so "chainstate" will never need to move).
  • Start/stop bitcoind (though maybe this isn't necessary).
  • Move the "index" files up one folder level.
  • Move the "blocks" folder off the internal disk.
  • Go to the new external "blocks" folder and soft-link back to the new "index" location.
  • Go to the original internal drive "data" location and link the now external "blocks" folder.

Again, the "chainstate" folder remains on the internal disk and never needs to move.

Again these "file-partition.md" notes are not the direct route I took, but (if correct) might save the next person considerable delay in initial blockchain synchronization.

To your point even though these low-capacity/high-IOfrequency LevelDb and high-capacity/low-IOfrequency blockchain files are physically separated on separate disks, they are kept in sync via the soft links created.

Maybe ultimately a more natural configuration of these files would be to have the "index" folder up a level to start with. That's what makes moving these folders confusing in my opinion.

Thank you again very much for the prompt feedback, very flattering, thanks.

@fanquake fanquake added the Docs label Jul 25, 2017

jimhashhq added some commits Jul 25, 2017

Update file-partition.md
Per @sipa, add mention of the fact that, like the "index" folder, the "chainstate" LevelDB folder must also remain on a fast (internal) drive if reasonable synchronization time is to be expected.
@jimhashhq

This comment has been minimized.

Copy link
Author

jimhashhq commented Jul 25, 2017

"file-partition.md"has been updated to highlight the need to keep "chainstate" folder on a fast (internal) drive in addition to "index" files as per @sipa. Thanks @fanquake for adding the label. This issue seems also related to "installation" and/or "configuration"; I did not see either of these as bitcoin project label categories, was thinking they might prove useful as well. Thank you both for your guidance.


2) Stop bitcoind, so that we can rearrange some datadir folders:

kill -QUIT `cat /Volumes/WD-Passport-Mac/bitcoin/data/bitcoind.pid`

This comment has been minimized.

Copy link
@achow101

achow101 Jul 31, 2017

Member

Instead of killing the process, you should use bitcoin-cli stop.


ln -s /Volumes/WD-Passport-Mac/bitcoin/blocks /Users/coinadm/local/bitcoin/data/blocks

注意 - Nota - Note - ध्यान दें - ﻢﻠﺣﻮﻇﺓ - метка

This comment has been minimized.

Copy link
@achow101

achow101 Jul 31, 2017

Member

What's with the muptiple languages here?

This comment has been minimized.

Copy link
@jimhashhq

jimhashhq Aug 2, 2017

Author

Again, good question, thank you. To me, the "key finding" here (if any, really) is simply that the "index" folder by default is nested within the "blocks" folder, unlike the "chainstate" folder which is a sibling folder of "blocks". This slightly complicates moving the "blocks" folder off of the internal disk to an external disk; my apologies for repeating the obvious. Database administrators out there (me included) might argue that this nested folder configuration is less desirable as it complicates physical separation of high-capacity/low-frequency block files from the lower-capacity/higher-frequency index (and chainstate) LevelDB files.
When I open the hood to my car there are warnings labels on the radiator cap, etc., etc.., and these labels are in multiple languages to point out appropriate cautions to naive vehicle operators who may have never looked under the hood or checked a radiator. I'm taking inspiration from this and wanted to (hopefully) say "Note" in 1/2 dozen or so most common languages by usage. Also, I want to be especially friendly in this day and age; I feel like we could all use it.
In closing, I hope it doesn't sound like I am trying to "make a mountain out of a molehill" here, it's not that at all; I just wanted to share my personal experiences in hopes of further facilitating ease of use of the system. Overall I find the system to be very easy to work with and well thought out.

Update file-partition.md
Correct the command for step #2 in this outline; per @achow101, thanks very much.
@sipa

This comment has been minimized.

Copy link
Member

sipa commented Aug 2, 2017

Your document is still suggesting to split the blocks/ directory from the blocks/index/ directory. Please don't do that; it's dangerous (they need to be in sync), and unnecessary (the blocks/index/ directory hardly sees any I/O). You should just suggest to move the chainstate/ to a faster drive compared to blocks/.

| ${datadir}/blocks/index | ${datadir} | low | high |
| ${datadir}/blocks | ${EXTERAL} | high | low |
| ${datadir}/chainstate | n/a | low | high |

This comment has been minimized.

Copy link
@jimhashhq

jimhashhq Aug 2, 2017

Author

At issue still is the value in the last column of the 1st row above (the index file folder "i/o Frequency"; is it high or low). My experience suggests that the index folder files are indeed high frequency, which is really the impetus for this doc; if they were not high frequency I would not have looked into this re-configuration detail. I only identified this possible re-configuration pitfall (which I had mistakenly made) via bitcoind file usage reported by the following command:
lsof -p cat ${datadir}/bitcoind.pid | grep ldb$
This motivated me to move the subordinate/child "index" folder back onto the internal drive and set up the soft links described here. I did not yet follow though and verify i/o frequency demands of these index LeveDB files (yet).
I admit the experience related here is qualitative and currently lacks supporting i/o reporting, but is [i]hopefully[/i] nonetheless correct.
I first moved the entire blocks folder (including the index subfolder) from $datadir to the external disk for internal capacity reasons (basically to save space because I'm cheap), and it synched very very slowly. Then I looked at open files using "lsof" as described above, and saw LevelDB index folder files open on the external USB 3.0 drive. Moving them as described here sped things up to the point that performance seemed to match that with the default configuration of everything (including the blocks folder) on the internal drive. Basically I was just doing a simplistic du -k . & ls -lrt blk*.dat every so often and watching how fast the blk*.dat files were growing in both cases. So qualitative, and not quantitative (I apologize). The responsible thing for me to do at this point is to gather quantitative evidence to support this position.
I'll maybe look for a blockchain indexing or synchronization test that I can run twice; once with the index files on the external drive, and again with the index files off-of the external drive, while trying to capture i/o frequency statistics as well as wall clock time? Note that what I described above with the du -k . and long listings while watching the wall clock is really what I did above.

A possible misperception here is that this doc was meant to be some sort of performance advice. It is not, rather it's really just meant to be a note to help other developers/analysts who (like me) work on very low end commodity hardware yet perform initial synchronization quickly, and without the large-capacity internal disk space requirements. That said, performance minded would likely benefit from moving any high i/o frequency files to the fastest storage available, similar to how traditional file-based databases are tuned.

Neither are these notes meant as instructions to backup the blockchain for portability between bitcoin development instances either; as @sipa points out, the "blocks" folder 'needs to be kept in sync with the "index" folder', rendering the blocks folder useless by itself for backup/portability purposes.

Missing from these notes is the (reasonable?) expectation that bitcoind not be started until the external disk is mounted and likewise that the external disk not be dismounted/ejected until bitcoind is shut down. To this point, I have not yet tried to see what happens, if after running in the configuration suggested here, the operator/developer accidentally tries to start bitcoin w/o the the external storage plugged in; I would hope that the index folder files are not corrupted if the blocks folder is not accessible.

Also, I'm still trying to understand the usage patterns of GitHub, like maybe this would have been better if this were reported as an "issue", I was tempted to do that, but don't personally see any issues. This is really more of a "pitfalls to avoid" type of document I was hoping might further adoption.

I do feel like I am onto something (albeit very minor) here. I do very much appreciate all of the feedback and consideration -- thank you very much.

| Folder Name | Link Name |
| ------------------------ | ------------------- |
| ${EXTERNAL}/blocks/index | ${datadir}/../index |
| ${datadir}/blocks | ${EXTERAL}/blocks |

This comment has been minimized.

Copy link
@flack

flack Aug 5, 2017

Contributor

typo: EXTERAL

@laanwj

This comment has been minimized.

Copy link
Member

laanwj commented Oct 4, 2017

(the blocks/index/ directory hardly sees any I/O).

Except with -txindex I guess :(

@laanwj

This comment has been minimized.

Copy link
Member

laanwj commented Nov 30, 2017

@jimhashhq I think this is pretty good, certainly as a start. Can you please address the comments and squash?

@laanwj

This comment has been minimized.

Copy link
Member

laanwj commented Nov 30, 2017

Please don't do that; it's dangerous (they need to be in sync)

Also: sometimes there's the problem with the leveldb not supporting the filesystem that the blocks/ directory points to. I think this happens with some network filesystems. If blocks/index is on the same partition that will never work :/ (see e.g. #10787)

@laanwj laanwj changed the title New file-partiion.md doc describing how to partition files to ensure fast initial blockchain synchronization.. New file-partition.md doc describing how to partition files to ensure fast initial blockchain synchronization.. Nov 30, 2017

@jimhashhq

This comment has been minimized.

Copy link
Author

jimhashhq commented Dec 1, 2017

My apologies for not getting back sooner, I was ill but am feeling better.
I think my intent here was just to share/communicate my experiences with "symlink(ing) out the block-files (the large part)" as mentioned in issue #10787 referenced above, thanks.
I was hoping this experience might prove useful to others on low end commodity hardware who wish to store the blocks on an external USB. Thanks!

@jimhashhq jimhashhq closed this Dec 1, 2017


ln -s /Users/coinadm/local/bitcoin/index /Volumes/WD-Passport-Mac/bitcoin/blocks/index

6) Replace the original index folder location with a soft link:

This comment has been minimized.

Copy link
@arowser

arowser Dec 4, 2017

Contributor

should be "block folder"?

This comment has been minimized.

Copy link
@jimhashhq

jimhashhq Dec 4, 2017

Author

Corrected, thanks!

Update file-partition.md
Correct typo per @arowser, thank you.
@laanwj

This comment has been minimized.

Copy link
Member

laanwj commented Dec 13, 2017

I was hoping this experience might prove useful to others on low end commodity hardware who wish to store the blocks on an external USB. Thanks!

Why close?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.