New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase LevelDB max_open_files #12495
Conversation
src/dbwrapper.cpp
Outdated
@@ -71,14 +71,41 @@ class CBitcoinLevelDBLogger : public leveldb::Logger { | |||
} | |||
}; | |||
|
|||
constexpr int PermissibleMaxOpenFiles() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add static
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The static
specifier isn't necessary because constexpr
functions are evaluated at compile time. I double checked that nm ./src/bitcoind | c++filt | grep Permissible
produces no output.
src/dbwrapper.cpp
Outdated
@@ -71,14 +71,41 @@ class CBitcoinLevelDBLogger : public leveldb::Logger { | |||
} | |||
}; | |||
|
|||
constexpr int PermissibleMaxOpenFiles() { | |||
#ifdef _POSIX_C_SOURCE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this does what you expect it to. As I understand it, confirmed by the gcc page about it, _POSIX_C_SOURCE
is supposed to be defined explicitly at the top of the compilation unit to request certain POSIX C library features. It is not defined by the compiler. So it can't be used to detect whether the platform compiled to is POSIX.
In other place, we've just used #if[n]def WIN32
as WIN32 is the only non-POSIX platform supported in this project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right, I will fix this.
I measured the impact of this change, you can see a graph here: https://monad.io/max_open_files.png Method for creating this graph:
|
Nice! |
The reason for this speedup isn't super obvious, and I think merits some explanation. I've spent a lot of time the last three days reading the LevelDB source code and think I understand now. It actually has nothing to do with reducing the number of open/close/mmap/munmap syscalls, as I had originally thought. The LevelDB terminology for a Increasing the |
@eklitzke Cheers for referencing #12123 . Replying here to get subscribed. I am going to do more tests on a Windows machine at the weekend with your changes here and report back my results. I am pretty familiar with Windows internals so I'll do further investigations to see if we can have similar changes on Windows. |
I had a closer look at Windows and the way we use the win32 apis in LevelDB. Observations:
I am confident that this increase to 1000 files will not affect Windows. I have already tested this but will do more tests tomorrow. The Win32RandomAccessFile is as basic as it gets. Further optimizations can be made in this by using Memory Mapped files (Similar to the POSIX versions). The random access nature of accesses should be a win for Memory Mapped files over regular file read operations due to the kernel level caching. As I've mentioned in #12123, we achieve optimization at CPU level also by avoiding the CRC checksum check that's performed on each file open. @eklitzke I hope you don't mind me replying on here. I'll continue to do more tests on the Windows sides of things. Any improvement to the IBD is a big win! Cheers, |
I have a branch that measures how much extra memory this uses. On my node, the average block index size is 19797 bytes for the chainstate database. The bloom filters (which are also loaded in memory) are 6 bytes or 7 bytes depending on the file. Thus a reasonable approximation is 20 KB of data per chainstate |
I ran bitcoind -reindex-chainstate on my Windows 10 PC with the full node synced. First I ran using the default: The results show a substantial difference in sync time. Time to reach 100% with That's an over 50% reduction in the time to reindex. I didn't expect such an improvement. |
That's a pretty big speedup! Does this only work on 64 bit systems? Would it be possible to do it for all systems? |
At least for POSIX systems, the issue is that LevelDB won't use I've been working on other LevelDB improvements that will result in even more significant improvements than what I've posted here, but they're more invasive changes. So I'm interested in getting this change (or a variant of it) merged before I try to go deeper into the UTXO system. @donaloconnor Can you comment on whether |
Windows by default has a limit of 64 descriptors in
(POSIX is supposed to allow changing |
The LevelDB's Win32 code does not use memory mapped files (Unlike the POSIX version) so this change should improve both 32/64bit versions of Windows.
What @luke-jr said above ^. Again since the LevelDB code for Win32 is using the Win32 ReadFile/CreateFile APIs, it should not affect any POSIX functions/fd descriptors. I'm running with a limit of 64 open files again to see if I get similar results (12hour sync time). I'd be really happy if someone else can also test this on Windows. I am finding it hard to believe the 2x speed up but it does seem to be the case. |
Interesting stats from observing Disk I/O reported by Task Manager:
This backs up the theory that since the files remain open, the file cache on Windows remains warm. *Not scientific, just observations. Can do some real disk I/O profiling later. |
I don't think it's related to the disk cache, it's due to the increased availability of the deserialized buffer indexes (and bloom filters) in memory. At least on Unix, files being open is unrelated to the availability of their contents in the page cache. In my measurements it takes four disk seeks just to open each .ldb file (which is related to parsing out the file header, bloom filter, and the block index), and the information is stored on disk in a compressed form that has to be deserialized. But once the file is open and in the table cache all that data is right there, ready to be accessed. I started writing up a blog post about this with details about how the table format works, how the LevelDB caches interact, and what's involved during key lookup. I'll try to finish that up in case people want to learn more about the details. I'll update this PR to also increase the limit on Windows. I am interested in hearing if people would like to see any other changes. For instance, in IRC Greg asked if there's a way to tell from the LevelDB API if there's a way to find out for a given platform if the implementation is going to use up real file descriptors or not. There isn't, but I could patch LevelDB to expose that if that's something reviewers want to see. |
Yes, I'm also now inclined to believe that it's probably unrelated to disk cache. Actually I've seen higher utilization of CPU in the 1000 limit case (mine was about 60-80% CPU usage, where as with the 64 limit it was just 30% mostly. Anyway, I just finished the tests again a few minutes ago:
Another 2x improvement.
Looking forward to it 👍 |
cfee017
to
6140df7
Compare
Can you let me know if the comment looks correct regarding Windows behavior? |
6140df7
to
08b132c
Compare
src/dbwrapper.cpp
Outdated
// See PR #12495 for further discussion. | ||
#ifndef WIN32 | ||
if (sizeof(void*) < 8) | ||
options->max_open_files = 64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer, to be safe:
if (sizeof(void*) < 8 || leveldb::kMajorVersion != 1 || leveldb::kMinorVersion < 3 || leveldb::kMinorVersion > 20) {
options->max_open_files = 64;
}
static leveldb::Options GetOptions(size_t nCacheSize) | ||
{ | ||
leveldb::Options options; | ||
options.block_cache = leveldb::NewLRUCache(nCacheSize / 2); | ||
options.write_buffer_size = nCacheSize / 4; // up to two write buffers may be held in memory simultaneously | ||
options.filter_policy = leveldb::NewBloomFilterPolicy(10); | ||
options.compression = leveldb::kNoCompression; | ||
options.max_open_files = 64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_open_files is not set explicitly before calling SetMaxOpenFiles. Probably should be set in SetMaxOpenFiles explicitly to 1000 before the #ifndef
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The constructor for leveldb::Options
sets it to 1000.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't scroll far another down :) - you're right.
Comment LGTM! |
Might be good to log somewhere which limit is being used... #ifndef WIN32
if (sizeof(void*) < 8 || leveldb::kMajorVersion != 1 || leveldb::kMinorVersion < 3 || leveldb::kMinorVersion > 20) {
options->max_open_files = 64;
LogPrint(BCLog::LEVELDB, "LevelDB max_open_files=%d\n", options->max_open_files);
} else {
LogPrint(BCLog::LEVELDB, "LevelDB max_open_files=%d (default)\n", options->max_open_files);
}
#endif |
08b132c
to
a6f5abf
Compare
I've added the log statement. I also added a |
It's a booby trap that everyone using system LevelDB libraries will need to patch out. But maybe a good idea anyway, to re-enforce the danger of using other versions. But it should probably at least tolerate older versions back to 1.3 (which added mmap support). |
Booby trap is intentional. I couldn't figure out how to make
Given that we only do the update once or twice a year I think it's reasonable to keep in there and leave it as a manual step for the person doing the upgrade. On the other hand, this might be overly paranoid because:
|
src/dbwrapper.cpp
Outdated
LogPrint(BCLog::LEVELDB, "LevelDB max_open_files=%d\n", options->max_open_files); | ||
} | ||
#endif | ||
LogPrint(BCLog::LEVELDB, "LevelDB max_open_files=%d (default)\n", options->max_open_files); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will log twice!
@donaloconnor As you acked this, can you change your review status? It stills shows up in GitHub as "requested changes". @laanwj Sorry to ping you on this again, but you requested changes on this earlier and I think this is ready to go (unless you have further objections). I'm trying to not stack PRs in the dbwrapper code, and this is preventing me from submitting further changes for review. |
This change significantly increases IBD performance by increasing the amount of the UTXO index that can remain in memory. To ensure this doesn't cause problems in the future, a static_assert on the LevelDB version has been added, which must be updated by anyone upgrading LevelDB.
21e2144
to
ccedbaf
Compare
Rebased with master, as other dbwrapper changes have been merged since my last update. |
utACK ccedbaf - with the comment that I think relying on undocumented behavior of leveldb is risky. This is only remotely acceptable because we have our own leveldb tree (https://github.com/bitcoin-core/leveldb), and don't blindly merge changes from upstream. I also think you have properly documented this. |
ccedbaf Increase LevelDB max_open_files unless on 32-bit Unix. (Evan Klitzke) Pull request description: Currently we set `max_open_files = 64` on all architectures due to concerns about file descriptor exhaustion. This is extremely expensive due to how LevelDB is designed. When a LevelDB file handle is opened, a bloom filter and block index are decoded, and some CRCs are checked. Bloom filters and block indexes in open table handles can be checked purely in memory. This means that when doing a key lookup, if a given table file may contain a given key, all of the lookup operations can happen completely in RAM until the block itself is fetched. In the common case fetching the block is one disk seek, because the block index stores its physical offset. This is the ideal case, and what we want to happen as often as possible. If a table file handle is not open in the table cache, then in addition to the regular system calls to open the file, the block index and bloom filter need to be decoded before they can be checked. This is expensive and is something we want to avoid. The current setting of 64 file handles means that on a synced node, only about 4% of key lookups can be satisifed by table file handles that are actually open and in memory. The original concerns about file descriptor exhaustion are unwarranted on most systems because: * On 64-bit POSIX hosts LevelDB will open up to 1000 file descriptors using `mmap()`, and it does not retain an open file descriptor for such files. * On Windows non-socket files do not interfere with the main network `select()` loop, so the same fd exhaustion issues do not apply there. This change keeps the default `max_open_files` value (which is 1000) on all systems except 32-bit POSIX hosts (which do not use `mmap()`). Open file handles use about 20 KB of memory (for the block index), so the extra file handles do not cause much memory overhead. At most 1000 will be open, and a fully synced node right now has about 1500 such files. Profile of `loadblk` thread before changes: https://monad.io/maxopenfiles-master.svg Profile of `loadblk` thread after changes: https://monad.io/maxopenfiles-increase.svg Tree-SHA512: de54f77d57e9f8999eaf8d12592aab5b02f5877be8fa727a1f42cf02da2693ce25846445eb19eb138ce4e5045d1c65e14054df72faf3ff32c7655c9cfadd27a9
ccedbaf Increase LevelDB max_open_files unless on 32-bit Unix. (Evan Klitzke) Pull request description: Currently we set `max_open_files = 64` on all architectures due to concerns about file descriptor exhaustion. This is extremely expensive due to how LevelDB is designed. When a LevelDB file handle is opened, a bloom filter and block index are decoded, and some CRCs are checked. Bloom filters and block indexes in open table handles can be checked purely in memory. This means that when doing a key lookup, if a given table file may contain a given key, all of the lookup operations can happen completely in RAM until the block itself is fetched. In the common case fetching the block is one disk seek, because the block index stores its physical offset. This is the ideal case, and what we want to happen as often as possible. If a table file handle is not open in the table cache, then in addition to the regular system calls to open the file, the block index and bloom filter need to be decoded before they can be checked. This is expensive and is something we want to avoid. The current setting of 64 file handles means that on a synced node, only about 4% of key lookups can be satisifed by table file handles that are actually open and in memory. The original concerns about file descriptor exhaustion are unwarranted on most systems because: * On 64-bit POSIX hosts LevelDB will open up to 1000 file descriptors using `mmap()`, and it does not retain an open file descriptor for such files. * On Windows non-socket files do not interfere with the main network `select()` loop, so the same fd exhaustion issues do not apply there. This change keeps the default `max_open_files` value (which is 1000) on all systems except 32-bit POSIX hosts (which do not use `mmap()`). Open file handles use about 20 KB of memory (for the block index), so the extra file handles do not cause much memory overhead. At most 1000 will be open, and a fully synced node right now has about 1500 such files. Profile of `loadblk` thread before changes: https://monad.io/maxopenfiles-master.svg Profile of `loadblk` thread after changes: https://monad.io/maxopenfiles-increase.svg Tree-SHA512: de54f77d57e9f8999eaf8d12592aab5b02f5877be8fa727a1f42cf02da2693ce25846445eb19eb138ce4e5045d1c65e14054df72faf3ff32c7655c9cfadd27a9
ccedbaf Increase LevelDB max_open_files unless on 32-bit Unix. (Evan Klitzke) Pull request description: Currently we set `max_open_files = 64` on all architectures due to concerns about file descriptor exhaustion. This is extremely expensive due to how LevelDB is designed. When a LevelDB file handle is opened, a bloom filter and block index are decoded, and some CRCs are checked. Bloom filters and block indexes in open table handles can be checked purely in memory. This means that when doing a key lookup, if a given table file may contain a given key, all of the lookup operations can happen completely in RAM until the block itself is fetched. In the common case fetching the block is one disk seek, because the block index stores its physical offset. This is the ideal case, and what we want to happen as often as possible. If a table file handle is not open in the table cache, then in addition to the regular system calls to open the file, the block index and bloom filter need to be decoded before they can be checked. This is expensive and is something we want to avoid. The current setting of 64 file handles means that on a synced node, only about 4% of key lookups can be satisifed by table file handles that are actually open and in memory. The original concerns about file descriptor exhaustion are unwarranted on most systems because: * On 64-bit POSIX hosts LevelDB will open up to 1000 file descriptors using `mmap()`, and it does not retain an open file descriptor for such files. * On Windows non-socket files do not interfere with the main network `select()` loop, so the same fd exhaustion issues do not apply there. This change keeps the default `max_open_files` value (which is 1000) on all systems except 32-bit POSIX hosts (which do not use `mmap()`). Open file handles use about 20 KB of memory (for the block index), so the extra file handles do not cause much memory overhead. At most 1000 will be open, and a fully synced node right now has about 1500 such files. Profile of `loadblk` thread before changes: https://monad.io/maxopenfiles-master.svg Profile of `loadblk` thread after changes: https://monad.io/maxopenfiles-increase.svg Tree-SHA512: de54f77d57e9f8999eaf8d12592aab5b02f5877be8fa727a1f42cf02da2693ce25846445eb19eb138ce4e5045d1c65e14054df72faf3ff32c7655c9cfadd27a9
ccedbaf Increase LevelDB max_open_files unless on 32-bit Unix. (Evan Klitzke) Pull request description: Currently we set `max_open_files = 64` on all architectures due to concerns about file descriptor exhaustion. This is extremely expensive due to how LevelDB is designed. When a LevelDB file handle is opened, a bloom filter and block index are decoded, and some CRCs are checked. Bloom filters and block indexes in open table handles can be checked purely in memory. This means that when doing a key lookup, if a given table file may contain a given key, all of the lookup operations can happen completely in RAM until the block itself is fetched. In the common case fetching the block is one disk seek, because the block index stores its physical offset. This is the ideal case, and what we want to happen as often as possible. If a table file handle is not open in the table cache, then in addition to the regular system calls to open the file, the block index and bloom filter need to be decoded before they can be checked. This is expensive and is something we want to avoid. The current setting of 64 file handles means that on a synced node, only about 4% of key lookups can be satisifed by table file handles that are actually open and in memory. The original concerns about file descriptor exhaustion are unwarranted on most systems because: * On 64-bit POSIX hosts LevelDB will open up to 1000 file descriptors using `mmap()`, and it does not retain an open file descriptor for such files. * On Windows non-socket files do not interfere with the main network `select()` loop, so the same fd exhaustion issues do not apply there. This change keeps the default `max_open_files` value (which is 1000) on all systems except 32-bit POSIX hosts (which do not use `mmap()`). Open file handles use about 20 KB of memory (for the block index), so the extra file handles do not cause much memory overhead. At most 1000 will be open, and a fully synced node right now has about 1500 such files. Profile of `loadblk` thread before changes: https://monad.io/maxopenfiles-master.svg Profile of `loadblk` thread after changes: https://monad.io/maxopenfiles-increase.svg Tree-SHA512: de54f77d57e9f8999eaf8d12592aab5b02f5877be8fa727a1f42cf02da2693ce25846445eb19eb138ce4e5045d1c65e14054df72faf3ff32c7655c9cfadd27a9
Changes are POSIX only, more on the subject here: bitcoin/bitcoin#12495
ccedbaf Increase LevelDB max_open_files unless on 32-bit Unix. (Evan Klitzke) Pull request description: Currently we set `max_open_files = 64` on all architectures due to concerns about file descriptor exhaustion. This is extremely expensive due to how LevelDB is designed. When a LevelDB file handle is opened, a bloom filter and block index are decoded, and some CRCs are checked. Bloom filters and block indexes in open table handles can be checked purely in memory. This means that when doing a key lookup, if a given table file may contain a given key, all of the lookup operations can happen completely in RAM until the block itself is fetched. In the common case fetching the block is one disk seek, because the block index stores its physical offset. This is the ideal case, and what we want to happen as often as possible. If a table file handle is not open in the table cache, then in addition to the regular system calls to open the file, the block index and bloom filter need to be decoded before they can be checked. This is expensive and is something we want to avoid. The current setting of 64 file handles means that on a synced node, only about 4% of key lookups can be satisifed by table file handles that are actually open and in memory. The original concerns about file descriptor exhaustion are unwarranted on most systems because: * On 64-bit POSIX hosts LevelDB will open up to 1000 file descriptors using `mmap()`, and it does not retain an open file descriptor for such files. * On Windows non-socket files do not interfere with the main network `select()` loop, so the same fd exhaustion issues do not apply there. This change keeps the default `max_open_files` value (which is 1000) on all systems except 32-bit POSIX hosts (which do not use `mmap()`). Open file handles use about 20 KB of memory (for the block index), so the extra file handles do not cause much memory overhead. At most 1000 will be open, and a fully synced node right now has about 1500 such files. Profile of `loadblk` thread before changes: https://monad.io/maxopenfiles-master.svg Profile of `loadblk` thread after changes: https://monad.io/maxopenfiles-increase.svg Tree-SHA512: de54f77d57e9f8999eaf8d12592aab5b02f5877be8fa727a1f42cf02da2693ce25846445eb19eb138ce4e5045d1c65e14054df72faf3ff32c7655c9cfadd27a9
ccedbaf Increase LevelDB max_open_files unless on 32-bit Unix. (Evan Klitzke) Pull request description: Currently we set `max_open_files = 64` on all architectures due to concerns about file descriptor exhaustion. This is extremely expensive due to how LevelDB is designed. When a LevelDB file handle is opened, a bloom filter and block index are decoded, and some CRCs are checked. Bloom filters and block indexes in open table handles can be checked purely in memory. This means that when doing a key lookup, if a given table file may contain a given key, all of the lookup operations can happen completely in RAM until the block itself is fetched. In the common case fetching the block is one disk seek, because the block index stores its physical offset. This is the ideal case, and what we want to happen as often as possible. If a table file handle is not open in the table cache, then in addition to the regular system calls to open the file, the block index and bloom filter need to be decoded before they can be checked. This is expensive and is something we want to avoid. The current setting of 64 file handles means that on a synced node, only about 4% of key lookups can be satisifed by table file handles that are actually open and in memory. The original concerns about file descriptor exhaustion are unwarranted on most systems because: * On 64-bit POSIX hosts LevelDB will open up to 1000 file descriptors using `mmap()`, and it does not retain an open file descriptor for such files. * On Windows non-socket files do not interfere with the main network `select()` loop, so the same fd exhaustion issues do not apply there. This change keeps the default `max_open_files` value (which is 1000) on all systems except 32-bit POSIX hosts (which do not use `mmap()`). Open file handles use about 20 KB of memory (for the block index), so the extra file handles do not cause much memory overhead. At most 1000 will be open, and a fully synced node right now has about 1500 such files. Profile of `loadblk` thread before changes: https://monad.io/maxopenfiles-master.svg Profile of `loadblk` thread after changes: https://monad.io/maxopenfiles-increase.svg Tree-SHA512: de54f77d57e9f8999eaf8d12592aab5b02f5877be8fa727a1f42cf02da2693ce25846445eb19eb138ce4e5045d1c65e14054df72faf3ff32c7655c9cfadd27a9
ccedbaf Increase LevelDB max_open_files unless on 32-bit Unix. (Evan Klitzke) Pull request description: Currently we set `max_open_files = 64` on all architectures due to concerns about file descriptor exhaustion. This is extremely expensive due to how LevelDB is designed. When a LevelDB file handle is opened, a bloom filter and block index are decoded, and some CRCs are checked. Bloom filters and block indexes in open table handles can be checked purely in memory. This means that when doing a key lookup, if a given table file may contain a given key, all of the lookup operations can happen completely in RAM until the block itself is fetched. In the common case fetching the block is one disk seek, because the block index stores its physical offset. This is the ideal case, and what we want to happen as often as possible. If a table file handle is not open in the table cache, then in addition to the regular system calls to open the file, the block index and bloom filter need to be decoded before they can be checked. This is expensive and is something we want to avoid. The current setting of 64 file handles means that on a synced node, only about 4% of key lookups can be satisifed by table file handles that are actually open and in memory. The original concerns about file descriptor exhaustion are unwarranted on most systems because: * On 64-bit POSIX hosts LevelDB will open up to 1000 file descriptors using `mmap()`, and it does not retain an open file descriptor for such files. * On Windows non-socket files do not interfere with the main network `select()` loop, so the same fd exhaustion issues do not apply there. This change keeps the default `max_open_files` value (which is 1000) on all systems except 32-bit POSIX hosts (which do not use `mmap()`). Open file handles use about 20 KB of memory (for the block index), so the extra file handles do not cause much memory overhead. At most 1000 will be open, and a fully synced node right now has about 1500 such files. Profile of `loadblk` thread before changes: https://monad.io/maxopenfiles-master.svg Profile of `loadblk` thread after changes: https://monad.io/maxopenfiles-increase.svg Tree-SHA512: de54f77d57e9f8999eaf8d12592aab5b02f5877be8fa727a1f42cf02da2693ce25846445eb19eb138ce4e5045d1c65e14054df72faf3ff32c7655c9cfadd27a9
7ce17a6 Increase LevelDB max_open_files unless on 32-bit Unix. (Evan Klitzke) Pull request description: This change significantly increases IBD performance by increasing the amount of the UTXO index that can remain in memory. To ensure this doesn't cause problems in the future, a static_assert on the LevelDB version has been added, which must be updated by anyone upgrading LevelDB. In upstream, this resulted in a substantial difference in the sync time. A 50% reduction in the reindex time. So.. **2x speedup improvement**. Coming straight from bitcoin#12495. ACKs for top commit: Fuzzbawls: utACK 7ce17a6 random-zebra: ACK 7ce17a6 and merging... Tree-SHA512: 47ae5a621b8a9f59cf470ec3d290c274082229481a226e448781f2c747b0b52e8e4d2506b183a67cc046af96369806944ed9edcf10745146f04c75b1c4f5d7a8
ccedbaf Increase LevelDB max_open_files unless on 32-bit Unix. (Evan Klitzke) Pull request description: Currently we set `max_open_files = 64` on all architectures due to concerns about file descriptor exhaustion. This is extremely expensive due to how LevelDB is designed. When a LevelDB file handle is opened, a bloom filter and block index are decoded, and some CRCs are checked. Bloom filters and block indexes in open table handles can be checked purely in memory. This means that when doing a key lookup, if a given table file may contain a given key, all of the lookup operations can happen completely in RAM until the block itself is fetched. In the common case fetching the block is one disk seek, because the block index stores its physical offset. This is the ideal case, and what we want to happen as often as possible. If a table file handle is not open in the table cache, then in addition to the regular system calls to open the file, the block index and bloom filter need to be decoded before they can be checked. This is expensive and is something we want to avoid. The current setting of 64 file handles means that on a synced node, only about 4% of key lookups can be satisifed by table file handles that are actually open and in memory. The original concerns about file descriptor exhaustion are unwarranted on most systems because: * On 64-bit POSIX hosts LevelDB will open up to 1000 file descriptors using `mmap()`, and it does not retain an open file descriptor for such files. * On Windows non-socket files do not interfere with the main network `select()` loop, so the same fd exhaustion issues do not apply there. This change keeps the default `max_open_files` value (which is 1000) on all systems except 32-bit POSIX hosts (which do not use `mmap()`). Open file handles use about 20 KB of memory (for the block index), so the extra file handles do not cause much memory overhead. At most 1000 will be open, and a fully synced node right now has about 1500 such files. Profile of `loadblk` thread before changes: https://monad.io/maxopenfiles-master.svg Profile of `loadblk` thread after changes: https://monad.io/maxopenfiles-increase.svg Tree-SHA512: de54f77d57e9f8999eaf8d12592aab5b02f5877be8fa727a1f42cf02da2693ce25846445eb19eb138ce4e5045d1c65e14054df72faf3ff32c7655c9cfadd27a9
Currently we set
max_open_files = 64
on all architectures due to concerns about file descriptor exhaustion. This is extremely expensive due to how LevelDB is designed.When a LevelDB file handle is opened, a bloom filter and block index are decoded, and some CRCs are checked. Bloom filters and block indexes in open table handles can be checked purely in memory. This means that when doing a key lookup, if a given table file may contain a given key, all of the lookup operations can happen completely in RAM until the block itself is fetched. In the common case fetching the block is one disk seek, because the block index stores its physical offset. This is the ideal case, and what we want to happen as often as possible.
If a table file handle is not open in the table cache, then in addition to the regular system calls to open the file, the block index and bloom filter need to be decoded before they can be checked. This is expensive and is something we want to avoid.
The current setting of 64 file handles means that on a synced node, only about 4% of key lookups can be satisifed by table file handles that are actually open and in memory.
The original concerns about file descriptor exhaustion are unwarranted on most systems because:
mmap()
, and it does not retain an open file descriptor for such files.select()
loop, so the same fd exhaustion issues do not apply there.This change keeps the default
max_open_files
value (which is 1000) on all systems except 32-bit POSIX hosts (which do not usemmap()
). Open file handles use about 20 KB of memory (for the block index), so the extra file handles do not cause much memory overhead. At most 1000 will be open, and a fully synced node right now has about 1500 such files.Profile of
loadblk
thread before changes: https://monad.io/maxopenfiles-master.svgProfile of
loadblk
thread after changes: https://monad.io/maxopenfiles-increase.svg