Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase LevelDB max_open_files #12495

Merged
merged 1 commit into from Mar 29, 2018
Merged

Conversation

eklitzke
Copy link
Contributor

@eklitzke eklitzke commented Feb 20, 2018

Currently we set max_open_files = 64 on all architectures due to concerns about file descriptor exhaustion. This is extremely expensive due to how LevelDB is designed.

When a LevelDB file handle is opened, a bloom filter and block index are decoded, and some CRCs are checked. Bloom filters and block indexes in open table handles can be checked purely in memory. This means that when doing a key lookup, if a given table file may contain a given key, all of the lookup operations can happen completely in RAM until the block itself is fetched. In the common case fetching the block is one disk seek, because the block index stores its physical offset. This is the ideal case, and what we want to happen as often as possible.

If a table file handle is not open in the table cache, then in addition to the regular system calls to open the file, the block index and bloom filter need to be decoded before they can be checked. This is expensive and is something we want to avoid.

The current setting of 64 file handles means that on a synced node, only about 4% of key lookups can be satisifed by table file handles that are actually open and in memory.

The original concerns about file descriptor exhaustion are unwarranted on most systems because:

  • On 64-bit POSIX hosts LevelDB will open up to 1000 file descriptors using mmap(), and it does not retain an open file descriptor for such files.
  • On Windows non-socket files do not interfere with the main network select() loop, so the same fd exhaustion issues do not apply there.

This change keeps the default max_open_files value (which is 1000) on all systems except 32-bit POSIX hosts (which do not use mmap()). Open file handles use about 20 KB of memory (for the block index), so the extra file handles do not cause much memory overhead. At most 1000 will be open, and a fully synced node right now has about 1500 such files.

Profile of loadblk thread before changes: https://monad.io/maxopenfiles-master.svg
Profile of loadblk thread after changes: https://monad.io/maxopenfiles-increase.svg

@@ -71,14 +71,41 @@ class CBitcoinLevelDBLogger : public leveldb::Logger {
}
};

constexpr int PermissibleMaxOpenFiles() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add static?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The static specifier isn't necessary because constexpr functions are evaluated at compile time. I double checked that nm ./src/bitcoind | c++filt | grep Permissible produces no output.

@@ -71,14 +71,41 @@ class CBitcoinLevelDBLogger : public leveldb::Logger {
}
};

constexpr int PermissibleMaxOpenFiles() {
#ifdef _POSIX_C_SOURCE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this does what you expect it to. As I understand it, confirmed by the gcc page about it, _POSIX_C_SOURCE is supposed to be defined explicitly at the top of the compilation unit to request certain POSIX C library features. It is not defined by the compiler. So it can't be used to detect whether the platform compiled to is POSIX.

In other place, we've just used #if[n]def WIN32 as WIN32 is the only non-POSIX platform supported in this project.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right, I will fix this.

@eklitzke
Copy link
Contributor Author

I measured the impact of this change, you can see a graph here: https://monad.io/max_open_files.png

Method for creating this graph:

  • Ran bitcoind -reindex-chainstate on a full node that had already been synced on mainnet
  • Instance is a GCP n1-standard-1 host which has one vCPU pinned to host CPU, 4 GB of RAM, and very slow virtualized disks

@laanwj
Copy link
Member

laanwj commented Feb 22, 2018

I measured the impact of this change, you can see a graph here: https://monad.io/max_open_files.png

Nice!

@eklitzke
Copy link
Contributor Author

The reason for this speedup isn't super obvious, and I think merits some explanation. I've spent a lot of time the last three days reading the LevelDB source code and think I understand now. It actually has nothing to do with reducing the number of open/close/mmap/munmap syscalls, as I had originally thought.

The LevelDB terminology for a .ldb file is a "table", and the tables are composed of 4 KB blocks. Each table contains a block index, which is an index that can be used to determine which block a given key will be in. The block index for a given table file is loaded into memory when that table is opened.

Increasing the max_open_files here forces Bitcoin to keep much more of the block index data actually loaded into memory. I think it might be worth actually increasing the internal limits of LevelDB on the number of mmap files beyond 1000 on 64-bit Unix platforms, as that would result in the entire block index being loaded into memory. However I think we should start with this, and then consider increasing the limits once I've done more investigation into what the overhead is.

@donaloconnor
Copy link
Contributor

@eklitzke Cheers for referencing #12123 . Replying here to get subscribed.

I am going to do more tests on a Windows machine at the weekend with your changes here and report back my results.

I am pretty familiar with Windows internals so I'll do further investigations to see if we can have similar changes on Windows.

@donaloconnor
Copy link
Contributor

I had a closer look at Windows and the way we use the win32 apis in LevelDB.

Observations:

  1. Win32RandomAccessFile does not use memory mapped files.
  2. The file opening and reading is done on the Win32 API level: CreateFile, ReadFile. There is no limit to the number of files that can be opened at this level (Use Window's HANDLES rather than FDs). Limits are at the C runtime library level but this is irrelevant.

I am confident that this increase to 1000 files will not affect Windows. I have already tested this but will do more tests tomorrow.

The Win32RandomAccessFile is as basic as it gets. Further optimizations can be made in this by using Memory Mapped files (Similar to the POSIX versions). The random access nature of accesses should be a win for Memory Mapped files over regular file read operations due to the kernel level caching.
I will modify this and see if any significant performance improvements are achieved. I reckon it will.

As I've mentioned in #12123, we achieve optimization at CPU level also by avoiding the CRC checksum check that's performed on each file open.

@eklitzke I hope you don't mind me replying on here. I'll continue to do more tests on the Windows sides of things. Any improvement to the IBD is a big win!

Cheers,
Donal

@eklitzke
Copy link
Contributor Author

eklitzke commented Feb 26, 2018

I have a branch that measures how much extra memory this uses. On my node, the average block index size is 19797 bytes for the chainstate database. The bloom filters (which are also loaded in memory) are 6 bytes or 7 bytes depending on the file. Thus a reasonable approximation is 20 KB of data per chainstate .ldb file. This means on master the indexes use about 1 MB of memory, and with this branch they'll use about 20 MB of memory. If we loaded all of the indexes into memory (not possible right now due to the hard-coded 1000 file limit, but possible if we patched leveldb) that would be about 33 MB of memory.

@donaloconnor
Copy link
Contributor

I ran bitcoind -reindex-chainstate on my Windows 10 PC with the full node synced.

First I ran using the default: max_open_files=64 (Blue series)
Next I ran using max_open_files=1000 (Red series).

The results show a substantial difference in sync time.

image

Time to reach 100% with max_open_files=64 was: 12h 15m
Time to reach 100% with max_open_files=1000 was: 5h 40m

That's an over 50% reduction in the time to reindex. I didn't expect such an improvement.

@achow101
Copy link
Member

achow101 commented Mar 1, 2018

That's a pretty big speedup! Does this only work on 64 bit systems? Would it be possible to do it for all systems?

@eklitzke
Copy link
Contributor Author

eklitzke commented Mar 2, 2018

At least for POSIX systems, the issue is that LevelDB won't use mmap() on 32-bit systems because the virtual memory space isn't large enough. The chainstate database is larger than 4 GB. On such hosts, each open file handle really does use up a file descriptor, which can obviously lead to problems in a select(2) loop if the fd count exceeds 1024. I haven't posted this data yet, but I also have data showing a large speedup on POSIX hosts that have intermediate disk characteristics to what I posted earlier. That said: I really cannot imagine many users are running Bitcoin on 32-bit hosts, and it's easy enough to have conditional logic in that case anyway.

I've been working on other LevelDB improvements that will result in even more significant improvements than what I've posted here, but they're more invasive changes. So I'm interested in getting this change (or a variant of it) merged before I try to go deeper into the UTXO system.

@donaloconnor Can you comment on whether select() loops have the same 1024 fd limit on Windows?

@luke-jr
Copy link
Member

luke-jr commented Mar 2, 2018

Windows by default has a limit of 64 descriptors in select(), but:

  1. This is a limit on the number of descriptors added to the fd_set, not a limit on total descriptors in the process.
  2. We override FD_SETSIZE to 1024 in compat.h.

(POSIX is supposed to allow changing FD_SETSIZE too, but this is broken in some versions of glibc)

@donaloconnor
Copy link
Contributor

donaloconnor commented Mar 2, 2018

Does this only work on 64 bit systems

The LevelDB's Win32 code does not use memory mapped files (Unlike the POSIX version) so this change should improve both 32/64bit versions of Windows.

@donaloconnor Can you comment on whether select() loops have the same 1024 fd limit on Windows?

What @luke-jr said above ^. Again since the LevelDB code for Win32 is using the Win32 ReadFile/CreateFile APIs, it should not affect any POSIX functions/fd descriptors.

I'm running with a limit of 64 open files again to see if I get similar results (12hour sync time). I'd be really happy if someone else can also test this on Windows. I am finding it hard to believe the 2x speed up but it does seem to be the case.

@donaloconnor
Copy link
Contributor

Interesting stats from observing Disk I/O reported by Task Manager:

max_open_files=64 - seems to hover around 2 - 6 MB/s with bursts of 40MB/s +
max_open_files=1000 - Seemed to be consistently above 60 MB/s when ever I looked at it

This backs up the theory that since the files remain open, the file cache on Windows remains warm.

*Not scientific, just observations. Can do some real disk I/O profiling later.

@eklitzke
Copy link
Contributor Author

eklitzke commented Mar 3, 2018

I don't think it's related to the disk cache, it's due to the increased availability of the deserialized buffer indexes (and bloom filters) in memory. At least on Unix, files being open is unrelated to the availability of their contents in the page cache. In my measurements it takes four disk seeks just to open each .ldb file (which is related to parsing out the file header, bloom filter, and the block index), and the information is stored on disk in a compressed form that has to be deserialized. But once the file is open and in the table cache all that data is right there, ready to be accessed. I started writing up a blog post about this with details about how the table format works, how the LevelDB caches interact, and what's involved during key lookup. I'll try to finish that up in case people want to learn more about the details.

I'll update this PR to also increase the limit on Windows. I am interested in hearing if people would like to see any other changes. For instance, in IRC Greg asked if there's a way to tell from the LevelDB API if there's a way to find out for a given platform if the implementation is going to use up real file descriptors or not. There isn't, but I could patch LevelDB to expose that if that's something reviewers want to see.

@donaloconnor
Copy link
Contributor

donaloconnor commented Mar 3, 2018

Yes, I'm also now inclined to believe that it's probably unrelated to disk cache. Actually I've seen higher utilization of CPU in the 1000 limit case (mine was about 60-80% CPU usage, where as with the 64 limit it was just 30% mostly.

Anyway, I just finished the tests again a few minutes ago:

max_open_files=64 -reindex-chainstate time: 13h 30m
max_open_files=1000 -reindex-chainstate time: 6h 30m

Another 2x improvement.

I started writing up a blog post about this with details about how the table format works, how the LevelDB caches interact, and what's involved during key lookup. I'll try to finish that up in case people want to learn more about the details.

Looking forward to it 👍

@eklitzke eklitzke force-pushed the ldb_max_open_files branch 2 times, most recently from cfee017 to 6140df7 Compare March 3, 2018 18:24
@eklitzke
Copy link
Contributor Author

eklitzke commented Mar 3, 2018

Can you let me know if the comment looks correct regarding Windows behavior?

// See PR #12495 for further discussion.
#ifndef WIN32
if (sizeof(void*) < 8)
options->max_open_files = 64;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer, to be safe:

if (sizeof(void*) < 8 || leveldb::kMajorVersion != 1 || leveldb::kMinorVersion < 3 || leveldb::kMinorVersion > 20) {
    options->max_open_files = 64;
}

static leveldb::Options GetOptions(size_t nCacheSize)
{
leveldb::Options options;
options.block_cache = leveldb::NewLRUCache(nCacheSize / 2);
options.write_buffer_size = nCacheSize / 4; // up to two write buffers may be held in memory simultaneously
options.filter_policy = leveldb::NewBloomFilterPolicy(10);
options.compression = leveldb::kNoCompression;
options.max_open_files = 64;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_open_files is not set explicitly before calling SetMaxOpenFiles. Probably should be set in SetMaxOpenFiles explicitly to 1000 before the #ifndef

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constructor for leveldb::Options sets it to 1000.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't scroll far another down :) - you're right.

@donaloconnor
Copy link
Contributor

Can you let me know if the comment looks correct regarding Windows behavior?

Comment LGTM!

@luke-jr
Copy link
Member

luke-jr commented Mar 3, 2018

Might be good to log somewhere which limit is being used...

#ifndef WIN32
    if (sizeof(void*) < 8 || leveldb::kMajorVersion != 1 || leveldb::kMinorVersion < 3 || leveldb::kMinorVersion > 20) {
        options->max_open_files = 64;
        LogPrint(BCLog::LEVELDB, "LevelDB max_open_files=%d\n", options->max_open_files);
    } else {
        LogPrint(BCLog::LEVELDB, "LevelDB max_open_files=%d (default)\n", options->max_open_files);
    }
#endif

@eklitzke
Copy link
Contributor Author

eklitzke commented Mar 3, 2018

I've added the log statement. I also added a static_assert() on the exact LevelDB version. That way in the future if someone upgrades LevelDB, they will be forced to check that the POSIX RandomAccessFile implementation hasn't changed in a way that will cause fd exhaustion issues. What do you think of this approach?

@luke-jr
Copy link
Member

luke-jr commented Mar 3, 2018

It's a booby trap that everyone using system LevelDB libraries will need to patch out. But maybe a good idea anyway, to re-enforce the danger of using other versions. But it should probably at least tolerate older versions back to 1.3 (which added mmap support).

@eklitzke
Copy link
Contributor Author

eklitzke commented Mar 3, 2018

Booby trap is intentional. I couldn't figure out how to make git log -GkMinorVersion src/leveldb work correctly with merge commits/subtrees, so I did this manually. From what I can tell the update history for LevelDB in the Bitcoin source tree is:

  • 2012-06-25 initial import of 1.5
  • 2012-12-12 upgrade to 1.7
  • 2013-08-24 upgrade to 1.13
  • 2013-12-12 upgrade to 1.15
  • 2014-05-09 upgrade to 1.17
  • 2014-10-16 upgrade to 1.18
  • 2016-12-01 upgrade to 1.19
  • 2017-06-08 upgrade to 1.20

Given that we only do the update once or twice a year I think it's reasonable to keep in there and leave it as a manual step for the person doing the upgrade. On the other hand, this might be overly paranoid because:

  • It's unlikely that upstream would stop using mmap, given that it's the obvious way to do things for speed reasons
  • Since the new logic is to use the default value in most cases, it seems unlikely that upstream would make a change that cause the default LevelDB options to use drastically different resources like fds

LogPrint(BCLog::LEVELDB, "LevelDB max_open_files=%d\n", options->max_open_files);
}
#endif
LogPrint(BCLog::LEVELDB, "LevelDB max_open_files=%d (default)\n", options->max_open_files);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will log twice!

@eklitzke
Copy link
Contributor Author

@donaloconnor As you acked this, can you change your review status? It stills shows up in GitHub as "requested changes".

@laanwj Sorry to ping you on this again, but you requested changes on this earlier and I think this is ready to go (unless you have further objections). I'm trying to not stack PRs in the dbwrapper code, and this is preventing me from submitting further changes for review.

This change significantly increases IBD performance by increasing the
amount of the UTXO index that can remain in memory. To ensure this
doesn't cause problems in the future, a static_assert on the LevelDB
version has been added, which must be updated by anyone upgrading
LevelDB.
@eklitzke
Copy link
Contributor Author

Rebased with master, as other dbwrapper changes have been merged since my last update.

@laanwj
Copy link
Member

laanwj commented Mar 29, 2018

utACK ccedbaf - with the comment that I think relying on undocumented behavior of leveldb is risky. This is only remotely acceptable because we have our own leveldb tree (https://github.com/bitcoin-core/leveldb), and don't blindly merge changes from upstream. I also think you have properly documented this.

@laanwj laanwj merged commit ccedbaf into bitcoin:master Mar 29, 2018
laanwj added a commit that referenced this pull request Mar 29, 2018
ccedbaf Increase LevelDB max_open_files unless on 32-bit Unix. (Evan Klitzke)

Pull request description:

  Currently we set `max_open_files = 64` on all architectures due to concerns about file descriptor exhaustion. This is extremely expensive due to how LevelDB is designed.

  When a LevelDB file handle is opened, a bloom filter and block index are decoded, and some CRCs are checked. Bloom filters and block indexes in open table handles can be checked purely in memory. This means that when doing a key lookup, if a given table file may contain a given key, all of the lookup operations can happen completely in RAM until the block itself is fetched. In the common case fetching the block is one disk seek, because the block index stores its physical offset. This is the ideal case, and what we want to happen as often as possible.

  If a table file handle is not open in the table cache, then in addition to the regular system calls to open the file, the block index and bloom filter need to be decoded before they can be checked. This is expensive and is something we want to avoid.

  The current setting of 64 file handles means that on a synced node, only about 4% of key lookups can be satisifed by table file handles that are actually open and in memory.

  The original concerns about file descriptor exhaustion are unwarranted on most systems because:
   * On 64-bit POSIX hosts LevelDB will open up to 1000 file descriptors using `mmap()`, and it does not retain an open file descriptor for such files.
   * On Windows non-socket files do not interfere with the main network `select()` loop, so the same fd exhaustion issues do not apply there.

  This change keeps the default `max_open_files` value (which is 1000) on all systems except 32-bit POSIX hosts (which do not use `mmap()`). Open file handles use about 20 KB of memory (for the block index), so the extra file handles do not cause much memory overhead. At most 1000 will be open, and a fully synced node right now has about 1500 such files.

  Profile of `loadblk` thread before changes: https://monad.io/maxopenfiles-master.svg
  Profile of `loadblk` thread after changes: https://monad.io/maxopenfiles-increase.svg

Tree-SHA512: de54f77d57e9f8999eaf8d12592aab5b02f5877be8fa727a1f42cf02da2693ce25846445eb19eb138ce4e5045d1c65e14054df72faf3ff32c7655c9cfadd27a9
sidhujag added a commit to syscoin/syscoin that referenced this pull request Jun 5, 2018
underdarkskies referenced this pull request in underdarkskies/Ravencoin Jul 19, 2018
ccedbaf Increase LevelDB max_open_files unless on 32-bit Unix. (Evan Klitzke)

Pull request description:

  Currently we set `max_open_files = 64` on all architectures due to concerns about file descriptor exhaustion. This is extremely expensive due to how LevelDB is designed.

  When a LevelDB file handle is opened, a bloom filter and block index are decoded, and some CRCs are checked. Bloom filters and block indexes in open table handles can be checked purely in memory. This means that when doing a key lookup, if a given table file may contain a given key, all of the lookup operations can happen completely in RAM until the block itself is fetched. In the common case fetching the block is one disk seek, because the block index stores its physical offset. This is the ideal case, and what we want to happen as often as possible.

  If a table file handle is not open in the table cache, then in addition to the regular system calls to open the file, the block index and bloom filter need to be decoded before they can be checked. This is expensive and is something we want to avoid.

  The current setting of 64 file handles means that on a synced node, only about 4% of key lookups can be satisifed by table file handles that are actually open and in memory.

  The original concerns about file descriptor exhaustion are unwarranted on most systems because:
   * On 64-bit POSIX hosts LevelDB will open up to 1000 file descriptors using `mmap()`, and it does not retain an open file descriptor for such files.
   * On Windows non-socket files do not interfere with the main network `select()` loop, so the same fd exhaustion issues do not apply there.

  This change keeps the default `max_open_files` value (which is 1000) on all systems except 32-bit POSIX hosts (which do not use `mmap()`). Open file handles use about 20 KB of memory (for the block index), so the extra file handles do not cause much memory overhead. At most 1000 will be open, and a fully synced node right now has about 1500 such files.

  Profile of `loadblk` thread before changes: https://monad.io/maxopenfiles-master.svg
  Profile of `loadblk` thread after changes: https://monad.io/maxopenfiles-increase.svg

Tree-SHA512: de54f77d57e9f8999eaf8d12592aab5b02f5877be8fa727a1f42cf02da2693ce25846445eb19eb138ce4e5045d1c65e14054df72faf3ff32c7655c9cfadd27a9
underdarkskies referenced this pull request in underdarkskies/Ravencoin Aug 1, 2018
ccedbaf Increase LevelDB max_open_files unless on 32-bit Unix. (Evan Klitzke)

Pull request description:

  Currently we set `max_open_files = 64` on all architectures due to concerns about file descriptor exhaustion. This is extremely expensive due to how LevelDB is designed.

  When a LevelDB file handle is opened, a bloom filter and block index are decoded, and some CRCs are checked. Bloom filters and block indexes in open table handles can be checked purely in memory. This means that when doing a key lookup, if a given table file may contain a given key, all of the lookup operations can happen completely in RAM until the block itself is fetched. In the common case fetching the block is one disk seek, because the block index stores its physical offset. This is the ideal case, and what we want to happen as often as possible.

  If a table file handle is not open in the table cache, then in addition to the regular system calls to open the file, the block index and bloom filter need to be decoded before they can be checked. This is expensive and is something we want to avoid.

  The current setting of 64 file handles means that on a synced node, only about 4% of key lookups can be satisifed by table file handles that are actually open and in memory.

  The original concerns about file descriptor exhaustion are unwarranted on most systems because:
   * On 64-bit POSIX hosts LevelDB will open up to 1000 file descriptors using `mmap()`, and it does not retain an open file descriptor for such files.
   * On Windows non-socket files do not interfere with the main network `select()` loop, so the same fd exhaustion issues do not apply there.

  This change keeps the default `max_open_files` value (which is 1000) on all systems except 32-bit POSIX hosts (which do not use `mmap()`). Open file handles use about 20 KB of memory (for the block index), so the extra file handles do not cause much memory overhead. At most 1000 will be open, and a fully synced node right now has about 1500 such files.

  Profile of `loadblk` thread before changes: https://monad.io/maxopenfiles-master.svg
  Profile of `loadblk` thread after changes: https://monad.io/maxopenfiles-increase.svg

Tree-SHA512: de54f77d57e9f8999eaf8d12592aab5b02f5877be8fa727a1f42cf02da2693ce25846445eb19eb138ce4e5045d1c65e14054df72faf3ff32c7655c9cfadd27a9
underdarkskies referenced this pull request in underdarkskies/Ravencoin Aug 1, 2018
ccedbaf Increase LevelDB max_open_files unless on 32-bit Unix. (Evan Klitzke)

Pull request description:

  Currently we set `max_open_files = 64` on all architectures due to concerns about file descriptor exhaustion. This is extremely expensive due to how LevelDB is designed.

  When a LevelDB file handle is opened, a bloom filter and block index are decoded, and some CRCs are checked. Bloom filters and block indexes in open table handles can be checked purely in memory. This means that when doing a key lookup, if a given table file may contain a given key, all of the lookup operations can happen completely in RAM until the block itself is fetched. In the common case fetching the block is one disk seek, because the block index stores its physical offset. This is the ideal case, and what we want to happen as often as possible.

  If a table file handle is not open in the table cache, then in addition to the regular system calls to open the file, the block index and bloom filter need to be decoded before they can be checked. This is expensive and is something we want to avoid.

  The current setting of 64 file handles means that on a synced node, only about 4% of key lookups can be satisifed by table file handles that are actually open and in memory.

  The original concerns about file descriptor exhaustion are unwarranted on most systems because:
   * On 64-bit POSIX hosts LevelDB will open up to 1000 file descriptors using `mmap()`, and it does not retain an open file descriptor for such files.
   * On Windows non-socket files do not interfere with the main network `select()` loop, so the same fd exhaustion issues do not apply there.

  This change keeps the default `max_open_files` value (which is 1000) on all systems except 32-bit POSIX hosts (which do not use `mmap()`). Open file handles use about 20 KB of memory (for the block index), so the extra file handles do not cause much memory overhead. At most 1000 will be open, and a fully synced node right now has about 1500 such files.

  Profile of `loadblk` thread before changes: https://monad.io/maxopenfiles-master.svg
  Profile of `loadblk` thread after changes: https://monad.io/maxopenfiles-increase.svg

Tree-SHA512: de54f77d57e9f8999eaf8d12592aab5b02f5877be8fa727a1f42cf02da2693ce25846445eb19eb138ce4e5045d1c65e14054df72faf3ff32c7655c9cfadd27a9
giaki3003 added a commit to LUX-Core/lux that referenced this pull request May 3, 2019
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Sep 27, 2020
ccedbaf Increase LevelDB max_open_files unless on 32-bit Unix. (Evan Klitzke)

Pull request description:

  Currently we set `max_open_files = 64` on all architectures due to concerns about file descriptor exhaustion. This is extremely expensive due to how LevelDB is designed.

  When a LevelDB file handle is opened, a bloom filter and block index are decoded, and some CRCs are checked. Bloom filters and block indexes in open table handles can be checked purely in memory. This means that when doing a key lookup, if a given table file may contain a given key, all of the lookup operations can happen completely in RAM until the block itself is fetched. In the common case fetching the block is one disk seek, because the block index stores its physical offset. This is the ideal case, and what we want to happen as often as possible.

  If a table file handle is not open in the table cache, then in addition to the regular system calls to open the file, the block index and bloom filter need to be decoded before they can be checked. This is expensive and is something we want to avoid.

  The current setting of 64 file handles means that on a synced node, only about 4% of key lookups can be satisifed by table file handles that are actually open and in memory.

  The original concerns about file descriptor exhaustion are unwarranted on most systems because:
   * On 64-bit POSIX hosts LevelDB will open up to 1000 file descriptors using `mmap()`, and it does not retain an open file descriptor for such files.
   * On Windows non-socket files do not interfere with the main network `select()` loop, so the same fd exhaustion issues do not apply there.

  This change keeps the default `max_open_files` value (which is 1000) on all systems except 32-bit POSIX hosts (which do not use `mmap()`). Open file handles use about 20 KB of memory (for the block index), so the extra file handles do not cause much memory overhead. At most 1000 will be open, and a fully synced node right now has about 1500 such files.

  Profile of `loadblk` thread before changes: https://monad.io/maxopenfiles-master.svg
  Profile of `loadblk` thread after changes: https://monad.io/maxopenfiles-increase.svg

Tree-SHA512: de54f77d57e9f8999eaf8d12592aab5b02f5877be8fa727a1f42cf02da2693ce25846445eb19eb138ce4e5045d1c65e14054df72faf3ff32c7655c9cfadd27a9
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Sep 27, 2020
ccedbaf Increase LevelDB max_open_files unless on 32-bit Unix. (Evan Klitzke)

Pull request description:

  Currently we set `max_open_files = 64` on all architectures due to concerns about file descriptor exhaustion. This is extremely expensive due to how LevelDB is designed.

  When a LevelDB file handle is opened, a bloom filter and block index are decoded, and some CRCs are checked. Bloom filters and block indexes in open table handles can be checked purely in memory. This means that when doing a key lookup, if a given table file may contain a given key, all of the lookup operations can happen completely in RAM until the block itself is fetched. In the common case fetching the block is one disk seek, because the block index stores its physical offset. This is the ideal case, and what we want to happen as often as possible.

  If a table file handle is not open in the table cache, then in addition to the regular system calls to open the file, the block index and bloom filter need to be decoded before they can be checked. This is expensive and is something we want to avoid.

  The current setting of 64 file handles means that on a synced node, only about 4% of key lookups can be satisifed by table file handles that are actually open and in memory.

  The original concerns about file descriptor exhaustion are unwarranted on most systems because:
   * On 64-bit POSIX hosts LevelDB will open up to 1000 file descriptors using `mmap()`, and it does not retain an open file descriptor for such files.
   * On Windows non-socket files do not interfere with the main network `select()` loop, so the same fd exhaustion issues do not apply there.

  This change keeps the default `max_open_files` value (which is 1000) on all systems except 32-bit POSIX hosts (which do not use `mmap()`). Open file handles use about 20 KB of memory (for the block index), so the extra file handles do not cause much memory overhead. At most 1000 will be open, and a fully synced node right now has about 1500 such files.

  Profile of `loadblk` thread before changes: https://monad.io/maxopenfiles-master.svg
  Profile of `loadblk` thread after changes: https://monad.io/maxopenfiles-increase.svg

Tree-SHA512: de54f77d57e9f8999eaf8d12592aab5b02f5877be8fa727a1f42cf02da2693ce25846445eb19eb138ce4e5045d1c65e14054df72faf3ff32c7655c9cfadd27a9
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Oct 14, 2020
ccedbaf Increase LevelDB max_open_files unless on 32-bit Unix. (Evan Klitzke)

Pull request description:

  Currently we set `max_open_files = 64` on all architectures due to concerns about file descriptor exhaustion. This is extremely expensive due to how LevelDB is designed.

  When a LevelDB file handle is opened, a bloom filter and block index are decoded, and some CRCs are checked. Bloom filters and block indexes in open table handles can be checked purely in memory. This means that when doing a key lookup, if a given table file may contain a given key, all of the lookup operations can happen completely in RAM until the block itself is fetched. In the common case fetching the block is one disk seek, because the block index stores its physical offset. This is the ideal case, and what we want to happen as often as possible.

  If a table file handle is not open in the table cache, then in addition to the regular system calls to open the file, the block index and bloom filter need to be decoded before they can be checked. This is expensive and is something we want to avoid.

  The current setting of 64 file handles means that on a synced node, only about 4% of key lookups can be satisifed by table file handles that are actually open and in memory.

  The original concerns about file descriptor exhaustion are unwarranted on most systems because:
   * On 64-bit POSIX hosts LevelDB will open up to 1000 file descriptors using `mmap()`, and it does not retain an open file descriptor for such files.
   * On Windows non-socket files do not interfere with the main network `select()` loop, so the same fd exhaustion issues do not apply there.

  This change keeps the default `max_open_files` value (which is 1000) on all systems except 32-bit POSIX hosts (which do not use `mmap()`). Open file handles use about 20 KB of memory (for the block index), so the extra file handles do not cause much memory overhead. At most 1000 will be open, and a fully synced node right now has about 1500 such files.

  Profile of `loadblk` thread before changes: https://monad.io/maxopenfiles-master.svg
  Profile of `loadblk` thread after changes: https://monad.io/maxopenfiles-increase.svg

Tree-SHA512: de54f77d57e9f8999eaf8d12592aab5b02f5877be8fa727a1f42cf02da2693ce25846445eb19eb138ce4e5045d1c65e14054df72faf3ff32c7655c9cfadd27a9
random-zebra added a commit to PIVX-Project/PIVX that referenced this pull request Apr 17, 2021
7ce17a6 Increase LevelDB max_open_files unless on 32-bit Unix. (Evan Klitzke)

Pull request description:

  This change significantly increases IBD performance by increasing the amount of the UTXO index that can remain in memory. To ensure this doesn't cause problems in the future, a static_assert on the LevelDB version has been added, which must be updated by anyone upgrading LevelDB.

  In upstream, this resulted in a substantial difference in the sync time. A 50% reduction in the reindex time.
  So.. **2x speedup improvement**.

  Coming straight from bitcoin#12495.

ACKs for top commit:
  Fuzzbawls:
    utACK 7ce17a6
  random-zebra:
    ACK 7ce17a6 and merging...

Tree-SHA512: 47ae5a621b8a9f59cf470ec3d290c274082229481a226e448781f2c747b0b52e8e4d2506b183a67cc046af96369806944ed9edcf10745146f04c75b1c4f5d7a8
gades pushed a commit to cosanta/cosanta-core that referenced this pull request Jun 24, 2021
ccedbaf Increase LevelDB max_open_files unless on 32-bit Unix. (Evan Klitzke)

Pull request description:

  Currently we set `max_open_files = 64` on all architectures due to concerns about file descriptor exhaustion. This is extremely expensive due to how LevelDB is designed.

  When a LevelDB file handle is opened, a bloom filter and block index are decoded, and some CRCs are checked. Bloom filters and block indexes in open table handles can be checked purely in memory. This means that when doing a key lookup, if a given table file may contain a given key, all of the lookup operations can happen completely in RAM until the block itself is fetched. In the common case fetching the block is one disk seek, because the block index stores its physical offset. This is the ideal case, and what we want to happen as often as possible.

  If a table file handle is not open in the table cache, then in addition to the regular system calls to open the file, the block index and bloom filter need to be decoded before they can be checked. This is expensive and is something we want to avoid.

  The current setting of 64 file handles means that on a synced node, only about 4% of key lookups can be satisifed by table file handles that are actually open and in memory.

  The original concerns about file descriptor exhaustion are unwarranted on most systems because:
   * On 64-bit POSIX hosts LevelDB will open up to 1000 file descriptors using `mmap()`, and it does not retain an open file descriptor for such files.
   * On Windows non-socket files do not interfere with the main network `select()` loop, so the same fd exhaustion issues do not apply there.

  This change keeps the default `max_open_files` value (which is 1000) on all systems except 32-bit POSIX hosts (which do not use `mmap()`). Open file handles use about 20 KB of memory (for the block index), so the extra file handles do not cause much memory overhead. At most 1000 will be open, and a fully synced node right now has about 1500 such files.

  Profile of `loadblk` thread before changes: https://monad.io/maxopenfiles-master.svg
  Profile of `loadblk` thread after changes: https://monad.io/maxopenfiles-increase.svg

Tree-SHA512: de54f77d57e9f8999eaf8d12592aab5b02f5877be8fa727a1f42cf02da2693ce25846445eb19eb138ce4e5045d1c65e14054df72faf3ff32c7655c9cfadd27a9
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Sep 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants