Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why libtorrent 2.0's use of memory mapped files was a bad idea #7551

Open
SemiAccurate opened this issue Nov 15, 2023 · 14 comments
Open

Why libtorrent 2.0's use of memory mapped files was a bad idea #7551

SemiAccurate opened this issue Nov 15, 2023 · 14 comments

Comments

@SemiAccurate
Copy link

A paper published last year goes into the details of why database systems using mmap in lieu of implementing a buffer pool inevitably run into problems from both a performance and correctness perspective. Of the four specific issues they detail, the first - transactional safety - does not apply to libtorrent since it is not a database in the conventional sense but works with immutable torrent files. But a lot of the reports they reference are eerily similar to issues reported by people here. (eg. "They also faced other issues when running in containerized environments or on machines without direct-attached storage" see #7480) And let's not even detail the additional issues this approach runs into with Windows, as reported by people here.

The paper Are You Sure You Want to Use MMAP in Your DBMS? says this in its abstract:

"mmap’s perceived ease of use has seduced DBMS developers for decades as a viable alternative to implementing a buffer pool. There are, however, severe correctness and performance issues with mmap that are not immediately apparent. Such problems make it difficult, if not impossible, to use mmap correctly and efficiently in a modern DBMS. In fact, several popular DBMSs initially used mmap to support larger-than-memory databases but soon encountered these hidden perils, forcing them to switch to managing file I/O themselves after significant engineering costs."

Though the paper has its detractors, namely the creators of the LMDB and RDB projects who use mmap (see an attempt at a rebuttal here) none disagree that if you want substantive control over I/O behavior then you have to do it yourself in userspace rather than rely on the OS to do it for you.

@HanabishiRecca
Copy link
Contributor

I think it can be salvaged though. Recently described it in #6667 (comment).

@ukoz
Copy link

ukoz commented Nov 16, 2023

Compile 2.0 with TORRENT_HAVE_MMAP=0;TORRENT_HAVE_MAP_VIEW_OF_FILE=0;.
MMAP and multithreaded IO solves large torrents issue which you probably never encounter in "containerized environments or on machines without direct-attached storage".

@SemiAccurate
Copy link
Author

Compile 2.0 with TORRENT_HAVE_MMAP=0;TORRENT_HAVE_MAP_VIEW_OF_FILE=0;.

This would still have very poor performance compared to libtorrent 1.2's disk I/O subsystem.

MMAP and multithreaded IO solves large torrents issue which you probably never encounter in "containerized environments or on machines without direct-attached storage".

Those on network shares have ostensibly encountered these issues .. among others.

@SemiAccurate
Copy link
Author

I think it can be salvaged though. Recently described it in #6667 (comment).

@HanabishiRecca Arvid has been trying 'salvage' the mmap implementation for literally three years now but it hasn't been successful. Mind you a lot of these knobs are on Linux and do nothing for the others OSes. Ultimately the lesson from the paper is that to have control over I/O behavior in your program you really have to do it yourself in userspace.

@HanabishiRecca
Copy link
Contributor

HanabishiRecca commented Nov 17, 2023

Well, yeah. I tried to address excessive memory usage in particular.
I heard about the performance issues as well, especially using HDDs.
I am not an expert in this topic, but that's kinda strange that OSes perform so poorly. Maybe mmap-ed files were never meant to be used for intensive I/O tasks.

But Arvid is kinda stubborn in this regard and I doubt we will see "back to the roots" any time soon.

@SemiAccurate
Copy link
Author

@arvidn the paper outlines the cases of many DBMS projects initially opting for mmap but then switching away when its limitations became clear and they need control over I/O performance. However your case with libtorrent is unique in that your trajectory is the reverse of many of these projects: You started out with your own buffer pool implementation, managing file I/O in userspace, but then switched to mmap with 2.0. What where the reasons that led you to make this curious decision?

@arvidn
Copy link
Owner

arvidn commented Nov 19, 2023

Arvid has been trying 'salvage' the mmap implementation for literally three years now but it hasn't been successful.

Contributions are welcome!

Mind you a lot of these knobs are on Linux and do nothing for the others OSes.

Windows does have some counterparts, like msync() and FlushViewOfFile()

But Arvid is kinda stubborn in this regard and I doubt we will see "back to the roots" any time soon.

I'm working on it. It's not easy to get right and efficient, contributions are welcome:
#7013

What where the reasons that led you to make this curious decision?

They are mostly documented here: https://github.com/arvidn/libtorrent/wiki/memory-mapped-I-O

One aspect was that balancing the size of the write cache, read cache and read-back avoidance (i.e. blocks that will need to be read back from disk in order to compute their piece hash) is not possible to do well in user space. It turns out it's not so easy in kernel space either though.

Another aspect was the emergence of fast SSDs and persistent memory (DAX) would most likely be much more efficient via memory mapped files.

The major failure case of mmap (afaict) is in network mounted drives, or any FUSE drive. On these drives, writing a partial page in a memory mapped file becomes very expensive, as it needs to pull the page from the network, overwrite part of it, and then flush the whole page back again over the network. Preserving the fidelity of exactly which bytes are being written helps tremendously in this scenario.

@HanabishiRecca
Copy link
Contributor

I'm working on it.

I'm personally fine with POSIX I/O. OS filesystem cache does a quite good job. (Even prior LT 2.0 I had in-client cache disabled anyway.)
Some people report UI freezes in qBittorrent with it (presumably Windows users?), but I never faced that problem.

It's not easy to get right and efficient, contributions are welcome:

I would have helped, but I'm not a C++ guy.

@SemiAccurate
Copy link
Author

I'm working on it. It's not easy to get right and efficient, contributions are welcome: #7013

Hmm as this new implementation is taking a while, why not just copy wholesale the disk I/O subsystem from 1.2 for 2.1 and then you can work on this new implementation afterwards?

Because you had at first said that this new implementation you're working on wouldn't cache blocks, though more recently you've stated that it would use caching. So it seems to be getting more complex over time .. in the interest of pragmatism is it not prudent to use 1.2's I/O subsystem for now?

What where the reasons that led you to make this curious decision?

They are mostly documented here: What where the reasons that led you to make this curious decision?

Did you mean to include a link here Arvid? I don't see it :(

One aspect was that balancing the size of the write cache, read cache and read-back avoidance (i.e. blocks that will need to be read back from disk in order to compute their piece hash) is not possible to do well in user space. It turns out it's not so easy in kernel space either though.

Another aspect was the emergence of fast SSDs and persistent memory (DAX) would most likely be much more efficient via memory mapped files.

Hmm how far out was this DAX persistent memory for consumer PCs in this idealized future scenario? Just asking because I have never heard of DAX and if it comes to consumer PCs it seems it will take a long time, if ever.

Maybe you were getting ahead of things with respect to future hardware developments with the memory mapped implementation?

The major failure case of mmap (afaict) is in network mounted drives, or any FUSE drive. On these drives, writing a partial page in a memory mapped file becomes very expensive, as it needs to pull the page from the network, overwrite part of it, and then flush the whole page back again over the network. Preserving the fidelity of exactly which bytes are being written helps tremendously in this scenario.

It's a pity I don't know C++ :(

@arvidn
Copy link
Owner

arvidn commented Nov 20, 2023

Hmm as this new implementation is taking a while, why not just copy wholesale the disk I/O subsystem from 1.2 for 2.1 and then you can work on this new implementation afterwards?

Because a lot of other this have changed around it. The 1.2 implementation doesn't fit in 2.0+.

Either option is a lot of work, and I don't have a lot of time.

Did you mean to include a link here Arvid? I don't see it :(

Yes, that was a copy-paste failure. I updated my post

Hmm how far out was this DAX persistent memory for consumer PCs in this idealized future scenario? Just asking because I have never heard of DAX and if it comes to consumer PCs it seems it will take a long time, if ever.

It seems Intel Optane kind of failed in the market too.

It's a pity I don't know C++ :(

It's never too late to start!

@HanabishiRecca
Copy link
Contributor

Another aspect was the emergence of fast SSDs

The thing is, most heavy lifting seeders still use HDDs, simply because of huge amounts of storage required. I know people seeding tens or even hundreds terabytes of data, having 10000+ tasks in a single client. And I don't think they will change soon, as SSD space still remains significantly more expensive.

@arvidn
Copy link
Owner

arvidn commented Nov 20, 2023

Because you had at first #7013 (comment) that this new implementation you're working on wouldn't cache blocks, though more recently you've #7480 (comment) that it would use caching.

My current plan is to only have a store-buffer and rely on the operating system for read cache.

@ukoz
Copy link

ukoz commented Nov 23, 2023

emergence of fast SSDs and persistent memory (DAX)

Persistent memory modules are Intel servers only, you put them in DIMM slots. Only Intel® Xeon® CPUs have hardware support for PMEM in memory controller.
New SSDs going nowhere since they less durable due more bits per cell.

@SemiAccurate
Copy link
Author

What where the reasons that led you to make this curious decision?

They are mostly documented here: https://github.com/arvidn/libtorrent/wiki/memory-mapped-I-O

So @arvidn this wiki page is mostly about the how, not the why. The first three lines give the goals but everything that follows is about how the implementation will work. There isn't any detailed reasoning as to why to adopt the memory mapped design in the first place, with a careful exploration of all the pros and cons. Perhaps this was never done, which would explain all the problems since...

Because you had at first #7013 (comment) that this new implementation you're working on wouldn't cache blocks, though more recently you've #7480 (comment) that it would use caching.

My current plan is to only have a store-buffer and rely on the operating system for read cache.

Huh, is your current plan changed from a few months ago? Because in September you wanted to use pread (in addition to pwrite) so no memory mapping in the read path apparently. Though back in August 2022 you intended to only have a store-buffer for writes.

It's a pity I don't know C++ :(

It's never too late to start!

Oh I've tried Arvid but man is it hard! And modern C++ is a career, all of C++17 and its idioms, plus the other things you use in your codebase like boost.asio, whose documentation is horrendous! Not for newbies at all!

As an example of how difficult it is for me, in a November PR someone mentioned adding support for passing client_data_t to flush_cache. Curious, I looked in to your docs to learn about what this client_data_t is. I looked at your interface definition for it in the Add Torrent reference page, but all that template magic befuddled me (compared to the simple - though untyped - void* you apparently used in LT 1.2).

Nonetheless being the fool that I am - as a small exercise - I tried to code up an assignment to this data type and a corresponding get() in order to see if I could work with it. Yet the compiler kept giving me errors with some template messages that were indecipherable to me. I couldn't get it to work. I then tried to find more documentation on how this type worked. In your Upgrading to LT 2.0 doc you say that this type of yours is similar to std::any. So I then searched for that and tried to understand this std::any idiom of modern C++ so that my small example code with your client_data_t data type could compile. Yet try as I might I still could not get it to compile using your type, as compared to a simple void* ! It was a very frustrating experience and I gave up.

Mind you this is a very small idiom of C++ you use in your code that in the end I could not grok enough to make working code out of it, no matter how much I tried. And there are so many much bigger pieces of modern C++ you use, not to mention the boost.asio library, which is quite formidable to grok in and of itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@arvidn @HanabishiRecca @ukoz @SemiAccurate and others