-
-
Notifications
You must be signed in to change notification settings - Fork 999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRAFT: Questions for asynchronous IO #6208
Conversation
00ce657
to
3134c17
Compare
@@ -213,6 +213,8 @@ rule linking ( properties * ) | |||
result += <library>/try_signal//try_signal/<link>static ; | |||
} | |||
|
|||
result += <linkflags>"-lxnvme -laio -luuid -lnuma -pthread -lrt" ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about android/windows/darwin? Will it build there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are currently working on Windows support for xNVMe and expect a release in couple of months. It'll be using I/O Control Ports for the async IO.
xNVMe is currently untested on Darwin, but does run on FreeBSD. Android is also untested, but at least the POSIX interfaces should work, and then depending on the kernel it should support libaio and io_uring. I expect that we'll get test coverage for both Darwin and Android :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite exciting!
I haven't had time to go through everything in detail yet, and I will have more questions later, but I figured I shouldn't delay posting these too long.
I'm curious about your take on async disk I/O. I kind of dismissed it for these reasons:
- open()/close()/stat() are all synchronous in all async I/O models (iirc)
- I expect storage to be moving towards Direct Access, i.e. where it's treated more like persistent RAM, mapped into the address space. With this in mind, multi-threaded mmap seems to be somewhat future proof.
src/xnvme_disk_io.cpp
Outdated
auto const len2 = v2_block ? std::min(default_block_size, piece_size2 - offset) : 0; | ||
|
||
iovec_t b = {buffer.data(), std::max(len, len2)}; | ||
int const ret = st->readv(m_settings, b, piece, offset, error); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine you could do something similar to the regular async read case here, and move the hashing to the completion handler. right? to avoid a blocking read call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't work on this yet, but it definitely should be possible
xnvme_storage* st = m_torrents[storage].get(); | ||
storage_error ec; | ||
status_t ret; | ||
std::tie(ret, p) = st->move_storage(p, flags, ec); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is blocking still, as I'm sure you understand. I think it's fine to leave "fringe" features like this still be blocking. However, if this becomes "production quality", it would probably have to move to a separate thread, as it otherwise would block the network loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, agreed. I was considering doing the read/write IOs asynchronously and using a thread pool to perform the file system operations
std::string m_xnvme_backend; | ||
}; | ||
|
||
TORRENT_EXPORT std::unique_ptr<disk_interface> xnvme_disk_io_constructor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of making this a free function, you could make it a function object. That way, you can pass arbitrary arguments into the disk subsystem implementation. For example the xnvme backend. That way you don't need a new settings_pack
field nor a new field to session_params
.
You can take a look at this example, where I use a custom disk I/O for simulation tests.
The test_disk
object is a kind of parameter pack, as well as a function object, it implements operator(), here. Implemented here.
To use this you construct a test_disk
object and assign it to session_params::disk_constructor
(example).
|
||
auto search = m_file_handles->find(fname); | ||
if (search != m_file_handles->end()) { | ||
return search->second; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are all file handles opened in both read and write mode? If not, you may get a read-only file here despite asking for write-mode. If files are always opened in read/write mode, you may fail to just seed files from read-only media. or volumes your user only has read-access to.
If you need a more sophisticated file cache, see file_view_pool
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are all file handles opened in both read and write mode?
Currently they are
If files are always opened in read/write mode, you may fail to just seed files from read-only media. or volumes your user only has read-access to.
Great point! I'll look into the file_view_pool
cache
}); | ||
|
||
int res = m_torrents[storage]->writev(m_settings, b, r.piece, r.start, *error, whandler); | ||
TORRENT_ASSERT(res >= 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like both writev()
and readv()
have failure paths where the handler is not called, which is problematic. It would also trigger this assert.
First of all, thanks for the great comments on the code! I'll make sure to address everything noted. In regards to your questions about async disk I/O:
Yep, that's pretty much the case, except for Linux that can do some of these things via io_uring. But nothing is defined in POSIX or otherwise generally available in a portable fashion. I'm wondering why the lack of asynchronous versions of said calls is enough to dismiss async I/O? There must be some part of the workflow that I'm missing - e.g., is torrenting largely bottlenecked by metadata I/O operations? My gut feeling would be that caching of file descriptors could amortize the cost of blocking metadata operations to be small enough that the increased performance from async read/write calls would more than make up for it. But I guess I'm basing this on the assumption that many torrents contain files that are big enough to benefit from a higher I/O depth. This gut feeling is based on nothing, so some of your insight on this would be much appreciated!
What do you mean by Direct Access? Do you mean bypassing the file system using raw block devices? Or do you mean new hardware such as persistent memory? Or something else? Generally, I think it will be an excellent long-term solution to provide an IO path that utilizes async I/Os because the $/GB of HDD and SSDs is likely to stay much lower than hardware that natively supports byte-addressing/unaligned access. |
If I would make blocking calls in the network thread, yes it would be.
It's not a show stopper, you just have to make those calls in a thread pool. However, once you have a thread pool, it's natural to use that for the actual I/O as well. Synchronizing opening and closing files with issuing async reads and writes becomes complicated. Additionally, hashing pieces ought to be done in separate threads too, so it's hard to escape the thread pool.
Any blocking call is problematic in the main thread, since it would prevent all sockets to respond to events, or peer requests to be responded to. I think most torrents probably do have a few large files, but there are so many torrents in the world that even the minority of many-small-files torrents is still a large number. Both extremes need to be handled well.
|
By the way, please don't interpret any of my comments as discouragement! |
Definitely not, your honest feedback is much appreciated! |
I'm currently working towards making reads and writes truly asynchronous, i.e. moving the current usage of |
I see two approaches:
|
I've been wondering - how do you do performance testing of the What would you recommend? |
There's this python script that runs transfer benchmarks. This covers more than just the disk I/O, but I would expect that to be an important part of it. This script is not run regularly and may have experienced some bit rot. I would also expect you'd want to tweak certain aspects of it for your specific setup. https://github.com/arvidn/libtorrent/blob/RC_2_0/tools/run_benchmark.py There's also a test that benchmarks checking of files. this is much more disk I/O centric. https://github.com/arvidn/libtorrent/blob/RC_2_0/tools/benchmark_checking.py |
I'll try to play around with this - thank you! |
61fc43a
to
f401481
Compare
Hey again! I've been attempting to make the IOs "truly" async by not forcing IO reaping inside To get the async IOs started, I went with your suggestion of having a separate thread doing the polling. As xNVMe IO queues aren't thread safe, this has required me to add some locking. I think the locking currently is a bit too pessimistic, but an iteration or two should make it clear if that's the case. Overall I think this works well. I've messed around with the implementation enough to get the My current problem (or symptom of my problem) is that when I run The Ah, and I definitely see how messed up my implementation of My current focus is to try to get torrents downloading without seeing any data in the "fail" bucket. My theory is that the problem is with |
The code is not currently "truly async" as it waits on all outstanding IOs before returning to the caller. The performance should improve substantially once this behavior is changed to "truly async".
WIP: refactor and test prepare_ios WIP: remove unused storage_error parameter from readv2, writev
f401481
to
dc295cc
Compare
one thing that makes In the It's similar to store buffers in CPUs. |
Ahhh, this makes a lot of sense! That definitely could explain why I'm seeing random fails. Thanks for the hint - I'll look into this! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hello,
As noted in the title this PR is a draft and is not meant to be merged. I'm posting my progress in adding asynchronous IOs to libtorrent and want to give some context for a few questions that I have in regards to whether and how to continue this work.
For context, I saw your blog post from 2012 about async IOs and got interested in trying to improve the way that IOs are done in libtorrent. I know I'm almost 10 years late to the party, but it seems to me that the implementation of IOs in libtorrent hasn't architecturally changed a lot from what you described in that post.
I'm working on the open source project xNVMe which provides a simple API that abstracts over underlying IO paths such as io_uring, libaio, and posix aio to provide all of the benefits of truly asynchronous IOs while requiring the user (libtorrent in this case) to implement only a single API. The xNVMe API then makes it possible to change the actual IO path at runtime.
This PR contains a simple xNVMe
disk_interface
implementation that is mostly identical to the existing posix disk interface, except thatasync_read
andasync_write
have been implemented using xNVMe. This implementation is not yet "truly asynchronous" because IOs are reaped immediately after they are posted.I tried briefly to implement this in a "truly asynchronous" fashion, but my lacking understanding of the libtorrent architecture made it difficult to come up with a good design. Instead of hacking on it for too long, I thought I'd start out by asking you if this is something that you think is worthwhile to work on and might be interested in merging into libtorrent when it's fully functional, or if the current thread pool model is fine as it is?
If you think it's worthwhile, I'd love to ask for some pointers in how you think this could fit in with the existing architecture:
The main problem I've had in implementing truly asynchronous IOs so far, has been to identify when it's sensible to reap IOs.
Other than that, I've struggled a bit with identifying which existing functionality can be reused in a non-blocking context (e.g. excellent helpers like
readwritev
) as many of them (naturally) assume a blocking behavior.Do you have any insights or code from your previous experiments with this that could make my life easier?
Thanks!