Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sendfile() support #143

Closed
giampaolo opened this issue Dec 20, 2016 · 11 comments
Closed

Add sendfile() support #143

giampaolo opened this issue Dec 20, 2016 · 11 comments

Comments

@giampaolo
Copy link
Contributor

giampaolo commented Dec 20, 2016

Hello David,
would you be interested in a patch adding sendfile() support?
To read more about sendfile you can take a look at my python wrapper which I successfully used it pyftpdlib. sendfile is also available as os.sendfile() starting from python 3.3.

Some considerations:

  • sendfile can be used with non-blocking sockets, meaning that in case the socket is not "write ready" you immediately get EAGAIN
  • with non blocking sockets sendfile() automatically returns "sooner", sending (and returning) fewer bytes than requested
  • on the other hand, the internal kernel-space operation moving the buffer from the regular file fd to the socket fd can block, similarly to plain file.read()

As such I'm not sure what's best to do. Having used sendfile() for years in pyftpdlib my impression is that disk IO is usually fast enough to not represent an actual problem for the main IO loop.
On the other hand I see that curio explicitly promotes the usage of a thread pool when it comes to dealing with files (aopen() etc.). So here's a possible proposal:

  • add a Socket.sendfile(fd, maxsize=65536) method which by default tries to send the file using sendfile()
  • if sendfile() fails on the first call (because fd is not a regular fd), internally we fallback on using the plain chunk = file.read(maxsize) -> sock.sendall(chunk) methodology. This mimics the semantic of socket.sendfile() which I contributed some time ago and which is currently not exposed in the Socket class.
  • by default Socket.sendfile does not use a thread pool, but if the user is worried about blocking he/she can optionally do so with await run_in_thread(s.sendfile, fd).

Thoughts?

@imrn
Copy link

imrn commented Dec 20, 2016 via email

@giampaolo
Copy link
Contributor Author

I'm unaware that Curio promotes any threaded stuff for files, yet happily
using aopen and friends in pure linear curio code. Can you provide some
reference for such promotion?

I'm talking about:

class AsyncFile(object):

...which wraps all file methods (including open()) into run_in_thread().
The code may be linear but a thread is still involved.
My proposal was to avoid that (run_in_thread) by default and leave that choice to the user which can either do this if the disk is fast enough:

await sock.sendfile(fd)  # blocking

...or this if it's not:

await run_in_thread(sock.sendfile, fd)  # non-blocking

Native sendfile will block curio until it returns. fd -> Socket
dumping will take time. So side stepping to a thread is needed.

My point is that modern disks should be fast enough so that the blocking time of a single sendfile() or file.read() call does not represent a problem for the main IO loop.

@imrn
Copy link

imrn commented Dec 20, 2016 via email

@giampaolo
Copy link
Contributor Author

This will not give back control until ALL fd is send through the socket.

Just to clarify, when using sendfile() with a non-blocking socket sendfile() does not send the whole file in one shot. It sends a chunk of it (typically 65k) even if you specify a bigger buffer size, then it returns the number of bytes being sent (e.g. see pyftpdlib implementation).
The actual blocking operation is the "internal file read()". If the socket fd is not "ready to be written" sendfile() immediately returns EAGAIN. So really it's an hybrid: the socket fd part is non blocking, the file fd part is blocking.

As to whether the read() part is "fast enough" I currently have no documentation to back this up so that should indeed be investigated. My point was trying to provide an API flexible enough so that you can choose which approach to use (threads vs. non-threads), depending on how much you trust your disk speed.

@njsmith
Copy link
Contributor

njsmith commented Dec 20, 2016 via email

@imrn
Copy link

imrn commented Dec 20, 2016 via email

@giampaolo
Copy link
Contributor Author

I think the blocking problem is still there. But for a smaller size.

It is for sure, it should just be "tolerable most of the times". Or at least this is my experience with pyftpdlib running on SSD disks where the sendfile() call shows acceptable timings (< 0.1 secs per sendfile() call) also under heavy loads. Unfortunately and astonishingly a real solution for reliably doing kernel-supported non-blocking file IO under UNIX / Linux still does not exist (...and AIO sucks). It appears it should possible on BSD via kqueue(). Funnily Windows is the only one who managed to do this right.

@imrn
Copy link

imrn commented Dec 21, 2016 via email

@njsmith
Copy link
Contributor

njsmith commented Dec 21, 2016

Unfortunately and astonishingly a real solution for reliably doing kernel-supported non-blocking file IO under UNIX / Linux still does not exist (...and AIO sucks). It appears it should possible on BSD via kqueue(). Funnily Windows is the only one who managed to do this right.

This used to bother me a lot more until I realized that from the kernel point of view, one-thread-per-I/O-request is basically isomorphic to what Windows is doing. Basically you have to store the temporary state involved in keeping track of a half-finished I/O operation somewhere. Windows does this explicitly with some sort of manually maintained data structures that it switches between manually; Linux uses the C stack. If you're a kernel, though, then the C stack itself basically is just another data structure that you switch between... so the "just spawn a kernel thread for each request" approach described in that LWN article is basically as good as anything. How does this compare to a user-space threadpool? User-space threads are a little more expensive to context-switch between. And they take up user memory for their stack. But on a lazy-allocating system like Linux, this is only ~1 page per thread in practice, which is nothing for a reasonably sized thread pool. And they take up virtual address space, but on a 64-bit system there's plenty to spare. For programs that are close to the metal and pushing the limits of the hardware, these kinds of small overheads do matter... but for Python programs I doubt the kernel/user-space thread distinction is important.

@dabeaz dabeaz closed this as completed Nov 18, 2018
@giampaolo
Copy link
Contributor Author

I see this was closed. FWIW asyncio recently implemented sendfile support (including on windows by using TransmitFile).

@dabeaz
Copy link
Owner

dabeaz commented Nov 18, 2018

Mainly closed because of inactivity. Will look at the asyncio implementation. Open to pull requests as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants