Skip to content

What's new with io_uring in 6.10

Jens Axboe edited this page May 17, 2024 · 5 revisions

Greatly improve zerocopy send performance, by enabling coalescing of buffers.

MSG_ZEROCOPY already does this with send(2) and sendmsg(2), but the io_uring side did not. In local testing, the crossover point for send zerocopy being faster is now around 3000 byte packets, and it performs better than the sync syscall variants as well. This improvement is transparent to the application, no changes needed in how zerocopy sends are used.

Add support for send/recv bundles.

Bundles are multiple buffers used in a single operation. On the receive side, this means a single receive may utilize multiple buffers, reducing the roundtrip through the networking stack from N per N buffers to just a single one. On the send side, this also enables better handling of how an application deals with sends from a socket, eliminating the need to serialize sends on a single socket. Bundles work with provided buffers, hence this feature also adds support for provided buffers for send operations.

See the liburing io_uring_prep_send_bundle(3) man page for more details.

Improvements for accept

Accept now supports IORING_ACCEPT_DONT_WAIT, allowing applications to issue a non-retryable accept attempt. IORING_ACCEPT_POLL_FIRST was also added, which works like IORING_RECVSEND_POLL_FIRST in that no immediate accept request is attempted. Rather, io_uring will rely solely on a poll trigger to gauge when it is a good idea to retry the operation. Again like on the receive side, this can be used with the added support for signaling IORING_CQE_F_SOCK_NONEMPTY to eliminate unnecessary accept attempts.

Unification of how async preparation is done across opcodes.

This is more of an internal cleanup with no user visible changes, but it does reduce the complexity of how retries are done. Rather than maintain on-stack state that is then copied to allocated state as needed, the same state is now being used whether or not this is the initial issue attempt or a retry. Various improvements were made in terms of how efficiently this state can be allocated and freed.

Move away from remap_pfn_range() for mapping rings and provided buffers.

Rather than use remap_pfn_range(), vm_insert_page(s)() is used. This applies to both the rings and SQ/CQ arrays, as well as the ring provided buffers. Again not a directly user visible change, as everything should work exactly like it did before. But it does have the added benefit of not requiring physically contiguous memory, which will help with making restarts of longer running services and bigger rings more reliable. Those previously would be prone to running into memory fragmentation that prevented allocation of bigger rings. Outside of that, the code for mapping data was also cleaned up and unified, and the end result is that roughly 400 lines of coded could be removed from the code base.

NOP for error injection

NOP commands don't do anything, they simply post a completion with a result of 0 to the CQ ring. Support was added for controlling what the completion result is, which means you can now use it to inject errors as well. This is handy for test purposes. If IORING_NOP_INJECT_RESULT is set in sqe->nop_flags, then sqe->len will be the posted completion result.