New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DMABUF API v2 #928
DMABUF API v2 #928
Conversation
eb53154
to
0d0e5fb
Compare
To be tested with a ZedBoard or ZCU102 running Linux 6.1 from this branch: With the following caveats:
|
0d0e5fb
to
37a71b7
Compare
37a71b7
to
6b342d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool work... Mostly nitpicks from my side!
local-dmabuf.c
Outdated
goto err_free_priv; | ||
} | ||
|
||
memfd = syscall(__NR_memfd_create, "/libiio-udmabuf", MFD_ALLOW_SEALING); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason not to use memfd_create(2))
? Not present in libc you were using?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is present in the libc I am using, but it is not everywhere; doing it like that is safer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, then I would wrap this is in a macro and call the libc version when available and fallback to this only when necessary...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could do that, but memfd_create
is just a wrapper around syscall( ) anyway, I don't really see what would be the benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could do that, but memfd_create is just a wrapper around syscall( ) anyway, I don't really see what would be the benefit.
Well, for now... Like this, you can potentially bypass any new code that lands in libc.
Anyways, up to you :)
entry->dmabuf_fd = -EINVAL; | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe we should already have a mechanism in here to properly handle this in the proper backend? For now, yes, only USB supports this but ideally we can have the same in net.
Maybe a simple helper function or a new callback?
Not really a blocker from my point of but maybe something easy enough that could be done already instead of doing it when a new backend supports this mechanism
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
USB does not support DMABUFs, the gadget/functionfs API does. So this is very much a IIOD-side things that shouldn't be in Libiio.
If we ever get DMABUF support to the network stack, the related code could be shared (as a static lib akin to iiod-responder) between the network backend and IIOD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is very much a IIOD-side things that shouldn't be in Libiio.
But don't we have a similar situation if network is being used? IIOD still needs to "respond" the data right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. We could have a "frontend" system, but I'd prefer refactoring the code once we do have DMABUF support in the network stack.
@@ -715,6 +756,10 @@ static void handle_free_block(struct parser_pdata *pdata, | |||
if (entry->block != block) | |||
continue; | |||
|
|||
ep_fd = buf_entry->is_tx ? pdata->fd_in : pdata->fd_out; | |||
if (WITH_IIOD_USB_DMABUF && entry->dmabuf_fd > 0) | |||
usb_detach_dmabuf(ep_fd, entry->dmabuf_fd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: another place where we could already have some "backend agnostic" mechanism in place...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this is IIOD and not Libiio, and interfaces gadget/functionfs and not client-side USB, so it wouldn't fit in one of Libiio's backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm aware it's IIOD, but for example if we ever get support for DMABUF in network (I know this is a big 'if') would we have something like net_detach_dmabuf()
? So wouldn't we need to have another another:
if (WITH_IIOD_NET_DMABUF && ...)
...
But maybe there's no ops
kind of thing in IIOD and adding new helpers for this, yeah, can be done when really needed...
Or maybe I'm just completely off here and should just check iiod code 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there are no "frontends" in IIOD yet, and adding this separation would be a bit outside the scope of this PR. I would refactor this the moment where we have DMABUF support on the network stack.
|
||
struct iio_ffs_dmabuf_transfer { | ||
int fd; | ||
uint32_t flags; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe let's make this u64 already? I know the kernel side has u32 (which would have to be also updated) and I was actually wondering why not unsigned long
but now I see this is used in ioctl().
BTW, the only reason I'm stating to move this into u64 is just to be on the safe side and have more space for flags (we never know how this will grow even though now, it seems that 32 flags is more than enough)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I currently have a total of 0 flags, I would think that 32 available flags is more than enough :)
I could switch to u64 flags, at the cost of doubling the size of the structure (24 bytes vs. 12 bytes), but I'm not really sure that will ever be needed.
I will send a patchset for the Linux bits very soon, let's see if they say anything about that.
6b342d3
to
d63228e
Compare
This function can be used to get the value of a counter with a precision of a microsecond. The value returned is not properly defined, and as such does not allow to represent a specific point in time, but allows to compute periods and delays. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Make sure that the transfer size requested (in bytes) is not bigger than the size of the block itself. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Previously, the 'bytes_used' was only used when enqueueing a TX transfer; for RX transfers, the full size of the iio_block was always used. This was arbitrary and confusing. Now, the 'bytes_used' field will be used for RX transfers as well. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
When calling iio_block_enqueue(), if 'bytes_used' is set to zero, then the full block is transferred. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
The bytes_used field can now be set to 0 to default to the block size. Note that for RX transfers, iio_block_enqueue() was already called with bytes_used == 0, which previously meant "transmit 0 bytes and queue for receive". Because the "bytes_used" field is now handled as for RX buffers as well, the behaviour is now "queue for receiving a block size worth of data" which is functionally the same. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
This function is meant to be used when the data is never accessed directly, but passed around as DMABUF objects. In that case, keeping the data in DMA space means that it doesn't have to be synchronized with the CPU cache, which improves performance, and saves two ioctl calls. When CPU access is disabled, all API functions to access a block's data should not be used. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
This API function can be used to retrieve a file descriptor of the underlying DMABUF object. This file descriptor can then be used for instance to attach the block to a different device driver for zero-copy applications. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
The previous DMABUF-based API was refused upstream as it had several problems and was very complex. A lot of that complexity was lifted by making the IIO core a DMABUF importer instead of an exporter. This means that the IIO core is no more responsible for creating the DMABUF objects, and to map them in the DMA space. This task is now delegated to the "udmabuf" kernel driver, available since kernel 5.19, which allows userspace programs to create DMABUF objects from memfd buffers. The DMABUF interface of the IIO core then simply consists in three IOCTLs: one to attach a DMABUF to the interface, one to detach it, and one to request a data transfer. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Implement the .disable_cpu_access() callback. This will only work when the DMABUF interface is used. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Implement the .get_dmabuf_fd() callback. This will only work when the DMABUF interface is used. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
A iio_err() check was performed on the wrong variable. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Older development versions of Libiio v1.x will try to enqueue blocks of size 0, which is not supported anymore by the current Libiio and IIOD. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
Add support for passing sample data as DMABUF objects between IIO devices, and the USB stack. This mechanism has the benefit that the CPU will never access the data to copy it from one hardware buffer to another, and therefore results in a much higher throughput at a much lower CPU usage. For this new mechanism to work, the DMABUF-based IIO kernel API must be available and used by the local backend, and the FunctionFS stack must also support importing DMABUFs. If those conditions are not met, the standard way of transferring the data will be used. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
d63228e
to
dd34813
Compare
If available, create buffers bigger than 2 MiB using huge pages. This can reduce the number of memory pages backing the buffer by up to a factor of 512x; and as a consequence, reduce the number of DMA transfers by the same amount. Signed-off-by: Paul Cercueil <paul@crapouillou.net>
These commits will modify how the local-DMABUF interface works, because the interface itself in the kernel changed (as it is not yet upstream).
The local backend will now make use of the "udmabuf" kernel driver, which was added in 5.19, to create DMABUF objects from memfd buffers. These DMABUFs can then be attached to the IIO device and/or other devices from other subsystems, and memory-mapped to access the data.
IIOD has also been modified to support enqueueing DMABUFs to the USB (functionfs) interface, and will support passing the buffers between IIO and USB in a zero-copy fashion.
From a Libiio API standpoint, some changes were introduced:
iio_block_enqueue
will support partial transfers on RX buffers as well;iio_block_enqueue
will default to transferring the full buffer;iio_block_disable_cpu_access
, to disable CPU synchronization with the buffers when doing zero-copy;iio_block_get_dmabuf_fd
which can be used when doing zero-copy as well.