Permalink
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
RFC: drm/amdgpu: Implement a proper implicit fencing uapi
WARNING: Absolutely untested beyond "gcc isn't dying in agony". Implicit fencing done properly needs to treat the implicit fencing slots like a funny kind of IPC mailbox. In other words it needs to be explicitly. This is the only way it will mesh well with explicit fencing userspace like vk, and it's also the bare minimum required to be able to manage anything else that wants to use the same buffer on multiple engines in parallel, and still be able to share it through implicit sync. amdgpu completely lacks such an uapi. Fix this. Luckily the concept of ignoring implicit fences exists already, and takes care of all the complexities of making sure that non-optional fences (like bo moves) are not ignored. This support was added in commit 177ae09 Author: Andres Rodriguez <andresx7@gmail.com> Date: Fri Sep 15 20:44:06 2017 -0400 drm/amdgpu: introduce AMDGPU_GEM_CREATE_EXPLICIT_SYNC v2 Unfortuantely it's the wrong semantics, because it's a bo flag and disables implicit sync on an allocated buffer completely. We _do_ want implicit sync, but control it explicitly. For this we need a flag on the drm_file, so that a given userspace (like vulkan) can manage the implicit sync slots explicitly. The other side of the pipeline (compositor, other process or just different stage in a media pipeline in the same process) can then either do the same, or fully participate in the implicit sync as implemented by the kernel by default. By building on the existing flag for buffers we avoid any issues with opening up additional security concerns - anything this new flag here allows is already. All drivers which supports this concept of a userspace-specific opt-out of implicit sync have a flag in their CS ioctl, but in reality that turned out to be a bit too inflexible. See the discussion below, let's try to do a bit better for amdgpu. This alone only allows us to completely avoid any stalls due to implicit sync, it does not yet allow us to use implicit sync as a strange form of IPC for sync_file. For that we need two more pieces: - a way to get the current implicit sync fences out of a buffer. Could be done in a driver ioctl, but everyone needs this, and generally a dma-buf is involved anyway to establish the sharing. So an ioctl on the dma-buf makes a ton more sense: https://lore.kernel.org/dri-devel/20210520190007.534046-4-jason@jlekstrand.net/ Current drivers in upstream solves this by having the opt-out flag on their CS ioctl. This has the downside that very often the CS which must actually stall for the implicit fence is run a while after the implicit fence point was logically sampled per the api spec (vk passes an explicit syncobj around for that afaiui), and so results in oversync. Converting the implicit sync fences into a snap-shot sync_file is actually accurate. - Simillar we need to be able to set the exclusive implicit fence. Current drivers again do this with a CS ioctl flag, with again the same problems that the time the CS happens additional dependencies have been added. An explicit ioctl to only insert a sync_file (while respecting the rules for how exclusive and shared fence slots must be update in struct dma_resv) is much better. This is proposed here: https://lore.kernel.org/dri-devel/20210520190007.534046-5-jason@jlekstrand.net/ These three pieces together allow userspace to fully control implicit fencing and remove all unecessary stall points due to them. Well, as much as the implicit fencing model fundamentally allows: There is only one set of fences, you can only choose to sync against only writers (exclusive slot), or everyone. Hence suballocating multiple buffers or anything else like this is fundamentally not possible, and can only be fixed by a proper explicit fencing model. Aside from that caveat this model gets implicit fencing as closely to explicit fencing semantics as possible: On the actual implementation I opted for a simple setparam ioctl, no locking (just atomic reads/writes) for simplicity. There is a nice flag parameter in the VM ioctl which we could use, except: - it's not checked, so userspace likely passes garbage - there's already a comment that userspace _does_ pass garbage in the priority field So yeah unfortunately this flag parameter for setting vm flags is useless, and we need to hack up a new one. v2: Explain why a new SETPARAM (Jason) v3: Bas noticed I forgot to hook up the dependency-side shortcut. We need both, or this doesn't do much. v4: Rebase over the amdgpu patch to always set the implicit sync fences. Cc: mesa-dev@lists.freedesktop.org Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Cc: Dave Airlie <airlied@gmail.com> Cc: Rob Clark <robdclark@chromium.org> Cc: Kristian H. Kristensen <hoegsberg@google.com> Cc: Michel Dänzer <michel@daenzer.net> Cc: Daniel Stone <daniels@collabora.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Deepak R Varma <mh12gx2825@gmail.com> Cc: Chen Li <chenli@uniontech.com> Cc: Kevin Wang <kevin1.wang@amd.com> Cc: Dennis Li <Dennis.Li@amd.com> Cc: Luben Tuikov <luben.tuikov@amd.com> Cc: linaro-mm-sig@lists.linaro.org Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
- Loading branch information