Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomic commit fails if IN_FENCE_FD is set #622

Open
2 tasks
mahkoh opened this issue Apr 10, 2024 · 2 comments
Open
2 tasks

Atomic commit fails if IN_FENCE_FD is set #622

mahkoh opened this issue Apr 10, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@mahkoh
Copy link

mahkoh commented Apr 10, 2024

NVIDIA Open GPU Kernel Modules Version

550.40.59

Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.

  • I confirm that this does not happen with the proprietary driver package.

Operating System and Version

n/a

Kernel Release

n/a

Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.

  • I am running on a stable kernel release.

Hardware: GPU

n/a

Describe the bug

When an atomic commit contains an IN_FENCE_FD, the nvidia driver rejects the commit with EPERM:

    if (plane_state->fence != NULL || nv_drm_plane_state->fd_user_ptr) {
        if (!nv_dev->supportsSyncpts) {
            return -1;
        }

-1 is EPERM.

First of all this is not a correct error code for this situation.

But more importantly, this makes it impossible to synchronize presentation with rendering from the compositor side. The workaround is to just not use IN_FENCE_FD if the nvidia driver is detected. Kinda absurd since nvidia was the vendor pushing for explicit sync in the first place.

If that is the expected workaround, then maybe the driver should just accept such commits instead of returning an error.


For whatever reason, one of my users reports that the commits are nevertheless applied by the nvidia driver even though the atomic commit fails. This might be a separate bug.

To Reproduce

n/a

Bug Incidence

Always

nvidia-bug-report.log.gz

n/a

More Info

I don't have nvidia hardware so I cannot fill out the remaining fields.

@mahkoh mahkoh added the bug Something isn't working label Apr 10, 2024
@erik-kz
Copy link

erik-kz commented Apr 12, 2024

@cubanismo is working on adding support for IN_FENCE_FD and OUT_FENCE_PTR to our nvidia-drm module, so these should be working properly in near-future driver release. By "properly" I mean leveraging the synchronization capabilities of the GPU's display hardware.

Having said that, I checked with him and we agreed that our current behavior of failing when IN_FENCE_FD is specified is wrong (and, yeah, returning -EPERM is extra wrong). Instead, we should be calling drm_atomic_helper_wait_for_fences to do a CPU wait on the fence. This is what the nouveau driver does, for example.

As you pointed out, there isn't really a good way for userspace to determine whether IN_FENCE_FD is supported, implying the intention was probably for it to always be supported. At worst using a CPU wait as described above.

If we had noticed this problem earlier we would most likely have made such a change as a short-term fix. However, at this point, since a better fix is right around the corner, there may not be any benefit to doing that.

@mahkoh
Copy link
Author

mahkoh commented Apr 12, 2024

Thanks for the update.

PS: The issue I mentioned re "commit being applied anyway" turned out to be a bug on my side where I would render directly to the front buffer if too many commits failed in a row.

ids1024 added a commit to ids1024/smithay that referenced this issue May 22, 2024
Atomic commits fail if this is set on the Nvidia driver, it seems:
NVIDIA/open-gpu-kernel-modules#622

Kwin also has a check for this
https://invent.kde.org/plasma/kwin/-/merge_requests/4770.

This fixes Anvil and cosmic-comp on the Nvidia 555 beta driver. (I'm not
sure why it only started to be a problem; `EGL_ANDROID_native_fence_sync`
is present in the 550 driver, etc.)
Drakulix pushed a commit to Smithay/smithay that referenced this issue May 22, 2024
Atomic commits fail if this is set on the Nvidia driver, it seems:
NVIDIA/open-gpu-kernel-modules#622

Kwin also has a check for this
https://invent.kde.org/plasma/kwin/-/merge_requests/4770.

This fixes Anvil and cosmic-comp on the Nvidia 555 beta driver. (I'm not
sure why it only started to be a problem; `EGL_ANDROID_native_fence_sync`
is present in the 550 driver, etc.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants