Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redraw lag with NVIDIA 334.21 driver #181

Closed
tchebb opened this issue Mar 9, 2014 · 28 comments
Closed

Redraw lag with NVIDIA 334.21 driver #181

tchebb opened this issue Mar 9, 2014 · 28 comments

Comments

@tchebb
Copy link

tchebb commented Mar 9, 2014

Since upgrading to NVIDIA's 334.21 driver, I've been seeing some odd input lag issues when using compton. The lag is most noticeable when scrolling using the scroll-wheel in Firefox. Right after starting Firefox, it's smooth and snappy as expected. But once I've had it open for a while (rapidly changing focus to and from another window seems to trigger it more quickly), my scroll position ends up "one behind." For example, if I scroll down two clicks then up one, I see the page position I'd expect to see after the second down click only after I do the up click. In order to see the results of the up click, I need to either perform another scroll action, change focus to a different window, or trigger a redraw in some other way. The only reliable way to get rid of the issue seems to be to restart Firefox.

Some Googling led me to this Clutter bug report, which has similar symptoms and was caused by a change to NVIDIA's GLX_EXT_buffer_age handling. Although I wasn't using glx-swap-method = "buffer-age"; to begin with, after adding it I saw the same flickering that was reported in the linked NVIDIA forum thread in addition to the original redraw lag.

Could there be a single bug in compton that's causing both issues, and if so, what can I do to help fix it? I've already looked through compton's buffer_age handling code, but I have very limited experience with graphics and couldn't find anything obviously wrong.

A copy of my configuration file follows:

backend = "glx";
glx-no-stencil = true;
glx-no-rebind-pixmap = true;

vsync = "opengl-swc";
paint-on-overlay = true;
@ghost
Copy link

ghost commented Mar 9, 2014

@bucaneer
Copy link

bucaneer commented Mar 9, 2014

I had this happen with --glx-swap-method 2 but then switched to 3 and everything appears to work fine as it did before driver upgrade.

@ghost
Copy link

ghost commented Mar 9, 2014

Well, with swap method 3, the redrawing lags in firefox seem to be gone, but they are still present in programs like termite for example… They become very obvious if you navigate around in ranger; sometimes, the reaction of a keypress takes over a second! Downgrading nvidia solves this problem for the moment, but this isn't a real solution of course.

Edit: Another workaround seems to be to use xr_glx_hybrid, I'll test on this later…

@tchebb
Copy link
Author

tchebb commented Mar 9, 2014

To RichardGv, who I was talking with in IRC earlier today: unfortunately, the screen capture I took (using gtk-RecordMyDesktop) of the symptoms with MONITOR_REPAINT enabled doesn't appear to have captured either the slow flickering I mentioned or the outdated regions that this bug report is about. That in and of itself might be useful information, but I don't think that attaching the video will be of much use.

@richardgv
Copy link
Collaborator

@tchebb:

Oh, could you please install apitrace, a tiny GL debugging tool (please build from git repo, the stable version lacks the functionality to debug a compositor), trace compton with it (MONITOR_REPAINT + apitrace trace --config /dev/null --backend glx --glx-swap-method 1), and attach the trace file?

@cju:

If you are using TripleBuffer in xorg.conf, --glx-swap-method 4 might be needed.

@richardgv
Copy link
Collaborator

Yet another day and I'm having absolutely no luck dealing with the --glx-swap-method buffer-age issue. The only possible workaround I know is to use a --glx-swap-method value larger than normal (use 3 for double buffered screen, and 4 for triple-buffered). The clutter patch actually (unintentionally?) makes mutter unions current damage region with buffer_age last damaged regions instead of buffer_age - 1 regions, which is what we uses and what I believe is correct according to the specification. Kwin and a weston patch ( http://lists.freedesktop.org/archives/wayland-devel/2013-March/007790.html ) probably uses buffer_age - 1, but strangely enough the issue doesn't appear on them. A comment on KDE review board seemingly indicates nvidia-drivers is returning wrong buffer_age. nVidia guys claimed they corrected the wrong buffer_age value, but as far as I could remember, it's returning 2 for double-buffered setups and it has been causing artifacts here since we introduced --glx-swap-method buffer-age -- I'm always using --glx-swap-method 3 personally. What makes tracing particularly tricky is I found no reliable way to reproduce the issue -- looks like it appears randomly here. nvidia-drivers do double-buffering correctly intially but after a while something will get messed up... So frustrating it is.

@ghost
Copy link

ghost commented Mar 10, 2014

Sorry, none of the swap-methods and none xorg buffer option resolves this problem for me... In case that matters, I'm testing with the latest git version of termite. As I said before, only using the hybrid backend (or just native xrender, of course) seems to solve this, that's why I burried out #163 once again… ;)

@richardgv
Copy link
Collaborator

@cju:

But you could simply avoid using --glx-swap-method, right? The optimization isn't really necessary if your GPU is strong enough. Or you meant the issue appears without --glx-swap-method as well? I see both the links you provided are about the buffer_age issue.

@roman-holovin
Copy link

@richardgv no, it is lagging with any swap method, including undefined.

@richardgv
Copy link
Collaborator

Heh, I see. The nVidia guy talked about synchronization issue and EXT_x11_sync_object, but I don't see any big compositors using it right now. (Another nVidia employee added some sort of EXT_x11_sync_object support to his own fork of Compiz, though.) I don't see a clear way to utilize X Sync extension and EXT_x11_sync_object in a compositor, either. I will try to add some synchronization code to some possibly problematic places tomorrow, to check if it brings any changes.

@richardgv
Copy link
Collaborator

Hmm, I'm truly sorry for the delay... Here's a patch that adds X Sync fence support to compton: https://gist.github.com/richardgv/9529221

Using a nVidia-proposed feature to deal with a problem on nvidia-drivers sounds like a nice idea. :-D Seemingly the --glx-swap-method buffer-age issue didn't occur after I apply the patch, at least. However it's not effective against the flickering issue of xr_glx_hybrid backend, might indicate potential problems in my implementation. (Yeah, I should run xrender_sync for 49 times before painting. :-D) (Or I would simply need EXT_x11_sync_object? Does anybody know the difference between XSyncAwaitFence() and EXT_x11_sync_object?) Fence is still a relatively new feature that may not be available to BSD users (worse yet, it may fail to compile with older libXext), which would surely be a problem if I wish to merge it, though.

@roman-holovin
Copy link

Well, I'm using compton from current master with this patch for a few hours and it works. Thank you.

@evanpurkhiser
Copy link

For any Arch users out there that want to test out the patch, here's a PKGBUILD.

@tchebb
Copy link
Author

tchebb commented Mar 13, 2014

Richard, your patch appears to fix the redraw issue. Could it be merged in conditionally with a compile-time option, perhaps?

@tchebb
Copy link
Author

tchebb commented Mar 13, 2014

However, when I use --glx-swap-method buffer-age (which I hadn't been previously), I see screen tearing. Both with and without, I see errors on the console when rapidly switching between workspaces (which don't seem to have any adverse effect). Example:

[   144.65 ] error 9 (BadDrawable) request 134 minor 14 serial 157129 ("BadDrawable (invalid Pixmap or Window parameter)")
[   144.65 ] error 136 (XSyncBadFence) request 134 minor 18 serial 157130 ("BadFence")
[   144.65 ] error 136 (XSyncBadFence) request 134 minor 15 serial 157131 ("BadFence")
[   144.65 ] error 136 (XSyncBadFence) request 134 minor 19 serial 157132 ("BadFence")
[   144.65 ] error 136 (XSyncBadFence) request 134 minor 18 serial 157133 ("BadFence")
[   144.65 ] error 136 (XSyncBadFence) request 134 minor 17 serial 157134 ("BadFence")
[   144.65 ] error 9 (BadDrawable) request 134 minor 14 serial 157136 ("BadDrawable (invalid Pixmap or Window parameter)")
[   144.65 ] error 136 (XSyncBadFence) request 134 minor 18 serial 157137 ("BadFence")
[   144.65 ] error 136 (XSyncBadFence) request 134 minor 15 serial 157138 ("BadFence")
[   144.65 ] error 136 (XSyncBadFence) request 134 minor 19 serial 157139 ("BadFence")
[   144.65 ] error 136 (XSyncBadFence) request 134 minor 18 serial 157140 ("BadFence")
[   144.65 ] error 136 (XSyncBadFence) request 134 minor 17 serial 157141 ("BadFence")
[   144.65 ] error 9 (BadDrawable) request 14 minor 0 serial 157143 ("BadDrawable (invalid Pixmap or Window parameter)")
glx_bind_pixmap(0x0480515a): Failed to query Pixmap info.
win_paint_win(0x01200042): Failed to bind texture. Expect troubles.
win_paint_win(0x01200042): Missing painting data. This is a bad sign.
[   144.65 ] error 9 (BadDrawable) request 134 minor 14 serial 157156 ("BadDrawable (invalid Pixmap or Window parameter)")
[   144.65 ] error 136 (XSyncBadFence) request 134 minor 18 serial 157157 ("BadFence")
[   144.65 ] error 136 (XSyncBadFence) request 134 minor 15 serial 157158 ("BadFence")
[   144.65 ] error 136 (XSyncBadFence) request 134 minor 19 serial 157159 ("BadFence")
[   144.65 ] error 136 (XSyncBadFence) request 134 minor 18 serial 157160 ("BadFence")
[   144.65 ] error 136 (XSyncBadFence) request 134 minor 17 serial 157161 ("BadFence")
[   144.66 ] error 9 (BadDrawable) request 134 minor 14 serial 157163 ("BadDrawable (invalid Pixmap or Window parameter)")
[   144.66 ] error 136 (XSyncBadFence) request 134 minor 18 serial 157164 ("BadFence")
[   144.66 ] error 136 (XSyncBadFence) request 134 minor 15 serial 157165 ("BadFence")
[   144.66 ] error 136 (XSyncBadFence) request 134 minor 19 serial 157166 ("BadFence")
[   144.66 ] error 136 (XSyncBadFence) request 134 minor 18 serial 157167 ("BadFence")
[   144.66 ] error 136 (XSyncBadFence) request 134 minor 17 serial 157168 ("BadFence")
[   144.66 ] error 9 (BadDrawable) request 14 minor 0 serial 157170 ("BadDrawable (invalid Pixmap or Window parameter)")
glx_bind_pixmap(0x0480515e): Failed to query Pixmap info.
win_paint_win(0x0120034e): Failed to bind texture. Expect troubles.
win_paint_win(0x0120034e): Missing painting data. This is a bad sign.
[   144.66 ] error 4 (BadPixmap) request 54 minor 0 serial 157442 ("BadPixmap (invalid Pixmap parameter)")
[   144.66 ] error 4 (BadPixmap) request 54 minor 0 serial 157451 ("BadPixmap (invalid Pixmap parameter)")

EDIT: The tearing appears even without using buffer-age.

@richardgv
Copy link
Collaborator

Thanks for testing, firstly!

Heh, sorry, I simply couldn't get x11_sync_object working. It always blocks whenever I wait on an imported fence, with either glWaitSync() or glClientWaitSync(), and even with a debug context the driver says nothing about an error. Might need to ask nVidia about this...

Here's a revised patch. Please clean up the remaining of the first patch before applying it (git reset HEAD --hard). I cached most XSync fence objects so that the delay is reduced, which might be helpful for @tchebb 's tearing issue. (But I do not understand why the previous patch caused tearing, VSync is done right before the last paint, and anything before the point shouldn't affect it.): https://gist.github.com/richardgv/9591390

Now the main issue that prevents it from being merged is I don't know how should the commandline switches enabling the feature be named. :-D xr_sync() uses two things to do the synchronization, X Sync fence and a simple XSync() call. I wish to at least make it possible to enable only the XSync() call but not the XSync fence (because it may not be available on many systems), so I need two switch names.

@tchebb:

The BadDrawable and XSyncBadFence shouldn't be very harmful. With the new patch I don't think there would be so many errors anymore, but I forgot one thing that may introduce those harmless errors and will fix it later. I don't get why you are getting BadPixmap, though.

richardgv added a commit that referenced this issue Mar 17, 2014
- Add --xrender-sync{,-fence} to deal with redraw lag issue on GLX
  backend. --xrender-sync-fence requires a sufficiently new xorg-server
  and libXext. NO_XSYNC=1 may be used to disable it at compile time.
  Thanks to tchebb for reporting and everybody else for testing. (#181)

- A bit code clean-up. Replace a few XSync() with XFlush() to minimize
  the latency.
@richardgv
Copy link
Collaborator

Finally I pushed it to richardgv-dev branch. Slightly revised to minimize useless synchronizations. Use --xrender-sync to enable XSync() and --xrender-sync-fence to enable fence sync as well. I hope I didn't accidentally break it in my modifications.

Now still looking into x11_sync_object...

@ghost
Copy link

ghost commented Mar 17, 2014

Hey, this seems to work, but only if I start compton with these two options as arguments; if I place them in the config, they don't work…

Works:

compton --xrender-sync --xrender-sync-fence

Doesn't work:

xrender-sync = true;
xrender-sync-fence = true;

Am I doing something wrong here or is that just not implemented yet?

@richardgv
Copy link
Collaborator

@cju:

Nope, it's not available as a configuration file option yet. I'm unsure if the switch name is appropriate -- It isn't only synchronizing X Render calls (but core X drawing calls as well) but I don't know a better name. And, does it work if you only enable --xrender-sync but not --xrender-sync-fence?

By the way, I got no progress on x11_sync_object. Still freezing compton.

@ghost
Copy link

ghost commented Mar 18, 2014

Ok, thank you. I'm fine with the name, but if you like to change it, why don't you call it just xsync and xsync-fence? That would be a bit more generic...

No, doesn't seem to work with --xrender-sync only, I have to enable both options to get rid of all lagging.

@richardgv
Copy link
Collaborator

@cju:

Ok, thank you. I'm fine with the name, but if you like to change it, why don't you call it just xsync and xsync-fence? That would be a bit more generic...

XSync is both the name of a core Xlib call and a X extension. Sounds more obscure...

No, doesn't seem to work with --xrender-sync only, I have to enable both options to get rid of all lagging.

Oh, I see. Thanks! don't understand why other GL compositors don't have the problem...

@ghost
Copy link

ghost commented Mar 18, 2014

XSync is both the name of a core Xlib call and a X extension. Sounds more obscure...

I agree that the term xrender perhaps should be left out of these two options, because users would likely interpret this just as a xrender-specific option. Hmmm… You could call it x-sync and x-sync-fence or just sync and sync-fence or something like that.

@richardgv
Copy link
Collaborator

I agree that the term xrender perhaps should be left out of these two options, because users would likely interpret this just as a xrender-specific option. Hmmm… You could call it x-sync and x-sync-fence or just sync and sync-fence or something like that.

The primary purpose is indeed to sync X Render draw calls other programs made, so xrender-sync does make some sense. Your x-sync might indicate X Synchronization extension. Heh, I'm feeling lazy. Let's just keep it that way. :-D

@ghost
Copy link

ghost commented Mar 20, 2014

As I said before, these were just completely spontaneous suggestions, so go ahead with whatever you like. The main thing is that it's working after all (also via the config). :-)

Quick question: If resolving the lag problem works only with both options enabled, maybe you could combine them? Or is there another use-case for using only xrender-sync without also using xrender-sync-fence?

@richardgv
Copy link
Collaborator

Quick question: If resolving the lag problem works only with both options enabled, maybe you could combine them? Or is there another use-case for using only xrender-sync without also using xrender-sync-fence?

Yes, it would be possible, but --xrender-sync works when X Sync extension is not present or its version is too low, and this might be helpful... Probably for somebody in the future. Anyway, since --xrender-sync-fence implies --xrender-sync, the two separate options won't break your keyboard much faster, I suppose. :-D

@ghost
Copy link

ghost commented Mar 23, 2014

Ok, thanks for all your efforts. So nevertheless: Can you please make xrender-sync and xrender-sync-fence callable via the config file? That'd be the icing on the cake. :-D

@richardgv
Copy link
Collaborator

@cju:

Done in richardgv-dev branch. A bit late, though.

I got x11_sync_object working as well, but it didn't help for xr-glx-hybrid backend, for some reasons... It could be used to minimize roundtrips to X (thus increase compton's FPS), though I'm not very interested in it.

@ghost
Copy link

ghost commented Mar 27, 2014

Very nice... Thanks a lot.

evanpurkhiser referenced this issue in evanpurkhiser/dots-personal Apr 7, 2014
yshui added a commit to yshui/picom that referenced this issue Oct 28, 2018
This was a dubious "fix" for a Nvidia driver problem. The problem was
never fully understood, and the then developers took a shotgun approach
and implemented xsync fences as a fix. Which somehow fixed the problem.
Again, I don't see any indication that the developers understood why
this "fix" worked.

(for details, see chjj/compton#152 and chjj/compton#181)

The driver problem should have been fixed almost 5 years ago. So this
shouldn't be needed anymore. In addition the way compton uses xsync fences
is apparently wrong according to the xsync spec (fences are attached to
screen, but compton uses them as if they were attached to drawables).

So, I will try removing it and see if anyone will complain. If there are
real concrete reasons why fences are needed, it will be brought back.

Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants