Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.9] Weird scrolling in, for example Firefox #2577

Closed
AlanGriffiths opened this issue Aug 24, 2022 · 11 comments · Fixed by #2590 or #2562
Closed

[2.9] Weird scrolling in, for example Firefox #2577

AlanGriffiths opened this issue Aug 24, 2022 · 11 comments · Fixed by #2590 or #2562

Comments

@AlanGriffiths
Copy link
Contributor

AlanGriffiths commented Aug 24, 2022

  1. Start firefox/Wayland on a webpage large enough to scroll
  2. Use the scroll wheel to scroll up or down

Expect: can scroll to the top and bottom
Actual: scrolling stops before reaching the top/bottom

Observation: on the first scroll event in the opposite direction the scrolling does reach the top/bottom. This was obvious when tracking the events in a terminal window with:

WAYLAND_DEBUG=client firefox 2>&1 | grep axis_discrete

This is intermittent, but seems consistent within a run of a Mir server. Restarting the server "rolls the dice" again with around a 65% chance of seeing problems.

@AlanGriffiths
Copy link
Contributor Author

Comparing with Mutter, there's a slight difference in the order of wl_pointer events within the frame, but empirically changing the order has no effect. (Which is obviously how it should be.)

@wmww
Copy link
Contributor

wmww commented Aug 25, 2022

I had to do some work on wayland-debug, but eventually I was able to get a nice clean event stream.

My observations:

  • I've analyzed diffs of protcol messages between the good and bad runs. Nothing stands out
  • I often see it on Mir-on-X, but never with libinput. Saviq reports seeing it in a Miriway session, presumably on libinput
  • Looking at how FF is acting, it appears that events are arriving late. This would explain both the wrong direction scrolls, and the lack of smooth scrolling (if FF notices it's processing events late, maybe it doesn't let them animate?)
  • One possibility is something is wrong with the timestamps? Just speculation

@wmww
Copy link
Contributor

wmww commented Aug 25, 2022

Oh, I should upload the logs I got in case that's helpful. They're from a patched libwayland so if you wanted to use wayland-debug to analyze these you'd have to use this branch.

ff-bad-scroll.log
ff-good-scroll.log

@AlanGriffiths
Copy link
Contributor Author

Sorry for the lack of clarity, this was first observed on gbm-kms/libwayland. That you also see it on the X11 backend is interesting as that doesn't do the discrete/valeu120 stuff (yet).

I don't think the events are arriving significantly late, watching them logged on the client side (as described above) it seem more that the client is responding to the previous event, not that the previous event arrive late.

Anyway, thanks for the observations they all help towards identifying the problem!

@AlanGriffiths
Copy link
Contributor Author

Actually, obviously it isn't as simple as "respond to the previous event" as there can be multiple idempotent "down" events before an "up" causes the fully scrolled content to paint.

@AlanGriffiths
Copy link
Contributor Author

AlanGriffiths commented Aug 25, 2022

Some further observations:

  1. This problem also exists in Mir 2.8 (tested with egmde/stable)
  2. It isn't "consistent within a run of a Mir server", it comes and goes even within a run of Firefox
  3. Scrolling from the touchpad works fine

@wmww
Copy link
Contributor

wmww commented Aug 25, 2022

It appears it does seem to be related to timestamps. Specifically, I can trigger it reliably on all platforms by adding 100ms to the timestamp returned by mir_input_event_get_wayland_timestamp(). Returning the current std::chrono::steady_clock time fixes the problem (though probably isn't what we want to do).

Subtracting 100ms fixes the jank, but also makes smooth scrolling jump a little bit. Subtracting 500ms entirely disables smooth scrolling. From this data I conclude that Firefox is (probably accidentally) assuming the timestamps it's getting match it's internal clock in the smooth scrolling animation code.

@wmww
Copy link
Contributor

wmww commented Aug 25, 2022

Oh, the reason you might not be seeing this on the X11 platform but I have been is that I've been testing with #2579. Without that, all X11 scroll events are seen as non-discrete.

@wmww
Copy link
Contributor

wmww commented Aug 25, 2022

I think I have a pretty clear idea what's going on now. Platforms send events with timestamps. Mir also sends timestamps (for example in the frame done event). Mir wants to pass on the timestamps from the platform because, for example, a client might want to calculate how long a click was, even if one of the events got slowed down before Mir saw it. Mir doesn't want to trust the the platform (and presumably underlying library like libinput and XCB) are using the same timestamp base as Mir itself.

To solve this problem, the first time a DefaultEventBuilder sees a timestamp it records it's offset from Mir's current time. It then applies that offset to all future events.

The problem is that often the first few events are delayed in transit, but after that they arrive in a timely manor. This causes Mir to register the first few events as the correct time, and subsequent events as coming from the future. We forward these future events to Firefox, Firefox tries to use future times as a basis for it's animation code, weird results ensue.

There's a few possibilities as to how to fix it. In practice, everything seems to the same timestamp base so entirely removing the calibration would fix it. We could also continually adjust the calibration every time a higher-than expected timestamp comes in, guaranteeing that no future timestamps are ever sent. We could do a hybrid approach where if the timestamps are close enough to the system's default time they're assumed to have the same base, but otherwise an offset is added. I need to do more research before having an opinion on which option is best.

@AlanGriffiths
Copy link
Contributor Author

Thanks @wmww for tracking that down. That explanation sounds both convoluted and plausible.

If the events have timestamps why is Mir messing with them at all? Isn't the simplest thing that might work to just pass them on?

@wmww
Copy link
Contributor

wmww commented Aug 26, 2022

That is the simplest thing and it does seem to work. The problem is that since we mix in timestamps generated by Mir (and even clients in the case of the virtual pointer and keyboard protocols), we need all the timestamps to match. In practice everyone seems to be getting the same timestamps, but the problem is no one is guaranteeing they use a particular source. In the case of the virtual input device wayland protocols the base is explicitly undefined.

@bors bors bot closed this as completed in 8c562d7 Aug 26, 2022
AlanGriffiths pushed a commit that referenced this issue Aug 26, 2022
2590: Fix timestamp calibration  r=AlanGriffiths a=wmww

Fixes #2577

Prevent timestamps in the future ever being sent, and use unmodified timestamps if they're close to Mir's internal timestamps.

Co-authored-by: Sophie Winter <wm@wmww.sh>
@Saviq Saviq mentioned this issue Aug 26, 2022
bors bot added a commit that referenced this issue Aug 30, 2022
2562: Release 2.9.0 r=AlanGriffiths a=AlanGriffiths

## ABI summary:
* miral ABI bumped to 5
* mircommon ABI unchanged at 9
* mircookie ABI unchanged at 2
* mircore ABI bumped to 2
* miroil ABI bumped to 2
* mirplatform ABI unchanged at 23
* mirserver ABI unchanged at 58
* mirwayland ABI unchanged at 3
* mirplatformgraphics ABI unchanged at 20
* mirinputplatform ABI unchanged at 8
## Enhancements:
* [Wayland] Implement zwp_idle_inhibit_manager_v1
* [Wayland] Implement zwlr_virtual_pointer_v1
* [Wayland] Implement zwp_text_input_manager_v1 (Electron works with OSK)
* [Wayland] Bump wl_seat to v8 and implement hi-res scrolling (Fixes: #2176, Fixes: #2499)
* [Wayland platform] improve failed to connect error
* [Wayland platform] Port to xdg-shell (Fixes #1903, Fixes: #2434)
* [MirAL] Allow `--add-wayland-extenions all`
* [MirAL] Allow servers to get repeated string options
* [MirAL] Improvement to ExternalClientLauncher: Don't force clients to split command lines themselves
* [MirAL] Tidy up event filtering API
* [MirAL] Expose miral::Zone::id()
* [Input] Filter 2 distinct bogus touch event scenarios (UBports)
* [gbm-kms] new driver quirk to disable KMS modeset probe
* [gbm-kms] Add defaults to driver-quirks for nvidia and evdi (Fixes: #2467)
* [gbm-kms] Add defaults to driver-quirks for vc4-drm and v3d
* [mir-smoke-test-runner] Enable working in a Wayland only environment
## Bugs fixed:
* Fix ABI breakage of libmircore.so.1 with v2.8.0 vs v1.8.2 (Fixes #2465)
* [Wayland] Text Input v2: do not use commit count as serial
* [Wayland] Send keyboard modifiers after keyboard enter (Fixes: #2535, Fixes: #2025)
* [Wayland] wlr-screencopy-v1: send .damage event as required (wayvnc fix)
* [Wayland] wlr-screencopy-v1: Waits until the copy area has been damaged in `.copy_with_damage` request (wayvnc fix)
* [MirAL] Unblock signals before execing child processes (Fixes: #1284)
* [eglstream-kms] Kill clients, not Mir, when they submit bad EGLStreams (Fixes: #2061)
* [eglstream-kms] Handle EGL errors in devnum_for_device (Fixes: #2426)
* [test clients] Explicitly ask for a GLESv2 context (Fixes: #2440)
* [renderers/gl] Clear framebuffer to opaque black (Fixes: #2427)
* CMake cleanup for locally built dependencies (Fixes: #2507, Fixes: #2261)
* Fix event timestamps (Fixes: #2577)

Co-authored-by: Alan Griffiths <alan@octopull.co.uk>
Co-authored-by: bors[bot] <26634292+bors[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants