Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve input latency #673

Closed
jwilm opened this issue Jul 18, 2017 · 36 comments
Closed

Improve input latency #673

jwilm opened this issue Jul 18, 2017 · 36 comments

Comments

@jwilm
Copy link
Collaborator

@jwilm jwilm commented Jul 18, 2017

First, for background, please see https://danluu.com/term-latency/

In the case of Alacritty, we have a worst-case latency (assuming reasonable draw times) of 3 VBLANK intervals. Here's the scenario:

  1. Input arrives just after a VBI, and a latency of VBI - draw_time is added
  2. After swapping buffers, input is handled. The key press event is sent to the program running within Alacritty, but a response won't be available until after the next VBI. One more VBI of latency is added.
  3. After swapping buffers again, the shell (for example) would have told the terminal to display a character. After drawing, we must wait one more VBI to see the result.

In total, that's 3 * VBI - draw_time. In a perfect world, draw_time is zero, and our worst case input latency is 3 VBI.

This can be resolved by moving the rendering to a separate thread. Certain windowing APIs require input processing to occur on the main thread, so input processing must stay in place. In the same scenario as described above, we can reduce the worst case to 2 VBLANK intervals. With input processing on its own thread, it no longer needs to wait for swap_buffers to return. Key press events can be sent to the terminal immediately, which means any drawing the child program does will be available to draw on the very next frame.

@restfuladi
Copy link

@restfuladi restfuladi commented Aug 13, 2017

Would something like Nvidia's Fast Sync help in this case?

@osa1
Copy link

@osa1 osa1 commented Sep 19, 2018

I wrote this on the /r/rust thread and someone asked me to file a ticket but I thought a comment in the existing ticket may be better, so here it is.

On my setup there's noticeable input lag in Alacritty, compared to other terminals like st and konsole. I don't know how to measure it and I'm happy to help with getting some numbers/debugging/profiling etc.

Here's how I feel the lag: when I keep a key pressed so that it repeats (e.g. arrow keys or hjkl in vim), after I stop pressing the key the key repeats one or more times, whereas in other terminals key repeat immediately stops. This makes alacritty feel like it's coming behind my actual key presses (as if key press events are waiting to be handled but alacritty is not fast enough, so even after I stop pressing it handles old key press events).

My key repeat settings: 260ms repeat delay and repeat speed of 55 key strokes per second (xset r rate 260 55).

Secondly, when typing I notice that it takes slightly more time in alacritty to see the letters appear. But this isn't as serious issue as and I'd probably get used to this if the other issue is fixed.

The problem with performance and latency problems is that everything is fast enough until you see something faster (e.g. 30 FPS in games was acceptable years ago, now anything below 60 FPS seems laggy), so it's hard to talk about these issues. Let me know if I can provide anything to diagnose the problem.

@jwilm
Copy link
Collaborator Author

@jwilm jwilm commented Sep 19, 2018

@osa1 thanks for posting this feedback here, this is really helpful. Can you share which window manager and graphics driver you're using? One final question, may we ping you when there's patches ready for evaluation to see if they address the problem?

@osa1
Copy link

@osa1 osa1 commented Sep 19, 2018

Can you share which window manager and graphics driver you're using?

I have two systems and I can observe this in both. Both systems are Xubuntu
18.04 running i3 as WM. This is what I have on my laptop:

~ $ sudo lshw -c video
  *-display
       description: 3D controller
       product: GM107M [GeForce GTX 960M]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:01:00.0
       version: a2
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress bus_master cap_list rom
       configuration: driver=nouveau latency=0
       resources: irq:137 memory:dc000000-dcffffff memory:b0000000-bfffffff memory:c0000000-c1ffffff ioport:e000(size=128) memory:dd000000-dd07ffff
  *-display
       description: VGA compatible controller
       product: HD Graphics 530
       vendor: Intel Corporation
       physical id: 2
       bus info: pci@0000:00:02.0
       version: 06
       width: 64 bits
       clock: 33MHz
       capabilities: pciexpress msi pm vga_controller bus_master cap_list rom
       configuration: driver=i915 latency=0
       resources: irq:126 memory:db000000-dbffffff memory:70000000-7fffffff ioport:f000(size=64) memory:c0000-dffff

I don't have access to my desktop right now but I can provide output of the
same command tomorrow. It has GTX 1080 and uses the driver installed by
Ubuntu's driver manager.
Here's my desktop:

~ $ sudo lshw -c video
  *-display
       description: VGA compatible controller
       product: GP104 [GeForce GTX 1080]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:01:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
       configuration: driver=nvidia latency=0
       resources: irq:132 memory:de000000-deffffff memory:c0000000-cfffffff memory:d0000000-d1ffffff ioport:e000(size=128) memory:c0000-dffff

One thing I can add is that the latency is much more significant on my laptop
than on my desktop (probably because my desktop is much faster both CPU and
GPU-wise).

One final question, may we ping you when there's patches ready for evaluation
to see if they address the problem?

Of course! Let me know if there's anything I can help with.

@sstewartgallus
Copy link

@sstewartgallus sstewartgallus commented Sep 20, 2018

Just so you know X11 locking is kind of a pain so you might not want to go the path of spawning a separate thread. This probably isn't a good idea for typical applications but one trick I've found is that it is perfectly reasonable to open a separate connection specifically for rendering on a different thread. The X server is still single-threaded of course but it pretty much handles locking automatically. It also means you can have a conventional blocking input loop and a separately rendering loop. This is pretty much mostly only useful for games though. Another improvement is that the different threads can use different X libraries. As I recall OpenGL requires the use of Xlib but the separate thread (or process is even possible) can entirely avoid Xlib.

@MarcoPolo
Copy link

@MarcoPolo MarcoPolo commented Oct 11, 2018

Hi! Just wanted to add my experience here too.

MacOS MBP 2015 running a monitor at 3360 x 1890
terminal.app is pretty responsive and Alacritty has a very noticeable lag.
Test case: Just typing into a blank prompt (default settings with blank zsh config) while terminal is full screen.

Lag is not bad when alacritty is smaller.

also happy to try out any patches or help any other way. Thanks!
Attached Recordings:
Alacritty:
alacritty

Terminal.app

terminal-app

@chrisduerr
Copy link
Collaborator

@chrisduerr chrisduerr commented Oct 11, 2018

@MarcoPolo While this is related to this issue, what you're showing off looks like an actual bug somewhere. The typing performance in Alacritty shouldn't be noticeably bad, this is just for improving what is already good.

I'm not using macOS, but I'd be interested in knowing which branch you run and if you are actually running with a dedicated GPU. This might be related to #1348, so I'd be interested to see what your benchmark times look like. If they are extremely bad like shown in that issue, it's probably best to follow up there.

@MarcoPolo
Copy link

@MarcoPolo MarcoPolo commented Oct 11, 2018

Built from master (commit: d2ed015)
Integrated gpu (Intel Iris Graphics 6100 1536 MB)

Just ran the same benchmark:

cargo run --release -- -b 50000000 -w $(tput cols) -h $(tput lines) -c alt-screen-random-write > out.vte
> time cat out.vte

On Terminal.app (the snappier one):
cat out.vte 0.00s user 0.46s system 15% cpu 2.977 total

On Alacritty:
cat out.vte 0.00s user 0.58s system 36% cpu 1.591 total

So even though Alacritty feels slower the benchmark is faster. So my guess is it's not related to that issue. Is there another branch I should try? thanks!

@chrisduerr
Copy link
Collaborator

@chrisduerr chrisduerr commented Oct 11, 2018

Yeah, that looks like it's actual input latency rather than rendering issues or similar.

You could try the #1403 PR, that's the only other branch which has a chance of improving this situation.

@MarcoPolo
Copy link

@MarcoPolo MarcoPolo commented Oct 11, 2018

Just tried it out. Feels about the same. I think this might be an issue related to dpi scaling(?)

alacritty feels snappier in my monitor's native resolution of 3840 x 2160 and much laggier in the scaled resolution of 3360 x 1890

display settings screen for reference:
screen shot 2018-10-11 at 3 44 36 pm

Would you like me to create a separate issue?

@chrisduerr
Copy link
Collaborator

@chrisduerr chrisduerr commented Oct 11, 2018

Yeah, I would recommend taking this to a separate issue.

@restfuladi
Copy link

@restfuladi restfuladi commented Oct 12, 2018

@SoniEx2
Copy link

@SoniEx2 SoniEx2 commented Jan 27, 2019

I guess one could provide an "input latency hack" where input is sent to the GPU immediately. this gives 1 VBI of latency but may cause input to pop in and out with some programs.

sudo and things like it would still work fine because it emits control codes to hide input, but this is probably best as an opt-in rather than a default.

@notriddle
Copy link

@notriddle notriddle commented Jan 27, 2019

The problem is that libreadline applications, like bash, turn off the terminal's built-in echoing support just like sudo does. So you either get jank from sudo, or the single most popular terminal application hits the slow path.

@SoniEx2
Copy link

@SoniEx2 SoniEx2 commented Jan 27, 2019

hm. what the hell does mosh do then.

@notriddle
Copy link

@notriddle notriddle commented Jan 27, 2019

Ugly, ugly heuristics. The sort of thing that should not be necessary on a local machine.

The other major benefit of working at the terminal-emulation layer is that the Mosh client is free to scribble on the local screen without lasting consequence. We use this to implement intelligent local echo. The client runs a predictive model in the background of the server's behavior, hypothesizing that each keystroke will be echoed at the cursor location and that the backspace and left- and right-arrow keys will have their traditional effect. But only when a prediction is confirmed by the server are these effects actually shown to the user. (In addition, by default predictions are only displayed on high-delay connections or during a network “glitch.”) Predictions are done in epochs: when the user does something that might alter the echo behavior — like hit ESC or carriage return or an up- or down-arrow — Mosh goes back into making background predictions until a prediction from the new batch can be confirmed as correct.

@SoniEx2
Copy link

@SoniEx2 SoniEx2 commented Jan 27, 2019

What happens if you do it on a local machine?

@notriddle
Copy link

@notriddle notriddle commented Jan 27, 2019

But only when a prediction is confirmed by the server are these effects actually shown to the user. (In addition, by default predictions are only displayed on high-delay connections or during a network “glitch.”)

In other words, mosh doesn't do the predictive echoing all the time, because it wants to avoid stray characters where it gets the prediction wrong. Assuming you went ahead and did predictive echoing even though local terminal's should never hit mosh's high-latency trigger limit:

  • It would not be able to do it until after the first keypress, because it needs to confirm it first
  • It is split by kind, so after a bunch of 1VBI letter presses, you get a 2VBI backspace
  • It gets reset every time you enter a special key (Enter, Escape, Ctrl-*)
  • It will still sometimes be wrong, so you get a glitch

Accepting the possibility of glitch letters appearing just to decrease latency by 1/60th of a second under very specific circumstances doesn't seem worth it.

@mstoeckl
Copy link
Contributor

@mstoeckl mstoeckl commented Jan 31, 2019

I bring measurements! Of output latency!

(Using https://github.com/mstoeckl/latencytool, a 187Hz camera, 60Hz QHD display, and uncomposited X11. This may not be a realistic benchmark, but it is very easy to perform. Expect ±10 ms uncertainty for the 99% percentile, and ±3 ms on averages.)

With a small (400x400px window), some timings between \e[47m\e[2J (or \e[40m\e[2J) and the time it takes for the screen to cross a specific brightness threshold:

Terminal Average 99th %ile
Alacritty 26 ms 44 ms
Kitty 43 ms 60 ms
Konsole 42 ms 46 ms
st 27 ms 47 ms
Termite 55 ms 63 ms
xterm 25 ms 28 ms

With a full screen (2560x1401) window, same environment/lighting as above:

Terminal Average 99th %ile
Alacritty 53 ms 81 ms

Slightly modified full screen testing environment, (2560x1401):

Terminal Average 99th %ile
Alacritty 46 ms 63 ms
Kitty 43 ms 64 ms
Konsole 68 ms 80 ms
st 33 ms 48 ms
Termite 67 ms 75 ms
xterm 25 ms 32 ms

On sway, full screen, there is no significant difference relative to X11:

Terminal Average 99th %ile
Alacritty 44 ms 56 ms

With an [unoptimized + debuginfo] build:

Terminal Average 99th %ile
Alacritty 76 ms 110 ms

Why this performance reduction? Just to guess, the shaders are too complicated, and my underclocked GPU can't keep up. (Compare with a fragment shader that only color = mix(bg, fg, texture(mask, texcoords)), and a vertex shader that only passes through flat fg, bg, and precalculated texcoords. ) Edit 4: This was a bad guess, parsing is CPU limited; rendering doesn't seem to take more than 5 msec of GPU time, even at full screen.

Edit: Updated with more fullscreen measurements. Font size choices can make a big difference, and some terminal emulators may have optimized clear screen (\e[2J) operations. If the results seem counterintuitive, then keep in mind the uncertainty of these measurements.

Edit 2: The unoptimized build is surprisingly slow. On clear screen, it calls a function per grid cell; for comparison, xterm's clear screen operation is almost a memset, and can perform ~1200 full screen clears per second. (2300 fps is the theoretical maximum, at a 25.6 GB/s data transfer rate.)

Edit 3: For alacritty, the dark->light transitions of the test take slightly longer than the light->dark transitions. After looking with apitrace, it turns out since my background color is black, almost all Cells have is_empty yield true, and are not drawn.

Edit 5: In the test above, Kitty was run with the default double buffering, 10ms output delay, and 3ms input delay. (A power-saving policy.) Switching to single buffering, and 0 ms input/output delay, I observe 27-30ms full screen average render times, with a 99th %ile of ~45ms. It remains unexplained why alacritty requires so long to render with full screens, in comparison.

@chrisduerr
Copy link
Collaborator

@chrisduerr chrisduerr commented Jan 31, 2019

@mstoeckl Thanks a ton for looking into this. It's always great to have some more benchmarks especially in areas that are hard to automate.

Just out of curiosity, are you running an AMD or Nvidia GPU? Because there are some performance-relevant workarounds in the renderer which might affect this.

@jwilm
Copy link
Collaborator Author

@jwilm jwilm commented Jan 31, 2019

@mstoeckl seconding the massive thank you! Really interesting data you've gathered here. Out of curiosity, do you have a similar table comparing terminals at full screen window size?

Separately, did you mean to link the Kitty shaders in your second link?

@mstoeckl
Copy link
Contributor

@mstoeckl mstoeckl commented Jan 31, 2019

Just out of curiosity, are you running an AMD or Nvidia GPU? Because there are some performance-relevant workarounds in the renderer which might affect this.

Intel HD5500, i915, UXA, X11, i3. Due to relatively long intervals between color switches (~100ms), the GPU was, AFAIK, sitting at 450 MHz for all the tests.

Out of curiosity, do you have a similar table comparing terminals at full screen window size?

I can make one, but that will need to wait until the weekend. As most terminals do partial updates for tasks in which low latency is desired, full screen tests are not as useful. (I do have similar data for application toolkit latency -- with the most efficient methods (xcb, framebuffer), a full screen color switch has 25 ms average latency between command and camera measurement.

Separately, did you mean to link the Kitty shaders in your second link?

As an example of complicated shaders :-) Running hexdump </dev/urandom, and watching engine busyness with (the recent update of) intel_gpu_top implies kitty's rendering is roughly twice as expensive, per area, as that of alacritty. (The proper tool to measure this would be a GPU profiler, but I don't have one set up yet.)

@mstoeckl
Copy link
Contributor

@mstoeckl mstoeckl commented Feb 3, 2019

I have a patch that reduces the total amount of work done by the GPU, on my computer, for realistic inputs, by about 50%. This has minor latency impact (maybe 0.5ms ?). (My graphics system performs the fragment discard in text.f.glsl as late as possible, so rendering no background is about as expensive as rendering a background. On programs like top, ccmake, kismet, yast2, etc, most cells with background have no text, and most cells with text have no background. The following patch neglects to send discarded background rectangles and empty glyphs.)

0001-Distinct-render-batches-for-background-and-text.patch.txt

I'm posting this here because I don't have expect to have the experience/motivation/time to make a proper changeset in the near future. If anyone wants to pick this up, the following things are advised:

  • The rect shader should perform the background rendering, and the foreground/background batches should be further disentangled
  • Use run-length-encoding or some other heuristic to reduce the total number of background rectangles transferred.
@mstoeckl
Copy link
Contributor

@mstoeckl mstoeckl commented Feb 10, 2019

Using single buffering reduces average time-to-camera on my screen by ~3 ms, for a full screen window. The time between sending \e[2J and having single-pixel screenshots on X11 indicate a color change is far less variable with single-buffering; the 99th percentile (computed over ~2000 samples) is 27ms with double buffering and 19ms with single buffering. (Kitty, full-screen, when single buffering is enabled, and input/output delay is set to 0, has a 99th percentile time of 8ms. [Its 99.9th percentile, on the other hand, is close to that of Alacritty.])

@anarcat
Copy link

@anarcat anarcat commented Apr 9, 2019

anecdotally, i see that 0.3.0 hasn't significantly improved since I last looked at this:

uxterm-custom VS alacritty-0 3 VS urxvt-custom

still within the average of most terminals though - nothing to be ashamed of, but it would be real nice to get below that 10ms threshold, although in my experiments I found it was really hard if not impossible the second a compositor steps in with the double-buffering - even xterm fails to get below 10ms then.

@dm17
Copy link

@dm17 dm17 commented Dec 16, 2019

Has anyone reproduced these benchmarks with the latest Alacritty?
https://lwn.net/Articles/751763/

@chrischen
Copy link

@chrischen chrischen commented May 3, 2020

Just want to add that comparing latest kitty to latest alacritty, kitty is noticeably more responsive for me when typing and it seems they default to 100 fps. However it's not always desirable to do this (iTerm can turn off GPU rendering on battery).

One way to make the lag more obvious is to use your trackpad to scroll up and down really fast. Doing this on a macbook you can feel that kitty is definitely more responsive.

Also not sure if related, but starting nvim you can see the background fill from top to bottom, whereas on kitty it instantly appears.

(On a 2.4 ghz core i9 macbook, whether on intel integrated or AMD 5500M)

@kchibisov
Copy link
Member

@kchibisov kchibisov commented May 3, 2020

Oh, macOS, I really wonder what is going on on macOS with input, since it's just slow compared to Wayland/X11. One thing that could help is to schedule frames to render closer to vblank, but there could be issues in our windowing stack, which makes it slow. Maybe we're using something that macOS don't like and it getting even slower.

@restfuladi
Copy link

@restfuladi restfuladi commented May 4, 2020

What is terminal.app doing to achieve such low latency? Are they using custom hooks like the Windows hardware cursor that bypasses the display manager? Its throughput/framerate can't touch Alactitty but the typing experience feels much better.

@rtfeldman
Copy link

@rtfeldman rtfeldman commented Jun 3, 2020

we have a worst-case latency (assuming reasonable draw times) of 3 VBLANK intervals

For what it's worth, I've been told that the way browsers minimize low latency for text input is to skip double buffering (for text input) and draw immediately without waiting for vsync.

(For animations they do wait for vsync, to avoid tearing.)

One reason other terminal emulators may have lower input latency is that they aren't going out of their way to double buffer, and since they aren't using OpenGL, they aren't getting double buffering by default either.

If tearing doesn't seem to be a problem for them in practice, it probably wouldn't be for Alacritty either!

I don't know how OpenGL controls those settings, but I have a side project using wgpu and if I change PresentMode between Immediate and any of the modes that wait for vsync, the input latency difference is instantly noticeably slower when waiting for vsync, but neither one exhibits tearing (for text inputs at least).

@aruhier
Copy link

@aruhier aruhier commented Jul 12, 2020

I did a PR to explicitly disable vsync: #3955
It only has an impact on non wayland systems, as wayland is not waiting for vsync.

I didn't do any benchmark though, please be free to test it!

@chrisduerr
Copy link
Collaborator

@chrisduerr chrisduerr commented Jul 15, 2020

Closing in favor of #3972 which provides a concrete solution to the problem.

While there always might be some latency issues, I think after resolving #3972 all the low-ish hanging fruit should be done with and we should look more at specific problems rather than just the general "improve input latency" (which would never be "done").

@chrisduerr chrisduerr closed this Jul 15, 2020
@kchibisov kchibisov unpinned this issue Jul 15, 2020
@anarcat
Copy link

@anarcat anarcat commented Jul 15, 2020

will #3972 provide some sort of a hard upper limit to latency?

@aruhier
Copy link

@aruhier aruhier commented Jul 15, 2020

will #3972 provide some sort of a hard upper limit to latency?

I don't think so, as with the measurements I did when disabling the vsync (in PR #3955), you could still see some spikes in the latency. But it would allow to have a lower average latency, mostly by saving one frame.

@bb010g
Copy link

@bb010g bb010g commented Jul 17, 2020

@chrisduerr Would an input latency tracking issue be alright, to gather relevant issues & PRs over input latency and be edited with new issue/pull numbers over time? (Or could this issue be converted into a tracking issue, if it's not too cluttered?)

@chrisduerr
Copy link
Collaborator

@chrisduerr chrisduerr commented Jul 17, 2020

As I've just stated, no, that would not be alright.

There aren't a lot of things left to do when it comes to input latency, so it makes no point to keep open an issue indefinitely just to track something that doesn't have any actual fix because it is way too vague.

If you want to follow an issue, look at #3972, if you have concrete issues after that is fixed, you should open a specific issue outlining the problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.