New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression in scrollback #1316
Comments
It appears these regressions were introduced in be89dbf and are system-dependent. According to jwilm no such performance issues can be measured on X11/i3/no compton/nvidia. |
It seems this strongly depends on Here's another quick comparison:
Important Note: It seems like using However since the changes made in the PR had the goal to increase performance, not decrease it, even with 0 lines or 1 line, it seems like this goal was not reached. |
I've run Master: https://paste.rs/Wgq |
Seems like the main issues are and . Seems like just the fundamentals that cost a lot of time here, any ideas how to fix that @jwilm? Might be possible to use the API that makes use of SIMD optimization, so we could replace the loop that swaps every single line with a single call to swap a range of lines. The only drawback is that this will not be stabilized until Rust 1.27, but that should be in the near future and might be fine. However |
The main problem with an optimization like One thing we might consider is having a fast path swap which detects when src/dst don't wrap around underlying buffer, and a more robust swap which handles the general case. The main difference would be that in the robust case, each swap needs to compute new raw indices, whereas a fast path could just compute the limits and loop over the swapping elements. In my testing, the degraded performance seems to only affect scrolling-in-region with a large number of fixed lines. This doesn't seem to degrade tmux performance at all in the case where there is a pane on top of another, and the top pane is scrolling. In this test, which looks like this The performance is similar to
Standard BenchmarksFor these benchmarks, I am just doing one run and taking the first result. The jitter from past experience is typically +- 0.2 seconds. Scrolling-in-region 1Emulates scrolling in vim or tmux (1 fixed bottom line)
Scrolling-in-region 2
Emulates scrolling in vim in tmux (2 fixed bottom lines)
Scrolling-in-region 50This is not a common case in practice, and the fact that there's a regression here isn't a big concern, IMO.
Alt-screen random write (monochrome)
Alt-screen random write (random colors)This is going to be the closest to a standard editor workload or tmux with multiple panes in the terminal. Text is written to random positions, and with some random styling. This is essentially what we've been optimizing for since the beginning.
SummaryThere are minor regressions from master in most benchmarks. In the cases that matter like having 1 or 2 fixed lines at the bottom of the screen, scrollback scrolling is much faster than master. Based on these numbers and the feel of scrollback in real use, I don't see any reason to block merging the branch. Unless of course you're still seeing dramatically different numbers. |
As discussed on IRC the scrollback branch results in a 60%+ slow down in alt-screen-random-write for me. Since this seems to be alacritty's biggest strength I'd be wary of introducing such a regression, however it's your call to make. |
@chrisduerr I'm not 100% sure how you and @jwilm are testing performance, but it seems your numbers aren't lining up. In the benchmarking results posted directly above, the numbers for alt screen random writes are very close to master. |
@nixpulvis I don't think there's any difference in how jwilm and I are testing performance, but rather that we use different systems. I'd love to see the results others get too. From what I can tell the main difference between jwilm's and my system currently are that I'm running amd + mesa and he's running nvidia graphics. |
Ok, I'll give it a shot on a few of my systems. I might need a little hand holding, I'll just find someone on IRC if I do. |
I just went through some debugging and performance testing with @chrisduerr on IRC. And we discovered some discrepancy between timing The format of this test session is as follows.
I'm lazy and I don't want to format them as beautifully as he did however, sorry.
This somewhat clearly shows how |
Thanks @nixpulvis for the elaborate breakdown of your results. I'd be interested in seeing your results @jwilm when actually writing to a file first instead of directly timing the output of Based on the results from @nixpulvis it seems like I'm not the only person with these issues, however it would probably still be useful to get some system information from @nixpulvis just to make sure in case @jwilm can't reproduce this on his machine. It might also be interesting to see some benchmarks of this on macos. |
Relevant system info. I use X11, running
|
So MacBook Pro 13 inch Iris 550 here, some weird times, discussed with chrisduerr on irc without an idea whats going on.
some 70MB plain text file, just if that helps
|
Thanks again for responding to my call for testing this on macos so quickly @monouser7dig. The results seem to indicate that there was a regression in alacritty (~25s -> ~40s) which seems to be in line with the regressions measured so far (60-100%, but the difference between benchmarks is so big that this metric probably makes little sense here). However there clearly seems to be more at play here with alacritty losing by a significant margin to both iTerm and Kitty. So not only would I take these results with a grain of salt, I'd also recommend following up in a separate issue if anyone has ideas where this could be coming from. The only thing I have in mind would be disabling transparency, otherwise I'm not sure. @monouser7dig if you'd like to open an issue in reference to this one to explore performance issues of the current master branch of your macos machine, please feel free to do so. I'm sure there's gonna be something at play here. |
without transparency (0.97 before)
|
Transparency is always enabled on master even if opacity is |
It seems like on the scrollback branch the parser is just taxed a lot more, here's a perf result I've got: You can see here that with master (right) the distribution of the most expensive methods is very good, however with scrollback (left) there are a few very "hot" and expensive functions. I'm not going to provide the full And this should allow easy analization of the result from the command line: |
The `compute_index` method in the `Storage` struct used to normalize indices was responsible for a significant amount of the CPU time spent while running the `alt-screen-random-write` benchmark (~50%). The issue with this relatively simple method was that due to how often the method is executed, the modulo operation was too expensive. However since most of the time when a majority of the buffer changes (high througput), it's in the alternate screen, the modulo operation can be skipped with a simple branch. There might be some more optimization to be done here, however this resolved all measurable issues in the `alt-sceen-random-write` benchmark. The whole `vtebench` suite hasn't been tested, however unless there were some regressions, this fixes alacritty#1316. I will benchmark this a bit more in the near future.
You mentioned from a commit that this issue was resolved; should we close it? |
I'm not quite sure. There definitely is still a regression from master to scrollback when it comes to alt-screen-random-write. However the obvious stuff has been fixed. I'd still like to keep track of places where a potential performance improvement could be achieved, but since I haven't been able to figure out how to resolve this and the performance hit is not massive, I wouldn't block scrollback on this. So I'd either keep this open to track that or maybe open a new issue to indicate that the main issues mentioned in here have been resolved. |
I'm going to close this. Our alt-screen performance is already exceptional, and additional investment there may only return marginal improvements. |
I've just tested the scrollback PR because I wanted to make sure that scrolling is still faster than on master, and it is. However, while testing it I've noticed that there were some significant regressions comparing current master to the scrollback PR.
Test Results:
These tests are based on https://github.com/jwilm/alacritty.
I'm not sure why I haven't noticed this earlier, but it seems like these regressions have been present ever since the scrollback PR was created. Especially the regression in
scrolling-in-region
seems quite massive.These results were captured on X11/i3/compton/amd/mesa.
The text was updated successfully, but these errors were encountered: