-
Notifications
You must be signed in to change notification settings - Fork 17.4k
Multi-line editing becomes unusable with large selections #5025
Comments
Probably related to issue #2078, if not the same. |
Bringing the discussion from #5678 (that's a nice number! 😉)
@nathansobo: first off, let me thank you for your kind words. They're part of the reason why I enjoy contributing to open source: they really make my day.
Definitely 😄 Before your reply, I was collecting some benchmarks to check where the time was being spent while moving multiple cursors or typing a character on many lines. It turned out that it's exactly as you describe:
In facts, the performance bottleneck arises because we recompute everything for each change, thus wasting quite a lot of time in performing over and over the same operation (which, as you point out, can be executed at the end of a change). Your redesign of
Either choices will definitely speed things up, but I believe the first path would be more beneficial than the second. Indeed, the latter would still be a coarse grained way of handling updates, whereas I am convinced we need to be very strict on what changes we apply and when to apply them. I made some experiments with heuristics but they do not feel quite right (moreover, they need always some kind of threshold/configuration, which I'd prefer to skip wherever possible). According to the first path, I believe that the key idea is to emit some more detailed events from the model. We could do this using two techniques:
To sum up: in order to be fast, Moreover, I think it could be nice to set a goal: starting from which line would it be considered acceptable to be slow? This way we can understand whether it's fine to perform micro-optimizations, or if we need to change dramatically how things work under the hood. I'll try my best to spot what can be improved, but any suggestion is welcome and appreciated 🙇 |
I like your thinking on making all incremental updates as minimal and optimal as possible first. The whole reason I switched to the synchronous design was to make it easier to compute a minimal update, so it would definitely be good to take full advantage of that. As for emitting some kind of explicit batching event, here's where it gets fuzzy for me. We already have the concept of a transaction, but sometimes a transaction wraps only one change. We currently allow an undo grouping interval to be supplied along with transactions, so perhaps we could supply another option in specific cases indicating that view updates should be batched. My initial thought was to just kick into batch mode whenever we saw more than 2 or 3 of the same kind of event. But that's sub-optimal in that we still need to pay for the initial incremental updates only to blow them away. Being precise is probably better. Maybe you're right that we can make the incremental updates so fast that it won't matter. Seems like the right place to attack first.
Ideally, no operation in Atom is slow. I'm not sure how useful a goal that is though. As I mentioned previously, I have some ideas for improving But that is all a pretty radical redesign of the model layer... I think there's a lot of low-hanging fruit at the presenter and view layers we could address first. I'm of course open to any big design changes you have in mind if you think any are warranted, but I'm curious to see how far we can get with micro-optimizations on the existing presenter/view structure. Thanks again! |
@as-cii Another thought... I noticed that the |
That's an objective we should pursue whenever possible and, for sure, there are some events which are eligible for such improvement. On the other hand, I have made several experiments (and reasoned a bit better about the code) and it turns out that such improvements won't solve many of the repetitive updates that we currently perform. Let's take multi-line insertions as an example and pretend for a second that this code actually works: # Inside TextEditorPresenter
@model.onDidChange (event) => @updateLinesState(event.start, event.end) This would update only the rows that changed, but unfortunately it has to update subsequents lines as well if, for instance, a newline was inserted. If we imagine this happening over and over, we end up saving very little time at the cost of complicating the code unnecessarily. What I mean with this example is that, potentially, a change on a row could ripple through other rows, and that's inefficient if we consider that we may have a very large amount of such changes. Now, consider this branch: with a relatively small change, we gain a lot of speed ⚡ . I am not opening a pull-request yet because there are some issues with the core specs (a few of them do not pass, but they should be quite easy to fix) and I want to refactor it a bit.
I introduced a # Extended: Mutate the text of all the selections in a single transaction.
#
# All the changes made inside the given {Function} can be reverted with a
# single call to {::undo}.
#
# * `fn` A {Function} that will be called once for each {Selection}. The first
# argument will be a {Selection} and the second argument will be the
# {Number} index of that selection.
mutateSelectedText: (fn) ->
@transact => fn(selection, index) for selection, index in @getSelections() @nathansobo: what I like about this approach is, primarily, its non-invasiveness; very simple code for a significant boost, which could make us shift our attention to those bigger performance issues that happen at a low-level, as you illustrated in your comment (I would be happy to take part to that refactoring as well 😄). Quick side note: I have seen that a significant amount of time is spent inside
Of course, low hanging fruits like the one you're describing here must be discovered and addressed. For instance, an easy fix for that could be to simply wrap the selection code with Another neat trick could be to keep the selections ordered as we add them, so that we don't need to sort them when on merging. A I am open to any suggestions as usual and, most importantly, I'd like to hear your opinion about my proposal. Thanks 😊 |
You may be right that it's not worth the complexity, but we shouldn't need to mess with every row. I would imagine the routine looking like this:
Not sure how big a savings that represents over just scanning all the lines, but in simple cases like commenting out a block of lines, you'd only have to address 1 line for each change since the row delta would be 0.
That branch is looking good, though I do have a couple concerns. The first concern is that this mechanism of batching is unavailable to people making large numbers of changes in the model layer. True, they could get access to the view and decide explicitly to batch, but it would be nice if we could guarantee good performance without people needing to concern themselves with batching explicitly. The second concern is more minor, but we may want to protect against nested calls in some way, either by keeping a counter or just blowing up if you batch with a batch started. I'm curious how much we lose if we just always batch for every transaction. We could record what kinds of changes occurred in the model during the batch in order to minimize the state update at the end, and even keep track of event specific information to optimize our updates. That puts us closer to the original design where we always recomputed the world on every update, but we would have more flexibility to do cheap synchronous book-keeping as the changes come in from the model. We could even decide to do sync updates for certain changes and deferred update for others.
I need to do a better write-up about what I have in mind, but your collaboration would be most welcome. I want to make a pretty big change that solves many of our problems at the model layer in a unified way. The idea is to represent the buffer in several layers, each applying a specific spatial transformation such as breaking the buffer into lines, expanding hard tabs, soft-wrapping, folding etc. Each layer will maintain a counted b-tree mapping regions in the source coordinate space to regions in the target coordinate space. Markers will fit directly into this tree so that we don't have to visit and update every marker when we update the document. I'm still in the research / sketching phase of all of this, but you can check out my sketch repo and comment there. But as you can see, with such a large change in mind I'm less interested in smaller optimizations in the current model layer. But I absolutely agree that exploiting ordering guarantees in our storage format can help us a lot. Curious about your thoughts on the "always batch, but try to minimize the update" strategy. |
That's certainly true, but we would address such case anyways with batching (except for commenting out a single line, but that's extremely fast already).
Let's take As you state later in your post, I think that the "always batch, but try to minimize the updates" is the way to go: it solves a lot of performance issues via a very simple mechanism.
Actually,
In the branch I showed earlier I adopted this strategy: not every batch is a transaction (e.g. selecting lines), but a transaction always enables batching. This speeds up things significantly: I still have to perform some scientific benchmarks™, but it looks very promising.
That's a great point you bring up. Actually we could simply keep a counter as you say, so that people do not have to bother with ordering/nesting batches: everything just works even if you nest a lot of batches, as only one event will be emitted to the presenter.
This is very close to what we have currently, I believe. What we would need is just a more clever presenter, which collects information and decides what specific updates it should perform. A coarse grained way of doing this could be something like the following. For each operation below (namely, all the update operations):
If in batch mode, prevent it from doing any further work but record that is has been called. If the same operation has been called many times, remember only the last one. When batch completes, re-run recorded operations. @nathansobo: what's your position on this last routine? I am particularly interested in your concerns about batching updates being unavailable to people performing changes in the model layer. Would the batch on As for the model layer, I think it's better if I draw my attention to it once we're finished with this task. If you plan to write something about it, that's definitely welcome and appreciated. In the meantime, I'll have a look at your sketch repo as well 👍 Thanks 🙇 |
Of course, I forgot about that nuance.
We dealt with ordering in version 1 of the text editor, and it was a nightmare. Also, I ultimately would like us to move to a different approach for rendering where we break the lines into tiles that get translated into position via the GPU. Depending on ordering would be totally incompatible with that. So yeah, you're right that there's no cheap way to compute individual updates unless they are single line.
This seems great and maybe all that's needed. An additional step (not sure how valuable it would be) would be to record specific information about the nature of the model change rather than just a flag. Then you could use the specific information to make a more targeted update. We should probably start coarse grained and see if that's even needed.
I commented in #5762 in greater detail, but I think the best way to ensure optimal performance for people making changes at the model layer might be to just always batch until the animation frame fires and we know we're about to update the view. Then there would be nothing to think about at the model layer; we'd just always do the efficient thing. Let me know if I'm missing anything with that idea. |
👍 Quick aside: have you already considered to use I think this is not a priority, though: painting on screen is quite cheap compared to other operations that afflict Atom's performance more remarkably.
It appears like #5762 is almost ready to 🚢: your feedback was very helpful, @nathansobo, therefore thank you for that. Back to text-transform: the idea of using an (unsorted) counted B-tree is quite clever, as it helps us to perform all the operations (searching, updating, inserting) in O(log n) ⚡ I have noticed, though, that the repo is not currently using such data structure, but I agree it can be deferred to a later moment, given that it's an implementation detail of the various transformations. I have noticed you're actively working on it: I'd like to help, but I don't want to get in your way in these early stages. Thus, I believe it may be useful to share a plan and to work together on it, if you agree 😊 Maybe we could have (one or many) issue(s) on text-transform repo and split work based on those. If this takes longer than implementing the whole idea, I am open to other ways in which I may contribute (e.g. pairing, chatting, etc.). |
I have thought about using
Agreed, and the graphics pipeline is getting faster all the time. I still think there's room to use it more efficiently, however. For example, what if the tiles I mentioned were variable in size, so that inserting lines in one tile pushed other tiles down so they didn't need repaint. Probably not an optimization worth worrying about until we fix the model layer though.
I will take a closer look soon... From my quick scan I agree it's getting close.
Yeah, the situation is complex enough that I want to get everything working before we optimize the data structure. Implementing change propagation on the lines layer yesterday was challenging enough just with a flat array. Hopefully that's the right strategy.
That sounds good. I've posted a list of projects at atom-archive/text-transform#1. I'm happy to engage a deeper discussion on anything in that list or other ideas you may have. As I said on that issue, you'll need to be prepared for churn as that project is still in a lot of flux. But it seems like you can handle that! |
I don't really have any knowledge of the internal API Atom has to handle this. But maybe it would be useful to take a look inside the Microsoft Visual Studio Code, I took a look to the project some time ago, and the column selection was handle in a pretty good way, and the project is also based on Electron. So maybe it would be easy to map their approach. |
This test will randomly fail likely due to atom/atom#5025
Is it normal for Atom to get stuck (or just veery slow) trying to use like 600-800 multicursors? |
We still have work to do to improve performance with large numbers of cursors. Stay tuned. |
@nathansobo Is there any branch/issue/PR that we can follow? |
Hello! Is atom/text-transform#1 still valid? Not sure if I can help, but I'd like to try in order to resolve this annoying issue. |
Hey @bitspike. That repo is old and archived. Most of the ideas started there have made their way into the codebase by now. Large numbers of cursors is one of the last big performance issues we have to tackle. Last time I looked at a profile, which was a while ago, the major bottlenecks were:
One way to start helping is just to familiarize yourself with the problem. Create a benchmark in the |
This is still an issue. |
Thanks for your contribution! This issue has been automatically marked as stale because it has not had recent activity. Because the Atom team treats their issues as their backlog, stale issues are closed. If you would like this issue to remain open:
Issues that are labeled as triaged will not be automatically marked as stale. |
This issue has been automatically locked since there has not been any recent activity after it was closed. If you can still reproduce this issue in Safe Mode then please open a new issue and fill out the entire issue template to ensure that we have enough information to address your issue. Thanks! |
I've just tried to multi-line edit a file with 400 lines produced by
find .
All of lines the started with ./, so I selected the first ./, kept on pressing CTRL+D to capture them all. Around line 250-300, the editor would slow down a bit, but the selection finished quickly. Then I started typing a few characters, and the first took several seconds to appear, while the second didn't appear even after 30 seconds. That makes multi-line editing pretty much useless.The text was updated successfully, but these errors were encountered: