New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite reorder #197
Rewrite reorder #197
Conversation
i agree - it's error prone but i want to make sure we don't lose the optimizations it could provide.
this will help a lot
i agree. my assumption has been that moving nodes in the DOM is more expensive than calculating a minimal reorder but we don't have any benchmarks. however, even if the cost of moving nodes is not expensive it should still be minimized because of the side-effects caused by removing a node from the DOM - specifically i was most of the way through my own rewrite/refactor which was largely continuing with the existing approach of calculating an insert offset. for a while i've thought that some of the problems could be alleviated with a better i'll try to take a look at this today and provide some feedback. |
i love the new patch format for moves - it's actually a more testable format which should help us. it looks like we can't avoid doing the insert offset logic - even though the implementation is different, the logic has just been moved from patch to diff. however, this has the benefit that multiple patch implementations can benefit from the good work done in diff, so moving to diff is a good thing. the reason i hadn't suggested this previously is because i believe this should qualify as a breaking change in the API so, we should bump the major version when we release this. this implementation has lost some of the optimizations which were previously in place. i think we need to strive for being both correct and efficient. i'm going to build on what you have to add some test cases that test for optimizations thanks to the new patch format. the first one i've tested that fails is when an element moves to a later position. in this case, the bad thing to do is to bubble all the later elements ahead of this one until it gets into position. as an example, if you had a list of nodes [
{ from: 1, to: null } // insertBefore(node, null) will add to the end of the list
] but the current logic will produce [
{ from: 2, to: 1 },
{ from: 3, to: 2 },
{ from: 4, to: 3 },
{ from: 5, to: 4 }
] as i mentioned in my previous comment, the number of moves is not just about efficiency but it's about avoiding breaking expectations around nodes being continuously in the document. i'm working on more test cases and also ways to improve the diff logic for reordering. |
@neonstalwart I have outstanding changes I'm going to push - basically I am going to produce an array of numeric indices, and then try implementing an in-place quick sort over the numbers. It will be much easier to optimize the sorting when we actually have something numeric to work with |
push away - i'm looking into this now |
Unfortunately the files are currently sleeping on my personal machine at home, and I'm at work, just thought I would give you a heads up. If you're interested on working on something right now, I would try implementing a generic sort function that takes an array of numbers and sorts them, returning a set of moves. If not, I will probably implement this https://www.cs.auckland.ac.nz/software/AlgAnim/qsort1a.html |
Rigorous tests are also clearly needed. When rewriting this I found that some fundamental bugs were caught by one assert alone. In one example, an error I made caused free items to be inserted multiple times. The assert statements pairwise compared the expected vs actual items 0, 1, and 2. It never bothered to check the expected length of the result, so items 3 and 4 in the result which never should have existed were not causing test failures. I only caught this in the "unnecessary patches" test, which saw that patches were created despite the arrays being identical. (2 inserts were created). Would love to see some improvements to the reordering tests. |
Seems like if the goal to to minimize the number of moves, quicksort is not the best option. |
@Matt-Esch i'm working on calculating moves based on using
i'm not sure we can use a standard sorting algorithm my first attempt today started with var a = Object.keys(aKeys).sort(function (a, b) { return aKeys[a] - aKeys[b] })
var b = Object.keys(bKeys).sort(function (a, b) { return bKeys[a] - bKeys[b] }) which gives me 2 arrays of keys ordered by their position in their respective children arrays but i keep coming back to needing to use offsets. you don't seem to be in favor of using offsets (and i agree that they are hard to get right) but i don't see how we can calculate the moves needed without doing some accounting for:
so, this doesn't work yet, but the current shape of it is like this: var moves = []
// adjusts the from position to consider nodes which will move later or have been added
var offset = 0
var length = Math.max(aChildren.length, bChildren.length)
var aNode
var bNode
for (var l = 0; l < length; l++) {
// aChildren is our initial list and we progressively mutate it in-place to become bChildren. work is done in
// order from to values of 0 to bChildren.length - 1. this means that we always keep the destination in the
// visited part of the output. anything that needs to move beyond the current index will be moved when we visit
// the destination index
aNode = aChildren[l - offset]
bNode = bChildren[l]
if (bNode && bNode.key) {
// aKeys tells us the source location of the node
var from = aKeys[bNode.key]
var to = l + offset
if (from != null) {
// an earlier node is moved later, do it now and reduce the offset
if (from < to) {
moves.push({ to: to, from: from + 1 })
offset--
}
else if (from + offset > to) {
moves.push({ to: to, from: from + offset })
}
}
// a node was added so our offset needs to be adjusted
else {
offset++
}
}
// account for nodes that are still there but will be removed later
if (aNode && aNode.key && bKeys[aNode.key] == null) {
offset++
}
} |
@Zolmeister the goal is not to minimize moves, it's to minimize the overall time. Of course, we assume that DOM moves are expensive. But that's no excuse to use an O(N^3) algorithm in diff to compute the optimal solution. In-place quick sort has an upper bound of O(N^2) but generally performs better. I would really like to do the following:
@neonstalwart I will need to brain cycles to parse your logic, will get back to you |
my latest as of the end of today... var moves = []
// adjusts the from position to consider nodes which will move later or have been added
var offset = 0
var nodesRemoved = 0
var nodesSkipped = 0
var nodesAdded = 0
var forwardOffsets = {}
var forwardMoves = 0
var length = Math.max(aChildren.length, bChildren.length)
var aNode
var aKey
var bNode
var bKey
for (var l = 0; l < length; l++) {
// aChildren is our initial list and we progressively mutate it in-place to become bChildren. work is done in
// order from to values of 0 to bChildren.length - 1. this means that we always keep the destination in the
// visited part of the output. anything that needs to move beyond the current index will be moved when we visit
// the destination index
aNode = aChildren[l]
aKey = aNode && aNode.key
bNode = bChildren[l]
bKey = bNode && bNode.key
if (aKey && bKeys[aKey] == null) {
nodesRemoved++
}
offset = nodesRemoved + nodesAdded + nodesSkipped
if (bKey) {
// aKeys tells us the source location of the node
var from = aKeys[bKey]
var to = l + offset + 1
if (from != null) {
// move a node from earlier in the list towards the end
if (from + offset < to - 1) {
nodesSkipped--
forwardMoves--
moves.push({ to: to - nodesSkipped, from: from + forwardOffsets[from] })
}
// move a node from later in the list towards the front
else if (from + offset > to) {
moves.push({ to: to - 1, from: from + offset })
forwardMoves++
}
}
// a node was added so our offset needs to be adjusted
else {
nodesAdded++
}
}
// node will move forward later - skip it for now
if (aKey && bKeys[aKey] > l) {
// store current offset from the original position
forwardOffsets[l] = nodesRemoved + nodesAdded + forwardMoves
nodesSkipped++
}
} it's still not quite right - 2 tests fail. before this is all over, there will be more rigorous testing but first i want to get the current tests working with this approach since it is O(n) for diff. for performance i really hope we can avoid anything worse than O(nlogn) and we don't break expectations for existence of nodes in the document (this is the most important to me). |
I pushed the next optimization which uses the single iteration over b to index the free nodes instead of searching for them. So we should now be able to construct the sort order for a after inserts and use that to make easy decisions about swaps. |
@neonstalwart I added numeric index sorting to this branch to show what I mean. The current caveat is that because all nodes are numerically indexed, deleted nodes are sorted. An update should be made to pre remove any deleted nodes from the array if sorting is going to take place and only do so if there are moves. I'd like to merge in your tests and compare the different approaches for performance. I'd also like to solicit feedback on the sorting algorithm itself. If you have some test cases you'd like to know the moves for I'd be happy to generate them. |
@Matt-Esch this smells pretty good but i need some time to digest it - i'm busy for most of today. i think pre-removing nodes is going to help any algorithm - they just get in the way of what might otherwise be some simpler logic. i agree with the order you already mentioned as ideal
it may be worth the time to do this now if you've got ideas for how you want to do this.
the tests where i've used |
I have another thought I want to pursue when I get a chance this coming week. the basis of it involves changing the patch to be 2 phases:
I believe the diff can then be done in O(N) with 2 passes
solutions involving using counters to help determine effective positions have proven to be harder than they first seem but I'm willing to try another one in the hope we can achieve O(N) |
I changed my mind. the first phase of the diff has to be left to right in order to determine if a node is out of position. |
@neonstalwart I added your tests to my pr, they are really good and helped me catch a case I forgot about. I've taken to eagerly removing nodes from the array before sorting. This means that in some cases we end up with extra moves. I'm using move Inserting into the correct position is going to be much harder as we are relying on indexing and the children diff mechanism to work through all of the new item machinery. |
@neonstalwart let me know if you get a few minutes to review this change. I'm getting eager to land a v2 ;) |
@Matt-Esch did you see my latest at #199? |
I have decided the reordering logic needs to be rewritten. I effectively take a bubble sort style approach to get the nodes in the correct shape. The offset counting is very error prone and difficult to debug so I've avoided that where possible.
As an extension to this, I would like to extend the capability of virtual-dom to insert at an index. Currently we use the
diff(null, newVnode)
feature to create the inserts. I would like to avoid inserting the items at the end and insert all new items at the correct index at creation time.Having better insert semantics will remove all re-order overhead for inserts. We really need some benchmarks for this. I am interested to see how more or less efficient this becomes.