Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite reorder #197

Merged
merged 18 commits into from Mar 5, 2015
Merged

Rewrite reorder #197

merged 18 commits into from Mar 5, 2015

Conversation

Matt-Esch
Copy link
Owner

I have decided the reordering logic needs to be rewritten. I effectively take a bubble sort style approach to get the nodes in the correct shape. The offset counting is very error prone and difficult to debug so I've avoided that where possible.

As an extension to this, I would like to extend the capability of virtual-dom to insert at an index. Currently we use the diff(null, newVnode) feature to create the inserts. I would like to avoid inserting the items at the end and insert all new items at the correct index at creation time.

Having better insert semantics will remove all re-order overhead for inserts. We really need some benchmarks for this. I am interested to see how more or less efficient this becomes.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.2%) to 97.27% when pulling 3481d6e on rewrite-reorder into 3ea4f0e on master.

@neonstalwart
Copy link
Collaborator

The offset counting is very error prone and difficult to debug so I've avoided that where possible.

i agree - it's error prone but i want to make sure we don't lose the optimizations it could provide.

I would like to avoid inserting the items at the end and insert all new items at the correct index at creation time.

this will help a lot

We really need some benchmarks for this.

i agree. my assumption has been that moving nodes in the DOM is more expensive than calculating a minimal reorder but we don't have any benchmarks. however, even if the cost of moving nodes is not expensive it should still be minimized because of the side-effects caused by removing a node from the DOM - specifically blur handlers and loss of focus if the activeElement is removed and interfering with CSS transitions.

i was most of the way through my own rewrite/refactor which was largely continuing with the existing approach of calculating an insert offset. for a while i've thought that some of the problems could be alleviated with a better diff (particularly by inserting new nodes where they belong) but i hadn't wanted to disturb that code. so, i'm glad you've had a look at this for yourself.

i'll try to take a look at this today and provide some feedback.

@neonstalwart
Copy link
Collaborator

i love the new patch format for moves - it's actually a more testable format which should help us.

it looks like we can't avoid doing the insert offset logic - even though the implementation is different, the logic has just been moved from patch to diff. however, this has the benefit that multiple patch implementations can benefit from the good work done in diff, so moving to diff is a good thing. the reason i hadn't suggested this previously is because i believe this should qualify as a breaking change in the API so, we should bump the major version when we release this.

this implementation has lost some of the optimizations which were previously in place. i think we need to strive for being both correct and efficient. i'm going to build on what you have to add some test cases that test for optimizations thanks to the new patch format.

the first one i've tested that fails is when an element moves to a later position.

in this case, the bad thing to do is to bubble all the later elements ahead of this one until it gets into position. as an example, if you had a list of nodes [0,5,1,2,3,4] that needs to become [0,1,2,3,4,5] the optimal set of moves is

[
  { from: 1, to: null } // insertBefore(node, null) will add to the end of the list
]

but the current logic will produce

[
  { from: 2, to: 1 },
  { from: 3, to: 2 },
  { from: 4, to: 3 },
  { from: 5, to: 4 }
]

as i mentioned in my previous comment, the number of moves is not just about efficiency but it's about avoiding breaking expectations around nodes being continuously in the document.

i'm working on more test cases and also ways to improve the diff logic for reordering.

@Matt-Esch
Copy link
Owner Author

@neonstalwart I have outstanding changes I'm going to push - basically I am going to produce an array of numeric indices, and then try implementing an in-place quick sort over the numbers. It will be much easier to optimize the sorting when we actually have something numeric to work with

@neonstalwart
Copy link
Collaborator

push away - i'm looking into this now

@Matt-Esch
Copy link
Owner Author

Unfortunately the files are currently sleeping on my personal machine at home, and I'm at work, just thought I would give you a heads up.

If you're interested on working on something right now, I would try implementing a generic sort function that takes an array of numbers and sorts them, returning a set of moves. If not, I will probably implement this https://www.cs.auckland.ac.nz/software/AlgAnim/qsort1a.html

@Matt-Esch
Copy link
Owner Author

Rigorous tests are also clearly needed. When rewriting this I found that some fundamental bugs were caught by one assert alone. In one example, an error I made caused free items to be inserted multiple times. The assert statements pairwise compared the expected vs actual items 0, 1, and 2. It never bothered to check the expected length of the result, so items 3 and 4 in the result which never should have existed were not causing test failures. I only caught this in the "unnecessary patches" test, which saw that patches were created despite the arrays being identical. (2 inserts were created).

Would love to see some improvements to the reordering tests.

@Zolmeister
Copy link
Contributor

Seems like if the goal to to minimize the number of moves, quicksort is not the best option.
It's also worth asking if a swap is as expensive as two a move in the DOM.

@neonstalwart
Copy link
Collaborator

@Matt-Esch i'm working on calculating moves based on using

  • aChildren - our starting point
  • bChildren - our goal
  • aKeys - used to determine if a node in bChildren was added
  • bKeys - used to determine if a node in aChildren has been removed

i'm not sure we can use a standard sorting algorithm

my first attempt today started with

    var a = Object.keys(aKeys).sort(function (a, b) { return aKeys[a] - aKeys[b]  })
    var b = Object.keys(bKeys).sort(function (a, b) { return bKeys[a] - bKeys[b]  })

which gives me 2 arrays of keys ordered by their position in their respective children arrays but i keep coming back to needing to use offsets.

you don't seem to be in favor of using offsets (and i agree that they are hard to get right) but i don't see how we can calculate the moves needed without doing some accounting for:

  • nodes scheduled to be removed - they affect the to index for anything following them
  • newly added nodes - they affect the from and the to index for anything following them
  • nodes moving towards the end of the array - these can only be done after we've scanned all the way to their destination so that we can adjust their to index to account for things like newly added nodes that sit between the from and to.

so, this doesn't work yet, but the current shape of it is like this:

    var moves = []
    // adjusts the from position to consider nodes which will move later or have been added
    var offset = 0
    var length = Math.max(aChildren.length, bChildren.length)
    var aNode
    var bNode

    for (var l = 0; l < length; l++) {
        // aChildren is our initial list and we progressively mutate it in-place to become bChildren. work is done in
        // order from to values of 0 to bChildren.length - 1. this means that we always keep the destination in the
        // visited part of the output. anything that needs to move beyond the current index will be moved when we visit
        // the destination index

        aNode = aChildren[l - offset]
        bNode = bChildren[l]

        if (bNode && bNode.key) {
            // aKeys tells us the source location of the node
            var from = aKeys[bNode.key]
            var to = l + offset

            if (from != null) {
                // an earlier node is moved later, do it now and reduce the offset
                if (from < to) {
                    moves.push({ to: to, from: from + 1 })
                    offset--
                }
                else if (from + offset > to) {
                    moves.push({ to: to, from: from + offset })
                }
            }
            // a node was added so our offset needs to be adjusted
            else {
                offset++
            }
        }

        // account for nodes that are still there but will be removed later
        if (aNode && aNode.key && bKeys[aNode.key] == null) {
            offset++
        }
    }

@Matt-Esch
Copy link
Owner Author

@Zolmeister the goal is not to minimize moves, it's to minimize the overall time. Of course, we assume that DOM moves are expensive. But that's no excuse to use an O(N^3) algorithm in diff to compute the optimal solution. In-place quick sort has an upper bound of O(N^2) but generally performs better.

I would really like to do the following:

  1. remove any old items
  2. sort the remaining items
  3. insert the new items in place

@neonstalwart I will need to brain cycles to parse your logic, will get back to you

@neonstalwart
Copy link
Collaborator

my latest as of the end of today...

    var moves = []
    // adjusts the from position to consider nodes which will move later or have been added
    var offset = 0
    var nodesRemoved = 0
    var nodesSkipped = 0
    var nodesAdded = 0
    var forwardOffsets = {}
    var forwardMoves = 0
    var length = Math.max(aChildren.length, bChildren.length)
    var aNode
    var aKey
    var bNode
    var bKey

    for (var l = 0; l < length; l++) {
        // aChildren is our initial list and we progressively mutate it in-place to become bChildren. work is done in
        // order from to values of 0 to bChildren.length - 1. this means that we always keep the destination in the
        // visited part of the output. anything that needs to move beyond the current index will be moved when we visit
        // the destination index

        aNode = aChildren[l]
        aKey = aNode && aNode.key
        bNode = bChildren[l]
        bKey = bNode && bNode.key

        if (aKey && bKeys[aKey] == null) {
            nodesRemoved++
        }

        offset = nodesRemoved + nodesAdded + nodesSkipped

        if (bKey) {
            // aKeys tells us the source location of the node
            var from = aKeys[bKey]
            var to = l + offset + 1

            if (from != null) {
                // move a node from earlier in the list towards the end
                if (from + offset < to - 1) {
                    nodesSkipped--
                    forwardMoves--
                    moves.push({ to: to - nodesSkipped, from: from + forwardOffsets[from] })
                }
                // move a node from later in the list towards the front
                else if (from + offset > to) {
                    moves.push({ to: to - 1, from: from + offset })
                    forwardMoves++
                }
            }
            // a node was added so our offset needs to be adjusted
            else {
                nodesAdded++
            }
        }

        // node will move forward later - skip it for now
        if (aKey && bKeys[aKey] > l) {
            // store current offset from the original position
            forwardOffsets[l] = nodesRemoved + nodesAdded + forwardMoves
            nodesSkipped++
        }
    }

it's still not quite right - 2 tests fail.

before this is all over, there will be more rigorous testing but first i want to get the current tests working with this approach since it is O(n) for diff. for performance i really hope we can avoid anything worse than O(nlogn) and we don't break expectations for existence of nodes in the document (this is the most important to me).

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.2%) to 97.27% when pulling 407ec42 on rewrite-reorder into 3ea4f0e on master.

@Matt-Esch
Copy link
Owner Author

I pushed the next optimization which uses the single iteration over b to index the free nodes instead of searching for them. So we should now be able to construct the sort order for a after inserts and use that to make easy decisions about swaps.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.3%) to 97.18% when pulling b9b490d on rewrite-reorder into 3ea4f0e on master.

@Matt-Esch
Copy link
Owner Author

@neonstalwart I added numeric index sorting to this branch to show what I mean.

The current caveat is that because all nodes are numerically indexed, deleted nodes are sorted. An update should be made to pre remove any deleted nodes from the array if sorting is going to take place and only do so if there are moves.

I'd like to merge in your tests and compare the different approaches for performance. I'd also like to solicit feedback on the sorting algorithm itself. If you have some test cases you'd like to know the moves for I'd be happy to generate them.

@neonstalwart
Copy link
Collaborator

@Matt-Esch this smells pretty good but i need some time to digest it - i'm busy for most of today.

i think pre-removing nodes is going to help any algorithm - they just get in the way of what might otherwise be some simpler logic. i agree with the order you already mentioned as ideal

  1. remove any old items
  2. sort the remaining items
  3. insert the new items in place

it may be worth the time to do this now if you've got ideas for how you want to do this.

If you have some test cases you'd like to know the moves for I'd be happy to generate them.

the tests where i've used assertReorderEquals are my attempt to ensure we're creating minimal/sensible moves. it would be good to find a general way to add an assertion for the minimal number of moves to the permutation/fuzzy testing (i have an idea for how to do this but no time currently to work through it) but at least if you could include something similar that does a manual assertion for some of the manual tests (like i have) that would be a good start.

@neonstalwart
Copy link
Collaborator

I have another thought I want to pursue when I get a chance this coming week.

the basis of it involves changing the patch to be 2 phases:

  1. remove all nodes to be reordered. maintain a hash of these nodes that allows us to retrieve each one by their key during the next phase.
  2. insert all nodes in their final position

I believe the diff can then be done in O(N) with 2 passes

  1. iterate from right to left over the simulate array and identify the nodes that are out of position, record that they need to be removed (an array of indices) and splice them from the array. by working right to left, the effect of removing an item minimizes the impact on the indices of the unvisited part of the array. in order to determine if a node is out of position, this pass needs to maintain some counters to help calculate the effective position of a node.
  2. iterate from left to right over the simulate array to determine the insert index for each of the nodes. record this as a list of { key, index} objects

solutions involving using counters to help determine effective positions have proven to be harder than they first seem but I'm willing to try another one in the hope we can achieve O(N)

@neonstalwart
Copy link
Collaborator

I changed my mind. the first phase of the diff has to be left to right in order to determine if a node is out of position.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) to 97.48% when pulling 707bb56 on rewrite-reorder into 3ea4f0e on master.

@Matt-Esch
Copy link
Owner Author

@neonstalwart I added your tests to my pr, they are really good and helped me catch a case I forgot about.

I've taken to eagerly removing nodes from the array before sorting. This means that in some cases we end up with extra moves.

I'm using move { from: nodeIndex, to: -1 } to indicate deletes. If the only moves that are made are deletes, I return an empty moves set, relying on the generated destroy patches to do so instead. This does highlight that there could be some redundancy in the remove operations at the moment. (i.e. nodes which can just be removed from the DOM will still have patches to remove them).

Inserting into the correct position is going to be much harder as we are relying on indexing and the children diff mechanism to work through all of the new item machinery.

@Matt-Esch
Copy link
Owner Author

@neonstalwart let me know if you get a few minutes to review this change. I'm getting eager to land a v2 ;)

@neonstalwart
Copy link
Collaborator

@Matt-Esch did you see my latest at #199?

Matt-Esch added a commit that referenced this pull request Mar 5, 2015
@Matt-Esch Matt-Esch merged commit e0c3298 into master Mar 5, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants