Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficient bulk updates for DataView #571

Closed
Danielku15 opened this issue Jan 20, 2021 · 6 comments · Fixed by #572
Closed

Efficient bulk updates for DataView #571

Danielku15 opened this issue Jan 20, 2021 · 6 comments · Fixed by #572

Comments

@Danielku15
Copy link
Contributor

We're currently facing quite a significant performance issue with Slickgrid when doing a lot of updates in the DataView.

From a data stream we're receiving thousands of log entries, which we want to feed into slickgrid via a DataView. As part these incoming events we need to delete some and add some new ones. The most nasty thing we might be doing is inserting all items at the beginning of the list because we want to have newest items first.

We're calling beginUpdate and endUpdate but this only suspends the refresh triggers. Any call to insertItem, addItem or deleteItem causes the updateIdxById to be called which in worst case causes a loop over all items again.

So assume you have a dataview with 3000 items, and you get 300 new items which will either call insertItem(0, item) or deleteItem(item.id) based on some rules. In the worst case we will have 300 times a loop over 3000 items to update the index table. As the updateIdxById and idxById are not public, it is really hardly possible to optimize here on our level with some known constraints.

The insert part I was able to patch already by modifying the original items array and calling insertItem at the very end:

dataView.beginUpdate();
for(let i = 0; i < logs.length; i++) {
    if(i < logs.length - 1) { dataView.getItems().unshift(item); } 
    else { dataView.insertItem(0, item); }
}
dataView.endUpdate();

This way I ensure that we have 1 loop over 3000 items at the last item. But it gets tricky if you start deleting items because the internal index table is not correct anymore and also delete calls cannot easily be translated.

dataView.beginUpdate();
for(let i = 0; i < logs.length; i++) {
    if (shouldRemove(item)) { dataView.deleteItem(item.id); }
    else if (i < logs.length - 1) { dataView.getItems().unshift(item); } 
    else { dataView.insertItem(0, item); }
}
dataView.endUpdate();

My idea on how SlickGrid could handle this would be something like this:

  1. We add something like a beginBulkUpdate() and call it before the update.
  2. Add/Insert operations are done instantly and the internal items array is changed, the id to index mapping is NOT updated yet.
  3. The delete operations are remembered in a separate Set instance (or as object key if you want to support old browsers).
  4. At the end we call an endBulkUpdate() which loops once through the whole dataset and either update the index of the current element, or delete the item at the current location if it is contained in the Set

I would have added this to my level of the code but the as the index loop is private I cannot enforce this properly.

It would be great if the DataView could handle such kind of add or update operations. SlickGrid in general is operating fine, it's the streaming update capabilities which are not available. Reimplementing various functionalities of the DataView we need is also not really a good option.

Alternatively I would be also fine with just exposing the currently private parts of the lookups to allow me changing it. This way I can load the original list and idx lookup. This way I should be able to insert new items and delete old ones. Finally I can just call setItems() with the final list or update idxById on my own.

Looking forward to your feedback.

@ghiscoding
Copy link
Collaborator

ghiscoding commented Jan 20, 2021

By doing a quick look at the internal code and reading through your use case.
I think the bulk idea could be nice to have.

by looking at the current beginUpdate, the code is

function beginUpdate() {
      suspend = true;
}

and yes it seems to be used only fur suspending the refresh

with that in mind and as per your suggestion, we could add something like this

function beginBulkUpdate() {
      bulkSuspend = true;
}

function insertItem(insertBefore, item) {
   if (bulkSuspend) {
      return;
   }
   // ...
}

We are totally open for PR (Pull Request) for performance improvements, you have the best use case, and code already in place, to test it and so it might be better if you implement it and tweak it so that it behaves the way you think is most performant.

I'm pretty sure @6pac would agree on adding such improvements. We always try to improve the code but at the same time we also try to change the code without breaking changes, so I would personally opt for new bulk functions so that developers can choose with/without the bulk in mind.

On the public/private subject, I think that the entire code only makes methods & event handlers as public, all the variables are private and we should probably leave it as it is. We can however makes some of the methods public if it helps, but we would rather have the improvement available to everyone right from the source.

@Danielku15
Copy link
Contributor Author

Warning: This message again contains quite a bunch of details but I hope it helps understanding the usecase and improvement proposal. 😉

I just made some local patches on the DataView and brought down the times quite significantly. Before an update of 3000k items took 150-200ms and now it takes 7-20ms. 10ms of this update are purely the CSS height calculations which happen on the refresh. I would also love to get rid of this 10ms but that's something for a different day.

image
image

On the public/private subject, I think that the entire code only makes methods & event handlers as public, all the variables are private and we should probably leave it as it is.

The getItems currently exposes the internal items array. So I followed this strategy and exposed the lookup via the getIdxById if no parameter is supplied.

SlickGrid/slick.dataview.js

Lines 332 to 334 in 214aa40

function getIdxById(id) {
return idxById[id];
}

became

    function getIdxById(id) {
      return id === void 0 ?  idxById : idxById[id];
    }

This allows me to obtain the whole id lookup table and do the efficient bulk update on my level. I am not sure if my logic is something which is beneficial for all usecases but let's discuss it if this approach really fits all cases and we could add it to the DataView in general.

The following code shows an example how my proposal could improve performance. It might be specific to my usecase but I guess the general logic could apply to many cases.

I am having some sort of ring-buffer per group. If we exceed the ringbuffer size we remove the log entries from the data view. This way we have 10 items per group in the data view and this is where a insert+delete comes together.

let groups = new Map();
let logs = new Slick.Data.DataView({ inlineFilters: true });
function addLoggingEvents(events) {
    logs.beginUpdate();
    for (let i = 0; i < events.length; i++) {
        let log = events[i];
        let groupName = log.group;
        if (!groups.has(groupName)) {
            groups.set(groupName, []);
        }
        
        let group = groups.get(groupName);
        if (group.length >= 10) {
            let oldestLog = group.pop();
            logs.deleteItem(oldestLog.id);
        }

        log.id = Math.floor(Math.random() * Date.now());
        group.unshift(log);
        logs.insertItem(0, log);
    }
    logs.endUpdate();
}

With the index lookup exposed exposed I improved it to:

let groups = new Map();
let logs = new Slick.Data.DataView({ inlineFilters: true });
function addLoggingEvents(events) {
    let itemsToDelete = new Set();
    let originalItems = logs.getItems();

    for (var i = 0; i < events.length; i++) {
        var log = events[i];
        let groupName = log.group;
        if (!groups.has(groupName)) {
            groups.set(groupName, []);
        }

        let group = groups.get(groupName);
        if (group.length >= 10) {
            let oldestLog = group.pop();
            itemsToDelete.add(oldestLog.id);
        }
        
        log.id = Math.floor(Math.random() * Date.now());
        group.unshift(log);
    }

    Array.prototype.splice.apply(originalItems, [0,0].concat(events));

    var idxById = logs.getIdxById();
    // This is an in-place filtering and index update of the array
    var newIndex = 0;
    for(var currentItem = 0; currentItem < originalItems.length; currentItem++) {
        var item = originalItems[currentItem];
        if (!itemsToDelete.has(item.id)) {
            originalItems[newIndex] = item;
            idxById[item.id] = newIndex;
            ++newIndex;
        } else {
            delete idxById[item.id];
        }
    }
    originalItems.length = newIndex;

    logs.refresh();
}

This brings me down from an exponential complexity down to a linear one.

The "magic" I talked about are 3 things:

  1. Instead of actually performing a delete logic, we're collecting the IDs to delete in a set.
  2. I bulk insert the new items to the position where they should be using splice
  3. I do a one time loop over the whole data set and recompute the new items array with an in-place array modification. At the same time I'm building up the new idxById lookup.

From a first feeling this could also be moved to the DataView and I could try to file a PR. It could be something like:

  1. The logic could be switched using a beginUpdate parameter, or a separate beginBulkUpdate
  2. A new insertItems(idx, items) (plural!) would allow inserting many items at once.
  3. To be aligned with the existing API we should also addItems(items).
  4. All insert* and add* methods would need to skip any refresh/index updates if the bulk update is active.
  5. The endUpdate would then fire off the logic above with the Set to recompute the final items.

Please let me know what you think about this approach. If there is a good chance either the "exposing the lookup" or the overall "bulk logic" will find it's way into SlickGrid, I'm happy to file a PR.

@ghiscoding
Copy link
Collaborator

ghiscoding commented Jan 20, 2021

What does id === void 0 do again? I've seen this somewhere but forgot the meaning of it (void 0).

Feel free to do a PR but I have couple of comments, I see you use Map and Set but that is only supported with IE11+ and you also use let in a couple of places, however the SlickGrid code is written in ES5 syntax so that it works in all browsers and probably because it was written 10 years ago and so we don't need polyfill, I'm ok with at least IE11 supported (I believe IE10 is EOL anyway) but I'm not sure if @6pac would say the same!? It has to work in Opera Mini as well, it's used a lot in countries with poor speed, but it says "unknown" for Map/Set on caniuse site.

Array.prototype.splice.apply is this overriding the splice method? This might be risky to do as it might affect something outside of SlickGrid, is it?

I'm more in favor of creating new methods beginBulkUpdate to avoid having behavior changes that not all developer might know about. Unless you're saying there's really 0 behavior changes then it would be less code in the lib.

For the methods that are public, they are all exposed starting from this line

SlickGrid/slick.dataview.js

Lines 1213 to 1216 in 16c3b01

$.extend(this, {
// methods
"beginUpdate": beginUpdate,
"endUpdate": endUpdate,

I can try your code change, after you do a PR, in my libs too Angular-Slickgrid and couple of other libs I have, which are used by few hundred users.

@Danielku15
Copy link
Contributor Author

void 0 is just a short hand for undefined. It's like the pre-ES6 default parameter handling.
image

When filing the PR I will try to stick to ES5 practices. Set can be replaced with an object, and the other things are just syntactic sugar 😉

Array.prototype.splice.apply is simply calling the method. The special thing is that for splice you cannot pass in an array which you want to insert. It will insert the array as new item, but not "concatenate" the array.

var a = [];
a.splice(0,0,1,2,3,4,5); 
// a will be [1,2,3,4,5]
var a = [];
var b = [0,0,1,2,3,4,5];
a.splice(0,0, b); 
// a will be [ [1,2,3,4,5] ] with a[0] being [1,2,3,4,5]

// but apply works: 
var a = [];
var b = [0,0,1,2,3,4,5];
Array.prototype.splice.apply(a, b);
// a will be [1,2,3,4,5]

Will try to provide a PR soon once we confirmed that this logic does not have side effects. For SlickGrid this logic will ultimately only be active if you call the right start* method, old code should remain working as it is.

@6pac
Copy link
Owner

6pac commented Jan 21, 2021

Just a confirmation that I don't personally use DataView at all, I have my own more advanced component. So if @ghiscoding is good with changes, I'm good with them also.

@Danielku15
Copy link
Contributor Author

I adopted my changes into the lib and opened #572. @ghiscoding if you like you can have a closer look. Especially point 2 with the proposal on how to activate the new feature.

For the tests I will likely adopt a bunch of existing dataview tests to simply ensure that things work as expected. Unfortunately it will be hard to test the performance aspects. Before I start with tests it would be good to get a confirmation that the changes are good as they are. For me using this patched DataView gives the same performance boost as the logic on my code level (as described above).

Unfortunately for the wiki there is no way to open PRs. But I will create a fork with my changes and then you can pull it manually.

@6pac 6pac closed this as completed in #572 Feb 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants