-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient bulk updates for DataView #571
Comments
By doing a quick look at the internal code and reading through your use case. by looking at the current function beginUpdate() {
suspend = true;
} and yes it seems to be used only fur suspending the with that in mind and as per your suggestion, we could add something like this function beginBulkUpdate() {
bulkSuspend = true;
}
function insertItem(insertBefore, item) {
if (bulkSuspend) {
return;
}
// ...
} We are totally open for PR (Pull Request) for performance improvements, you have the best use case, and code already in place, to test it and so it might be better if you implement it and tweak it so that it behaves the way you think is most performant. I'm pretty sure @6pac would agree on adding such improvements. We always try to improve the code but at the same time we also try to change the code without breaking changes, so I would personally opt for new bulk functions so that developers can choose with/without the bulk in mind. On the public/private subject, I think that the entire code only makes methods & event handlers as public, all the variables are private and we should probably leave it as it is. We can however makes some of the methods public if it helps, but we would rather have the improvement available to everyone right from the source. |
Warning: This message again contains quite a bunch of details but I hope it helps understanding the usecase and improvement proposal. 😉 I just made some local patches on the DataView and brought down the times quite significantly. Before an update of 3000k items took 150-200ms and now it takes 7-20ms. 10ms of this update are purely the CSS height calculations which happen on the refresh. I would also love to get rid of this 10ms but that's something for a different day.
The Lines 332 to 334 in 214aa40
became function getIdxById(id) {
return id === void 0 ? idxById : idxById[id];
} This allows me to obtain the whole id lookup table and do the efficient bulk update on my level. I am not sure if my logic is something which is beneficial for all usecases but let's discuss it if this approach really fits all cases and we could add it to the DataView in general. The following code shows an example how my proposal could improve performance. It might be specific to my usecase but I guess the general logic could apply to many cases. I am having some sort of ring-buffer per group. If we exceed the ringbuffer size we remove the log entries from the data view. This way we have 10 items per group in the data view and this is where a insert+delete comes together. let groups = new Map();
let logs = new Slick.Data.DataView({ inlineFilters: true });
function addLoggingEvents(events) {
logs.beginUpdate();
for (let i = 0; i < events.length; i++) {
let log = events[i];
let groupName = log.group;
if (!groups.has(groupName)) {
groups.set(groupName, []);
}
let group = groups.get(groupName);
if (group.length >= 10) {
let oldestLog = group.pop();
logs.deleteItem(oldestLog.id);
}
log.id = Math.floor(Math.random() * Date.now());
group.unshift(log);
logs.insertItem(0, log);
}
logs.endUpdate();
} With the index lookup exposed exposed I improved it to: let groups = new Map();
let logs = new Slick.Data.DataView({ inlineFilters: true });
function addLoggingEvents(events) {
let itemsToDelete = new Set();
let originalItems = logs.getItems();
for (var i = 0; i < events.length; i++) {
var log = events[i];
let groupName = log.group;
if (!groups.has(groupName)) {
groups.set(groupName, []);
}
let group = groups.get(groupName);
if (group.length >= 10) {
let oldestLog = group.pop();
itemsToDelete.add(oldestLog.id);
}
log.id = Math.floor(Math.random() * Date.now());
group.unshift(log);
}
Array.prototype.splice.apply(originalItems, [0,0].concat(events));
var idxById = logs.getIdxById();
// This is an in-place filtering and index update of the array
var newIndex = 0;
for(var currentItem = 0; currentItem < originalItems.length; currentItem++) {
var item = originalItems[currentItem];
if (!itemsToDelete.has(item.id)) {
originalItems[newIndex] = item;
idxById[item.id] = newIndex;
++newIndex;
} else {
delete idxById[item.id];
}
}
originalItems.length = newIndex;
logs.refresh();
} This brings me down from an exponential complexity down to a linear one. The "magic" I talked about are 3 things:
From a first feeling this could also be moved to the DataView and I could try to file a PR. It could be something like:
Please let me know what you think about this approach. If there is a good chance either the "exposing the lookup" or the overall "bulk logic" will find it's way into SlickGrid, I'm happy to file a PR. |
What does Feel free to do a PR but I have couple of comments, I see you use
I'm more in favor of creating new methods For the methods that are public, they are all exposed starting from this line Lines 1213 to 1216 in 16c3b01
I can try your code change, after you do a PR, in my libs too Angular-Slickgrid and couple of other libs I have, which are used by few hundred users. |
Just a confirmation that I don't personally use DataView at all, I have my own more advanced component. So if @ghiscoding is good with changes, I'm good with them also. |
I adopted my changes into the lib and opened #572. @ghiscoding if you like you can have a closer look. Especially point 2 with the proposal on how to activate the new feature. For the tests I will likely adopt a bunch of existing dataview tests to simply ensure that things work as expected. Unfortunately it will be hard to test the performance aspects. Before I start with tests it would be good to get a confirmation that the changes are good as they are. For me using this patched DataView gives the same performance boost as the logic on my code level (as described above). Unfortunately for the wiki there is no way to open PRs. But I will create a fork with my changes and then you can pull it manually. |
We're currently facing quite a significant performance issue with Slickgrid when doing a lot of updates in the DataView.
From a data stream we're receiving thousands of log entries, which we want to feed into slickgrid via a DataView. As part these incoming events we need to delete some and add some new ones. The most nasty thing we might be doing is inserting all items at the beginning of the list because we want to have newest items first.
We're calling
beginUpdate
andendUpdate
but this only suspends therefresh
triggers. Any call toinsertItem
,addItem
ordeleteItem
causes theupdateIdxById
to be called which in worst case causes a loop over all items again.So assume you have a dataview with 3000 items, and you get 300 new items which will either call
insertItem(0, item)
ordeleteItem(item.id)
based on some rules. In the worst case we will have 300 times a loop over 3000 items to update the index table. As theupdateIdxById
andidxById
are not public, it is really hardly possible to optimize here on our level with some known constraints.The insert part I was able to patch already by modifying the original items array and calling insertItem at the very end:
This way I ensure that we have 1 loop over 3000 items at the last item. But it gets tricky if you start deleting items because the internal index table is not correct anymore and also delete calls cannot easily be translated.
My idea on how SlickGrid could handle this would be something like this:
beginBulkUpdate()
and call it before the update.Set
instance (or as object key if you want to support old browsers).endBulkUpdate()
which loops once through the whole dataset and either update the index of the current element, or delete the item at the current location if it is contained in theSet
I would have added this to my level of the code but the as the index loop is private I cannot enforce this properly.
It would be great if the DataView could handle such kind of add or update operations. SlickGrid in general is operating fine, it's the streaming update capabilities which are not available. Reimplementing various functionalities of the DataView we need is also not really a good option.
Alternatively I would be also fine with just exposing the currently private parts of the lookups to allow me changing it. This way I can load the original list and idx lookup. This way I should be able to insert new items and delete old ones. Finally I can just call setItems() with the final list or update
idxById
on my own.Looking forward to your feedback.
The text was updated successfully, but these errors were encountered: