Count objects performance improvement #167

Byron · 2021-08-22T01:59:37Z

The current implementation suffers for being more flexible than it needs to be.

For one, it's implemented as iterator even though that's not required as counts are never streamed.
Secondly, it is forced to use thread-safe data structures which greatly slow down operation in a
single thread.

Rewrite multi-threaded counting to not be an iterator but use scoped thread parallelism instead with cancellation support. That way some indirections through arcs can be removed and overall we will probably get faster as we don't have to send small vectors around just to combine them into a big one later. Also review how input is presented, allow an Option/Result to handle errors there. Otherwise people are forced to panic or iterate everything in advance.
a single-threaded version of counting to avoid dashmap tax (as Mutex is 12s faster than dashmap already with a single thread) to get en-par or better than git in this case - fast insertions are key.

Unfortunately, it didn't get noticeably faster - using a RefCell in 40s of time about 2s go to the RefCell. PackCaches currently do a lot of allocations (and throw them away when the LRU portion kicks in), which could be improved with a free-list. But to do that, it requires support in the statically allocated version.

Add support for returning displaced entries for re-use. servo/uluru#22 - this doesn't yield more performance though.

Furthermore it might be possible to improve cache efficiency by caching whole objects even though my strong feeling is that this won't do much as objects are never visited twice when traversing trees. When handling tree diffs, a better cache is definitely possible though, but that's out of scope here.

…which is good enough for tests but the real world example shows that it needs some additional changes.

…and backport all capabilities like progress reporting and interruptability to something that's semantically similar.

…which forms the basis for having a single-threaded version of this

This opens a pathway to using something that's not a dashmap

…to allow for a single-threaded implementation with a RefCell. Unfortunately we can't just use a mutable HashSet without duplicating everything thanks to the &mut requirement or without unsafe code (i.e. storing a pointer and just turning it into a mutable ref)

…which is still not optimal due to RefCell, but probably the cost of that is neglectable or can be made up for with a faster hash. However, it's not exactly faster and it doesn't max out one core either.

…even though that doesn't really translate into much performance, despite technically saving millions of allocations. Maybe allocators are already pretty good at this.

…tions…" This reverts commit 8d49976. Wait for servo/uluru#22 to land.

…even though that doesn't really translate into much performance, despite technically saving millions of allocations. Maybe allocators are already pretty good at this.

[features] refactor

d4605cd

Byron mentioned this pull request Aug 22, 2021

pack-generation MVP #67

Open

43 tasks

Byron added 10 commits August 22, 2021 10:31

[pack] A non-iterator version of parallel object counting…

04fe855

…which is good enough for tests but the real world example shows that it needs some additional changes.

[features] refactor

0958fc8

[pack #167] remove iterator based count objects impl…

7ec2f2b

…and backport all capabilities like progress reporting and interruptability to something that's semantically similar.

thanks clippy

d689599

[pack #167] Error handling for object input

0aac40c

[pack #167] progress is handled by reducer…

a22f8e1

…which forms the basis for having a single-threaded version of this

[pack #167] refactor

6bf0f7e

This opens a pathway to using something that's not a dashmap

[pack #167] a single-threaded special case for counting…

65e29de

…which is still not optimal due to RefCell, but probably the cost of that is neglectable or can be made up for with a faster hash. However, it's not exactly faster and it doesn't max out one core either.

[pack #167] Use custom uluru version to avoid a lot of allocations…

8d49976

…even though that doesn't really translate into much performance, despite technically saving millions of allocations. Maybe allocators are already pretty good at this.

Byron added a commit that referenced this pull request Aug 22, 2021

Revert "[pack #167] Use custom uluru version to avoid a lot of alloca…

4c2ea21

…tions…" This reverts commit 8d49976. Wait for servo/uluru#22 to land.

Byron merged commit 8d49976 into main Aug 22, 2021

Byron deleted the count-objects-performance branch August 24, 2021 06:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Count objects performance improvement #167

Count objects performance improvement #167

Byron commented Aug 22, 2021 •

edited

Loading

Count objects performance improvement #167

Count objects performance improvement #167

Conversation

Byron commented Aug 22, 2021 • edited Loading

Byron commented Aug 22, 2021 •

edited

Loading