Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Count objects performance improvement #167

Merged
merged 11 commits into from
Aug 22, 2021
Merged

Count objects performance improvement #167

merged 11 commits into from
Aug 22, 2021

Conversation

Byron
Copy link
Owner

@Byron Byron commented Aug 22, 2021

The current implementation suffers for being more flexible than it needs to be.

For one, it's implemented as iterator even though that's not required as counts are never streamed.
Secondly, it is forced to use thread-safe data structures which greatly slow down operation in a
single thread.

  • Rewrite multi-threaded counting to not be an iterator but use scoped thread parallelism instead with cancellation support. That way some indirections through arcs can be removed and overall we will probably get faster as we don't have to send small vectors around just to combine them into a big one later. Also review how input is presented, allow an Option/Result to handle errors there. Otherwise people are forced to panic or iterate everything in advance.
  • a single-threaded version of counting to avoid dashmap tax (as Mutex is 12s faster than dashmap already with a single thread) to get en-par or better than git in this case - fast insertions are key.

Unfortunately, it didn't get noticeably faster - using a RefCell in 40s of time about 2s go to the RefCell. PackCaches currently do a lot of allocations (and throw them away when the LRU portion kicks in), which could be improved with a free-list. But to do that, it requires support in the statically allocated version.

Furthermore it might be possible to improve cache efficiency by caching whole objects even though my strong feeling is that this won't do much as objects are never visited twice when traversing trees. When handling tree diffs, a better cache is definitely possible though, but that's out of scope here.

@Byron Byron mentioned this pull request Aug 22, 2021
43 tasks
…which is good enough for tests but the real world example shows that
it needs some additional changes.
…and backport all capabilities like progress reporting and
interruptability to something that's semantically similar.
…which forms the basis for having a single-threaded version of this
This opens a pathway to using something that's not a dashmap
…to allow for a single-threaded implementation with a RefCell.
Unfortunately we can't just use a mutable HashSet without duplicating
everything thanks to the &mut requirement or without unsafe code (i.e.
storing a pointer and just turning it into a mutable ref)
…which is still not optimal due to RefCell, but probably the cost of
that is neglectable or can be made up for with a faster hash.

However, it's not exactly faster and it doesn't max out one core either.
…even though that doesn't really translate into much performance,
despite technically saving millions of allocations. Maybe allocators
are already pretty good at this.
Byron added a commit that referenced this pull request Aug 22, 2021
@Byron Byron merged commit 8d49976 into main Aug 22, 2021
Byron added a commit that referenced this pull request Aug 22, 2021
…even though that doesn't really translate into much performance,
despite technically saving millions of allocations. Maybe allocators
are already pretty good at this.
@Byron Byron deleted the count-objects-performance branch August 24, 2021 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant