Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Id assignment perf #186

Merged
merged 7 commits into from
Apr 16, 2024
Merged

Id assignment perf #186

merged 7 commits into from
Apr 16, 2024

Conversation

rsjbailey
Copy link
Contributor

@rsjbailey rsjbailey commented Jan 4, 2024

ID assignment was pretty slow for large documents

On my machine, adding 200 simple objects to a document took ~70ms, primarily due to the repeated document->lookup() calls. This adds a slightly more complex search for a free id, which reduces the cost of adding 200 objects to about 1.5ms.

A 64 object document (more realistic) went from 2ms -> 200µs, which makes it more reasonable in an SADM context.

This should improve copying of medium/large docs (unless they're primarily common defs, which skipped the lookup anyway)

Probably slower for small documents, but should still be fast as then operating on small vectors.

Could improve further by keeping element vectors sorted on id, but that's a more intrusive change that might affect behaviour as elements would always get written out sorted by ID. (flat_map from boost or C++23 would work, or writing a less generic equivalent isn't too tricky)

@codecov-commenter
Copy link

codecov-commenter commented Jan 4, 2024

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (3f94a9b) 89.91% compared to head (7c1946c) 90.41%.

Files Patch % Lines
src/detail/id_assigner.cpp 97.29% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #186      +/-   ##
==========================================
+ Coverage   89.91%   90.41%   +0.49%     
==========================================
  Files         125      124       -1     
  Lines        5812     5864      +52     
==========================================
+ Hits         5226     5302      +76     
+ Misses        586      562      -24     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tomjnixon
Copy link
Member

Looks good to me.

I had a go at using unordered_set instead, which would make this simpler and O(n), but it was a bit slower in practice. The default hash maps in C++ are notoriously slow.

You can make it slightly faster (1.25ms to 1.15ms) with counters.reserve(elements.size());. this might temporarily waste some space, but allocation should normally be constant time for these sizes.

@rsjbailey rsjbailey force-pushed the id_assingment_perf branch 2 times, most recently from 08b43dd to 7c1946c Compare January 11, 2024 15:10
@rsjbailey
Copy link
Contributor Author

rsjbailey commented Jan 11, 2024

The deepCopy() improvements reduce the 200 objects + common definitions copy from ~2ms (post ID changes) to about 500µs, which is probably good enough for most of our use cases.

The downside is there are now two paths for adding things to a document (via add() or via deepCopy()), both of which need to be maintained if we add any more elements/parameters. I think it's reasonable as this should only happen rarely (when the standard changes)

The variant unpacking is a bit messy. I thought of a couple of alternatives:

Write a visitor: Still messy as you'd either need it to be a friend of all the element attourneys and Document or do something nasty by passing internals around by reference.

Write a different copyElements function that returns a struct with element vectors rather than variants, or takes references to the document element vectors: Adds more boilerplate than just having a big conditional (as you still need the original copyElement interface for doing deepCopyTo), and benchmarked performance was identical.

@rsjbailey rsjbailey marked this pull request as ready for review January 11, 2024 15:23
Skips all the checks for ID clashes by simply copying the elements assigning the new document as parent, and pushing into the relevant containers
This should be fine as the checks have already been done by the original
document, and we're starting from nothing.
Speeds up the deepCopy() and common definitions copy benchmarks by about 30% here
Uses the kitchen sink xml that features all ADM elements to generate a document
Then deep copies that document, and checks that xml written from both the
document and its copy match (they may differ from the original)
@rsjbailey rsjbailey merged commit 258059d into master Apr 16, 2024
8 checks passed
@rsjbailey rsjbailey deleted the id_assingment_perf branch April 16, 2024 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants