-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace std::map with std::deque for versionedMap #1988
Conversation
Replaces std::map with std::deque in versionedMap.
While replacing std::map with std::deque. Don't store any pointers because emplace can invalidate iterators.
Removed commented out code and added some comments for clarification.
…egoyal_dequeue Attempt at fixing a segv
Tried to fix segfault
I microbenchmarked the storage queue standalone. i.e. the set and clearrange mutations were performed solely at the in-memory storage queue. No other FDB components were involved in this test. And hence the numbers presented here the best case numbers. Test setup: - 100M mutations: about 5% clearRange and 95% set mutations - 100M rangeReads - Keys/Values generated using deterministicRandom() - A new version generated for each mutation (i.e. it's an extreme version test) Performance comparison between std::map and std::deque for VersionedMap" std::map std::deque Time to perform the mutations 220.066 218.784 Time to perform buffered readRange 184.423 171.578
VerisonedMap (storage queue) is a std::map of versioned and corresponding PTree root nodes. It provides a logical abstraction of a separate PTree at any given version. Since the versions are monotonically increasing, the data is intrinsically sorted. Hence don't need an ordered tree (std::map is a red-black tree) for that reason. std::deque gives us the ability to remove from the front and insert at the end efficiently. And we can still do binary search because of intrinsically sorted data. This can provide time and memory benefits esp. in the presence of very many versions. I microbenchmarked the VersionedMap standalone. i.e. the set and clear-range mutations were performed solely at the in-memory storage queue. No other FDB components were involved Test setup:
Performance comparison between std::map and std::deque for VersionedMap" Time to perform the mutations 220.066 (std::map) 218.784 (std::deque) |
@satherton Since you are reviewing #1986, I'm adding you to this review as well. @negoyal : #1986 has same name as this PR. Can you update the description/title of both PRs accordingly, or close if one of them is obsolete. |
This is the correct one with the relevant title. I have closed the obsolete one. |
Have you run any whole-system performance tests on this PR? |
I have run Jenkins perfsanity. But given the inconsistencies across runs in general, it's hard to showcase the improvements. Also we don't have any target tests. That's why I included the microbenchmark numbers here. After discussing with Markus and Ashish, we had a consensus that it should be a safe change and deque should be no worse than a map (it should be better in most cases). So it should be safe to merge. @mpilman any thoughts? |
While the deque does appear to beat std::map, it is by a fairly small amount - 0.6 % and .9 % reductions in the benchmark times. Perhaps the reason the gains aren't higher is that deque isn't backed by a vector, it's more like a vector of vectors, so the binary search of it isn't quite as efficient as a binary search on a vector. I'm not sure how insert suffers compared to a plain vector but based on the benchmark I would say that it does, because always inserting the highest element into a balanced binary tree causes the most rotations and yet it is apparently almost as fast as the deque. Or maybe the CPU time attributable to the map/deque is actually small compared to comparison/construction/destruction of the value type? @negoyal did you do any profiling to try to separate the container from the template type? I think the ideal structure here might be a sorted array/vector that is used in a circular / wrap-around manner, starting with a reasonable size limit (expected unique mutation versions per second * 10) as doubling resizes would be high overhead so should be made rare. Since the vector is still in sorted order except for the rotating wrap-around point, a binary search can still be done although it's a bit tricky (and made for a great interview question by @AlvinMooreSr). I can't think of a more efficient way to provide exactly the API we need - clear the smallest entry, add a new largest entry, binary search all the entries. |
Hi Steve, You have a valid point. Markus and I did think of a circular buffer as well. The reason we chose to go with deque was that it shouldn't be significantly worse than a circular buffer and the data structure is available to use. Circular buffer shouldn't be too hard to do and I would be curious to see how much better that actually is. But for now, since deque should still be faster than a map, are you ok with getting this PR merged for now. And I can create a separate PR to try the circular buffer part? I have not done the profiling. Thanks, |
I haven't reviewed the code yet, but pending review I have no objections to merging it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few nitpicks for which I made suggested changes.
Indentation fix. Co-Authored-By: Steve Atherton <stevea@apple.com>
Indentation Fix Co-Authored-By: Steve Atherton <stevea@apple.com>
Renamed variable to be more specific Co-Authored-By: Steve Atherton <stevea@apple.com>
Co-Authored-By: Steve Atherton <stevea@apple.com>
Co-Authored-By: Steve Atherton <stevea@apple.com>
Co-Authored-By: Steve Atherton <stevea@apple.com>
Replaces std::map with std::deque in versionedMap.