Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace std::map with std::deque for versionedMap #1988

Merged
merged 17 commits into from
Sep 27, 2019

Conversation

negoyal
Copy link
Contributor

@negoyal negoyal commented Aug 13, 2019

Replaces std::map with std::deque in versionedMap.

Replaces std::map with std::deque in versionedMap.
negoyal and others added 10 commits August 15, 2019 11:48
While replacing std::map with std::deque. Don't store
any pointers because emplace can invalidate iterators.
Removed commented out code and added some comments for clarification.
I microbenchmarked the storage queue standalone. i.e. the set and clearrange mutations
were performed solely at the in-memory storage queue. No other FDB components were involved
in this test. And hence the numbers presented here the best case numbers.

Test setup:
- 100M mutations: about 5% clearRange and 95% set mutations
- 100M rangeReads
- Keys/Values generated using deterministicRandom()
- A new version generated for each mutation (i.e. it's an extreme version test)

Performance comparison between std::map and std::deque for VersionedMap"

                                      std::map    std::deque
Time to perform the mutations         220.066     218.784
Time to perform buffered readRange    184.423     171.578
@negoyal negoyal changed the title This is a draft PR. Not yet open for review. Please ignore for now. Replace std::map with std::deque for versionedMap Sep 4, 2019
@negoyal
Copy link
Contributor Author

negoyal commented Sep 4, 2019

VerisonedMap (storage queue) is a std::map of versioned and corresponding PTree root nodes. It provides a logical abstraction of a separate PTree at any given version. Since the versions are monotonically increasing, the data is intrinsically sorted. Hence don't need an ordered tree (std::map is a red-black tree) for that reason. std::deque gives us the ability to remove from the front and insert at the end efficiently. And we can still do binary search because of intrinsically sorted data. This can provide time and memory benefits esp. in the presence of very many versions.

I microbenchmarked the VersionedMap standalone. i.e. the set and clear-range mutations were performed solely at the in-memory storage queue. No other FDB components were involved
in this test. And hence the numbers presented here are the best case numbers.

Test setup:

  • 100M mutations: about 5% clearRange and 95% set mutations
  • 100M rangeReads
  • Keys/Values generated using deterministicRandom()
  • A new version generated for each mutation (i.e. it's an extreme version test)

Performance comparison between std::map and std::deque for VersionedMap"

Time to perform the mutations 220.066 (std::map) 218.784 (std::deque)
Time to perform the readRange 184.423 (std::map) 171.578 (std::deque)

@negoyal negoyal marked this pull request as ready for review September 4, 2019 20:18
@vishesh
Copy link
Contributor

vishesh commented Sep 10, 2019

@satherton Since you are reviewing #1986, I'm adding you to this review as well.

@negoyal : #1986 has same name as this PR. Can you update the description/title of both PRs accordingly, or close if one of them is obsolete.

@negoyal
Copy link
Contributor Author

negoyal commented Sep 10, 2019

This is the correct one with the relevant title. I have closed the obsolete one.

@alexmiller-apple
Copy link
Contributor

Have you run any whole-system performance tests on this PR?

@negoyal
Copy link
Contributor Author

negoyal commented Sep 11, 2019

I have run Jenkins perfsanity. But given the inconsistencies across runs in general, it's hard to showcase the improvements. Also we don't have any target tests. That's why I included the microbenchmark numbers here.

After discussing with Markus and Ashish, we had a consensus that it should be a safe change and deque should be no worse than a map (it should be better in most cases). So it should be safe to merge.

@mpilman any thoughts?

@satherton
Copy link
Contributor

While the deque does appear to beat std::map, it is by a fairly small amount - 0.6 % and .9 % reductions in the benchmark times.

Perhaps the reason the gains aren't higher is that deque isn't backed by a vector, it's more like a vector of vectors, so the binary search of it isn't quite as efficient as a binary search on a vector. I'm not sure how insert suffers compared to a plain vector but based on the benchmark I would say that it does, because always inserting the highest element into a balanced binary tree causes the most rotations and yet it is apparently almost as fast as the deque.

Or maybe the CPU time attributable to the map/deque is actually small compared to comparison/construction/destruction of the value type? @negoyal did you do any profiling to try to separate the container from the template type?

I think the ideal structure here might be a sorted array/vector that is used in a circular / wrap-around manner, starting with a reasonable size limit (expected unique mutation versions per second * 10) as doubling resizes would be high overhead so should be made rare. Since the vector is still in sorted order except for the rotating wrap-around point, a binary search can still be done although it's a bit tricky (and made for a great interview question by @AlvinMooreSr).

I can't think of a more efficient way to provide exactly the API we need - clear the smallest entry, add a new largest entry, binary search all the entries.

@negoyal
Copy link
Contributor Author

negoyal commented Sep 12, 2019

Hi Steve,

You have a valid point. Markus and I did think of a circular buffer as well. The reason we chose to go with deque was that it shouldn't be significantly worse than a circular buffer and the data structure is available to use.

Circular buffer shouldn't be too hard to do and I would be curious to see how much better that actually is.

But for now, since deque should still be faster than a map, are you ok with getting this PR merged for now. And I can create a separate PR to try the circular buffer part?

I have not done the profiling.

Thanks,
Neelam

@satherton
Copy link
Contributor

satherton commented Sep 12, 2019

I haven't reviewed the code yet, but pending review I have no objections to merging it.

Copy link
Contributor

@satherton satherton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few nitpicks for which I made suggested changes.

fdbclient/VersionedMap.h Outdated Show resolved Hide resolved
fdbclient/VersionedMap.h Outdated Show resolved Hide resolved
fdbclient/VersionedMap.h Outdated Show resolved Hide resolved
fdbclient/VersionedMap.h Outdated Show resolved Hide resolved
fdbclient/VersionedMap.h Outdated Show resolved Hide resolved
fdbclient/VersionedMap.h Outdated Show resolved Hide resolved
Indentation fix.

Co-Authored-By: Steve Atherton <stevea@apple.com>
negoyal and others added 5 commits September 26, 2019 21:57
Indentation Fix

Co-Authored-By: Steve Atherton <stevea@apple.com>
Renamed variable to be more specific

Co-Authored-By: Steve Atherton <stevea@apple.com>
Co-Authored-By: Steve Atherton <stevea@apple.com>
Co-Authored-By: Steve Atherton <stevea@apple.com>
Co-Authored-By: Steve Atherton <stevea@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants