-
Notifications
You must be signed in to change notification settings - Fork 870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat (hset): Support arguments (count, withvalues) in HRANDFIELD #1804
Conversation
redis claim complexity of |
I might be wrong, but it seems like listpacks implementation is also O(M) 😮 They use the same algorithms as here Actually we can make our implementation O(N) by harnessing stringmaps internals. But I don't think its mandatory as I already answered in the first PR on this issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work 👨🍳 , some minor comments
std::vector<RandomPick> picks; | ||
unsigned int total_size = Size(); | ||
|
||
for (unsigned int i = 0; i < count; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use a std::map<uint32_t, uint32_t>
. The key represents the index, and value represents the number of times you encountered it (we need map, since it's an ordered container). Each time you loop, you check if the index rand() % total_size
already exists. If it does you increment it's occurrence (the value) by 1. Otherwise (if the index is not already inserted) you insert it with an associated value of 1
(since it's the first occurrence).
That way you will:
- Get rid of
std::sort
. - Simplify the code below, since now you don't need two while loops. You only need a for loop, and for each element, you output
occurrence
number of times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, but I don't think it matters that much. Sort only gets slow for a really big number of elements and for it the i/o time is much larger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No no, I do not care about the performance here, I don't expect that std::sort
will have any measurable impact since it the count will always be relatively small. But by using a map
it will simplify the implemetnaion of this function and make it more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to keep as is. sort() is more implicit when std::map is used. Personally, the current version has better readabliy by making sort obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So a) I think stl associative ordered containers are used exactly for that, so I would argue that vector + sort is an antipattern. b) how is what you wrote simpler than this 3 lines? No pair accesses, no nested while loops no nothing:
for(auto it = begin(); it < end() && picks_sz < count ; ++it) {
auto [key, frequency] = *it;
keys.insert(keys.end(), frequency, key);
picks_sz += frequency;
}
I might a have a mistake on the above code but that's the gist of it
@dranikpg @kostasrim addressed all the comments, please take another look, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- But I still encourage you to use a unique_ptr instead of raw arrays
- about RandomPairs(), we usually take output arguments by pointer (and returning values is even better) but I don't wanna be picky
thanks @dranikpg, I just changed to use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
Fixes: #858
It might possibly fix: #1707
The algorithms that support both encodings (string map and listpack) have been implemented and tested. To use string map requires a larger hset (my tests used 500+ entries)
The random selection algorithms implemented for string map class are the reimplementation of the same algorithms used by Redis' listpack. (therefore same time complexities)
Without this patch:
after this patch: