Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat (hset): Support arguments (count, withvalues) in HRANDFIELD #1804

Merged
merged 1 commit into from Sep 6, 2023

Conversation

theyueli
Copy link
Contributor

@theyueli theyueli commented Sep 5, 2023

Fixes: #858

It might possibly fix: #1707

The algorithms that support both encodings (string map and listpack) have been implemented and tested. To use string map requires a larger hset (my tests used 500+ entries)

The random selection algorithms implemented for string map class are the reimplementation of the same algorithms used by Redis' listpack. (therefore same time complexities)

Without this patch:

127.0.0.1:6379> HSET coin heads obverse tails reverse edge null
(integer) 3
127.0.0.1:6379> HRANDFIELD coin 3 WITHVALUES
(error) ERR wrong number of arguments for 'hrandfield' command
127.0.0.1:6379> HRANDFIELD coin -1
(error) ERR wrong number of arguments for 'hrandfield' command
127.0.0.1:6379> 

after this patch:

127.0.0.1:6379> HSET coin heads obverse tails reverse edge null
(integer) 3
127.0.0.1:6379> HRANDFIELD coin -5 WITHVALUES
 1) "tails"
 2) "reverse"
 3) "tails"
 4) "reverse"
 5) "heads"
 6) "obverse"
 7) "tails"
 8) "reverse"
 9) "edge"
10) "null"
127.0.0.1:6379> HRANDFIELD coin 3
1) "heads"
2) "tails"
3) "edge"
127.0.0.1:6379> HRANDFIELD coin 3 WITHVALUES
1) "heads"
2) "obverse"
3) "tails"
4) "reverse"
5) "edge"
6) "null"

@royjacobson
Copy link
Contributor

royjacobson commented Sep 5, 2023

redis claim complexity of O(N) with N being the amount of items returned. AFAICT RandomPairsUnique and RandomPairs in this PR are O(M) with M=size(map). Am I missing something?

@dranikpg
Copy link
Contributor

dranikpg commented Sep 5, 2023

@royjacobson

I might be wrong, but it seems like listpacks implementation is also O(M) 😮 They use the same algorithms as here

Actually we can make our implementation O(N) by harnessing stringmaps internals. But I don't think its mandatory as I already answered in the first PR on this issue

src/core/string_map.cc Outdated Show resolved Hide resolved
src/core/string_map.cc Outdated Show resolved Hide resolved
src/server/hset_family.cc Outdated Show resolved Hide resolved
src/server/hset_family.cc Outdated Show resolved Hide resolved
src/server/hset_family.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@kostasrim kostasrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work 👨‍🍳 , some minor comments

src/core/string_map.h Outdated Show resolved Hide resolved
src/core/string_map.cc Outdated Show resolved Hide resolved
src/server/hset_family.cc Outdated Show resolved Hide resolved
src/server/hset_family.cc Show resolved Hide resolved
src/server/hset_family.cc Outdated Show resolved Hide resolved
src/server/hset_family.cc Outdated Show resolved Hide resolved
src/server/hset_family.cc Show resolved Hide resolved
src/core/string_map.cc Outdated Show resolved Hide resolved
std::vector<RandomPick> picks;
unsigned int total_size = Size();

for (unsigned int i = 0; i < count; ++i) {
Copy link
Contributor

@kostasrim kostasrim Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use a std::map<uint32_t, uint32_t>. The key represents the index, and value represents the number of times you encountered it (we need map, since it's an ordered container). Each time you loop, you check if the index rand() % total_size already exists. If it does you increment it's occurrence (the value) by 1. Otherwise (if the index is not already inserted) you insert it with an associated value of 1 (since it's the first occurrence).

That way you will:

  1. Get rid of std::sort.
  2. Simplify the code below, since now you don't need two while loops. You only need a for loop, and for each element, you output occurrence number of times

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, but I don't think it matters that much. Sort only gets slow for a really big number of elements and for it the i/o time is much larger

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No no, I do not care about the performance here, I don't expect that std::sort will have any measurable impact since it the count will always be relatively small. But by using a map it will simplify the implemetnaion of this function and make it more readable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to keep as is. sort() is more implicit when std::map is used. Personally, the current version has better readabliy by making sort obvious.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a) I think stl associative ordered containers are used exactly for that, so I would argue that vector + sort is an antipattern. b) how is what you wrote simpler than this 3 lines? No pair accesses, no nested while loops no nothing:

for(auto it = begin(); it < end() && picks_sz < count ; ++it) {
  auto [key, frequency] = *it;
  keys.insert(keys.end(), frequency, key);
  picks_sz += frequency;
}

I might a have a mistake on the above code but that's the gist of it

src/server/hset_family.cc Show resolved Hide resolved
@theyueli
Copy link
Contributor Author

theyueli commented Sep 6, 2023

@dranikpg @kostasrim addressed all the comments, please take another look, thanks!

src/server/hset_family.cc Show resolved Hide resolved
src/server/hset_family.cc Outdated Show resolved Hide resolved
dranikpg
dranikpg previously approved these changes Sep 6, 2023
Copy link
Contributor

@dranikpg dranikpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. But I still encourage you to use a unique_ptr instead of raw arrays
  2. about RandomPairs(), we usually take output arguments by pointer (and returning values is even better) but I don't wanna be picky

kostasrim
kostasrim previously approved these changes Sep 6, 2023
src/server/hset_family.cc Outdated Show resolved Hide resolved
@theyueli
Copy link
Contributor Author

theyueli commented Sep 6, 2023

  1. But I still encourage you to use a unique_ptr instead of raw arrays

thanks @dranikpg, I just changed to use unique_ptr, please take a second look.

Copy link
Contributor

@dranikpg dranikpg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

@dranikpg dranikpg enabled auto-merge (squash) September 6, 2023 08:55
@dranikpg dranikpg merged commit a8e4beb into dragonflydb:main Sep 6, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Datagrip support Support arguments (count, withvalues) in HRANDFIELD
4 participants