Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable arbitrary commands to be run on cluster mode #117

Merged

Conversation

filipecosta90
Copy link
Collaborator

@filipecosta90 filipecosta90 commented Mar 23, 2020

This PR enables running arbitrary commands on cluster mode and adds the required testing to it.
Example command:

memtier_benchmark -s 10.3.0.54 -p 12002 -c 50 -t 14 \
            --command="hset __key__ f1 v1 f2 v2 f3 v3 f4 v4 f5 v5 f6 v6 f7 v7" \
            --hide-histogram --test-time 180 --run-count 3 --cluster-mode

Given it detected the issue in #204 it fixes it as well.

Copy link
Contributor

@yossigo yossigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@filipecosta90 This looks good to me, with a few minor changes (and we'll need to rebase all the commits). In your example above you only specify a single __key__, I guess there should be no problem to use an arbitrary number of __key__ and __data__ arguments right?

client.cpp Outdated Show resolved Hide resolved
cluster_client.cpp Outdated Show resolved Hide resolved
cluster_client.cpp Outdated Show resolved Hide resolved
@filipecosta90
Copy link
Collaborator Author

@filipecosta90 This looks good to me, with a few minor changes (and we'll need to rebase all the commits). In your example above you only specify a single __key__, I guess there should be no problem to use an arbitrary number of __key__ and __data__ arguments right?

Yes, assuming that the keys are all in the same slot ( given that we request for a connection based on the key ). I'll make that change and mention that the first key position will always be the one used by memtier to internal map the command to the proper shard connection. WDYT?

@filipecosta90
Copy link
Collaborator Author

@filipecosta90 This looks good to me, with a few minor changes (and we'll need to rebase all the commits). In your example above you only specify a single __key__, I guess there should be no problem to use an arbitrary number of __key__ and __data__ arguments right?

@yossigo I've added the multi-placeholder test on the last commit. I believe all issues have been solved and we're ready to review :)

yossigo
yossigo previously approved these changes May 3, 2022
Copy link
Contributor

@yossigo yossigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@filipecosta90 Had a quick look, seems OK to me.
@YaacovHazan Any other inputs?

@YaacovHazan
Copy link
Collaborator

YaacovHazan commented May 4, 2022

@filipecosta90 Looks good. I see that in the example test you are still using one __key__. Do we support more than one __key__? if yes what user should expect when the __key__'s are mapped to different slots?

Did we test it? As far as I remember when we are running with -n, we are filling the queue per connection according to the requested number of requests.
So once you reach that limit, all the conns send the keys that they have in thier queue.
But now are you sure we are not hung? or sending more requests than needed?

@filipecosta90
Copy link
Collaborator Author

@filipecosta90 Looks good. I see that in the example test you are still using one __key__. Do we support more than one __key__? if yes what user should expect when the __key__'s are mapped to different slots?

@YaacovHazan with the following example:

memtier_benchmark --command "MSET __key__ v __key__ v" -p 30001 --cluster-mode

the expected CROSSSLOT error is replied:

server 127.0.0.1:30001 handle error response: -CROSSSLOT Keys in request don't hash to the same slot

I believe this is the expected behavior correct?

@filipecosta90
Copy link
Collaborator Author

@YaacovHazan I've added an extra test for multi-key placeholders in 6ef066a

on the tests we confirm that the number of requests matches the expected.

@filipecosta90
Copy link
Collaborator Author

@YaacovHazan / @yossigo followed the recommendations and key placeholder is only allowed to be used once per command.
One interesting thing that I saw on the cluster_client is that before 5150d09 and 2d86e8c it was issuing more than the N requested commands of the benchmark and that was because of the code that fills in the key_index_pool for slots that are not of that connection. This was why I needed to do the change in 2d86e8c .

@YaacovHazan
Copy link
Collaborator

@filipecosta90 why did you add "Don't fill key_index_pool for other connections on cluster client" commit, this is by design

@filipecosta90
Copy link
Collaborator Author

@filipecosta90 why did you add "Don't fill key_index_pool for other connections on cluster client" commit, this is by design

@YaacovHazan if we kept the code as is it hanged indefinitely given we were generating random keys that did not belong to that slot and then stopped prior generating any other key for that connection. As soon as I removed the check for total generated keys that included the keys that were buffered we were then issuing more commands that were required. The solution, ( proper one IMHO ) is not to buffer any key that does not belong to that conn. In this manner, we've solved both issues.

@YaacovHazan
Copy link
Collaborator

@filipecosta90 assuming Cluster Mode with SET/GET (1 key command), the idea is that you as a client have some key generator based on the user requirements.

Once you generated a key you should deliver it, if currently, you are in a context of a connection that the key does not belong to, it doesn't mean you can't or don't want to send it. You should use it but with the right connection.

If we were not connection driven, you could think of it as you have some basic component that generates keys and sends them with the right connection based on the current Cluster topology (and not that one connection put keys for another connection).

Otherwise, you are impacting the randomization of the keys based on the connection (you will use more keys that route to a good connection than a bad one).
The implementation in Memtier-Benchmark is that you choose your key generator and follow that creates the requests.

Now for Cluster with Arbitrary-Command (Multiple keys), the current implementation will not work well and will cause the client to hang. As I said at the early stage of this PR, we should think and consider what is the right way to handle that.
But IMOH just dropping keys is not the right solution and if yes, maybe only for this case of arbitrary command in Cluster mode.

@filipecosta90
Copy link
Collaborator Author

@filipecosta90 assuming Cluster Mode with SET/GET (1 key command), the idea is that you as a client have some key generator based on the user requirements.

Once you generated a key you should deliver it, if currently, you are in a context of a connection that the key does not belong to, it doesn't mean you can't or don't want to send it. You should use it but with the right connection.

If we were not connection driven, you could think of it as you have some basic component that generates keys and sends them with the right connection based on the current Cluster topology (and not that one connection put keys for another connection).

Otherwise, you are impacting the randomization of the keys based on the connection (you will use more keys that route to a good connection than a bad one). The implementation in Memtier-Benchmark is that you choose your key generator and follow that creates the requests.

Now for Cluster with Arbitrary-Command (Multiple keys), the current implementation will not work well and will cause the client to hang. As I said at the early stage of this PR, we should think and consider what is the right way to handle that. But IMOH just dropping keys is not the right solution and if yes, maybe only for this case of arbitrary command in Cluster mode.

@YaacovHazan given arbitrary commands won't allow multiple key placeholders as in the current implementation I guess the only missing concern that is still to be addressed is the part of dropping the keys from different slots when not in arbitrary command. If we fix it IMHO we can move forward with this PR without compromising any of the current usage of cluster mode. Agree? Trying to see if we can make this as simple as possible to merge and have this feature. We need this so much! =)

@YaacovHazan
Copy link
Collaborator

@filipecosta90 ,Not sure I got you about the last concern. I think that once we limit the arbitrary command to one key placeholder, the PR is ok.

Two more notes:

  1. I do think that we need to come up with a complete solution and add the ability to have arbitrary commands with more than one key placeholder.
  2. I know that @ushachar is working on a different approach to generating keys per connection, and it could be the basic step to achieve that.

@filipecosta90
Copy link
Collaborator Author

@YaacovHazan I've enabled all tests and updated based upon:

IMOH just dropping keys is not the right solution and if yes, maybe only for this case of arbitrary command in Cluster mode.

All is green now. Can you check it?

protocol.cpp Outdated
if (current_arg->data.length() != strlen(KEY_PLACEHOLDER)) {
benchmark_error_log("error: key placeholder can't combined with other data\n");
return false;
}

if (key_placeholder_count > 1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should apply only to cluster mode. The non-cluster mode does support more than one key.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, "IMOH just dropping keys is not the right solution and if yes, maybe only for this case of arbitrary command in Cluster mode." was relevant for the initial PR where we allowed more than one key.
Since we agree to limited the arbitrary command in cluster mode for one key (for now) there is no need to "play" with the pool

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should apply only to cluster mode. The non-cluster mode does support more than one key.

@YaacovHazan both for standalone and cluster versions we don't allow for more than one key placeholder.

@@ -1293,6 +1290,11 @@ int main(int argc, char *argv[])
exit(1);
}

// Cluster mode supports only a single key commands
if (cfg.cluster_mode && cfg.arbitrary_commands->at(i).keys_count != 1) {
Copy link
Collaborator Author

@filipecosta90 filipecosta90 Jun 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YaacovHazan we should change from !=1 to >1.
Example of a failing command that should work:

$ ./memtier_benchmark --cluster --command="SET key __data__" -p 30001 
error: Cluster mode supports only a single key commands

and another keyless one

./memtier_benchmark --cluster --command="PUBLISH channel __data__" -p 30001 
error: Cluster mode supports only a single key commands

I'm including that example on CI and addressing it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YaacovHazan I've added the test and confirmed behaviour is as expected with the code change.

@codecov-commenter
Copy link

codecov-commenter commented Jun 19, 2023

Codecov Report

Merging #117 (b6c2158) into master (5c8be9c) will decrease coverage by 0.62%.
The diff coverage is 70.76%.

❗ Current head b6c2158 differs from pull request most recent head 9cd9c82. Consider uploading reports for the commit 9cd9c82 to get more accurate results

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##           master     #117      +/-   ##
==========================================
- Coverage   56.87%   56.25%   -0.62%     
==========================================
  Files          21       21              
  Lines        4283     4330      +47     
==========================================
  Hits         2436     2436              
- Misses       1847     1894      +47     
Impacted Files Coverage Δ
config_types.h 91.66% <ø> (-2.46%) ⬇️
run_stats.h 100.00% <ø> (ø)
memtier_benchmark.cpp 52.81% <50.00%> (+0.05%) ⬆️
run_stats.cpp 77.50% <50.00%> (-0.79%) ⬇️
client.cpp 63.92% <66.66%> (+0.74%) ⬆️
cluster_client.cpp 70.24% <77.77%> (+0.39%) ⬆️
client.h 75.00% <93.33%> (+12.50%) ⬆️
config_types.cpp 42.91% <100.00%> (+0.39%) ⬆️
protocol.cpp 36.65% <100.00%> (+0.10%) ⬆️
shard_connection.cpp 61.73% <100.00%> (+0.12%) ⬆️

... and 1 file with indirect coverage changes

@filipecosta90 filipecosta90 merged commit 1c6735f into RedisLabs:master Jun 21, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants