New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(cluster): check command keys ownership #1194
Conversation
src/server/cluster/cluster_data.h
Outdated
class ClusterData { | ||
public: | ||
ClusterData(); | ||
static uint16_t keyHashSlot(std::string_view key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a typedef
for this uint16_t
instead of repeating it, maybe using SlotId = uint16_t
?
src/server/cluster/cluster_data.h
Outdated
ClusterData(); | ||
static uint16_t keyHashSlot(std::string_view key); | ||
|
||
bool IsKeyInMySlot(std::string_view key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe IsKeyInMySlots()
(plural)?
Or IsKeyOwned()
?
src/server/cluster/cluster_data.h
Outdated
bool AddSlots(); | ||
|
||
util::SharedMutex slots_mu_; | ||
std::unordered_set<uint16_t> owned_slots_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mostly better to use absl::flat_hash_set
instead of unordered_set
src/server/cluster/cluster_data.h
Outdated
|
||
util::SharedMutex slots_mu_; | ||
std::unordered_set<uint16_t> owned_slots_; | ||
static constexpr uint16_t kMaxSlotNum = 0x3FFF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd keep this implementation detail in the .cc
file
src/server/cluster/cluster_data.cc
Outdated
|
||
bool cluster_enabled = false; | ||
|
||
// If the key contains the {...} pattern, return only the part between { and } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this documentation to the .h
file?
src/server/cluster/cluster_data.cc
Outdated
slots_mu_.lock_shared(); | ||
size_t count = owned_slots_.count(slot_id); | ||
slots_mu_.unlock_shared(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While count()
shouldn't throw any exceptions, it's generally safer to use scoped guards like you did above.
Here, the unlocking really doesn't help us as all there's left outside is the return and a comparison, so maybe just create a guard above?
If not, I'd define a scoped guard with a narrower scope, like:
size_t count;
{
lock_guard gu(slots_mu_);
count = owned_slots_.count(slot_id);
}
return count > 0;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am calling lock_shared and there is not shard_lock_gaurd in std ,right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you're looking for std::shared_lock
: https://en.cppreference.com/w/cpp/thread/shared_lock :)
src/server/cluster/cluster_data.h
Outdated
class ClusterData { | ||
public: | ||
ClusterData(); | ||
static uint16_t keyHashSlot(std::string_view key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call it KeySlot()
instead, as hashing is really an implementation detail?
src/server/cluster_data_test.cc
Outdated
ClusterDataTest() { | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this constructor if it's empty
src/server/main_service.cc
Outdated
string_view key = ArgS(args, i); | ||
if (!server_family_.cluster_data()->IsKeyInMySlot(key)) { | ||
VLOG(1) << "Key " << key << " is not owned by server"; | ||
(*dfly_cntx)->SendError("MOVED"); // TODO add more info to moved error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's think of the case in which a command was issued on multiple keys, and some are owned by this instance, but others are not.
In this case, I don't think that the proper response is MOVED
, because there's no way to issue that command no matter which instance receives it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, any command which runs on multiple instances can't be issued, and we also can't return MOVED
for it (we won't be able to redirect the response to any specific instance).
As such, perhaps we should change the IsKeyInMySlot()
to be GetSlotForKey(key)
, which returns a slot id.
Then we can determine the response:
- If 1 instance, and it is the current instance: handle it as usual
- If 1 instance but it's not the current one: return
MOVED
- If it's >1: return an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, I will block requests on different slots, as redis does.
So I will check first if keys belong to different slots and than if this slot belongs to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that's what I proposed, just ordered differently :)
@@ -118,6 +118,8 @@ class Service : public facade::ServiceInterface { | |||
// Return false if command is invalid and reply with error. | |||
bool VerifyCommand(const CommandId* cid, CmdArgList args, ConnectionContext* cntx); | |||
|
|||
bool CheckKeysOwnership(const CommandId* cid, CmdArgList args, ConnectionContext* dfly_cntx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add some documentation. Please include that this function returns true
if all keys are owned by this instance, false
otherwise. If false
is returned, an error has already been send to the client.
src/server/cluster/cluster_data.h
Outdated
|
||
namespace dfly { | ||
|
||
class ClusterData { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't really data, right? As in, this class is not responsible for any data nor data handling.
Maybe we can call it ClusterSlots
? Or ClusterKeying
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this class will be used to hold all the cluster configuration, right now its only slots data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So how about ClusterConfiguration
?
I personally think that "data" is too generic, but obviously your call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok will change
src/server/cluster/cluster_data.cc
Outdated
// TODO update logic acording to config | ||
// currently add all slots to owned slots | ||
std::lock_guard lk{slots_mu_}; | ||
for (uint16_t slot_id = 0; slot_id <= kMaxSlotNum; ++slot_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not ideal to consume so much RAM for so little information. If there are 3 nodes in the cluster, each will have a map with >5000 entries. Even if we moved to absl::flat_hash_set
the overhead would still be quite big.
However, since we will need to store all slots (to be able to properly redirect), we'll end up storing 16k entries, each holding multiple strings (IDs and roles).
We could use an interval map here (like Boost's), but perhaps it's not that bad to consume <1MB for cluster related metadata.
We could also do a flat_hash_map<uint16_t, std::shared_ptr<Instance>>
to have a single instance data allocated.
Just an idea, do whatever you want with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a little bit confused. first you say that using absl::flat_hash_set will be overhead , but than you suggest to use flat_hash_map<uint16_t, std::shared_ptr>?
Currently I dont have instance data structure. We will need it for the host in the MOVED error, but lets add it once the config is done. Ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry :)
I tried to say that any "regular" map will have the overhead of not "compressing" ranges to a single entry (like saying slots [0, 100)
belong to instance X
).
I realize (and Roman says so as well) that using an interval map is not worth it, so maybe instead of copying the value (in this case the hosts, ips and roles of the instances) 16k times, we'll hold a single shared_ptr
for each instance and point to it form the map.
Then again, I guess it's not that bad and Roman thinks we can do with a // TODO
so your call :)
I think interval tree is an overkill here just bevause data is bounded as
you mentioned and has less than 0.1% overhead. shared_ptr or a todo could
both fit here at this point
…On Mon, May 8, 2023, 15:44 Chaka ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/server/cluster/cluster_data.cc
<#1194 (comment)>
:
> +}
+
+uint16_t ClusterData::keyHashSlot(std::string_view key) {
+ std::string_view tag = KeyTag(key);
+ return crc16(tag.data(), tag.length()) & kMaxSlotNum;
+}
+
+ClusterData::ClusterData() {
+ AddSlots();
+}
+
+bool ClusterData::AddSlots() {
+ // TODO update logic acording to config
+ // currently add all slots to owned slots
+ std::lock_guard lk{slots_mu_};
+ for (uint16_t slot_id = 0; slot_id <= kMaxSlotNum; ++slot_id) {
It's not ideal to consume so much RAM for so little information. If there
are 3 nodes in the cluster, each will have a map with >5000 entries. Even
if we moved to absl::flat_hash_set the overhead would still be quite big.
However, since we will need to store all slots (to be able to properly
redirect), we'll end up storing 16k entries, each holding multiple strings
(IDs and roles).
We could use an interval map here (like Boost's
<https://www.boost.org/doc/libs/1_59_0/libs/icl/doc/html/index.html#boost_icl.introduction>),
but perhaps it's not that bad to consume <1MB for cluster related metadata.
We could also do a flat_hash_map<uint16_t, std::shared_ptr<Instance>> to
have a single instance data allocated.
Just an idea, do whatever you want with it.
—
Reply to this email directly, view it on GitHub
<#1194 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA4BFCGRJYCAWXPBESUIERLXFDTBDANCNFSM6AAAAAAXZTU62U>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
src/server/cluster/cluster_data.cc
Outdated
// TODO update logic acording to config | ||
// currently add all slots to owned slots | ||
std::lock_guard lk{slots_mu_}; | ||
for (uint16_t slot_id = 0; slot_id <= kMaxSlotNum; ++slot_id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry :)
I tried to say that any "regular" map will have the overhead of not "compressing" ranges to a single entry (like saying slots [0, 100)
belong to instance X
).
I realize (and Roman says so as well) that using an interval map is not worth it, so maybe instead of copying the value (in this case the hosts, ips and roles of the instances) 16k times, we'll hold a single shared_ptr
for each instance and point to it form the map.
Then again, I guess it's not that bad and Roman thinks we can do with a // TODO
so your call :)
src/server/cluster/cluster_config.h
Outdated
// If the key contains the {...} pattern, return only the part between { and } | ||
std::string_view KeyTag(std::string_view key); | ||
|
||
extern bool cluster_enabled; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you decided against exporting these from within ClusterConfig
?
(ideally cluster_enabled
will be wrapped in a static method)
src/server/cluster/cluster_data.cc
Outdated
// currently add all slots to owned slots | ||
std::lock_guard lk{slots_mu_}; | ||
for (uint16_t slot_id = 0; slot_id <= kMaxSlotNum; ++slot_id) { | ||
owned_slots_.emplace(slot_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re/ safety, see a few examples here: https://stackoverflow.com/questions/10890653/why-would-i-ever-use-push-back-instead-of-emplace-back
The actual difference between these two statements is that the more powerful emplace_back will call any type of constructor out there, whereas the more cautious push_back will call only constructors that are implicit
With a good example:
std::vector<std::unique_ptr<T>> v;
T a;
v.emplace_back(std::addressof(a)); // compiles
v.push_back(std::addressof(a)); // fails to compile
src/server/main_service.cc
Outdated
string_view key = ArgS(args, i); | ||
if (!server_family_.cluster_data()->IsKeyInMySlot(key)) { | ||
VLOG(1) << "Key " << key << " is not owned by server"; | ||
(*dfly_cntx)->SendError("MOVED"); // TODO add more info to moved error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that's what I proposed, just ordered differently :)
src/server/main_service.cc
Outdated
|
||
const auto& key_index = *key_index_res; | ||
SlotId keys_slot; | ||
bool crossslot = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: cross_slot
?
src/server/cluster/cluster_config.cc
Outdated
size_t count = owned_slots_.count(id); | ||
return count > 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe return owned_slots.contains(id);
?
Signed-off-by: adi_holden <adi@dragonflydb.io>
Signed-off-by: adi_holden <adi@dragonflydb.io>
Signed-off-by: adi_holden <adi@dragonflydb.io>
Signed-off-by: adi_holden <adi@dragonflydb.io>
Signed-off-by: adi_holden <adi@dragonflydb.io>
7f93d36
to
23f27c3
Compare
} | ||
|
||
const auto& key_index = *key_index_res; | ||
SlotId keys_slot; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got this warning on compilation and was about to ignore it.
../src/server/main_service.cc: In member function ‘bool dfly::Service::CheckKeysOwnership(const dfly::CommandId*, facade::CmdArgList, dfly::ConnectionContext*)’:
../src/server/main_service.cc:625:49: warning: ‘keys_slot’ may be used uninitialized in this function [-Wmaybe-uninitialized]
625 | if (!server_family_.cluster_config()->IsMySlot(keys_slot)) {
But it seems that slot_id can actually be non initialized. Because even if cid->first_key_pos() > 0
, the command itself can have 0 keys if it uses variadic keys. For example: EVAL {script} 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @dranikpg
* feat(cluster): check command keys ownership on cluster mode Signed-off-by: adi_holden <adi@dragonflydb.io>
If cluster mode is enabled