New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(server): SetConfig() for ClusterConfig. #1202
Conversation
src/server/cluster/cluster_config.cc
Outdated
for (SlotId slot_id = 0; slot_id <= kMaxSlotNum; ++slot_id) { | ||
owned_slots_.emplace(slot_id); | ||
namespace { | ||
optional<ClusterConfig::Node> GetMasterNodeForSlot(const vector<ClusterConfig::Node>& nodes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rename to GetMasterNode?
src/server/cluster/cluster_config.cc
Outdated
} | ||
} | ||
|
||
CHECK(false) << "No master present in nodes"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we dont want to crash the server if there is problem with the config data. You were thinking to add validation on this on a higher level later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove the CHECK()
here, and will also add enforcement on SetConfig()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its ok to have the CHECK here as an internal system behavior. But the SetConfig should have a return value so if the validations fail we should reply to the set config command with error
src/server/cluster/cluster_config.cc
Outdated
std::shared_lock sl(slots_mu_); | ||
return owned_slots_.contains(id); | ||
bool ClusterConfig::IsMySlot(SlotId id) const { | ||
return GetNodeForSlot(id) == nullopt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will need to validate that all slots are allocated in the given config, otherwise this logic will fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently (pre-this PR), when the process starts it assumes that it owns all slots. This logic maintains that, but I actually think it should be the other way around, i.e. that the process should start with 0 slots and take on new slots as needed.
This is worth discussing actually, but anyway, as long as config requests are full, we are good. We can enforce that in SetConfig()
, like you proposed above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes you are right, when the server starts it will start with 0 slots, this is also what I wrote in the design doc. I just created this logic in my PR so that I could check my logic in all unit test with cluster_mode yes as default. But we can fix this now
src/server/cluster/cluster_config.cc
Outdated
} | ||
|
||
optional<ClusterConfig::Node> ClusterConfig::GetNodeForSlot(SlotId id) const { | ||
CHECK(id < config_.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CHECK_LT
src/server/cluster/cluster_config.h
Outdated
|
||
// This array covers the whole range of possible slots. We keep nullopt for the current instance | ||
// ("me"), as we do not need any additional information for it (such as ip and port). | ||
std::array<std::optional<Node>, kMaxSlotNum + 1> config_ ABSL_GUARDED_BY(slots_mu_) = {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use different name than config, as this is not the config data we need to save. We will need to save all the config data in SetConfig to be able to reply to the client commands. f.e https://redis.io/commands/cluster-shards/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point!
src/server/cluster/cluster_config.h
Outdated
private: | ||
void AddSlots(); | ||
// Returns nodes that own `id`. Empty if owned by me. | ||
std::optional<Node> GetNodeForSlot(SlotId id) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You created this function for the MOVE error, right? You will only need tor return the ip and port in this case.
If you save all the config data in SetConfig, as you will need it for later, see my comment on the config member, you can create an array of pair string_view ip and port
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A string_view
won't work here I'm afraid, as the config may be changed after we return from this method.
And also - we'll need to return the ID and role for CLUSTER SHARDS
, so there's no redundant information in returning Node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type of this function can be pair<string,string> while the slots array data can be pair<string_view,sting_view>
I believe you will need to save entire config data not in a vector of slot nodes, but as given in SetConfig command, just vector of shards data. Therefor I suggest having the slots array as pair<string_view,sting_view> which will point to the vector of shards data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about keeping only the config_
array (to be renamed to slots_
) instead of saving redundant data. Then, upon CLUSTER SHARDS
or CLUSTER SLOTS
we could build whatever we need (and it's not like the performance of these command matter). I conceive this to be less error prone (fewer points of failure / mismatches, smaller state to maintain).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but you save on the slots_ array only master node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm changing that following your feedback above :)
@adiholden I slightly changed the API following your feedback and some ideas that came into my mind after that. |
@chakaz did you push the changes to the branch? |
Oops, I guess I didn't :) |
for (const auto& shard : new_config) { | ||
for (const auto& slot_range : shard.slot_ranges) { | ||
if (slot_range.start > slot_range.end) { | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add log prints so it will be easier to understand if we have a problem with the config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's a great idea!
src/server/cluster/cluster_config.h
Outdated
|
||
private: | ||
void AddSlots(); | ||
struct SlotOwner { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this implementation we are using *16384 space to save the data ,instead saving it only once as the config format, and we will need to have a logic to convert from the slot array to the config format again to be able to reply to the CLISTER SHARDS command.
If you would create a member :
std::vector config_;
Which you will set on Set config command and this will be used to reply CLUSTER SHARDS
The SlotOwner struct can be
{
string_view ip;
string_view port;
bool owned_by_me;
}
In this implementation if we have one master and one replica, the data will be saved in the config_ member, the slot array will point to the config data.
In the current implementation we are saving the master and the replica data *16384 times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that it's a matter of what we're optimizing for. The current optimization is aimed for fast "get owners for slot" questions, such as MOVED
replies.
If we're looking to save memory, we could consider using shared_ptr
as the value of the internal map. It has the downside of using atomics, but I guess it's not terrible since it's only going to take effect during SetConfig()
(I think).
But it's true that the current implementation requires more logic to build the reply of CLUSTER SHARDS
and CLUSTER SLOTS
. Not much more logic, essentially building a dictionary by iterating all entries (that'll actually be simpler to implement if we moved to shared_ptr
as we'll be able to compare Node
s). But still, we could avoid compressing ranges.
Do you suggest having 2 members (slots_
and config_
, where slots_
will have 16k entries and will point into config_
)? Because that will keep the 16k times data duplication, but without it I don't think we can guarantee O(1)
computational time for MOVED
replies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should optimise for "get owners for slot" as it is in the flow of client database queries.
But why does my suggestion guarantee O(1) computational time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, I kept 2 members and updated the code according to your suggestion. PTAL.
src/server/cluster/cluster_config.h
Outdated
|
||
// Returns nodes that own `id`. Result will always have the first element set to be the master, | ||
// and the following 0 or more elements are the replicas. | ||
std::vector<Node> GetNodesForSlot(SlotId id) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the flow we will need this api for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CLUSTER SHARDS
for example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed per the new API
src/server/cluster_config_test.cc
Outdated
} | ||
|
||
TEST_F(ClusterConfigTest, ConfigSetInvalidEmpty) { | ||
// Test that empty config means all slots are owned locally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the comment
TEST_F(ClusterConfigTest, ConfigEmpty) { | ||
// Test that empty-initialization causes none of the slots to be owned locally. | ||
for (SlotId i : {0, 1, 10, 100, 1'000, 10'000, 16'000, 0x3FFF}) { | ||
EXPECT_FALSE(config_.IsMySlot(i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a test where you check that IsMySlot is true and for some false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea!
I changed ClusterConfigTest.ConfigSetMultipleInstances
to have that.
feat(server): SetConfig() for ClusterConfig.
No description provided.