Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(server): SetConfig() for ClusterConfig. #1202

Merged
merged 7 commits into from May 14, 2023
Merged

feat(server): SetConfig() for ClusterConfig. #1202

merged 7 commits into from May 14, 2023

Conversation

chakaz
Copy link
Collaborator

@chakaz chakaz commented May 10, 2023

No description provided.

@chakaz chakaz requested a review from adiholden May 10, 2023 19:29
for (SlotId slot_id = 0; slot_id <= kMaxSlotNum; ++slot_id) {
owned_slots_.emplace(slot_id);
namespace {
optional<ClusterConfig::Node> GetMasterNodeForSlot(const vector<ClusterConfig::Node>& nodes,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename to GetMasterNode?

}
}

CHECK(false) << "No master present in nodes";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont want to crash the server if there is problem with the config data. You were thinking to add validation on this on a higher level later?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove the CHECK() here, and will also add enforcement on SetConfig()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its ok to have the CHECK here as an internal system behavior. But the SetConfig should have a return value so if the validations fail we should reply to the set config command with error

std::shared_lock sl(slots_mu_);
return owned_slots_.contains(id);
bool ClusterConfig::IsMySlot(SlotId id) const {
return GetNodeForSlot(id) == nullopt;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to validate that all slots are allocated in the given config, otherwise this logic will fail

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently (pre-this PR), when the process starts it assumes that it owns all slots. This logic maintains that, but I actually think it should be the other way around, i.e. that the process should start with 0 slots and take on new slots as needed.
This is worth discussing actually, but anyway, as long as config requests are full, we are good. We can enforce that in SetConfig(), like you proposed above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes you are right, when the server starts it will start with 0 slots, this is also what I wrote in the design doc. I just created this logic in my PR so that I could check my logic in all unit test with cluster_mode yes as default. But we can fix this now

}

optional<ClusterConfig::Node> ClusterConfig::GetNodeForSlot(SlotId id) const {
CHECK(id < config_.size());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHECK_LT


// This array covers the whole range of possible slots. We keep nullopt for the current instance
// ("me"), as we do not need any additional information for it (such as ip and port).
std::array<std::optional<Node>, kMaxSlotNum + 1> config_ ABSL_GUARDED_BY(slots_mu_) = {};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use different name than config, as this is not the config data we need to save. We will need to save all the config data in SetConfig to be able to reply to the client commands. f.e https://redis.io/commands/cluster-shards/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point!

private:
void AddSlots();
// Returns nodes that own `id`. Empty if owned by me.
std::optional<Node> GetNodeForSlot(SlotId id) const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You created this function for the MOVE error, right? You will only need tor return the ip and port in this case.
If you save all the config data in SetConfig, as you will need it for later, see my comment on the config member, you can create an array of pair string_view ip and port

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A string_view won't work here I'm afraid, as the config may be changed after we return from this method.
And also - we'll need to return the ID and role for CLUSTER SHARDS, so there's no redundant information in returning Node

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type of this function can be pair<string,string> while the slots array data can be pair<string_view,sting_view>
I believe you will need to save entire config data not in a vector of slot nodes, but as given in SetConfig command, just vector of shards data. Therefor I suggest having the slots array as pair<string_view,sting_view> which will point to the vector of shards data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about keeping only the config_ array (to be renamed to slots_) instead of saving redundant data. Then, upon CLUSTER SHARDS or CLUSTER SLOTS we could build whatever we need (and it's not like the performance of these command matter). I conceive this to be less error prone (fewer points of failure / mismatches, smaller state to maintain).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but you save on the slots_ array only master node

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm changing that following your feedback above :)

@chakaz
Copy link
Collaborator Author

chakaz commented May 11, 2023

@adiholden I slightly changed the API following your feedback and some ideas that came into my mind after that.
Specifically, I added an explicit master member to the nodes, and the vector now only represents replicas. This leaves out many error paths (no master, multiple masters) that are now unrepresentable.
It also removes the Role enum, as it's not needed anymore.
PTAL

@adiholden
Copy link
Collaborator

@adiholden I slightly changed the API following your feedback and some ideas that came into my mind after that. Specifically, I added an explicit master member to the nodes, and the vector now only represents replicas. This leaves out many error paths (no master, multiple masters) that are now unrepresentable. It also removes the Role enum, as it's not needed anymore. PTAL

@chakaz did you push the changes to the branch?

@chakaz
Copy link
Collaborator Author

chakaz commented May 14, 2023

@adiholden I slightly changed the API following your feedback and some ideas that came into my mind after that. Specifically, I added an explicit master member to the nodes, and the vector now only represents replicas. This leaves out many error paths (no master, multiple masters) that are now unrepresentable. It also removes the Role enum, as it's not needed anymore. PTAL

@chakaz did you push the changes to the branch?

Oops, I guess I didn't :)
PTAL, sorry!

for (const auto& shard : new_config) {
for (const auto& slot_range : shard.slot_ranges) {
if (slot_range.start > slot_range.end) {
return false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add log prints so it will be easier to understand if we have a problem with the config

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's a great idea!


private:
void AddSlots();
struct SlotOwner {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this implementation we are using *16384 space to save the data ,instead saving it only once as the config format, and we will need to have a logic to convert from the slot array to the config format again to be able to reply to the CLISTER SHARDS command.

If you would create a member :
std::vector config_;
Which you will set on Set config command and this will be used to reply CLUSTER SHARDS
The SlotOwner struct can be
{
string_view ip;
string_view port;
bool owned_by_me;
}

In this implementation if we have one master and one replica, the data will be saved in the config_ member, the slot array will point to the config data.
In the current implementation we are saving the master and the replica data *16384 times

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that it's a matter of what we're optimizing for. The current optimization is aimed for fast "get owners for slot" questions, such as MOVED replies.
If we're looking to save memory, we could consider using shared_ptr as the value of the internal map. It has the downside of using atomics, but I guess it's not terrible since it's only going to take effect during SetConfig() (I think).
But it's true that the current implementation requires more logic to build the reply of CLUSTER SHARDS and CLUSTER SLOTS. Not much more logic, essentially building a dictionary by iterating all entries (that'll actually be simpler to implement if we moved to shared_ptr as we'll be able to compare Nodes). But still, we could avoid compressing ranges.
Do you suggest having 2 members (slots_ and config_, where slots_ will have 16k entries and will point into config_)? Because that will keep the 16k times data duplication, but without it I don't think we can guarantee O(1) computational time for MOVED replies.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should optimise for "get owners for slot" as it is in the flow of client database queries.
But why does my suggestion guarantee O(1) computational time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, I kept 2 members and updated the code according to your suggestion. PTAL.


// Returns nodes that own `id`. Result will always have the first element set to be the master,
// and the following 0 or more elements are the replicas.
std::vector<Node> GetNodesForSlot(SlotId id) const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the flow we will need this api for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLUSTER SHARDS for example

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed per the new API

}

TEST_F(ClusterConfigTest, ConfigSetInvalidEmpty) {
// Test that empty config means all slots are owned locally.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the comment

TEST_F(ClusterConfigTest, ConfigEmpty) {
// Test that empty-initialization causes none of the slots to be owned locally.
for (SlotId i : {0, 1, 10, 100, 1'000, 10'000, 16'000, 0x3FFF}) {
EXPECT_FALSE(config_.IsMySlot(i));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a test where you check that IsMySlot is true and for some false

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea!
I changed ClusterConfigTest.ConfigSetMultipleInstances to have that.

adiholden
adiholden previously approved these changes May 14, 2023
@chakaz chakaz merged commit d9007ee into main May 14, 2023
2 of 6 checks passed
@chakaz chakaz deleted the cluster-config branch May 14, 2023 12:54
romange pushed a commit that referenced this pull request Jun 1, 2023
feat(server): SetConfig() for ClusterConfig.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants