New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doc: add example raft-kv #156
Conversation
@ppamorim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am super happy with this implementation! Finally I will be able to implement my project! Thank you VERY much!!!
I also pointed some items in the review, please be free to answer these any time. I will probably later point more things (at the moment I don't have anything, but I did not test the code yet because I am currently working in another projects at the same time, I will verify that today but later).
## Cluster management | ||
|
||
The raft itself does not store node addresses. | ||
But in a real-world application, the implementation of `RaftNetwork` needs to know the addresses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: I saw that rite_raft works by this way:
- Start the leader informing the server address and raft address (aren't or could they be the same here?)
- When adding a node, inform the server address, leader address and raft address.
By doing that, the node will connect to the leader and append itself as an active instance. Could we do that in the demo? I am aware that you did that on rpc 21001/write '{"AddNode":{"id":1,"addr":"127.0.0.1:21001"}}'
but is that better? Just a question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drmingdrmer I can open a PR in your fork that includes this option, if the invocation includes the flag leader_addr
it will auto include the leaner to after the server gets started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Node addresses have to be stored and replicated, i.e., adding nodes has to be done through raft protocol.
Otherwise, a new leader won't know about the addresses of other nodes.
Adding a node as a member in a single step is can be done by just sending a join(leader_address)
request to an uninitialized node and letting it do all these 3 steps:
- Write a raft log entry to the leader to add the address of it:
rpc <leader>/write '{"AddNode":{"id":1,"addr":"127.0.0.1:21001"}}'
- Inform the leader to replicate logs to it:
rpc <leader>/add-learner
. - Inform the leader to change to a new membership config that includes it:
rpc <leader>/change-members
.
@drmingdrmer Hi again, I opened a PR in your fork (it's is in draft at the moment, I will give some hours to check again if everything is right) with some fixes and how I think it should be implemented. I tried to prevent any type of abbreviation in the implementation, even if it looks verbose. I think it's easier to understand what is happening. Please be free to check that. |
async-trait = "0.1.36" | ||
clap = { version = "3.0.13", features = ["derive", "env"] } | ||
env_logger = "0.9.0" | ||
openraft = { version="0.6", path= "../openraft" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drmingdrmer I would set the crates source to prevent any conflict with future versions, we can keep the demo updated accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not get it. What do you want it to look like? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean replace that with openraft = "0.6.4"
to force cargo to get the build from the stable release, but I noticed that you did not that because there are changes to the core, understandable for the moment. I will be closing this item.
Great! Let me check it out! |
example-raft-kv/src/rpc_handlers.rs
Outdated
value.unwrap_or_default() | ||
}; | ||
Ok(Json(res)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could result in stale reads if the request reaches a node that is not the leader, right?
IMHO an endpoint with potential stale reads could be very useful in many applications, but it should be explicit that it could be a stale read. Adding another endpoint based on client_read
would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
This is a draft application.
I'm gonna add a read endpoint in another PR:DDD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MikaelCall @drmingdrmer Is this outdated? By placing the code below before all state machine calls will sort this issue?
let _ = app.raft.client_read().await;
I suspect it will wait until the node gets confirmed to be in sync with the leader, no? It seems to be working but I have no means to confirm that it is really preventing stale reads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ppamorim
You have to handle the error. If the node that receives the request is not the leader it will give a forward to leader error. Not handling the error (i.e. returning the error using ?) may still result in stale reads.
Search for client_read in this repository. The fn docs are good and there is a great test explaining the difference calling client_read
on leader vs follower nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MikaelCall Nice! I was confused and thought it was part of another code, in my project I removed this completely and it's redirecting the request to the leader, as expected. I will read more about it too.
@drmingdrmer I did a test where:
After that the nodes started heartbeating as expected, calling Then I left the server working for a moment and killed the leader at port 8500. The node running at port 5801 called After that I created a new learner at port 8502 and try to add it as a node to the new leader. But when I call |
@drmingdrmer I could panic the application when calling The problem happened on the function let addr = {
let state_machine = self.store.state_machine.read().await;
state_machine.nodes.get(&target).unwrap().clone() // <------ `unwrap()` caused the application to crash
}; |
The cluster has only 2 nodes thus a quorum(majority) is the entire cluster, i.e. 8500 and 8501. |
Yes, it will panic if the node is not added. A real-world application should check the presence of a target and if not, return an error. |
@drmingdrmer Understood, so the minimum number of nodes is 3. Is there any way to recover from this state by adding 2 learners? |
No way. :( A learner does not vote and it does not count as a member of a quorum. |
@drmingdrmer Talking about that. I am trying to handle the state machine error, I created a new enum called I had to do that because |
@drmingdrmer Ah, after reading a bit more and being less stupid I could understand that I can use |
If a target is not found, it probably is an administrative mistake: At least it's not the state-machine's fault, IMHO.
impl NetworkError {
pub fn new<E: Error + 'static>(e: &E) -> Self {
Self {
source: AnyError::new(e),
}
}
} |
Hmm.. an |
@drmingdrmer Could implement it by doing: pub struct NodeNotFound {
pub id: NodeId,
}
impl NodeNotFound {
fn new(id: NodeId) -> Self {
Self { id }
}
} Then later using: let addr = state_machine.nodes
.get(&target)
.ok_or(RPCError::Network(NetworkError::new(&NodeNotFound::new(target))))?
.clone(); Just curious why the |
Right:DDD Because |
Question for further discussion: I noticed that when I try to write data to the state machine using the the learners (already members), I get this error:
I am aware that is expected and I understand that with this implementation only the leader accepts write of data. Could be possible to make the learners to dispatch the write of the data to the leader in that case? I am asking that based on the information |
Yes, it's possible to embed a
|
Add an example of distributed kv store implementation: `./example-raft-kv`. Includes: - An in-memory `RaftStorage` implementation [store](./store). - A server is based on [actix-web](https://docs.rs/actix-web/4.0.0-rc.2). Includes: - raft-internal network APIs for replication and voting. - Admin APIs to add nodes, change-membership etc. - Application APIs to write a value by key or read a value by key. - Client and `RaftNetwork`([network](./network)) are built upon [reqwest](https://docs.rs/reqwest).
@drmingdrmer I tried to create a batch interaction with the state machine with the implementation below: https://gist.github.com/ppamorim/858bd2b48f779beecb3941d653cab0d8#file-mod-rs-L30-L45 I did that to prevent the need to call the endpoint for each entry, so I can insert multiple the entries to the state machine's data at once. But it's causing the error:
Is it some sort of timeout for the heartbeats? What I need to do to prevent that? The data has around 25k entries. Edit: I tested with a small batch (20 items) and it's working, will I need to add some sort of heartbeat verification for the batch write? |
Normally a timeout does not cause a Fatal error. My guess is that there is a panic caused by a too large RPC payload. |
@drmingdrmer I will add you in the project, after we sort this issue I can replicate that in the example. Edit: You should have the project access now. |
Then what do I do to reproduce this problem 🤔 ? |
@drmingdrmer You can create a massive JSON (more than 20k entries) with this format: [
{
"email": "test@google.com",
"region": "usa"
},
{
"email": "spanish@google.com",
"region": "spain"
},
...
] You need to call If you run Mind that the issue does not happen if you have only the leader started, you can start 3 instances by running |
@drmingdrmer I think the issue is around actix-web limiting the payload size, this can be a problem even for production level synchronization. Edit: It's not :( |
Got it. Let me check it out. Is this PR good enough to merge? |
Add an example of distributed kv store implementation:
./example-raft-kv
.Includes:
An in-memory
RaftStorage
implementation store.A server is based on actix-web.
Includes:
Client and
RaftNetwork
(network) are built upon reqwest.