-
Hello MicroRaft Community, Following our previous discussion (Discussion #44) on using MicroRaft for data sharding, I have a more specific question related to this topic. I am exploring ways to efficiently operate multiple Raft groups on a single physical node (not RaftNode). Given that the current implementation of
Both methods seem to have their pros and cons, but I would like to hear your opinions on which approach would be preferable, considering the design philosophy and future direction of MicroRaft. Additionally, if there are other methods to efficiently operate multiple Raft groups on a single physical node, I would be very interested to learn about them. I look forward to borrowing your wisdom and experience. Thank you. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
There are a couple of points that needs consideration. The list is not exclusive and very likely there are more points to consider with a deeper discussion / design exploration.
Independent of which option to pick, there should be some solution for managing this mapping. Potential solutions are to run it a Raft group for the replicated state machine of the Raft group -> physical node mapping or rely on an external CP data store for this. Given that MultiRaft already solves the CP store problem, implementing a state machine for this on top of MicroRaft might be a reasonable direction which will also eliminate an external dependency. On the other hand, systems like ZooKeeper are already providing some solutions for this problem I think, so reusing an existing system might be faster to implement and validate. Once the Raft group -> physical node mapping is maintained somewhere, there should be some subscription mechanism which will trigger physical nodes to start Raft nodes when they are added to / removed from Raft groups, when new Raft groups are defined, existing Raft groups are deleted, etc.
I think managing multiple Raft groups outside of the existing RaftNode abstraction gives more opportunities for achieving isolation. You can run Raft groups in different threads if they are running in the same process, or you can completely isolate them by running them on different processes with some resource limits. Running multiple Raft groups in a single Raft node can make it more prune to the noisy neighbour problem. For instance, if a Raft group is committing and executing a computation heavy task on the state machine which is holding the thread for too long, it can trigger heartbeat timeouts and leader elections for the other Raft groups assigned to that thread.
Once a physical node (i.e., server) is part of multiple Raft groups, it will receive / append heartbeats for every Raft group it is part of. For instance, if server A and server B are part of N Raft groups and if one of them is the leader, there will be N heartbeat messages and responses between these two servers. There may be a need to optimize this and just send 1 heartbeat message (i.e., empty AppendRequest RPC) that contains all N Raft groups.
If we go with the option #2, at the very basic level, RaftNode interface needs new methods like There are a few OSS Raft implementations that can be checked for this:
Alternatively, in option #1, the very same complexity can be implemented outside of the RaftNode class. Its advantage is, RaftNode will maintain its simplicity and basically continue abstracting away the Raft algorithm implementation. If you want to do more exploration on this, I would suggest to put more details to the offered solutions and we can compare them more concretely. Hope my reply helps. Regards, |
Beta Was this translation helpful? Give feedback.
Hi @bootjp
sorry for my delayed response. i am on a christmas break hence not checking notifications very frequently.
on how many servers will you run those 100 Raft groups? what is your group size? what will be QPS of each Raft group to replicate actual mutation operations? should we estimate such stuff and see if HBs will be an actual overhead or not?
if you are using an RPC framework like gRPC, Thrift, etc, you will have open connections across your servers maintained by the rpc framework anyway and you will be sending cheap HB payloads through them. you can tune HB timeout and HB periods and MicroRaft applies some jitter for HB'ing. and if your Raft groups are getting mutation operati…