Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raft - Usage and Production Readiness #7

Closed
rcollina opened this issue May 8, 2020 · 6 comments
Closed

Raft - Usage and Production Readiness #7

rcollina opened this issue May 8, 2020 · 6 comments
Assignees
Labels
question Further information is requested

Comments

@rcollina
Copy link

rcollina commented May 8, 2020

Hello,
Thank you for sharing your work with the community.
The documentation of .NEXT is very detailed - going through it right now.

I'd be lying if I said that I understand all the complexities involved in what you've built.
That being said, I'm currently evaluating options to fulfill the following use case:

  • Several instances of the same webhost (might vary at runtime due to autoscaling)
  • Each instance has a BackgroundService that should perform some work
  • Only one of said instances should be processing a backlog of tasks at a time.

So I thought that some kind of leader election would get me there.
I've used distributed locks (via a DB) in the past, but I thought I could do better.
Would you say I'm on the right path with .NEXT Raft?

The other question is: are you currently using .NEXT Raft in production or know of case histories I might share with the higher-ups, so I can promote the usage of this library?

Thank you again for your contribution to the community and for your time.

Best,
Rob

@sakno sakno added the question Further information is requested label May 8, 2020
@sakno sakno self-assigned this May 8, 2020
@sakno
Copy link
Collaborator

sakno commented May 8, 2020

Hi Rob!

Thanks for your interest. It's is ready for production use. I don't know about all usages of this library by existing users. However, I'm aware about adoption by Wargaming company. They are using this library actively and I have strong feedback. One of the use cases exactly the same as you described: coordination of job execution in cluster where only one node can execute the job at a time. So yes, you're on the right way. Leader election can be used as the form of distributed lock. However, you should choose election timeout carefully. For instance, lower election timeout should not be less than approximate execution time of your job. This assumption guarantees that the leader cannot be re-elected during execution of your job. Otherwise, in theory, you may have two jobs executing in the same time. This is called lease time in the context of distributed locks means the maximum time that the node can hold the lock without lease renewal.

Keep in mind, it's software and no one immune from bugs. There a lot of special use cases that I cannot predict. From my side, I'm trying to fix them as fast as possible. PRs are also welcome.

@rcollina
Copy link
Author

rcollina commented May 8, 2020

Hello!
Thank you for your thorough reply. Very informative as well, much appreciated. I'll keep your suggestions in mind.

I'll make sure to contribute, should need be, but right now I'm just glad to be able to kickstart our initiative with your library.

Just one more question, if I may: I'm not sure I found a way to enlist members in the cluster at runtime.
I did see the configuration section:

{
	"partitioning" : false,
	"lowerElectionTimeout" : 150,
	"upperElectionTimeout" : 300,
	"members" : ["http://localhost:3262", "http://localhost:3263", "http://localhost:3264"],
	"metadata" :
	{
		"key": "value"
	},
	"allowedNetworks" : ["127.0.0.0", "255.255.0.0/16", "2001:0db9::1/64"],
	"hostAddressHint" : "192.168.0.1",
	"requestJournal" :
	{
		"memoryLimit": 5,
		"expiration": "00:00:10",
		"pollingInterval" : "00:01:00"
	},
	"resourcePath" : "/cluster-consensus/raft",
	"port" : 3262,
	"heartbeatThreshold" : 0.5
}

This might be a question with a classic RTFM response 😄

Thank you again.

Best,
Rob.

@sakno
Copy link
Collaborator

sakno commented May 8, 2020

It depends on hosting model of your application. If your app is on top of ASP.NET Core then use IConfiguration and yes, it's RTFM-like response. There a lot of articles about ASP.NET Core configuration model. DotNext.AspNetCore.Cluster library just follows all these convention. Moreover, it supports dynamic re-configuration of cluster through adding and removing cluster nodes. This feature is supported via IOptionsMonitor interface from ASP.NET Core that allows to track changes in configuration and apply changes at runtime. In other words, if you're using ASP.NET Core then it's very uncommon to discover cluster nodes at runtime. Usually, they are known at startup time and, as a result, can be placed in configuration.

If your app is not based on ASP.NET Core then you can enlist members programmatically as described here. In this case you can choose highly optimized network transport for Raft: TCP or UDP. Read the article carefully because the choice highly depends on particular use. However, when node started it's not possible to add new member to the configuration at runtime. You must inform other nodes about new member and utilize protected members from RaftCluster class to add member at runtime. Therefore, it's recommended to use distributed configuration service such as Consul or ETCD. Otherwise, your admins must add new member to config file manually on each node.

So what we have:

  • DotNext.AspNetCore.Cluster library gives native integration with ASP.NET Core framework with all magic things like tracking configuration, DI, HTTP 1.1/2.0, TLS etc.
  • With TCP/UDP transport and configuration model provided by DotNext.Net.Cluster you have fast transmission based on binary protocol and full control over RaftCluster instance.

All these aspects perfectly described in this article. Hope you'll find it helpful.

@sakno
Copy link
Collaborator

sakno commented May 8, 2020

One more thing. If you need reliable replication between nodes then you need to use persistent Write-Ahead Log which is shipped with the library. Read more here. However, consensus is enough for your use case so no additional efforts required.

@rcollina
Copy link
Author

rcollina commented May 8, 2020

I really appreciate your help.
I realize I have poorly worded the question, for which I apologise.

You’re absolutely right, I didn’t consider IOptionsMonitor - the RTFM part was mostly regarding your library and not ASP.NET Core itself. I do see your point regarding configuration hot-reloading.

I am indeed using ASP.NET Core, and I guess the answer lies in a distributed configuration service just like you said.

You’ve been absolutely stellar. I have plenty of reading ahead to do.

Sorry for the bother and thank you kindly again for your time!
Rob

@sakno
Copy link
Collaborator

sakno commented May 8, 2020

You're welcome, Rob! If you'll have any questions then reopen this issue.

@sakno sakno closed this as completed May 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants