New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: auto-discover peer nodes instead of --join #32374
Comments
@centerorbit Thank you for your suggestion. Indeed the use case sounds appealing. However until/unless CockroachDB serves as its own certification authority, it won't be possible to auto-generate and synchronize secure certificate for nodes that are automatically discovered. I think your proposal will become relevant once there is a CA inside CockroachDB. |
This would be cool. The --join command could take a subnet like 192.168.1.0/24 @knz Couldn't we pre-setup the cert to include our entire subnet to plan for growth? |
@mberhault would it make sense to have a cert valid for an entire subnet? (I know that wildcard certs can be used for web sites, unsure about cockroachdb) |
I vaguely recall testing them and I think they work fine with the Go TLS client. We should double check, they recently tightened certificate validation in 1.10.x (though I don't think it impacts this). Our plan for easier k8s autoscaling was exactly this, instead of per-node CSRs, there would be a single "node" secret storing the wildcard certificate. Adding new nodes would thus require no manual intervention. IP addresses would not be included in the certificate as there is no way to specify a subnet. Instead, all communication would be DNS based which can use wildcards for host matching (usually, the wildcard only applies at the first level, it does not recurse). |
@mberhault are you suggesting that the request by OP (see issue title + desc) was already in the works? |
It's been talked about, there's no issue for it. |
@knz @mberhault very good point, I've been testing Cockroach locally in I don't yet understand the flow involved for a node to acquire the proper certs. I will do work to try and understand @mberhault 's DNS concept, and how that interacts with certs and Cockroach join code. Thanks for the quick response and feedback! |
Using all A-records returned by a dns resolve would also be helpful, as swarm allows you to resolve |
Yeah, I was actually researching how systems like Kubernetes and Docker discover services within a cluster. @salzig is right. Looking up the DNS for all of the IP Addresses for a particular service name, and then using that would probably be the most straight forward. I'm not sure how the TLS certs are created, but could there be a potential to use those service names for the cert? See: |
I'm struggling a bit to get CockroachDB building on my systems. My Chromebook doesn't have enough RAM or disk space to perform builds reasonably (it works, but just barely) and my Windows machine is not very compatible with the builder.sh script (even with Windows Subsystem for Linux)... so I'm still trying to come up with decent ways to work on these platforms. In the meantime, I think that I could use a similar DNS lookup method to what's documented here: https://jameshfisher.com/2017/08/03/golang-dns-lookup.html And apply it in this general area of code: cockroach/pkg/server/config.go Lines 506 to 516 in 6fb1b00
Naturally, if I get it working, I'll need to circle-back and make a branch/PR add flags, figure out the TLS (so it can run in secure mode), tests, etc. But this is where I'm currently at with it. |
This will allow the --join CLI option to "find" many nodes to connect to, instead of needing to specify specific individuals. This will cater well to auto-scaling, Kubernetes and Docker DNS behaviors. See: cockroachdb#32374 Release note: None
Started working on implementation here: master...centerorbit:feature-auto-join It compiles! Now to come up with a few test scenarios and test Docker, K8s, normal, etc and see if it does what I expect. Currently (I hope) it just uses the params from the You could instead just say something like: And the DNS resolver (Kubernetes or Docker) should return the IPs for (Again, this is certs discussion aside. I'm proof-of-concepting this with the |
I was just looking into this, and ran down from the Cluster Name RFC to this ticket. I like that your implementation is simple and based on DNS entries; this should integrate easily with Kubernetes, Consul, or "bare cloud" on EC2 with an auto-scaling-group behind a load balancer. Figuring out the CA situation will be important. Since there need to be a script or wrapper (be it Kubernetes or hand-written) to fetch a certificate, that same script/wrapper can fetch the IP addresses of the other cluster members. This continues to feel like an area for improvement for CockroachDB. As an expansion of varieties for discovery, consider the https://github.com/hashicorp/go-discover library from Hashicorp? |
We just got another user request, also pointing to go-discover. |
Having an auto-discover would greatly simplify our setup as well. I guess for all the 'cloud native' setups it would be a big improvement. |
Struggling to find a way to create cluster-aware apps that doesn't require a bunch of setup, this small change (even running in insecure mode since my overlay network is already secured) would make things a whole lot easier. Being able to just use a compose file like this would be amazing. version: "3.2"
services:
db:
image: cockroachdb/cockroach:latest
command: start --insecure --join tasks.{{.Service.Name}}
volumes:
- db:/cockroach/cockroach-data
deploy:
mode: global
volumes:
db: |
This issue seems fairly inactive but I'd like to add my +1 with some details about my use case and potential workarounds I am considering. I am using Consul for service discovery and gives me a number of tools I could use to get around some of the issues above, although given that is also written in GO it seems feasible that native support for Consul and Consul Connect wouldn't be outside the realms of possibility. Firstly, discovery. With appropriate service definitions, it's possible to publish all instances in Consul and query them to find all existing nodes of the cluster, then apply that to the join parameter on start. Leader-election / cluster init. There are well-documented processes for using Consul KV Locking to perform leader election, and therefore a mechanism to select a node to run cluster-init on. Locality. You can query the hosts consul region and datacenter and pass them dynamically to cockroachdb mTLS. A few options I'm considering. Using the CA built into Consul or Vault to generate node certificates, or using Consul Connect with CockroachDB (optionally) in insecure mode. The latter allows more granular access controls via intentions. Even then, though, there are new AutoTLS options that may have appeared since this thread was last updated. It's also possible to write appropriate wrapper scripts to do all this externally. |
Thanks for the reminder. This issue had been mistriaged and fell between the cracks. @mwang1026 we'll want to place this back into the radar. It's still relevant today and would also simplify (and lower the cost of) our CC infrastructure. |
I've been looking into this as well. Right now we have Traefik in front of our services and it would be fantastic to have CockroachDB behind the traefik TLS load blancer. Even if it's using insecure mode (we can use an encrypted network on swarm). Happy to test if anyone is familiar enough with Consul/KV stores to get this going. |
Any plans to support https://github.com/hashicorp/go-discover or equivalent? |
go-discover is problematic because it would require provisioning API keys to the CockroachDB process, which then in turn would need to be protected somehow. The complexity of provisioning the API keys in a secure way is not trivial, and I'm not sure it would result in a setup that's objectively simpler to operate than the (Something that folk may forget is that it's possible to point the Also another thing to consider is that CockroachDB is designed to operate across regions and even across cloud providers and so we need the |
Is your feature request related to a problem? Please describe.
I've been playing around with cockroach, particularly in Docker. It seems odd to me that one needs to instruct new containers to 'join' to existing containers.
Describe the solution you'd like
I would like to create a new instance within a private subnet, and it has the ability to auto-discover its own friend nodes, and join by itself.
Describe alternatives you've considered
It even looks like Kubernetes config Found here
Defines a default scale of 3, and specifies for all to join themselves as well.
--join cockroachdb-0.cockroachdb,cockroachdb-1.cockroachdb,cockroachdb-2.cockroachdb
It's odd that db-0 is told to join with itself along with 1 and 2. I'm assuming there is already handling to gracefully ignore joining to ones self, but still that seems like a bit of a 'hack' to get a Kubernetes up and running. It'd be nice if you didn't have to specify the node names at all!
It's be much easier to say something like:
--join-auto
and call it a day.Additional context
I figure in most cases, DBs will be clustered within their own private subnet, and could use a designated port and broadcast IP to make requests to join. When you want to scale to multiple AZs, either a VPN can be established, or a bridge of some sort to enable communication across two subnets.
I'm not sure how instances running in Kubernetes would react to broadcast pings, but they may be able to use the Kube API to discover others to join, there would just need to be some sort of environment detection, or another flag to tell it which auto-discover method to use.
Perhaps something like:
--join-auto=broadcast
--join-auto=kube
Jira issue: CRDB-4753
The text was updated successfully, but these errors were encountered: