New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubernetes: Config files + setup script for secure multiregion clusters #27092
Merged
Jump to file or symbol
Failed to load files and symbols.
Diff settings
kubernetes: Config files + setup script for secure multiregion clusters
tl;dr fill in a few constants at the top of secure.py and then run it, and you'll have a working secure multiregion cluster. Works on GKE. Does not work on AWS. This relies on linking together each cluster's DNS servers so that they're able to defer requests to each other for a special zone-scoped namespace. This is a little hacky, but very maintainable and survivable even if all the nodes in a cluster are taken down and brought back up again later with different IP addresses. The big caveat that I learned after I thought I was done, though, is that GCE's internal load balancers only work within a region. Unfortunately I had done my prototyping all within one region, with each cluster just in a different zone. To get around this, I've had to switch from exposing each DNS server on an internal IP to exposing them on external IPs, which some users may not like, and which might tip the scales in favor of a different solution. I'd be happy to discuss the alternatives (hooking up a CoreDNS instance in each cluster to every cluster's apiserver, or using istio multicluster) with anyone interested. I've only handled the secure cluster case in the script, but it can be easily modified to also handle insecure clusters as an option. Insecure clusters can actually run in more environments than secure clusters, though, and should probably be handled differently for that reason. Release note (general change): Configuration files and a setup script for secure multiregion deployments on GKE are now provided. Release note: None
- Loading branch information...
| @@ -0,0 +1,86 @@ | ||
| # Running CockroachDB across multiple Kubernetes clusters | ||
| The script and configuration files in this directory enable deploying | ||
| CockroachDB across multiple Kubernetes clusters that are spread across different | ||
| geographic regions. It deploys a CockroachDB | ||
| [StatefulSet](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) | ||
| into each separate cluster, and links them together using DNS. | ||
| To use the configuration provided here, check out this repository (or otherwise | ||
| download a copy of this directory), fill in the constants at the top of | ||
| [setup.py](setup.py) with the relevant information about your Kubernetes | ||
| clusters, optionally make any desired modifications to | ||
| [cockroachdb-statefulset-secure.yaml](cockroachdb-statefulset-secure.yaml) as | ||
| explained in [our Kubernetes performance tuning | ||
| guide](https://www.cockroachlabs.com/docs/stable/kubernetes-performance.html), | ||
| then finally run [setup.py](setup.py). | ||
| You should see a lot of output as it does its thing, hopefully ending after | ||
| printing out `job "cluster-init-secure" created`. This implies that everything | ||
| was created successfully, and you should soon see the CockroachDB cluster | ||
| initialized with 3 pods in the "READY" state in each Kubernetes cluster. At this | ||
| point you can manage the StatefulSet in each cluster independently if you so | ||
| desire, scaling up the number of replicas, changing their resource requests, or | ||
| making other modifications as you please. | ||
| If anything goes wrong along the way, please let us know via any of the [normal | ||
| troubleshooting | ||
| channels](https://www.cockroachlabs.com/docs/stable/support-resources.html). | ||
| While we believe this creates a highly available, maintainable multi-region | ||
| deployment, it is still pushing the boundaries of how Kubernetes is typically | ||
| used, so feedback and issue reports are very appreciated. | ||
| ## Limitations | ||
| ### Pod-to-pod connectivity | ||
| The deployment outlined in this directory relies on pod IP addresses being | ||
| routable even across Kubernetes clusters and regions. This achieves optimal | ||
| performance, particularly when compared to alternative solutions that route all packets between clusters through load balancers, but means that it won't work in certain environments. | ||
| This requirement is satisfied by clusters deployed in cloud environments such as Google Kubernetes Engine, and | ||
| can also be satisfied by on-prem environments depending on the [Kubernetes networking setup](https://kubernetes.io/docs/concepts/cluster-administration/networking/) used. If you want to test whether your cluster will work, you can run this basic network test: | ||
| ```shell | ||
| $ kubectl run network-test --image=alpine --restart=Never -- sleep 999999 | ||
| pod "network-test" created | ||
| $ kubectl describe pod network-test | grep IP | ||
| IP: THAT-PODS-IP-ADDRESS | ||
| $ kubectl config use-context YOUR-OTHER-CLUSTERS-CONTEXT-HERE | ||
| $ kubectl run -it network-test --image=alpine --restart=Never -- ping THAT-PODS-IP-ADDRESS | ||
| If you don't see a command prompt, try pressing enter. | ||
| 64 bytes from 10.12.14.10: seq=1 ttl=62 time=0.570 ms | ||
| 64 bytes from 10.12.14.10: seq=2 ttl=62 time=0.449 ms | ||
| 64 bytes from 10.12.14.10: seq=3 ttl=62 time=0.635 ms | ||
| 64 bytes from 10.12.14.10: seq=4 ttl=62 time=0.722 ms | ||
| 64 bytes from 10.12.14.10: seq=5 ttl=62 time=0.504 ms | ||
| ... | ||
| ``` | ||
| If the pods can directly connect, you should see successful ping output like the | ||
| above. If they can't, you won't see any successful ping responses. Make sure to | ||
| delete the `network-test` pod in each cluster when you're done! | ||
| ### Exposing DNS servers to the Internet | ||
| As currently configured, the way that the DNS servers from each Kubernetes | ||
| cluster are hooked together is by exposing them via a load balanced IP address | ||
| that's visible to the public Internet. This is because [Google Cloud Platform's Internal Load Balancers do not currently support clients in one region using a load balancer in another region](https://cloud.google.com/compute/docs/load-balancing/internal/#deploying_internal_load_balancing_with_clients_across_vpn_or_interconnect). | ||
| None of the services in your Kubernetes cluster will be made accessible, but | ||
| their names could leak out to a motivated attacker. If this is unacceptable, | ||
| please let us know and we can demonstrate other options. [Your voice could also | ||
| help convince Google to allow clients from one region to use an Internal Load | ||
| Balancer in another](https://issuetracker.google.com/issues/111021512), | ||
| eliminating the problem. | ||
| ## Cleaning up | ||
| To remove all the resources created in your clusters by [setup.py](setup.py), | ||
| copy the parameters you provided at the top of [setup.py](setup.py) to the top | ||
| of [teardown.py](teardown.py) and run [teardown.py](teardown.py). | ||
| ## More information | ||
| For more information on running CockroachDB in Kubernetes, please see the [README | ||
| in the parent directory](../README.md). |
| @@ -0,0 +1,27 @@ | ||
| apiVersion: v1 | ||
| kind: Pod | ||
| metadata: | ||
| name: cockroachdb-client-secure | ||
| labels: | ||
| app: cockroachdb-client | ||
| spec: | ||
| serviceAccountName: cockroachdb | ||
| containers: | ||
| - name: cockroachdb-client | ||
| image: cockroachdb/cockroach:v2.0.5 | ||
| imagePullPolicy: IfNotPresent | ||
| volumeMounts: | ||
| - name: client-certs | ||
| mountPath: /cockroach-certs | ||
| # Keep a pod open indefinitely so kubectl exec can be used to get a shell to it | ||
| # and run cockroach client commands, such as cockroach sql, cockroach node status, etc. | ||
| command: | ||
| - sleep | ||
| - "2147483648" # 2^31 | ||
| # This pod isn't doing anything important, so don't bother waiting to terminate it. | ||
| terminationGracePeriodSeconds: 0 | ||
| volumes: | ||
| - name: client-certs | ||
| secret: | ||
| secretName: cockroachdb.client.root | ||
| defaultMode: 256 |
| @@ -0,0 +1,28 @@ | ||
| apiVersion: batch/v1 | ||
| kind: Job | ||
| metadata: | ||
| name: cluster-init-secure | ||
| labels: | ||
| app: cockroachdb | ||
| spec: | ||
| template: | ||
| spec: | ||
| serviceAccountName: cockroachdb | ||
| containers: | ||
| - name: cluster-init | ||
| image: cockroachdb/cockroach:v2.0.5 | ||
| imagePullPolicy: IfNotPresent | ||
| volumeMounts: | ||
| - name: client-certs | ||
| mountPath: /cockroach-certs | ||
| command: | ||
| - "/cockroach/cockroach" | ||
| - "init" | ||
| - "--certs-dir=/cockroach-certs" | ||
| - "--host=cockroachdb-0.cockroachdb" | ||
| restartPolicy: OnFailure | ||
| volumes: | ||
| - name: client-certs | ||
| secret: | ||
| secretName: cockroachdb.client.root | ||
| defaultMode: 256 |
| @@ -0,0 +1,224 @@ | ||
| apiVersion: v1 | ||
| kind: ServiceAccount | ||
| metadata: | ||
| name: cockroachdb | ||
| labels: | ||
| app: cockroachdb | ||
| --- | ||
| apiVersion: rbac.authorization.k8s.io/v1beta1 | ||
| kind: Role | ||
| metadata: | ||
| name: cockroachdb | ||
| labels: | ||
| app: cockroachdb | ||
| rules: | ||
| - apiGroups: | ||
| - "" | ||
| resources: | ||
| - secrets | ||
| verbs: | ||
| - create | ||
| - get | ||
| --- | ||
| apiVersion: rbac.authorization.k8s.io/v1beta1 | ||
| kind: ClusterRole | ||
| metadata: | ||
| name: cockroachdb | ||
| labels: | ||
| app: cockroachdb | ||
| rules: | ||
| - apiGroups: | ||
| - certificates.k8s.io | ||
| resources: | ||
| - certificatesigningrequests | ||
| verbs: | ||
| - create | ||
| - get | ||
| - watch | ||
| --- | ||
| apiVersion: rbac.authorization.k8s.io/v1beta1 | ||
| kind: RoleBinding | ||
| metadata: | ||
| name: cockroachdb | ||
| labels: | ||
| app: cockroachdb | ||
| roleRef: | ||
| apiGroup: rbac.authorization.k8s.io | ||
| kind: Role | ||
| name: cockroachdb | ||
| subjects: | ||
| - kind: ServiceAccount | ||
| name: cockroachdb | ||
| namespace: default | ||
| --- | ||
| apiVersion: rbac.authorization.k8s.io/v1beta1 | ||
| kind: ClusterRoleBinding | ||
| metadata: | ||
| name: cockroachdb | ||
| labels: | ||
| app: cockroachdb | ||
| roleRef: | ||
| apiGroup: rbac.authorization.k8s.io | ||
| kind: ClusterRole | ||
| name: cockroachdb | ||
| subjects: | ||
| - kind: ServiceAccount | ||
| name: cockroachdb | ||
| namespace: default | ||
| --- | ||
| apiVersion: v1 | ||
| kind: Service | ||
| metadata: | ||
| # This service is meant to be used by clients of the database. It exposes a ClusterIP that will | ||
| # automatically load balance connections to the different database pods. | ||
| name: cockroachdb-public | ||
| labels: | ||
| app: cockroachdb | ||
| spec: | ||
| ports: | ||
| # The main port, served by gRPC, serves Postgres-flavor SQL, internode | ||
| # traffic and the cli. | ||
| - port: 26257 | ||
| targetPort: 26257 | ||
| name: grpc | ||
| # The secondary port serves the UI as well as health and debug endpoints. | ||
| - port: 8080 | ||
| targetPort: 8080 | ||
| name: http | ||
| selector: | ||
| app: cockroachdb | ||
| --- | ||
| apiVersion: v1 | ||
| kind: Service | ||
| metadata: | ||
| # This service only exists to create DNS entries for each pod in the stateful | ||
| # set such that they can resolve each other's IP addresses. It does not | ||
| # create a load-balanced ClusterIP and should not be used directly by clients | ||
| # in most circumstances. | ||
| name: cockroachdb | ||
| labels: | ||
| app: cockroachdb | ||
| annotations: | ||
| # Use this annotation in addition to the actual publishNotReadyAddresses | ||
| # field below because the annotation will stop being respected soon but the | ||
| # field is broken in some versions of Kubernetes: | ||
| # https://github.com/kubernetes/kubernetes/issues/58662 | ||
| service.alpha.kubernetes.io/tolerate-unready-endpoints: "true" | ||
| # Enable automatic monitoring of all instances when Prometheus is running in the cluster. | ||
| prometheus.io/scrape: "true" | ||
| prometheus.io/path: "_status/vars" | ||
| prometheus.io/port: "8080" | ||
| spec: | ||
| ports: | ||
| - port: 26257 | ||
| targetPort: 26257 | ||
| name: grpc | ||
| - port: 8080 | ||
| targetPort: 8080 | ||
| name: http | ||
| # We want all pods in the StatefulSet to have their addresses published for | ||
| # the sake of the other CockroachDB pods even before they're ready, since they | ||
| # have to be able to talk to each other in order to become ready. | ||
| publishNotReadyAddresses: true | ||
| clusterIP: None | ||
| selector: | ||
| app: cockroachdb | ||
| --- | ||
| apiVersion: policy/v1beta1 | ||
| kind: PodDisruptionBudget | ||
| metadata: | ||
| name: cockroachdb-budget | ||
| labels: | ||
| app: cockroachdb | ||
| spec: | ||
| selector: | ||
| matchLabels: | ||
| app: cockroachdb | ||
| maxUnavailable: 1 | ||
| --- | ||
| apiVersion: apps/v1beta1 | ||
| kind: StatefulSet | ||
| metadata: | ||
| name: cockroachdb | ||
| spec: | ||
| serviceName: "cockroachdb" | ||
| replicas: 3 | ||
| template: | ||
| metadata: | ||
| labels: | ||
| app: cockroachdb | ||
| spec: | ||
| serviceAccountName: cockroachdb | ||
| affinity: | ||
| podAntiAffinity: | ||
| preferredDuringSchedulingIgnoredDuringExecution: | ||
| - weight: 100 | ||
| podAffinityTerm: | ||
| labelSelector: | ||
| matchExpressions: | ||
| - key: app | ||
| operator: In | ||
| values: | ||
| - cockroachdb | ||
| topologyKey: kubernetes.io/hostname | ||
| containers: | ||
| - name: cockroachdb | ||
| image: cockroachdb/cockroach:v2.0.5 | ||
| imagePullPolicy: IfNotPresent | ||
| ports: | ||
| - containerPort: 26257 | ||
| name: grpc | ||
| - containerPort: 8080 | ||
| name: http | ||
| livenessProbe: | ||
| httpGet: | ||
| path: "/health" | ||
| port: http | ||
| scheme: HTTPS | ||
| initialDelaySeconds: 30 | ||
| periodSeconds: 5 | ||
| readinessProbe: | ||
| httpGet: | ||
| path: "/health?ready=1" | ||
| port: http | ||
| scheme: HTTPS | ||
| initialDelaySeconds: 10 | ||
| periodSeconds: 5 | ||
| failureThreshold: 2 | ||
| volumeMounts: | ||
| - name: datadir | ||
| mountPath: /cockroach/cockroach-data | ||
| - name: certs | ||
| mountPath: /cockroach/cockroach-certs | ||
| env: | ||
| - name: COCKROACH_CHANNEL | ||
| value: kubernetes-secure | ||
| command: | ||
| - "/bin/bash" | ||
| - "-ecx" | ||
| # The use of qualified `hostname -f` is crucial: | ||
| # Other nodes aren't able to look up the unqualified hostname. | ||
| - "exec /cockroach/cockroach start --logtostderr --certs-dir /cockroach/cockroach-certs --advertise-host $(hostname -f) --http-host 0.0.0.0 --join JOINLIST --locality LOCALITYLIST --cache 25% --max-sql-memory 25%" | ||
| # No pre-stop hook is required, a SIGTERM plus some time is all that's | ||
| # needed for graceful shutdown of a node. | ||
| terminationGracePeriodSeconds: 60 | ||
| volumes: | ||
| - name: datadir | ||
| persistentVolumeClaim: | ||
| claimName: datadir | ||
| - name: certs | ||
| secret: | ||
| secretName: cockroachdb.node | ||
| defaultMode: 256 | ||
| podManagementPolicy: Parallel | ||
| updateStrategy: | ||
| type: RollingUpdate | ||
| volumeClaimTemplates: | ||
| - metadata: | ||
| name: datadir | ||
| spec: | ||
| accessModes: | ||
| - "ReadWriteOnce" | ||
| resources: | ||
| requests: | ||
| storage: 100Gi |
Oops, something went wrong.
ProTip!
Use n and p to navigate between commits in a pull request.