-
Notifications
You must be signed in to change notification settings - Fork 48
tls: add scripts and instructions for rotating certificates #155
Conversation
Some open questions:
|
Start by reviewing the general [TLS documentation][tls-certs] and the [TLS topology][tls-topology] for Tectonic to identify the various certificates in the cluster. | ||
|
||
We will be using the [CFSSL][cfssl-util] utility to view and manage the certificates, which may be downloaded from [https://pkg.cfssl.org/][cfssl-package]. | ||
__WARNING:__ Rotating certificates by hand can break component connectivity and leave the cluster in an unrecoverable state. Before performing any of these instructions on a live cluster backup your cluster state and migrate critical workloads to another cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we link to backup docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have them and I imagine they're custom to different cluster setups. E.g. if you've got stateless apps it's just the manifests. If you've got persistent volumes you'll need to backup the data.
Documentation/tls/rotate-tls.md
Outdated
|
||
Copy the archive with the new certificates into place and change to the | ||
directory. | ||
__WARNING:__ you MUST use `kubectl apply` in the following command and NOT other `kubectl` creation sub-commands. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WARNING: You MUST use
Documentation/tls/rotate-tls.md
Outdated
``` | ||
|
||
Remove the old certificates and unzip the archive with the new certificates. | ||
To force the various deployments to restart and pick-up the new TLS assets, force the rotation of the various components. Note that the API server may become temporarily unavailable after this action. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... pick up ...
Documentation/tls/rotate-tls.md
Outdated
sudo chown etcd: peer.* server.* | ||
ls -lAh | ||
``` | ||
Unlike other cluster components, kubelets are configured through host files and require SSH access to the modify. Because Tectonic often deploys worker nodes behind firewalls, this document assumes using one of the control plane nodes as a [bastion host][bastion-host] for access to the cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... require SSH access to modify. ...
Documentation/tls/rotate/gencerts.sh
Outdated
base domain is "example.com" | ||
|
||
CLUSTER_NAME Name of the cluster. If your API server is running on the | ||
domain "my-cluster-k8s.example.com" the name of the cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... "my-cluster-k8s.example.com", the name ...
Documentation/tls/rotate/gencerts.sh
Outdated
rm $CERT_DIR/serial* | ||
rm $CERT_DIR/*.csr | ||
|
||
# Use openssl for base64'ing instead of base64 which has different wrap behavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenSSL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we actually mean openssl
here (i.e. the CLI binary).
It was pointed out that kube-proxy reuses the kubelet's certs https://github.com/coreos/tectonic-installer/blob/1.8.9-tectonic.1/modules/bootkube/resources/manifests/kube-proxy.yaml#L31 Need to roll that daemonset as well. |
This document is almost ready to merge, but has a bug in it that keeps bricking my clusters... I cannot stress this warning at the beginning more
|
This is done. @zbwright would you take a look one last time? |
@ericchiang Do you have any context on why the clusters are getting bricked? |
If you mess up the etcd CA rotation, then it's really hard to change
anything on the self hosted control plane. I can expand on that in the doc.
…On Fri, Apr 6, 2018, 3:21 PM Stephen Augustus ***@***.***> wrote:
@ericchiang <https://github.com/ericchiang> Do you have any context on
why the clusters are getting bricked?
A warning like that seems super troubling.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#155 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACO_XWQxDZ4jWE5cRs58CXBnFf0g3m_sks5tl-pJgaJpZM4TAyzg>
.
|
To be clear that bug I mentioned earlier was resolved, but still I'd tread very carefully here. |
@ericchiang cool, cool. Thanks for the clarification! |
bumping this thread. there was some interest in more testing beside's me just doing it. did that ever get planned/done? |
@ericchiang I was hope to get some field validation on this, but as it's only Dan (and me, in a diminished capacity), I don't think we can commit to any testing in the near-term, so don't block this on my account. @kbrwn mentioned he was working with someone, but would need to check-in again to try out the etcd rotation. |
openssl x509 -in $CERT -noout -text > "${CERT%.crt}.txt" | ||
done | ||
|
||
# Use openssl for base64'ing instead of base64 which has different wrap behavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenSSL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"openssl" is the tool, right?
Documentation/tls/rotate-tls.md
Outdated
``` | ||
|
||
This will generate several files in the current directory. | ||
The scripts creates a directory of generated TLS assets. If you provided the etcd CA, this will include etcd certificates and manifest patches. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either 'script creates' or 'scripts create'. I believe it's the former.
Documentation/tls/rotate-tls.md
Outdated
|
||
## etcd | ||
## Rotating certificates for Tectonic and Kubernetes components. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no periods in headers
Documentation/tls/rotate-tls.md
Outdated
### Verify cluster health | ||
|
||
First, verify the current health of the etcd cluster. Connect to one of the etcd members of the cluster using SSH. | ||
__WARNING:__ The following commands MUST use `kubectl patch` and NOT other `kubectl` creation sub-commands. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subcommands
Documentation/tls/rotate-tls.md
Outdated
``` | ||
|
||
etcd clusters should be configured to require client authentication. Therefore, we will need the existing CA certificate, and the client certificate and key for the cluster. These artifacts should be located in the `/etc/ssl/etcd` directory if the Tectonic cluster was set up using self-signed certificates. | ||
To force the various deployments to restart and pick up the new TLS assets, force the rotation of the various components. Note that the API server may become temporarily unavailable after this action. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To force the deployments to restart and pick up the new TLS assets, force the rotation of the deployments' components.
Documentation/tls/rotate-tls.md
Outdated
``` | ||
|
||
Generate the new certificate and private key using the CFSSL utility. | ||
The addresses of a cluster's etcd instances be found by inspecting the API server's `--etcd-servers` flag. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inspect the API server's --etcd-servers
flag to find the address of a cluster's etcd instances.
Documentation/tls/rotate-tls.md
Outdated
``` | ||
|
||
### Client | ||
Finally, for each etcd instance, rotate the peer and serving certs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
: not .
Docs updated. |
Okay it's been a bit. I'm merging this tomorrow afternoon unless someone says otherwise. |
This is a revamp of our TLS rotation docs. I've been testing them on more recent clusters (1.8.x) on AWS.
Etcd rotation instructions will be added in a bit, but I'd like early feedback.
@kbrwn for testing
@robszumski for general review
@zbwright for docs