Skip to content
This repository has been archived by the owner on Feb 5, 2020. It is now read-only.

tls: add scripts and instructions for rotating certificates #155

Merged
merged 1 commit into from
Apr 20, 2018
Merged

tls: add scripts and instructions for rotating certificates #155

merged 1 commit into from
Apr 20, 2018

Conversation

ericchiang
Copy link
Contributor

@ericchiang ericchiang commented Mar 29, 2018

This is a revamp of our TLS rotation docs. I've been testing them on more recent clusters (1.8.x) on AWS.

Etcd rotation instructions will be added in a bit, but I'd like early feedback.

@kbrwn for testing
@robszumski for general review
@zbwright for docs

@ericchiang
Copy link
Contributor Author

ericchiang commented Mar 29, 2018

Some open questions:

  • Where should these scripts live?
  • Does this work on other clouds?

Start by reviewing the general [TLS documentation][tls-certs] and the [TLS topology][tls-topology] for Tectonic to identify the various certificates in the cluster.

We will be using the [CFSSL][cfssl-util] utility to view and manage the certificates, which may be downloaded from [https://pkg.cfssl.org/][cfssl-package].
__WARNING:__ Rotating certificates by hand can break component connectivity and leave the cluster in an unrecoverable state. Before performing any of these instructions on a live cluster backup your cluster state and migrate critical workloads to another cluster.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we link to backup docs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have them and I imagine they're custom to different cluster setups. E.g. if you've got stateless apps it's just the manifests. If you've got persistent volumes you'll need to backup the data.


Copy the archive with the new certificates into place and change to the
directory.
__WARNING:__ you MUST use `kubectl apply` in the following command and NOT other `kubectl` creation sub-commands.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: You MUST use

```

Remove the old certificates and unzip the archive with the new certificates.
To force the various deployments to restart and pick-up the new TLS assets, force the rotation of the various components. Note that the API server may become temporarily unavailable after this action.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... pick up ...

sudo chown etcd: peer.* server.*
ls -lAh
```
Unlike other cluster components, kubelets are configured through host files and require SSH access to the modify. Because Tectonic often deploys worker nodes behind firewalls, this document assumes using one of the control plane nodes as a [bastion host][bastion-host] for access to the cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... require SSH access to modify. ...

base domain is "example.com"

CLUSTER_NAME Name of the cluster. If your API server is running on the
domain "my-cluster-k8s.example.com" the name of the cluster
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... "my-cluster-k8s.example.com", the name ...

rm $CERT_DIR/serial*
rm $CERT_DIR/*.csr

# Use openssl for base64'ing instead of base64 which has different wrap behavior
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenSSL

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we actually mean openssl here (i.e. the CLI binary).

@ericchiang
Copy link
Contributor Author

It was pointed out that kube-proxy reuses the kubelet's certs https://github.com/coreos/tectonic-installer/blob/1.8.9-tectonic.1/modules/bootkube/resources/manifests/kube-proxy.yaml#L31

Need to roll that daemonset as well.

@ericchiang
Copy link
Contributor Author

This document is almost ready to merge, but has a bug in it that keeps bricking my clusters...

I cannot stress this warning at the beginning more

__WARNING:__ Rotating certificates by hand can break component connectivity and leave the cluster in an unrecoverable state. Before performing any of these instructions on a live cluster backup your cluster state and migrate critical workloads to another cluster.

@ericchiang
Copy link
Contributor Author

This is done. @zbwright would you take a look one last time?

@justaugustus
Copy link

@ericchiang Do you have any context on why the clusters are getting bricked?
A warning like that seems super troubling.

@ericchiang
Copy link
Contributor Author

ericchiang commented Apr 6, 2018 via email

@ericchiang
Copy link
Contributor Author

To be clear that bug I mentioned earlier was resolved, but still I'd tread very carefully here.

@justaugustus
Copy link

@ericchiang cool, cool. Thanks for the clarification!

@ericchiang
Copy link
Contributor Author

bumping this thread. there was some interest in more testing beside's me just doing it. did that ever get planned/done?

@justaugustus
Copy link

justaugustus commented Apr 11, 2018

@ericchiang I was hope to get some field validation on this, but as it's only Dan (and me, in a diminished capacity), I don't think we can commit to any testing in the near-term, so don't block this on my account.

@kbrwn mentioned he was working with someone, but would need to check-in again to try out the etcd rotation.

openssl x509 -in $CERT -noout -text > "${CERT%.crt}.txt"
done

# Use openssl for base64'ing instead of base64 which has different wrap behavior
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenSSL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"openssl" is the tool, right?

```

This will generate several files in the current directory.
The scripts creates a directory of generated TLS assets. If you provided the etcd CA, this will include etcd certificates and manifest patches.
Copy link
Contributor

@zbwright zbwright Apr 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either 'script creates' or 'scripts create'. I believe it's the former.


## etcd
## Rotating certificates for Tectonic and Kubernetes components.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no periods in headers

### Verify cluster health

First, verify the current health of the etcd cluster. Connect to one of the etcd members of the cluster using SSH.
__WARNING:__ The following commands MUST use `kubectl patch` and NOT other `kubectl` creation sub-commands.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subcommands

```

etcd clusters should be configured to require client authentication. Therefore, we will need the existing CA certificate, and the client certificate and key for the cluster. These artifacts should be located in the `/etc/ssl/etcd` directory if the Tectonic cluster was set up using self-signed certificates.
To force the various deployments to restart and pick up the new TLS assets, force the rotation of the various components. Note that the API server may become temporarily unavailable after this action.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To force the deployments to restart and pick up the new TLS assets, force the rotation of the deployments' components.

```

Generate the new certificate and private key using the CFSSL utility.
The addresses of a cluster's etcd instances be found by inspecting the API server's `--etcd-servers` flag.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inspect the API server's --etcd-servers flag to find the address of a cluster's etcd instances.

```

### Client
Finally, for each etcd instance, rotate the peer and serving certs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

: not .

@ericchiang
Copy link
Contributor Author

Docs updated.

@ericchiang
Copy link
Contributor Author

Okay it's been a bit. I'm merging this tomorrow afternoon unless someone says otherwise.

@ericchiang ericchiang merged commit 56f7173 into coreos:master Apr 20, 2018
@ericchiang ericchiang deleted the rotate-tls branch April 20, 2018 23:22
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants