Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS certificates hot reloading for Hubble and Relay #13249

Merged
merged 4 commits into from
Oct 6, 2020

Conversation

kaworu
Copy link
Member

@kaworu kaworu commented Sep 22, 2020

Fix #13092

@kaworu kaworu added wip priority/medium This is considered important, but not urgent. kind/feature This introduces new functionality. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/hubble Impacts hubble server or relay labels Sep 22, 2020
@kaworu kaworu self-assigned this Sep 22, 2020
@kaworu kaworu force-pushed the pr/kaworu/tls-files-hot-reloading branch 4 times, most recently from 0749f4e to 8332879 Compare September 22, 2020 15:17
Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. I know this is WIP, but a very simple unit test which creates files via ioutil.TempDir would be great to have as well.

The current code assumes that the files already exist (which is fine for the hot-reloading use-case). I wonder however if we could have the interface also work somehow where you would only get a WatchedConfig pointer when the files actually exist and the certs/keys have been read successfully. For example, NewWatchedConfig could return a <-chan *WatchedConfig.

Copy link
Member

@rolinh rolinh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is still WIP but I left a few comments nonetheless. I have some ideas to make this a bit more generic but I'd rather take this offline rather than trying to detail it here.

pkg/tls/config.go Outdated Show resolved Hide resolved
pkg/tls/config.go Outdated Show resolved Hide resolved
pkg/tls/config.go Outdated Show resolved Hide resolved
@kaworu kaworu force-pushed the pr/kaworu/tls-files-hot-reloading branch 10 times, most recently from 3c8287c to e24ff0f Compare September 25, 2020 12:18
@kaworu kaworu removed the wip label Sep 25, 2020
@kaworu kaworu marked this pull request as ready for review September 25, 2020 12:22
@kaworu kaworu requested a review from a team September 25, 2020 12:22
@kaworu kaworu requested a review from a team as a code owner September 25, 2020 12:22
@kaworu kaworu force-pushed the pr/kaworu/tls-files-hot-reloading branch from 5c182c5 to f5625df Compare October 5, 2020 16:33
The certloader package aim to provide a facility to ease dynamic
tls.Config handling.

The main goal is to "smoothly" handle TLS certificates rotation, i.e.
without service interruption and without severing ongoing connections
(aka "hot reloading").

This initial implementation brings support for file-backed TLS
configuration watched for changes through fsnotify. The server and
client configurations are separated into different interfaces to avoid
misuse.

Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
Preparation for tls certificates hot reloading.

Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
Signed-off-by: Alexandre Perrin <alex@kaworu.ch>
@kaworu
Copy link
Member Author

kaworu commented Oct 5, 2020

Unit tests added.

I think overall we have a good enough coverage of the certloader package. There are a couple of untested scenarios outside of our current use-cases; like non-mutual TLS client/server configuration generation, partial Reload (because k8s update all the files at once through symlink'ing), etc.

@kaworu kaworu requested a review from rolinh October 5, 2020 16:39
@kaworu
Copy link
Member Author

kaworu commented Oct 5, 2020

test-me-please

Copy link
Member

@rolinh rolinh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR: all of this looks very solid to me! Nice work! 🚀

I tested this feature extensively by triggering certs generation via helm upgrade:

$ for name in $(kubectl -n kube-system get secrets -l app.kubernetes.io/managed-by=Helm -o json | jq ".items[].metadata.name"); do echo $name; kubectl -n kube-system get secrets $(echo $name | tr -d \" ) -o json | jq -r '.data["tls.crt"]' | base64 -d | openssl x509 -noout -text | grep -A 1 Serial; done 
"hubble-relay-client-certs"
        Serial Number:
            8c:5f:2d:3f:3e:6f:39:91:85:b9:25:32:45:21:b5:7c
"hubble-relay-server-certs"
        Serial Number:
            9b:15:56:f0:c6:5d:41:b9:df:84:a4:aa:48:29:06:6a
"hubble-server-certs"
        Serial Number:
            ea:38:8f:13:f8:f2:40:1a:a5:66:32:b2:9b:db:4b:25
$ helm upgrade cilium ./cilium --namespace kube-system --set global.nodeinit.enabled=true --set global.kubeProxyReplacement=partial --set global.hostServices.enabled=false --set global.externalIPs.enabled=true --set global.nodePort.enabled=true --set global.hostPort.enabled=true --set global.pullPolicy=IfNotPresent --set config.ipam=kubernetes --set global.hubble.enabled=true --set global.hubble.listenAddress=":4244" --set global.hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,http}" --set global.hubble.relay.enabled=true --set global.hubble.ui.enabled=true --set global.hubble.relay.tls.enabled=true
$ for name in $(kubectl -n kube-system get secrets -l app.kubernetes.io/managed-by=Helm -o json | jq ".items[].metadata.name"); do echo $name; kubectl -n kube-system get secrets $(echo $name | tr -d \" ) -o json | jq -r '.data["tls.crt"]' | base64 -d | openssl x509 -noout -text | grep -A 1 Serial; done
"hubble-relay-client-certs"
        Serial Number:
            05:de:64:f1:21:31:bc:e2:36:4c:2a:7f:3b:c1:de:51
"hubble-relay-server-certs"
        Serial Number:
            7a:48:15:f7:96:4b:69:f4:fe:d1:e2:f1:75:51:eb:78
"hubble-server-certs"
        Serial Number:
            c6:48:98:76:c1:04:df:4b:78:89:9c:a0:5e:72:7f:c5

From the above output, we can see that each cert was regenerated. In the meantime, I ran hubble observe -f against Hubble Relay: no connection loss (which is expected).

I then deleted a cilium pod to force a re-connection from Hubble Relay with the new certs: success!

Next test is to kubectl exec into a cilium pod that was not restarted after a helm upgrade ... (which generates new certs) and ensure that it now uses the new cert:

# openssl s_client -connect localhost:4244 -showcerts 2>/dev/null | openssl x509 -noout -text | grep -A 1 Serial 
        Serial Number:
            c6:48:98:76:c1:04:df:4b:78:89:9c:a0:5e:72:7f:c5

@rolinh rolinh removed the priority/medium This is considered important, but not urgent. label Oct 6, 2020
Copy link
Member

@fristonio fristonio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!!! LGTM 🚀 🚀 :shipit:

@rolinh
Copy link
Member

rolinh commented Oct 6, 2020

netnext failure is a flake, marking as ready to merge.

@rolinh rolinh added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Oct 6, 2020
@gandro gandro merged commit 8a1d6d0 into master Oct 6, 2020
@gandro gandro deleted the pr/kaworu/tls-files-hot-reloading branch October 6, 2020 12:47
gandro added a commit that referenced this pull request Oct 7, 2020
Since #13249, if Hubble is configured to serve the API via mTLS, it is
able to delay starting the server until the certificates are present on
the file system.

As such, we can now mark the mounts for the certificates as optional, as
the we can already start Cilium agent without Hubble server. The Hubble
server will be started automatically once the TLS secrets are ready and
the files are mounted by Kubernetes.

The current Hubble server implementation prints an info log message if
the certificate files are not present after 30 seconds. This allows
users to troubleshoot if the Hubble server is not started due to missing
certificates.

The mounts are still required for Hubble Relay. In contrast to
cilium-agent, it cannot operate without the certificate present.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit that referenced this pull request Oct 13, 2020
Since #13249, if Hubble is configured to serve the API via mTLS, it is
able to delay starting the server until the certificates are present on
the file system.

As such, we can now mark the mounts for the certificates as optional, as
the we can already start Cilium agent without Hubble server. The Hubble
server will be started automatically once the TLS secrets are ready and
the files are mounted by Kubernetes.

The current Hubble server implementation prints an info log message if
the certificate files are not present after 30 seconds. This allows
users to troubleshoot if the Hubble server is not started due to missing
certificates.

The mounts are still required for Hubble Relay. In contrast to
cilium-agent, it cannot operate without the certificate present.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit that referenced this pull request Oct 15, 2020
Since #13249, if Hubble is configured to serve the API via mTLS, it is
able to delay starting the server until the certificates are present on
the file system.

As such, we can now mark the mounts for the certificates as optional, as
the we can already start Cilium agent without Hubble server. The Hubble
server will be started automatically once the TLS secrets are ready and
the files are mounted by Kubernetes.

The current Hubble server implementation prints an info log message if
the certificate files are not present after 30 seconds. This allows
users to troubleshoot if the Hubble server is not started due to missing
certificates.

The mounts are still required for Hubble Relay. In contrast to
cilium-agent, it cannot operate without the certificate present.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit that referenced this pull request Oct 15, 2020
This adds a new method to automatically generate mTLS certificates for
Hubble at run-time without relying on Helm. This for example is useful
when a pregenerated YAML manifest (e.g.  `experimentall-install.yaml`)
which with the Helm based would re-use the same set of certificates for
all installations.

This new approach generates the certificates at run-time instead. When
`hubble.tls.auto.method` is set to `cronJob`, the generated YAML will
create a Kubernetes Job to generate the certificates at installation
time. The job is running in host namespace and will store the
certificates as Kubernetes secrets. Once the secerts are creatd,
Kubernetes will automatically mount the generated certificates into the
already running Cilium pod, allowing it to pick them up and start
serving the Hubble API with mTLS (c.f. #13249).

In addition to the one-shot job to generate the initial set of
certificates, an additional recurring Kubernetes CronJob is deployed to
regenerate the certificate in a regular interval (regardless of the
expiration date). This cronjob can be optionally disabled in Helm by
unsetting `hubble.tls.auto.cronJob.schedule`.

The source code for the cron job docker image can be found at
https://github.com/cilium/certgen.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit that referenced this pull request Oct 16, 2020
Since #13249, if Hubble is configured to serve the API via mTLS, it is
able to delay starting the server until the certificates are present on
the file system.

As such, we can now mark the mounts for the certificates as optional, as
the we can already start Cilium agent without Hubble server. The Hubble
server will be started automatically once the TLS secrets are ready and
the files are mounted by Kubernetes.

The current Hubble server implementation prints an info log message if
the certificate files are not present after 30 seconds. This allows
users to troubleshoot if the Hubble server is not started due to missing
certificates.

The mounts are still required for Hubble Relay. In contrast to
cilium-agent, it cannot operate without the certificate present.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit that referenced this pull request Oct 16, 2020
This adds a new method to automatically generate mTLS certificates for
Hubble at run-time without relying on Helm. This for example is useful
when a pregenerated YAML manifest (e.g.  `experimentall-install.yaml`)
which with the Helm based would re-use the same set of certificates for
all installations.

This new approach generates the certificates at run-time instead. When
`hubble.tls.auto.method` is set to `cronJob`, the generated YAML will
create a Kubernetes Job to generate the certificates at installation
time. The job is running in host namespace and will store the
certificates as Kubernetes secrets. Once the secerts are creatd,
Kubernetes will automatically mount the generated certificates into the
already running Cilium pod, allowing it to pick them up and start
serving the Hubble API with mTLS (c.f. #13249).

In addition to the one-shot job to generate the initial set of
certificates, an additional recurring Kubernetes CronJob is deployed to
regenerate the certificate in a regular interval (regardless of the
expiration date). This cronjob can be optionally disabled in Helm by
unsetting `hubble.tls.auto.cronJob.schedule`.

The source code for the cron job docker image can be found at
https://github.com/cilium/certgen.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
joestringer pushed a commit that referenced this pull request Oct 16, 2020
Since #13249, if Hubble is configured to serve the API via mTLS, it is
able to delay starting the server until the certificates are present on
the file system.

As such, we can now mark the mounts for the certificates as optional, as
the we can already start Cilium agent without Hubble server. The Hubble
server will be started automatically once the TLS secrets are ready and
the files are mounted by Kubernetes.

The current Hubble server implementation prints an info log message if
the certificate files are not present after 30 seconds. This allows
users to troubleshoot if the Hubble server is not started due to missing
certificates.

The mounts are still required for Hubble Relay. In contrast to
cilium-agent, it cannot operate without the certificate present.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
joestringer pushed a commit that referenced this pull request Oct 16, 2020
This adds a new method to automatically generate mTLS certificates for
Hubble at run-time without relying on Helm. This for example is useful
when a pregenerated YAML manifest (e.g.  `experimentall-install.yaml`)
which with the Helm based would re-use the same set of certificates for
all installations.

This new approach generates the certificates at run-time instead. When
`hubble.tls.auto.method` is set to `cronJob`, the generated YAML will
create a Kubernetes Job to generate the certificates at installation
time. The job is running in host namespace and will store the
certificates as Kubernetes secrets. Once the secerts are creatd,
Kubernetes will automatically mount the generated certificates into the
already running Cilium pod, allowing it to pick them up and start
serving the Hubble API with mTLS (c.f. #13249).

In addition to the one-shot job to generate the initial set of
certificates, an additional recurring Kubernetes CronJob is deployed to
regenerate the certificate in a regular interval (regardless of the
expiration date). This cronjob can be optionally disabled in Helm by
unsetting `hubble.tls.auto.cronJob.schedule`.

The source code for the cron job docker image can be found at
https://github.com/cilium/certgen.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature This introduces new functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/hubble Impacts hubble server or relay
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hubble: add support for TLS certificates hot reloading
6 participants