Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

install: Provide quick-hubble-install.yaml for Relay and UI #14221

Merged

Conversation

gandro
Copy link
Member

@gandro gandro commented Nov 30, 2020

This PR aims to streamline the installation of Hubble (Relay and UI) a bit by providing a quick-hubble-install.yaml which can be applied on top of the existing quick-install.yaml:

$ kubectl create -f quick-install.yaml
serviceaccount/cilium created
serviceaccount/cilium-operator created
configmap/cilium-config created
clusterrole.rbac.authorization.k8s.io/cilium created
clusterrole.rbac.authorization.k8s.io/cilium-operator created
clusterrolebinding.rbac.authorization.k8s.io/cilium created
clusterrolebinding.rbac.authorization.k8s.io/cilium-operator created
daemonset.apps/cilium created
deployment.apps/cilium-operator created
$ kubectl create -f quick-hubble-install.yaml 
serviceaccount/hubble-generate-certs created
serviceaccount/hubble-relay created
serviceaccount/hubble-ui created
configmap/hubble-relay-config created
configmap/hubble-ui-envoy created
clusterrole.rbac.authorization.k8s.io/hubble-generate-certs created
clusterrole.rbac.authorization.k8s.io/hubble-relay created
clusterrole.rbac.authorization.k8s.io/hubble-ui created
clusterrolebinding.rbac.authorization.k8s.io/hubble-generate-certs created
clusterrolebinding.rbac.authorization.k8s.io/hubble-relay created
clusterrolebinding.rbac.authorization.k8s.io/hubble-ui created
service/hubble-relay created
service/hubble-ui created
deployment.apps/hubble-relay created
deployment.apps/hubble-ui created
job.batch/hubble-generate-certs created
cronjob.batch/hubble-generate-certs created

To make the above work, the listenAddress needs to be set by default, as otherwise installing Relay requires modifications to the cilium DaemonSet. The Hubble listen address is mTLS protected, therefore the overall operational impact of this should be low:

hubble.listenAddress can safely be enabled without providing the mTLS certificates and keys used to protect it. In that case, the port will not be opened until the certificates are provided (e.g. by creating the necessary K8s secrets), which can be done without a restart of cilium-agent.

When installing Cilium via Helm, the certificates and keys are automatically generated at installation time (hubble.tls.auto).
If Cilium is installed via quick-install.yaml, the certificates and keys are not part of the pre-generated YAML to avoid accidental key-reuse. They will be generated via certgen in the quick-hubble-install.yaml instead.

The documentation is updated to refer to the quick-hubble-install.yaml where it makes sense. See commit messages for more details.

Unresolved questions:

  • Do we want to backport this to 1.9?
  • If we do want to backport this to 1.9, the documentation likely needs further tweaks. Otherwise a user might try to apply the quick-hubble-install.yaml over an older 1.9.0 installation, which will not work (as listenAddress is not set in 1.9.0).

The `certloader` package used by Hubble to load the certificates is
watching the file system for certificate updates. In case certificates
do not exist yet, it is able to hold off creating the API endpoint until
the certificates are created (e.g. by the `certgen` K8s cronjob).

With this change, the daemon will print the message about missing
certificates only once, instead of every 30 seconds. This is to simplify
the deployment of Hubble Relay (see subsequent commits).

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
To use Hubble Relay for cluster-wide visibility, the Cilium agent needs
to expose the Hubble API on a (mTLS-protected) port, which is done via
the `listenAddress` Helm value.

Unfortunately, enabling that port requires changes to the Cilium agent
DaemonSet, making it quite involved to enable Hubble Relay on an
existing installation if Cilium was installed without Helm.

This commit therefore enables `hubble.listenAddress` by default. The
address is mTLS protected, therefore the overall operational impact of
this is low.

`hubble.listenAddress` can safely be enabled without providing the mTLS
certificates and keys used to protect it. In that case, the port will
not be opened until the certificates are provided (e.g. by creating the
necessary K8s secrets), which can be done without a restart of
cilium-agent.

When installing Cilium via Helm, the certificates and keys are
automatically generated at installation time (`hubble.tls.auto`).

If Cilium is installed via quick-install.yaml, the certificates and keys
are _not_ part of the pre-generated YAML to avoid accidental key-reuse.
However, they can easily be generated after installation either manually
with `kubectl create secret` or via `certgen`. Helper commands and
documentation for this are added in subsequent commits.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This commit adds a new `quick-hubble-install.yaml` which is intended to
be installed on top of a existing `quick-install.yaml` to deploy Hubble
Relay and UI. It deploys `certgen` to create the necessary certificates
in the cluster.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro added release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/hubble Impacts hubble server or relay area/helm Impacts helm charts and user deployment experience labels Nov 30, 2020
@gandro gandro requested review from a team as code owners November 30, 2020 15:57
@gandro gandro requested review from a team, qmonnet, kaworu and nathanjsweet November 30, 2020 15:57
@maintainer-s-little-helper maintainer-s-little-helper bot added this to In progress in 1.10.0 Nov 30, 2020
@gandro gandro force-pushed the pr/gandro/enable-hubble-listen-address-by-default branch 2 times, most recently from d7ba39a to 75f993e Compare November 30, 2020 16:06
Copy link
Contributor

@michi-covalent michi-covalent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm just one minor question 🚀

Documentation/spelling_wordlist.txt Outdated Show resolved Hide resolved
Documentation/gettingstarted/hubble-enable.rst Outdated Show resolved Hide resolved
@gandro gandro force-pushed the pr/gandro/enable-hubble-listen-address-by-default branch from 75f993e to 5f00eee Compare November 30, 2020 16:15
@@ -159,6 +159,7 @@ contributors across the globe, there is almost always someone available to help.
| hostServices.enabled | bool | `false` | Enable host reachable services. |
| hostServices.protocols | string | `"tcp,udp"` | Supported list of protocols to apply ClusterIP translation to. |
| hubble.enabled | bool | `true` | Enable Hubble (true by default). |
| hubble.listenAddress | string | `":4244"` | An additional address for Hubble to listen to. Set this field ":4244" if you are enabling Hubble Relay, as it assumes that Hubble is listening on port 4244. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not directly related to this PR, but i guess relay can now use hubble.listenAddress to figure out which port hubble is listening on instead of assuming 4244 cc @rolinh

Copy link
Member

@rolinh rolinh Dec 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hubble Relay is relying solely on the peer service and making it rely on other sources could create inconsistencies.
IIRC, the peer service already uses this value and assume it's the same one for every peer (I'd need to check to be sure, it could also just be using the default value for the port in the code).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rolinh The peer service doesn't provide the port in the ChangeNotification, only the IP address:

// nodeAddress returns the node's address. If the node has both IPv4 and IPv6
// addresses, IPv4 takes priority.
func nodeAddress(n types.Node) string {
addr := n.GetNodeIP(false)
if addr.To4() != nil {
return addr.String()
}
addr = n.GetNodeIP(true)
if addr == nil {
return ""
}
return addr.String()
}

Then relay assume the default port (hardcoded constant):
Port: defaults.ServerPort,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaworu Ah right, I remember now. I think this information should be added to the peer service. IIRC, Relay uses the default value for the port if the peer service does not provide this information already.

Copy link
Member

@kaworu kaworu Dec 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gandro
Copy link
Member Author

gandro commented Nov 30, 2020

test-me-please

Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

Copy link
Member

@rolinh rolinh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome, thanks for taking care of this including updating the documentation!! 🚀
Everything looks good to me.

@@ -159,6 +159,7 @@ contributors across the globe, there is almost always someone available to help.
| hostServices.enabled | bool | `false` | Enable host reachable services. |
| hostServices.protocols | string | `"tcp,udp"` | Supported list of protocols to apply ClusterIP translation to. |
| hubble.enabled | bool | `true` | Enable Hubble (true by default). |
| hubble.listenAddress | string | `":4244"` | An additional address for Hubble to listen to. Set this field ":4244" if you are enabling Hubble Relay, as it assumes that Hubble is listening on port 4244. |
Copy link
Member

@rolinh rolinh Dec 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hubble Relay is relying solely on the peer service and making it rely on other sources could create inconsistencies.
IIRC, the peer service already uses this value and assume it's the same one for every peer (I'd need to check to be sure, it could also just be using the default value for the port in the code).

Copy link
Member

@sayboras sayboras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Can you also help to add one more entry to avoid drift ?

- name: quick-install
template: install/kubernetes/quick-install.yaml
- name: experimental-install
template: install/kubernetes/experimental-install.yaml

@gandro gandro requested a review from a team as a code owner December 1, 2020 12:21
@gandro gandro requested a review from kkourt December 1, 2020 12:21
@gandro
Copy link
Member Author

gandro commented Dec 1, 2020

LGTM.

Can you also help to add one more entry to avoid drift ?

Done! Thanks!

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This adds the `quick-hubble-install.yaml` installation method to the
"Enable Hubble" section.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
This adds the hubble-enable.rst section to more installation guides.

The Hubble section is still missing from some installation methods, such
as kops, microk8s, and okd/openshift, as they rely neither on `helm
install`, nor `quick-install.yaml` and likely need custom instructions.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro force-pushed the pr/gandro/enable-hubble-listen-address-by-default branch from d7675f5 to 70a2acb Compare December 1, 2020 12:28
@gandro
Copy link
Member Author

gandro commented Dec 1, 2020

test-me-please

@gandro gandro added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Dec 2, 2020
@joestringer joestringer merged commit 1c55bc2 into master Dec 2, 2020
@joestringer joestringer deleted the pr/gandro/enable-hubble-listen-address-by-default branch December 2, 2020 23:53
@joestringer
Copy link
Member

Do we still need experimental-install.yaml?

@rolinh
Copy link
Member

rolinh commented Dec 3, 2020

Do we still need experimental-install.yaml?

I'd say so, yes. This allows to enable beta/experimental features to make it easier for users to try out.

@joestringer
Copy link
Member

Do we want to backport this to 1.9?

This seems like a pretty solid usability win. I've seen users on Slack who install Cilium via kops or use GKE dataplane v2 wondering how they can enable hubble in their clusters without reworking the way that they deploy Cilium. I presume those are the kind of cases that this PR is aimed to simplify?

There's certainly an open question about default-deploying with the additional socket. Performance-wise I don't think it will make a meaningful difference as hubble is already enabled by default in the agent. Security-wise I understand we require mTLS on the socket but I don't have too much more understanding of the implications on that side.

The other side is that we won't do 1.10 for a few months so Hubble will be naturally a little harder to deploy during that period. Then the sooner we make the change, the more impact that choice has.

If we do want to backport this to 1.9, the documentation likely needs further tweaks. Otherwise a user might try to apply the quick-hubble-install.yaml over an older 1.9.0 installation, which will not work (as listenAddress is not set in 1.9.0).

For this reason, if we decide to backport, I would suggest splitting the functional changes from the docs, do the functional changes first, then release 1.9.2, then backport & fix up the docs. Otherwise we would need to add "Upgrade to 1.9.2" type statements into the docs prior to the 1.9.2 release, which would be immediately rendered on http://docs.cilium.io/en/v1.9 , and would be very confusing for the period between merging the changes and actually releasing them.

@rolinh
Copy link
Member

rolinh commented Dec 8, 2020

I presume those are the kind of cases that this PR is aimed to simplify?

Exactly.

I also think it's a very solid usability win. Without this PR, it's virtually impossible for users who don't use Helm to enable Hubble Relay/Hubble UI.

Security-wise I understand we require mTLS on the socket but I don't have too much more understanding of the implications on that side.

This means a TLS certificate trusted by the CA needs to be presented by a client to connect to a Hubble endpoint or the access is denied. Only Hubble Relay is allowed access. The important point to note though is that until a TLS keypair is added for Hubble server, the agent won't listen on port :4244.

@gandro
Copy link
Member Author

gandro commented Dec 8, 2020

The important point to note though is that until a TLS keypair is added for Hubble server, the agent won't listen on port :4244.

Yes. With the way this PR is set up right now, we are creating the mTLS secrets via Helm even if the user will never install Relay. This is something we could and probably should tweak a bit and ensure that the server certificates are only generated by Helm if hubble.relay=enabled.

In that case, the Hubble API port will never even be opened until the secrets are created (either by running helm upgrade --reuse-values --set hubble.relay=enabled or by applying quick-hubble-install.yaml), but it can be opened without a agent restart. This works because the daemon is watching the file system for certificates:

cilium/daemon/cmd/hubble.go

Lines 188 to 197 in 70a2acb

waitingMsgTimeout := time.After(30 * time.Second)
for tlsServerConfig == nil {
select {
case tlsServerConfig = <-tlsServerConfigChan:
case <-waitingMsgTimeout:
logger.Info("Waiting for Hubble server TLS certificate and key files to be created")
case <-d.ctx.Done():
return
}
}

For this reason, if we decide to backport, I would suggest splitting the functional changes from the docs, do the functional changes first, then release 1.9.2, then backport & fix up the docs.

Very good point!

@rolinh
Copy link
Member

rolinh commented Dec 8, 2020

This is something we could and probably tweak a bit and ensure that the server certificates are only generated by Helm if hubble.relay=enabled.

This is a very good point. I think it indeed makes sense to add an {{ if .Values.hubble.relay.enabled }} around the TLS secrets because we only want Hubble Relay to consume from TCP 4244 on agent instances. This also implies that users who want to open port 4244 on the agent will be forced to enable Hubble Relay. I think this is fine as we want to discourage listening on 4244 if not using Hubble Relay but we may want to be more loose about this although I don't see a valid use-case for it.

gandro added a commit that referenced this pull request Dec 14, 2020
The Hubble API exposed by Cilium Agent and used by Hubble Relay is
protected by mTLS by default. Since #14221, the Cilium DaemonSet is
configured to open that API endpoint by default, this makes the
deployment of Relay on top of an already existing Cilium installation
easier.

However, because the Hubble API is only ever accessed by Relay, there is
no point in deploying the mTLS secrets before Relay is deployed. Thus,
this commit only deploys the Helm-generated mTLS secrets when Relay is
enabled.

This also means that listening on the Hubble API port is will be delayed
until the certs are in created in Kubernetes. Once the certs are
created, the port will be opened without a pod restart, as we are
watching for the certificate creation on the volume mount via the
`certloader` package.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit that referenced this pull request Dec 15, 2020
The Hubble API exposed by Cilium Agent and used by Hubble Relay is
protected by mTLS by default. Since #14221, the Cilium DaemonSet is
configured to open that API endpoint by default, this makes the
deployment of Relay on top of an already existing Cilium installation
easier.

However, because the Hubble API is only ever accessed by Relay, there is
no point in deploying the mTLS secrets before Relay is deployed. Thus,
this commit only deploys the Helm-generated mTLS secrets when Relay is
enabled.

This also means that listening on the Hubble API port is will be delayed
until the certs are in created in Kubernetes. Once the certs are
created, the port will be opened without a pod restart, as we are
watching for the certificate creation on the volume mount via the
`certloader` package.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit that referenced this pull request Dec 17, 2020
The Hubble API exposed by Cilium Agent and used by Hubble Relay is
protected by mTLS by default. Since #14221, the Cilium DaemonSet is
configured to open that API endpoint by default, this makes the
deployment of Relay on top of an already existing Cilium installation
easier.

However, because the Hubble API is only ever accessed by Relay, there is
no point in deploying the mTLS secrets before Relay is deployed. Thus,
this commit only deploys the Helm-generated mTLS secrets when Relay is
enabled.

This also means that listening on the Hubble API port is will be delayed
until the certs are in created in Kubernetes. Once the certs are
created, the port will be opened without a pod restart, as we are
watching for the certificate creation on the volume mount via the
`certloader` package.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
gandro added a commit that referenced this pull request Dec 17, 2020
[ upstream commit de8b14e ]

The Hubble API exposed by Cilium Agent and used by Hubble Relay is
protected by mTLS by default. Since #14221, the Cilium DaemonSet is
configured to open that API endpoint by default, this makes the
deployment of Relay on top of an already existing Cilium installation
easier.

However, because the Hubble API is only ever accessed by Relay, there is
no point in deploying the mTLS secrets before Relay is deployed. Thus,
this commit only deploys the Helm-generated mTLS secrets when Relay is
enabled.

This also means that listening on the Hubble API port is will be delayed
until the certs are in created in Kubernetes. Once the certs are
created, the port will be opened without a pod restart, as we are
watching for the certificate creation on the volume mount via the
`certloader` package.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
aanm pushed a commit that referenced this pull request Dec 21, 2020
[ upstream commit de8b14e ]

The Hubble API exposed by Cilium Agent and used by Hubble Relay is
protected by mTLS by default. Since #14221, the Cilium DaemonSet is
configured to open that API endpoint by default, this makes the
deployment of Relay on top of an already existing Cilium installation
easier.

However, because the Hubble API is only ever accessed by Relay, there is
no point in deploying the mTLS secrets before Relay is deployed. Thus,
this commit only deploys the Helm-generated mTLS secrets when Relay is
enabled.

This also means that listening on the Hubble API port is will be delayed
until the certs are in created in Kubernetes. Once the certs are
created, the port will be opened without a pod restart, as we are
watching for the certificate creation on the volume mount via the
`certloader` package.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/helm Impacts helm charts and user deployment experience ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/hubble Impacts hubble server or relay
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

10 participants