Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hubble: fix enable/disable command when cilium-cli-helm-values does… #1020

Merged
merged 1 commit into from
Sep 19, 2022

Conversation

olga-mir
Copy link
Contributor

@olga-mir olga-mir commented Aug 7, 2022

… not exist

This commit fixes hubble enable and disable commands which are broken
when cilium initially was not installed with cilium-cli. In these cases
cilium-cli-helm-values secret does not exist and all cilium hubble
commands fail.

Since hubble components are cherry-picked from the generated helm manifests,
it should be safe to proceed without the full helm values state.

We should not write the new helm state to the cluster so that if in the
future other components need to use it, we don't inadvertently
over shadow real installation parameters.

Fixes: #959

Signed-off-by: Olga Mirensky 5200844+olga-mir@users.noreply.github.com

@olga-mir olga-mir requested a review from a team as a code owner August 7, 2022 09:04
@olga-mir olga-mir requested a review from michi-covalent August 7, 2022 09:04
@olga-mir olga-mir temporarily deployed to ci August 7, 2022 09:04 Inactive
Copy link
Contributor

@michi-covalent michi-covalent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! i'd like to get this reviewed by @gandro as well he's been looking into this area.

@michi-covalent michi-covalent requested a review from gandro August 7, 2022 09:34
@olga-mir olga-mir temporarily deployed to ci August 7, 2022 21:51 Inactive
Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

This PR falls back to using default configuration for the Cilium ConfigMap. That in turn will lead to unexpected results unfortunately: I tried this PR and it reset the whole Cilium configuration (e.g. disabling all custom features I enabled during install), which I don't think is what most users want. For example, I installed Cilium with PortMap chaining enabled (cni.chainingMode=portmap in Helm), and this was removed after I invoked cilium hubble enable.

The only way to support cilium hubble enable|disable against an unknown installation is that we ensure we do not replace any existing objects (such as the Cilium ConfigMap and DaemonSet), and instead only patch them to add whatever we need for Hubble. That requires a bit more work, but should be doable in practice.

@olga-mir
Copy link
Contributor Author

olga-mir commented Aug 8, 2022

@gandro thanks for testing and for your input. I am keen to have a deeper look into this, unless someone else can provide a faster resolution.
after last fixes in #928 and #964 hubble is the only place which still depends on the cilium-cli-helm-values secret, and even here it should be possible to work without it as you described above. If we can implement these flows without the helm secret in hubble, then it is probably worth getting rid of the secret altogether (don't create it in new clusters and don't look it up for the existing ones). If it can't be done safely then I reckon this use case should be declared as unsupported.
I'll keep working on this and should come back with my findings in a few days.

@gandro
Copy link
Member

gandro commented Aug 9, 2022

@olga-mir Thanks, makes sense. Our long-term plan for the CLI is to migrate it to use fully use Helm in the backend, so instead of having cilium-cli-helm-values, it will rely on Helm's state to perform things like changing configuration or enabling Hubble and essentially become a smart wrapper around helm install/helm upgrade.

Of course, that will require one to have Cilium installed via CLI or Helm (and not e.g. via a statically applied YAML), so the having a fallback which still works even if no Helm state is available would likely be nice to have regardless. But I'm not sure how feasible that will be.

@olga-mir olga-mir marked this pull request as draft August 9, 2022 21:21
@olga-mir olga-mir temporarily deployed to ci August 23, 2022 11:49 Inactive
@olga-mir
Copy link
Contributor Author

Tested this with kind cluster, with custom fields in the configmap which are different from defaults. This PR works in this setup, but I have further questions/concerns, so it's still draft:

  1. this implementation uses hard-coded list of values which is a bit brittle. We can assert minimum version, but how do we keep this updated for future versions? If you had another idea how to cherry-pick only hubble values let me know.
  2. it will proceed with unknown state even if the secret exists but failed to parse, probably unlikely, but possible. We could enumerate possible errors in getHelmState and only act on an error where the secret did not exist.
  3. Needs test and more testing.
  4. I am not sure if in general it is right to support this use case as it still creates a drift. I think this is still a good idea - the drift is minimal required to enable/disable hubble and makes it easier for people to work with cilium (it is not a proper way way to manage hubble in production-like environments anyway). If this is not valuable then I am happy to close the PR and just add a user friendly error message that explains this behavior is intentional.

@gandro
Copy link
Member

gandro commented Aug 23, 2022

Thanks a lot. The diff approach is pretty clever!

  1. this implementation uses hard-coded list of values which is a bit brittle. We can assert minimum version, but how do we keep this updated for future versions? If you had another idea how to cherry-pick only hubble values let me know.

Yeah, it's an unfortunate situation. Overall, I'm not too worried about the set of values, they are somewhat stable between releases, and often fall-back to a sensible default if a value is missing. If more Hubble-related values get added that break things, we'll hopefully catch it in CI.

  1. it will proceed with unknown state even if the secret exists but failed to parse, probably unlikely, but possible. We could enumerate possible errors in getHelmState and only act on an error where the secret did not exist.

If it's not too hard to implement, I think that would be ideal.

  1. Needs test and more testing.

There is also an issue with gofmt according to CI. CI also seems temporarily broken at the moment, but I think that's unrelated to this PR (it's unable to fetch the Helm charts).

  1. I am not sure if in general it is right to support this use case as it still creates a drift.

There is one shortcoming which just patching the ConfigMap - the cilium DaemonSet also needs to be patched depending on the original configuration, namely it needs to mount the Hubble TLS secrets:

https://github.com/cilium/cilium/blob/6d920b25ae069f58c9f98ab87b54c94e7218da05/install/kubernetes/cilium/templates/cilium-agent/daemonset.yaml#L380-L384

But I'm fine to address that in a follow-up PR. Not even sure how to filter out that particular section when diffing the deployment.

Maybe instead of automatically creating a patch based on the old and new Helm state, I would also be fine with having a hand-crafted JSON patch in case we don't have an old Helm state. And just try to apply that over the existing deployment. It would mean different paths depending on if we have the Helm state available or not, but might be more robust.

@olga-mir
Copy link
Contributor Author

olga-mir commented Sep 1, 2022

I think this is fine to leave it without version check. Looking at file blame on master, the changes to this section were done 2 and 3 years ago (with one exception 5 months ago which was only a whitespace change).

I didn't dig into having a hand-crafted JSON patch implementation, but it feels that it will be more awkward as it will require passing a new parameter along few functions calls, from generateManifestsEnable/Disable along with helmMapOpts variable and passed on until configmap patch. It doesn't look like we can just define two const patches for enable and disable.

As for the daemonset bug, it should be ok to leave for a follow-up PR because it is not a regression introduced in this PR. By default if no hubble parameters are specified, hubble is enabled both in helm install and cilium install and hubble tls is mounted in cilium agent. Only if hubble was explicitly disabled at install time (e.g. cilium install --helm-set hubble.enabled=false) then with current cilium-cli cilium hubble enable will not patch the ds to mount the certs.

I did find another issue though which is a regression compared to the current behavior when cilium is installed with cilium-cli.

On released cilium-cli:

$ cilium install
$ cilium hubble enable --ui # works

On this PR:

$ helm install ...
$ ./cilium hubble enable --ui
...
Error: Unable to enable Hubble: services "hubble-peer" already exists

This actually looks like cilium install bug, it should have created hubble-peer service, like helm install does.

Other than that it seems ok, the configmap values are preserved and cilium hubble [enable|disable] commands are working, also when invoked multiple times.

@olga-mir olga-mir marked this pull request as ready for review September 1, 2022 14:29
@olga-mir olga-mir requested a review from a team as a code owner September 1, 2022 14:29

// all hubble values from `cilium-config“ configmap
// https://github.com/cilium/cilium/blob/d9a04be9d714e5f5544cbca7ef8db7a151bfce96/install/kubernetes/cilium/templates/cilium-configmap.yaml#L709-L750
var HubbleKeys = []string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a quick comment as to what this array is for? And, perhaps, a follow-up PR in the helm chart that points users to this code if they add more hubble config keys?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. On second thought - a key is a hubble related key if and only if it has hubble substring in its name, at least up until now this was always true. Would it be less hacky if we just check if the string contains hubble?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bah, I'm not sure. @chancez?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I like that proposal. Feel free to implement. But the manually maintained list also is fine, I don't expect too much churn.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure to be honest. I'd have to review all the flags in more detail. I think if your checking only for hubble then you will also catch the hubble-relay options, so that might be an issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hubble Relay options are stored in a different configmap. This map here is only read by the Cilium Agent and the Operator (the latter does not parse any Hubble values).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I didn't read the whole context, that makes sense.

@maintainer-s-little-helper
Copy link

Commit 6a64764 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@gandro
Copy link
Member

gandro commented Sep 5, 2022

Commit 6a64764 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

This needs to be addressed before the PR can be merged

@olga-mir olga-mir force-pushed the fix/issue-959-fix-hubble-cmd branch from 7cad3d4 to 854c3ca Compare September 9, 2022 20:31
@olga-mir olga-mir temporarily deployed to ci September 9, 2022 20:31 Inactive
@olga-mir olga-mir temporarily deployed to ci September 9, 2022 20:45 Inactive
@olga-mir
Copy link
Contributor Author

olga-mir commented Sep 9, 2022

the newbie mistake - merge commit caused by “update branch” in UI. I’ve removed the commit and had to force push the branch.

@gandro
Copy link
Member

gandro commented Sep 15, 2022

@olga-mir There seems to be an issue with vendoring:

please run 'go mod tidy && go mod vendor', and submit your changes

Might be due to jsonpatch? Or is this a bug in CI? 🤔

@olga-mir olga-mir requested a review from a team as a code owner September 15, 2022 13:12
@olga-mir olga-mir requested review from rolinh and removed request for a team September 15, 2022 13:12
@olga-mir olga-mir temporarily deployed to ci September 15, 2022 13:12 Inactive
Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks! If there are no further changes to the code, could you squash all the commits into one?

… not exist

This commit fixes hubble enable and disable commands which are broken
when cilium initially was not installed with cilium-cli. In these cases
`cilium-cli-helm-values` secret does not exist and all `cilium hubble`
commands fail.

Since hubble components are cherry-picked from the generated helm manifests,
it should be safe to proceed without the full helm values state.

We should not write the new helm state to the cluster so that if in the
future other components need to use it, we don't inadvertently
over shadow real installation parameters.

Fixes: cilium#959

Signed-off-by: Olga Mirensky <5200844+olga-mir@users.noreply.github.com>
@olga-mir olga-mir force-pushed the fix/issue-959-fix-hubble-cmd branch from fdc5e5a to a0846c1 Compare September 16, 2022 08:56
@olga-mir olga-mir temporarily deployed to ci September 16, 2022 08:56 Inactive
@olga-mir
Copy link
Contributor Author

there is no further changes to this PR, if we can leave this #1020 (comment) improvement to a follow up PR.

@gandro
Copy link
Member

gandro commented Sep 19, 2022

there is no further changes to this PR, if we can leave this #1020 (comment) improvement to a follow up PR.

Agreed, that can be done in a follow-up. Let's run CI and then this should be ready to merge. Thanks again @olga-mir!

@gandro gandro added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Sep 19, 2022
@tklauser tklauser merged commit 0e01dc5 into cilium:master Sep 19, 2022
@sayboras
Copy link
Member

Thanks a lot for your fix 💯 , this issue was annoying as part of day to day cli usage 🎖️

@olga-mir olga-mir deleted the fix/issue-959-fix-hubble-cmd branch October 15, 2022 04:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot enable Hubble if Cilium is originally deployed with Hubble disabled
7 participants