Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helm: Add example dashboards as chart options #23794

Merged
merged 1 commit into from Mar 13, 2023
Merged

helm: Add example dashboards as chart options #23794

merged 1 commit into from Mar 13, 2023

Conversation

jcpunk
Copy link
Contributor

@jcpunk jcpunk commented Feb 15, 2023

With this patch, helm users can enable reporting on their clusters with example grafana dashboards.

These dashboards were copied from examples/kubernetes/addons/prometheus/files/grafana-dashboards/

Fixes: #21921

Integration of sample dashboards with Helm chart

@jcpunk jcpunk requested review from a team as code owners February 15, 2023 20:53
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 15, 2023
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Feb 15, 2023
Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I'll ping some folks more familiar with the Grafana integration to also take a look at this.

Since you copied the Dashboards from examples - I think it would make sense to remove them there and update the docs to the new Helm chart options instead. Otherwise we would have to maintain two copies of the dashboards in the tree.

@gandro gandro added release-note/minor This PR changes functionality that users may find relevant to operating Cilium. area/metrics Impacts statistics / metrics gathering, eg via Prometheus. area/helm Impacts helm charts and user deployment experience labels Feb 16, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Feb 16, 2023
Copy link
Contributor

@lambdanis lambdanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you copied the Dashboards from examples - I think it would make sense to remove them there and update the docs to the new Helm chart options instead. Otherwise we would have to maintain two copies of the dashboards in the tree.

+1, I don't how files in examples are used (are there any scripts or docs depending on them?), but maintaining two copies of each dashboard should be avoidable.

I'm not sure what's the quality of these "example" dashboards, @tommyp1ckles might have more input on that. If a dashboard is included in a Helm chart, I would expect it to be well-thought with prod usage in mind, not just a dump of metrics. I know such dashboards for Cilium exist, but I'm not sure if that's the case for these examples.

Also a housekeeping note: I left a longer comment in the referenced issue (#21921), but this PR seems to be more directly relevant for another one: #20354

install/kubernetes/cilium/values.yaml Outdated Show resolved Hide resolved
Copy link
Contributor

@lambdanis lambdanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The change looks good to me.

The quality of these dashboards is an open question, but if better dashboards exist they can be commited separately. Same about the dashboards duplication (it seems that hubble-l7-http-metrics-by-workload.json is already in both places, so this PR is not introducing duplication; but it should be resolved, especially if the examples are actually used my some scripts or docs).

@jcpunk
Copy link
Contributor Author

jcpunk commented Feb 16, 2023

Would dropping soft links in the examples directory work for now? That would preserve the paths for evaluation by folks who use them.

PR updated with links

@lambdanis
Copy link
Contributor

lambdanis commented Feb 16, 2023

@jcpunk One more thing, when you update the Helm values you need to update the Helm values docs too:

make -C Documentation update-helm-values
make -C install/kubernetes docs

I think that would be it, can you commit the generated docs changes?

I hope switching to symlinks in examples won't break anything, either way this is the right direction IMO.

@jcpunk
Copy link
Contributor Author

jcpunk commented Feb 16, 2023

.... I've got to be doing something wrong here:

[riehecky@leibniz cilium]$ make -C Documentation update-helm-values
make: Entering directory '/tmp/foo/cilium/Documentation'
docker container run --rm --workdir /src/Documentation --volume /tmp/foo/cilium/Documentation/..:/src --env READTHEDOCS_VERSION= --env SKIP_LINT= --user "1000:1000" "quay.io/cilium/helm-toolbox:"v1.0.1"@sha256:"a1483fc036b15c290326ba7ed36b8f0b714e185913b72a6074c04225d74e034c"" helm-docs -d -c /src/install/kubernetes -t /src/Documentation/helm-values.tmp.tmpl > helm-values.tmp
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
time="2023-02-16T17:54:44Z" level=fatal msg="error finding chart directories: open /src/install/kubernetes: permission denied"
make: *** [Makefile:108: helm-values.rst] Error 1
make: Leaving directory '/tmp/foo/cilium/Documentation'
[riehecky@leibniz cilium]$ make -C install/kubernetes doc
make: Entering directory '/tmp/foo/cilium/install/kubernetes'
make: *** No rule to make target 'doc'.  Stop.
make: Leaving directory '/tmp/foo/cilium/install/kubernetes'

@lambdanis
Copy link
Contributor

.... I've got to be doing something wrong here:

nope, my typo, docs with an s

@jcpunk
Copy link
Contributor Author

jcpunk commented Feb 16, 2023

Any guesses on:

time="2023-02-16T18:05:07Z" level=fatal msg="error finding chart directories: open /src/install/kubernetes: permission denied"

I've given you merge rights to my repo if my local system is just too wonky....

@lambdanis
Copy link
Contributor

lambdanis commented Feb 16, 2023

@jcpunk I don't know, make dev-doctor might have some hints or someone else will know for sure.

@jcpunk
Copy link
Contributor Author

jcpunk commented Feb 16, 2023

I think I got it to run... the files changed at least.

Copy link
Contributor

@lambdanis lambdanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now.
There are some more dashboards in a Grafana mixin btw: https://github.com/grafana/jsonnet-libs/tree/master/cilium-enterprise-mixin/dashboards The dashboards here should be updated too at some point, but not necessarily in this PR.

@lambdanis
Copy link
Contributor

/test

Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've left some more feedback

@lambdanis
Copy link
Contributor

lambdanis commented Feb 20, 2023

/test

Job 'Cilium-PR-K8s-1.25-kernel-4.19' failed:

Click to show.

Test Name

K8sDatapathVerifier Runs the kernel verifier against Cilium's BPF datapath

Failure Output

FAIL: terminating containers are not deleted after timeout

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.25-kernel-4.19 so I can create one.

@lambdanis
Copy link
Contributor

/test

Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! One last typo

install/kubernetes/cilium/values.yaml.tmpl Outdated Show resolved Hide resolved
@jcpunk
Copy link
Contributor Author

jcpunk commented Feb 27, 2023

Updated and rebased

Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Some more documentation nits, and then this should be good to go I think!

install/kubernetes/cilium/values.yaml Outdated Show resolved Hide resolved
install/kubernetes/cilium/values.yaml Show resolved Hide resolved
@jcpunk
Copy link
Contributor Author

jcpunk commented Feb 28, 2023

Done

Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@gandro
Copy link
Member

gandro commented Feb 28, 2023

/test

@gandro
Copy link
Member

gandro commented Mar 1, 2023

Seems like there are still some issues with the generated code:

https://github.com/cilium/cilium/actions/runs/4294347264/jobs/7484211056

+++ /home/runner/work/cilium/cilium/install/kubernetes/cilium/values.yaml	2023-02-28 15:44:31.598361649 +0000
@@ -1696,7 +1696,7 @@
   enabled: false
   label: grafana_dashboard
   namespace: ~
-  labelValue: "1"
+  labelValue: "1" 
   annotations: {}
 
 # -- Configure Istio proxy options.
@@ -2057,7 +2057,6 @@
       relabelings: ~
       # -- Metrics relabeling configs for the ServiceMonitor cilium-operator
       metricRelabelings: ~
-
   # -- Grafana dashboards for cilium-operator
   # grafana can import dashboards based on the label and value
   # ref: https://github.com/grafana/helm-charts/tree/main/charts/grafana#sidecar-for-dashboards
@@ -2065,7 +2064,7 @@
     enabled: false
     label: grafana_dashboard
     namespace: ~
-    labelValue: "1" 
+    labelValue: "1"
     annotations: {}
 
   # -- Skip CRDs creation for cilium-operator

make: *** [Makefile:49: check-values-yaml] Error 1
error: cilium/values.yaml has been modified
Make sure to apply your changes to cilium/values.yaml.tmpl and run 'make -C install/kubernetes cilium/values.yaml' to regenerate.

@jcpunk
Copy link
Contributor Author

jcpunk commented Mar 2, 2023

I think I've got it sorted...

@gandro
Copy link
Member

gandro commented Mar 2, 2023

/test

@gandro
Copy link
Member

gandro commented Mar 2, 2023

The issue seems to persist unfortunately.

Locally, I can fix it via rm install/kubernetes/cilium/values.yaml; make -C install/kubernetes cilium/values.yaml

With this patch, helm users can enable reporting
on their clusters with example grafana dashboards.

Fixes: #21921
Fixes: #20354

Signed-off-by: Pat Riehecky <riehecky@fnal.gov>
@jcpunk
Copy link
Contributor Author

jcpunk commented Mar 2, 2023

That is strange... I've run that by hand and hopefully that helps.

@gandro
Copy link
Member

gandro commented Mar 2, 2023

Looks good now!

@gandro
Copy link
Member

gandro commented Mar 2, 2023

/test

Job 'Cilium-PR-K8s-1.16-kernel-4.19' failed:

Click to show.

Test Name

K8sAgentPolicyTest Multi-node policy test with L7 policy using connectivity-check to check datapath

Failure Output

FAIL: connectivity-check pods are not ready after timeout

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.16-kernel-4.19 so I can create one.

@gandro
Copy link
Member

gandro commented Mar 6, 2023

@tommyp1ckles Ping for review

@tommyp1ckles
Copy link
Contributor

tommyp1ckles commented Mar 9, 2023

@lambdanis

I'm not sure what's the quality of these "example" dashboards, @tommyp1ckles might have more input on that. If a dashboard is included in a Helm chart, I would expect it to be well-thought with prod usage in mind, not just a dump of metrics. I know such dashboards for Cilium exist, but I'm not sure if that's the case for these examples.

They appear similar to the other Isovalent dashboards here: https://grafana.com/orgs/isovalent/dashboards. Not 100% sure what the relationship is.

Overall they're ok, for basic use cases but some obvious flaws jump out (same issues as the linked ones):

  • Grouping by pods scales poorly for large clusters, or setups with multiple clusters.
  • More things should be parameterized, such as cluster, Agent Pod, etc...
  • Lots of arbitrary looking interval values, should probably be dynamic or $__interval?

Anyway, changes LGTM 🙏

@gandro
Copy link
Member

gandro commented Mar 9, 2023

I'm marking this ready to merge. There are a newer test suites which weren't part of the branch when this PR was opened, so those can be ignored. The other test failures are unrelated:

@gandro gandro added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 9, 2023
@jrajahalme jrajahalme merged commit bdbb129 into cilium:master Mar 13, 2023
@jcpunk jcpunk deleted the dashboards branch March 13, 2023 13:29
@sathieu
Copy link
Contributor

sathieu commented Apr 4, 2023

@jcpunk I think there is a bug in this PR.

There is {{- $files := .Files.Glob "files/cilium-dashboards/*.json" }}:

{{- $files := .Files.Glob "files/cilium-dashboards/*.json" }}

While the dashboard are in files/cilium-agent/dashboards.

Could you fix this?

@jcpunk
Copy link
Contributor Author

jcpunk commented Apr 4, 2023

#24733 opened

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/helm Impacts helm charts and user deployment experience area/metrics Impacts statistics / metrics gathering, eg via Prometheus. kind/community-contribution This was a contribution made by a community member. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CFP: cilium-grafana chart
6 participants