Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Kube-Prometheus-Stack monitoring #10

Open
Luzifer opened this issue Nov 18, 2023 · 6 comments
Open

Add support for Kube-Prometheus-Stack monitoring #10

Luzifer opened this issue Nov 18, 2023 · 6 comments

Comments

@Luzifer
Copy link

Luzifer commented Nov 18, 2023

As a systems operator I want to be able to monitor my applications.

With release v1.10.0 OTS gained Prometheus metrics export which can be scraped by the Kube-Prometheus-Stack in an automated way without the OPS having to configure that themselves.

Acceptance Criteria

  • A ServiceMonitor is added to the chart scraping /metrics on the pods
  • A Grafana Dashboard is built to visualize the metrics exposed by OTS and added as an annotated ConfigMap
  • All of the above must be enabled through a flag in the .Values (default: enabled = false)
  • When enabled the customization option metricsAllowedSubnets should automatically be enabled for cluster-local access (usual setup is to have the cluster run on 10.0.0.0/8 so allowing that should be sufficient though this almost certainly also exposes the metrics to the public so there needs to be a blocker in the ingress to block access to /metrics)
  • A way to add alerting rules should be added but no default rules should be specified.

As a side-effect of enabling the metricsAllowedSubnets in customizations it might be a good idea to provide a map in the Values to set customization options which then are {{ .Values.customizations | toYaml }} rendered into a config-map (after being joined with the chart-set customizations) and mounted into the pod.

@hypery2k
Copy link
Contributor

thanks for your suggestion. I wasn't aware that the metrics are now exposed.

Will try to incooperate the proposed changes next week

@hypery2k
Copy link
Contributor

I made a first draft (Did not tried it yet). Also the grafana and alerting is missing.

Regarding the "subnet" allow list I'm a little unsure. I don't see a benefit of this setting. If you allow the service subnet it would also be accessible via the ingress. Then you have to also block it ingress-specific. Sounds a little to complicated.

Why not just checking the HTTP Request Header for x-forwarded-host? If the request gets through the ingress this value is set.

What do you think?

@Luzifer
Copy link
Author

Luzifer commented Nov 25, 2023

Well, not everything is Kubernetes and never ever trust that header without previously whitelisting the proxies which can send that header (which then would be just another setting configuring IPs / Subnets) aside of that in some scenarios you just don't have that header / the real IP. That setting is to - by default - cover up the metrics handler.

Just ensure that /metrics cannot be accessed through the Ingress and open the setting so a cluster-internal Prometheus can access it.

@hypery2k
Copy link
Contributor

yeah, that's correct. The problem is that the there is no ingress-indepent configuration option for that

I made a simple, but not really elegent approach:

# redirect metrics path to non existing service to prevent public access: That will return a "503 Service Temporarily Unavailable" on that specific URL
- path: /metrics
pathType: Prefix
backend:
service:
name: {{ include "ots.fullname" . }}-non-existing
port:
name: app

@hypery2k
Copy link
Contributor

hypery2k commented Nov 28, 2023

@Luzifer Would it be okay for you to split the Monitoring + Grafana/Alerting features?

I'm currently not having that much time for the complete solution and would like to publish an release with this PR: #11

@hypery2k
Copy link
Contributor

@Luzifer with the latest release I still get issues with Prometheus:

expected a valid start token, got "\x1f" ("INVALID") while parsing: "\x1f"

I think this needs to be handled at your side, see #11 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants