Decouple custom resource config with kube-state-metrics deployment #2050

CatherineF-dev · 2023-04-18T11:56:41Z

What would you like to be added:

Why is this needed:
Add a custom resource metric requires changing KSM deployment.
In a company with platform monitoring team and application team, it means application team needs to change codes managed by platform monitoring team.

Describe the solution you'd like
#2049

Additional context

mrueg · 2023-04-19T09:51:40Z

Thanks for coming up with a possible solution here @CatherineF-dev @chrischdi
Some thoughts on this approach and the design:

I feel like this approach definitely improves the situation, coming with a drawback of added complexity to kube-state-metrics which might make it more difficult to maintain though.

I would prefer a solution that mimics a similar design as prometheus and prometheus-operator. This will reduce complexity and simplify things, as well as follow the unix philosphy "Make each program do one thing well".

kube-state-metrics-operator could support two CRDs:

CustomResourceMonitor, which matches your design
KubeStateMetrics, which sets up new instances of kube-state-metrics

A couple of use cases:

Large fleet of nodes and you need to separate kube-state-metrics out into multiple deployments (e.g. one is a daemonset collecting pod metrics, while the other one is a deployment collecting api resource info, while the third one is dealing with custom-resources)
Multi-tenancy and you want to have one kube-state-metrics deployment per namespace
The operator further could support an AdmissionWebhook to make sure no invalid CustomResourceMonitor gets added and takes kube-state-metrics down

What do you think?

CatherineF-dev · 2023-04-19T14:17:21Z

I think we can split it into kube-state-metrics and kube-state-metrics-operator in the future if it's needed. Because

It's a breaking change (users need to deploy kube-state-metrics-operator) and needs to be in a major release (3.x).
Prometheus-operator manages many components (prometheus, alert-manager and related monitoring components) and many CRDs. kube-state-metrics' complexity seems fine so far.
Scaling model still needs some improvements, which might affect the design of kube-state-metrics-operator.
A new operator also brings some maintenance work.

One possible plan is:

v2.10: add CustomResourceMonitor feat: add CustomResourceMonitor CRD (WIP) #2049
v3.x: discuss whether we need to introduce kube-state-metrics-operator

Given kube-state-metrics-operator is a breaking change, we can migrate CustomResourceMonitor CRD into kube-state-metrics-operator in v3.x

mrueg · 2023-04-19T15:15:06Z

Could you elaborate why you think kube-state-metrics-operator is a breaking change to kube-state-metrics?

I would have assumed the operator is maintained in a separate repository, similar to https://github.com/prometheus-operator/prometheus-operator and https://github.com/prometheus/prometheus - completely decoupled.

This would allow to work independently on it, be on different release cycles.

CatherineF-dev · 2023-04-19T15:17:38Z

Could you elaborate why you think kube-state-metrics-operator is a breaking change to kube-state-metrics?

I thought KubeStateMetrics, which sets up new instances of kube-state-metrics is related to auto-scaling, which will be moved into operator.

mrueg · 2023-04-19T15:21:44Z

Could you elaborate why you think kube-state-metrics-operator is a breaking change to kube-state-metrics?

I thought KubeStateMetrics, which sets up new instances of kube-state-metrics is related to auto-scaling, which will be moved into operator.

Ah understood. Let me explain this better:
KubeStateMetrics is a CRD similar to https://github.com/prometheus-operator/prometheus-operator/blob/main/example/prometheus-operator-crd/monitoring.coreos.com_prometheuses.yaml used by the operator to spin up new instances of kube-state-metrics.

CatherineF-dev · 2023-04-19T15:26:22Z

Okay, I think both KubeStateMetrics and auto-sharding should be moved into kube-state-metrics-operator. So that kube-state-metrics-operator responsibility is managing pods.

CustomResourceMonitor is similar with other metrics, which maybe can be kept in kube-state-metrics. So that kube-state-metrics responsibility is collecting metrics.

nathanperkins · 2023-05-18T19:15:57Z

Some perspectives I hope are helpful in deciding where to put this feature.

Configuring components through a (potentially giant) yaml blob in a config file that is mounted via a config map is not really friendly UX. I don't get any help from my IDE on linting or auto-completion; it's just a big blob of text.
CRDs are so much more friendly and the natural UX that Kubernetes engineers are familiar with.
CRDs are easier to deal with when you are providing metrics configs for API types coming from multiple components / projects / repos. Each one gets a separate CR which you can decide to deploy or ignore as you like based on labels.
CRD generation and validation is easier to automate.
- FR: CustomResourceMonitor CRs could be generated from go type definitions using comment tags (this would be awesome)!

For these reasons, CRDs should be the first class API for configuring custom resource monitoring. I wouldn't want to have to deploy an entire separate operator pod (wasting resources) just to get a friendly UX.

dashpole · 2023-07-27T17:00:22Z

/assign @CatherineF-dev
/triage accepted

CatherineF-dev added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 18, 2023

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 18, 2023

CatherineF-dev linked a pull request Apr 18, 2023 that will close this issue

feat: add CustomResourceMonitor CRD (WIP) #2049

Open

k8s-ci-robot assigned CatherineF-dev Jul 27, 2023

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple custom resource config with kube-state-metrics deployment #2050

Decouple custom resource config with kube-state-metrics deployment #2050

CatherineF-dev commented Apr 18, 2023 •

edited

mrueg commented Apr 19, 2023

CatherineF-dev commented Apr 19, 2023

mrueg commented Apr 19, 2023

CatherineF-dev commented Apr 19, 2023 •

edited

mrueg commented Apr 19, 2023

CatherineF-dev commented Apr 19, 2023

nathanperkins commented May 18, 2023 •

edited

dashpole commented Jul 27, 2023

Decouple custom resource config with kube-state-metrics deployment #2050

Decouple custom resource config with kube-state-metrics deployment #2050

Comments

CatherineF-dev commented Apr 18, 2023 • edited

mrueg commented Apr 19, 2023

CatherineF-dev commented Apr 19, 2023

mrueg commented Apr 19, 2023

CatherineF-dev commented Apr 19, 2023 • edited

mrueg commented Apr 19, 2023

CatherineF-dev commented Apr 19, 2023

nathanperkins commented May 18, 2023 • edited

dashpole commented Jul 27, 2023

CatherineF-dev commented Apr 18, 2023 •

edited

CatherineF-dev commented Apr 19, 2023 •

edited

nathanperkins commented May 18, 2023 •

edited