Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple custom resource config with kube-state-metrics deployment #2050

Open
CatherineF-dev opened this issue Apr 18, 2023 · 8 comments · May be fixed by #2049
Open

Decouple custom resource config with kube-state-metrics deployment #2050

CatherineF-dev opened this issue Apr 18, 2023 · 8 comments · May be fixed by #2049
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@CatherineF-dev
Copy link
Contributor

CatherineF-dev commented Apr 18, 2023

What would you like to be added:

Why is this needed:
Add a custom resource metric requires changing KSM deployment.
In a company with platform monitoring team and application team, it means application team needs to change codes managed by platform monitoring team.

Describe the solution you'd like
#2049

Additional context

@CatherineF-dev CatherineF-dev added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 18, 2023
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 18, 2023
@CatherineF-dev CatherineF-dev linked a pull request Apr 18, 2023 that will close this issue
@mrueg
Copy link
Member

mrueg commented Apr 19, 2023

Thanks for coming up with a possible solution here @CatherineF-dev @chrischdi
Some thoughts on this approach and the design:

I feel like this approach definitely improves the situation, coming with a drawback of added complexity to kube-state-metrics which might make it more difficult to maintain though.

I would prefer a solution that mimics a similar design as prometheus and prometheus-operator. This will reduce complexity and simplify things, as well as follow the unix philosphy "Make each program do one thing well".

kube-state-metrics-operator could support two CRDs:

  • CustomResourceMonitor, which matches your design
  • KubeStateMetrics, which sets up new instances of kube-state-metrics

A couple of use cases:

  • Large fleet of nodes and you need to separate kube-state-metrics out into multiple deployments (e.g. one is a daemonset collecting pod metrics, while the other one is a deployment collecting api resource info, while the third one is dealing with custom-resources)
  • Multi-tenancy and you want to have one kube-state-metrics deployment per namespace
  • The operator further could support an AdmissionWebhook to make sure no invalid CustomResourceMonitor gets added and takes kube-state-metrics down

What do you think?

@CatherineF-dev
Copy link
Contributor Author

I think we can split it into kube-state-metrics and kube-state-metrics-operator in the future if it's needed. Because

  1. It's a breaking change (users need to deploy kube-state-metrics-operator) and needs to be in a major release (3.x).
  2. Prometheus-operator manages many components (prometheus, alert-manager and related monitoring components) and many CRDs. kube-state-metrics' complexity seems fine so far.
  3. Scaling model still needs some improvements, which might affect the design of kube-state-metrics-operator.
  4. A new operator also brings some maintenance work.

One possible plan is:

Given kube-state-metrics-operator is a breaking change, we can migrate CustomResourceMonitor CRD into kube-state-metrics-operator in v3.x

@mrueg
Copy link
Member

mrueg commented Apr 19, 2023

Could you elaborate why you think kube-state-metrics-operator is a breaking change to kube-state-metrics?

I would have assumed the operator is maintained in a separate repository, similar to https://github.com/prometheus-operator/prometheus-operator and https://github.com/prometheus/prometheus - completely decoupled.

This would allow to work independently on it, be on different release cycles.

@CatherineF-dev
Copy link
Contributor Author

CatherineF-dev commented Apr 19, 2023

Could you elaborate why you think kube-state-metrics-operator is a breaking change to kube-state-metrics?

I thought KubeStateMetrics, which sets up new instances of kube-state-metrics is related to auto-scaling, which will be moved into operator.

@mrueg
Copy link
Member

mrueg commented Apr 19, 2023

Could you elaborate why you think kube-state-metrics-operator is a breaking change to kube-state-metrics?

I thought KubeStateMetrics, which sets up new instances of kube-state-metrics is related to auto-scaling, which will be moved into operator.

Ah understood. Let me explain this better:
KubeStateMetrics is a CRD similar to https://github.com/prometheus-operator/prometheus-operator/blob/main/example/prometheus-operator-crd/monitoring.coreos.com_prometheuses.yaml used by the operator to spin up new instances of kube-state-metrics.

@CatherineF-dev
Copy link
Contributor Author

Okay, I think both KubeStateMetrics and auto-sharding should be moved into kube-state-metrics-operator. So that kube-state-metrics-operator responsibility is managing pods.

CustomResourceMonitor is similar with other metrics, which maybe can be kept in kube-state-metrics. So that kube-state-metrics responsibility is collecting metrics.

@nathanperkins
Copy link

nathanperkins commented May 18, 2023

Some perspectives I hope are helpful in deciding where to put this feature.

  • Configuring components through a (potentially giant) yaml blob in a config file that is mounted via a config map is not really friendly UX. I don't get any help from my IDE on linting or auto-completion; it's just a big blob of text.
  • CRDs are so much more friendly and the natural UX that Kubernetes engineers are familiar with.
  • CRDs are easier to deal with when you are providing metrics configs for API types coming from multiple components / projects / repos. Each one gets a separate CR which you can decide to deploy or ignore as you like based on labels.
  • CRD generation and validation is easier to automate.
    • FR: CustomResourceMonitor CRs could be generated from go type definitions using comment tags (this would be awesome)!

For these reasons, CRDs should be the first class API for configuring custom resource monitoring. I wouldn't want to have to deploy an entire separate operator pod (wasting resources) just to get a friendly UX.

@dashpole
Copy link

/assign @CatherineF-dev
/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants