New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metrics: Metrics initial modularization #25651
metrics: Metrics initial modularization #25651
Conversation
b74cfa3
to
65b6515
Compare
d37cfc1
to
293b856
Compare
outTyp = outTyp.Elem() | ||
} | ||
|
||
if outTyp.Kind() != reflect.Struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meta comment that isn't specific to this PR, more of a discussion. Any time we design an API where the constraints can only be enforced at runtime we lose developer productivity / velocity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good point. I try to use types wherever I can to make it clear to users what can and can't be used at compile time. But at this moment we just do not have that capability to the depth I would like.
In this case, we could also have gone for a cell.Config
like model where the generic type must implement a given interface and have the interface return a slice of metrics. The very first iteration worked like this, but it requires the user to write a lot of extra boilerplate which I disliked.
At the moment I feel like I have to pick between two evils, this being the lesser from my point of view. But I am open to discussing this further, perhaps on slack somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
contributing part lgtm too
882c3d7
to
9a2c9da
Compare
/test Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed: Click to show.Test Name
Failure Output
Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/334/ If it is a flake and a GitHub issue doesn't already exist to track it, comment Then please upload the Jenkins artifacts to that issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc changes look good, thanks!
The few nits below are not blocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for cli
9a2c9da
to
0961cfb
Compare
What is with?
|
a9769da
to
b4f30d8
Compare
/test |
I think in the last rebase you included one commit more than intended, hence the tophat requirement. |
This commit adds the metric cell type which can be used to make metrics available to hive. The cell works by giving it a constructor function just like providing a normal type. The returned type should contain public metric fields, the cell will enforce this. The cell will make both the populated metric struct as well as all of the individual metrics available in the DAG. The idea being that users would use the struct so they can access just the metrics they are interested in. The metrics registry will collect all individual metrics and register them. We currently have the concept of metrics being enabled/disabled by default unless configured otherwise. To support this concept in a modular way we need additional metadata for metrics. Therefor all metrics should be defined with the new `metric` package which contains types that wrap existing prometheus metrics and it adds our own options such as `Disabled` and `ConfigName`. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
This commit modularizes the metrics package. We now have a registry cell which collects all metrics exposed via `cell.Metric` and registers them with an internal prometheus registry. This registry operates independently now, the HTTP server is part of its lifecycle and the daemon and daemon config are no longer involved directly. We still have some metric types which implement dynamic registration and logic external to the metrics package. The global functions now await a promise to the Registry so they still work until we can replace/ refactor these external dynamic metrics. All of the metrics defined in the metrics package have been transformed into `metrics/metric` types and are make available via a Metrics cell called `LegacyMetics`. This cell will also assign the individual metrics to the existing global variables so they can be used until the callsites can be adjusted to take `LegacyMetrics` as dependency or to split off some of the metrics contained inside. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
This commits adds a new section to the "Guide to Hive" page explaining the new `cell.Metric` cell type. Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
b4f30d8
to
33da195
Compare
I excellently did a |
/test |
This PR is the first step towards full modularization of metrics. It is a smaller version of #23936 to make review easier. This PR makes the registry into a cell, yet still registers this cell as global variable so the global functions stay working. So in essence we are in a sort of hybrid mode so we do not have to modify as much metric call sites.
This version adds the
cell.Metric
cell-type which allows developers to define metrics wherever they want. This type works just like acell.Provide
, you give it a constructor which returns a struct containing metrics. The main difference is that both the struct and its fields are provided to the DAG so the registry can collect the individual metrics while normal cells can use the struct as param and have named metrics.To provide support for disabled-by-default metrics, we added wrapper types around existing prometheus types, defined in
pkg/metrics/metric
. They are a super-set of their normal counterparts so they don't break the existing call sites. Having this config inside the metrics options at creation time eliminates the need for the "is enabled" feedback to the central config. So the metrics are not fully standalone without dependency on the Daemon it init or the Daemon config.All existing metrics are converted to these new metrics and are made available to Hive via a new
LegacyMetrics
struct. They are also still registered at the old global variables so the call sites do not have to be updated in this PR. Follow-up PRs should incrementally remove usage of these global variables in favor of taking LegacyMetrics as parameter or moving metrics of interest into a dedicated metrics struct closer to the call site.Follow-up PRs should also enrich the metrics type to allow for self documenting functionality like the original PR and all dynamic registration/unregistration should be refactored to use the registry as parameter or to make the collectors smarter so we can move away from global functions to modify the registry.