New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Operator modular metrics #28005
Operator modular metrics #28005
Conversation
d21ca48
to
4d57317
Compare
9711f5e
to
84659a9
Compare
Rebased to fix conflicts after merging of #26836 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pippolo84 Nice work Fabio! Just a couple comments...
Each cell that adds metrics through cell.Metric refers the global variable metrics.Namespace. This variable is set at startup with the value "cilium", that is, the agent metrics namespace. For cells that are also part of the operator hive, like hive/job, this is incorrect. This commit overwrites the metrics.Namespace global variable with the correct one for the operator before running the hive. Doing this a separate global variable for the operator namespace is not needed anymore, so the unmodularized operator code is updated to refer to the global metrics.Namespace. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Add a cell to provide global metrics registry and prometheus metrics server for the operator hive. This makes possible to use cell.Metric to export metrics in the operator cells. Since there are parts of the operator code not yet modularized, the legacy metrics variables and the global registry are kept exported, in order to be directly used by legacy code. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Operator legacy metrics have been changed to be declared as metrics.NoOp*. Thus, even when the metrics are not enabled in the operator, all the usual methods can be safely called on them. Therefore, there is no need to check the EnableMetrics option every time a metric is referred in the operator code. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
84659a9
to
919c751
Compare
Thanks for the feedback. Fixed it! |
Added a commit to remove the stale |
Operator metrics have been switched to metrics.NoOp* types, so there is no need to keep each metrics operation behind a flag. The enableMetrics field is a leftover from the previous metrics management and thus should be removed. Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
51209e7
to
311263e
Compare
/test |
Conformance Ingress failure tracked here, rerunning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pippolo84 LGTM. Thanks for the updates!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPAM changes look good
I resolved a pending conversation to make this mergeable: https://github.com/cilium/cilium/pull/28005/files#r1322543478 |
Add a cell to provide global metrics registry and prometheus metrics server for the operator hive.
This makes possible to use
cell.Metric
to export metrics in the operator cells.Since there are parts of the operator code not yet modularized, the legacy metrics variables and the global registry are kept exported, in order to be directly used by legacy code. This approach is inspired by the agent metrics cell.
Besides, this PR fixes the operator metrics namespace, setting it early at startup, so that all the code shared between other components (e.g: the agent) will report the correct
cilium_operator
namespace when updating the metrics.Here a query that shows both the correct namespace for all the metrics and also all the metrics registered with
cell.Metric
(like the ones in theoperator/pkg/lbipam
cell) showing up: