Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator modular metrics #28005

Merged
merged 4 commits into from Sep 25, 2023

Conversation

pippolo84
Copy link
Member

@pippolo84 pippolo84 commented Sep 7, 2023

Add a cell to provide global metrics registry and prometheus metrics server for the operator hive.

This makes possible to use cell.Metric to export metrics in the operator cells.

Since there are parts of the operator code not yet modularized, the legacy metrics variables and the global registry are kept exported, in order to be directly used by legacy code. This approach is inspired by the agent metrics cell.

Besides, this PR fixes the operator metrics namespace, setting it early at startup, so that all the code shared between other components (e.g: the agent) will report the correct cilium_operator namespace when updating the metrics.

Here a query that shows both the correct namespace for all the metrics and also all the metrics registered with cell.Metric (like the ones in the operator/pkg/lbipam cell) showing up:

$ curl localhost:9234/v1/metrics
[
    {
        "name": "cilium_operator_ces_queueing_delay_seconds"
    },
    {
        "name": "cilium_operator_ces_sync_errors_total"
    },
    {
        "name": "cilium_operator_lbipam_conflicting_pools_total"
    },
    {
        "name": "cilium_operator_lbipam_services_matching_total"
    },
    {
        "name": "cilium_operator_lbipam_services_unsatisfied_total"
    },
    {
        "name": "cilium_operator_number_of_ceps_per_ces"
    },
    {
        "name": "cilium_operator_process_cpu_seconds_total",
        "value": 1.03
    },
    {
        "name": "cilium_operator_process_max_fds",
        "value": 1048576
    },
    {
        "name": "cilium_operator_process_open_fds",
        "value": 12
    },
    {
        "name": "cilium_operator_process_resident_memory_bytes",
        "value": 95997952
    },
    {
        "name": "cilium_operator_process_start_time_seconds",
        "value": 1694106767.7
    },
    {
        "name": "cilium_operator_process_virtual_memory_bytes",
        "value": 1351573504
    },
    {
        "name": "cilium_operator_process_virtual_memory_max_bytes",
        "value": 18446744073709552000
    }
]

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Sep 7, 2023
@pippolo84 pippolo84 added release-note/minor This PR changes functionality that users may find relevant to operating Cilium. area/metrics Impacts statistics / metrics gathering, eg via Prometheus. area/operator Impacts the cilium-operator component labels Sep 7, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Sep 7, 2023
@pippolo84 pippolo84 force-pushed the pr/pippolo84/operator-modular-metrics branch 3 times, most recently from d21ca48 to 4d57317 Compare September 8, 2023 10:44
@pippolo84 pippolo84 marked this pull request as ready for review September 8, 2023 10:46
@pippolo84 pippolo84 requested review from a team as code owners September 8, 2023 10:46
operator/metrics/cell.go Outdated Show resolved Hide resolved
operator/metrics/legacy.go Show resolved Hide resolved
@pippolo84 pippolo84 force-pushed the pr/pippolo84/operator-modular-metrics branch 2 times, most recently from 9711f5e to 84659a9 Compare September 12, 2023 10:40
@pippolo84
Copy link
Member Author

Rebased to fix conflicts after merging of #26836

Copy link
Contributor

@derailed derailed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pippolo84 Nice work Fabio! Just a couple comments...

operator/metrics/legacy.go Outdated Show resolved Hide resolved
operator/metrics/legacy.go Outdated Show resolved Hide resolved
Each cell that adds metrics through cell.Metric refers the global
variable metrics.Namespace.  This variable is set at startup with the
value "cilium", that is, the agent metrics namespace.  For cells that
are also part of the operator hive, like hive/job, this is incorrect.

This commit overwrites the metrics.Namespace global variable with the
correct one for the operator before running the hive.

Doing this a separate global variable for the operator namespace is not
needed anymore, so the unmodularized operator code is updated to refer
to the global metrics.Namespace.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Add a cell to provide global metrics registry and prometheus metrics
server for the operator hive.

This makes possible to use cell.Metric to export metrics in the operator
cells.

Since there are parts of the operator code not yet modularized, the
legacy metrics variables and the global registry are kept exported, in
order to be directly used by legacy code.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Operator legacy metrics have been changed to be declared as
metrics.NoOp*. Thus, even when the metrics are not enabled in the
operator, all the usual methods can be safely called on them. Therefore,
there is no need to check the EnableMetrics option every time a metric
is referred in the operator code.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
@pippolo84 pippolo84 force-pushed the pr/pippolo84/operator-modular-metrics branch from 84659a9 to 919c751 Compare September 13, 2023 15:22
@pippolo84
Copy link
Member Author

@pippolo84 Nice work Fabio! Just a couple comments...

Thanks for the feedback. Fixed it!

@pippolo84
Copy link
Member Author

Added a commit to remove the stale enableMetrics bool in identitygc (no need to check for metrics enabled flag anymore, since we switched to NoOp* types).

Operator metrics have been switched to metrics.NoOp* types, so there is
no need to keep each metrics operation behind a flag. The enableMetrics
field is a leftover from the previous metrics management and thus should
be removed.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
@pippolo84 pippolo84 force-pushed the pr/pippolo84/operator-modular-metrics branch from 51209e7 to 311263e Compare September 14, 2023 09:46
@pippolo84
Copy link
Member Author

/test

@pippolo84
Copy link
Member Author

Conformance Ingress failure tracked here, rerunning

Copy link
Contributor

@derailed derailed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pippolo84 LGTM. Thanks for the updates!

Copy link
Contributor

@tommyp1ckles tommyp1ckles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IPAM changes look good

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Sep 25, 2023
@lmb
Copy link
Contributor

lmb commented Sep 25, 2023

I resolved a pending conversation to make this mergeable: https://github.com/cilium/cilium/pull/28005/files#r1322543478

@lmb lmb merged commit 69d30fb into cilium:main Sep 25, 2023
59 of 61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metrics Impacts statistics / metrics gathering, eg via Prometheus. area/operator Impacts the cilium-operator component ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants