Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: Metrics initial modularization #25651

Conversation

dylandreimerink
Copy link
Member

@dylandreimerink dylandreimerink commented May 24, 2023

This PR is the first step towards full modularization of metrics. It is a smaller version of #23936 to make review easier. This PR makes the registry into a cell, yet still registers this cell as global variable so the global functions stay working. So in essence we are in a sort of hybrid mode so we do not have to modify as much metric call sites.

This version adds the cell.Metric cell-type which allows developers to define metrics wherever they want. This type works just like a cell.Provide, you give it a constructor which returns a struct containing metrics. The main difference is that both the struct and its fields are provided to the DAG so the registry can collect the individual metrics while normal cells can use the struct as param and have named metrics.

To provide support for disabled-by-default metrics, we added wrapper types around existing prometheus types, defined in pkg/metrics/metric. They are a super-set of their normal counterparts so they don't break the existing call sites. Having this config inside the metrics options at creation time eliminates the need for the "is enabled" feedback to the central config. So the metrics are not fully standalone without dependency on the Daemon it init or the Daemon config.

All existing metrics are converted to these new metrics and are made available to Hive via a new LegacyMetrics struct. They are also still registered at the old global variables so the call sites do not have to be updated in this PR. Follow-up PRs should incrementally remove usage of these global variables in favor of taking LegacyMetrics as parameter or moving metrics of interest into a dedicated metrics struct closer to the call site.

Follow-up PRs should also enrich the metrics type to allow for self documenting functionality like the original PR and all dynamic registration/unregistration should be refactored to use the registry as parameter or to make the collectors smarter so we can move away from global functions to modify the registry.

@dylandreimerink dylandreimerink added the dont-merge/preview-only Only for preview or testing, don't merge it. label May 24, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 24, 2023
@dylandreimerink dylandreimerink force-pushed the feature/modular-metrics-simplified branch 9 times, most recently from b74cfa3 to 65b6515 Compare May 26, 2023 10:07
@dylandreimerink dylandreimerink changed the title metrics: Metrics initial modularization WIP metrics: Metrics initial modularization May 26, 2023
@dylandreimerink dylandreimerink added area/metrics Impacts statistics / metrics gathering, eg via Prometheus. release-note/misc This PR makes changes that have no direct user impact. area/modularization and removed dont-merge/preview-only Only for preview or testing, don't merge it. labels May 26, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 26, 2023
@dylandreimerink dylandreimerink marked this pull request as ready for review May 26, 2023 10:59
@dylandreimerink dylandreimerink requested review from a team as code owners May 26, 2023 10:59
daemon/cmd/daemon_test.go Outdated Show resolved Hide resolved
pkg/bpf/map_linux.go Show resolved Hide resolved
pkg/hive/cell/metric.go Outdated Show resolved Hide resolved
outTyp = outTyp.Elem()
}

if outTyp.Kind() != reflect.Struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meta comment that isn't specific to this PR, more of a discussion. Any time we design an API where the constraints can only be enforced at runtime we lose developer productivity / velocity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good point. I try to use types wherever I can to make it clear to users what can and can't be used at compile time. But at this moment we just do not have that capability to the depth I would like.

In this case, we could also have gone for a cell.Config like model where the generic type must implement a given interface and have the interface return a slice of metrics. The very first iteration worked like this, but it requires the user to write a lot of extra boilerplate which I disliked.

At the moment I feel like I have to pick between two evils, this being the lesser from my point of view. But I am open to discussing this further, perhaps on slack somewhere.

pkg/hive/cell/metric.go Show resolved Hide resolved
pkg/hive/cell/metric.go Show resolved Hide resolved
Copy link
Member

@bimmlerd bimmlerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contributing part lgtm too

daemon/cmd/cells.go Show resolved Hide resolved
@dylandreimerink dylandreimerink force-pushed the feature/modular-metrics-simplified branch 2 times, most recently from 882c3d7 to 9a2c9da Compare May 31, 2023 14:44
@dylandreimerink
Copy link
Member Author

dylandreimerink commented May 31, 2023

/test

Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed:

Click to show.

Test Name

K8sDatapathServicesTest Checks N/S loadbalancing Tests with XDP, direct routing, DSR with Geneve and Maglev

Failure Output

FAIL: Can not connect to service "http://[fd04::11]:30624" from outside cluster (1/10)

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/334/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.26-kernel-net-next so I can create one.

Then please upload the Jenkins artifacts to that issue.

Copy link
Member

@qmonnet qmonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc changes look good, thanks!

The few nits below are not blocking.

pkg/hive/cell/metric.go Outdated Show resolved Hide resolved
pkg/metrics/metric/counter.go Outdated Show resolved Hide resolved
pkg/hive/cell/metric.go Outdated Show resolved Hide resolved
@qmonnet qmonnet removed the request for review from zacharysarah June 1, 2023 10:26
Copy link
Member

@nathanjsweet nathanjsweet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for cli

clustermesh-apiserver/metrics/metrics.go Outdated Show resolved Hide resolved
@nathanjsweet nathanjsweet requested a review from a team June 2, 2023 01:24
@dylandreimerink dylandreimerink force-pushed the feature/modular-metrics-simplified branch from 9a2c9da to 0961cfb Compare June 2, 2023 11:55
@dylandreimerink
Copy link
Member Author

What is with?

nathanjsweet requested a review from https://github.com/orgs/cilium/teams/sig-policy

@dylandreimerink dylandreimerink force-pushed the feature/modular-metrics-simplified branch 2 times, most recently from a9769da to b4f30d8 Compare June 2, 2023 12:13
@dylandreimerink dylandreimerink requested a review from a team as a code owner June 2, 2023 12:13
@dylandreimerink
Copy link
Member Author

/test

@bimmlerd
Copy link
Member

bimmlerd commented Jun 2, 2023

I think in the last rebase you included one commit more than intended, hence the tophat requirement.

This commit adds the metric cell type which can be used to make metrics
available to hive. The cell works by giving it a constructor function
just like providing a normal type. The returned type should contain
public metric fields, the cell will enforce this. The cell will make
both the populated metric struct as well as all of the individual
metrics available in the DAG. The idea being that users would use the
struct so they can access just the metrics they are interested in. The
metrics registry will collect all individual metrics and register them.

We currently have the concept of metrics being enabled/disabled by
default unless configured otherwise. To support this concept in a
modular way we need additional metadata for metrics. Therefor all
metrics should be defined with the new `metric` package which contains
types that wrap existing prometheus metrics and it adds our own options
such as `Disabled` and `ConfigName`.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
This commit modularizes the metrics package. We now have a registry
cell which collects all metrics exposed via `cell.Metric` and registers
them with an internal prometheus registry. This registry operates
independently now, the HTTP server is part of its lifecycle and the
daemon and daemon config are no longer involved directly.

We still have some metric types which implement dynamic registration
and logic external to the metrics package. The global functions now
await a promise to the Registry so they still work until we can replace/
refactor these external dynamic metrics.

All of the metrics defined in the metrics package have been transformed
into `metrics/metric` types and are make available via a Metrics cell
called `LegacyMetics`. This cell will also assign the individual metrics
to the existing global variables so they can be used until the callsites
can be adjusted to take `LegacyMetrics` as dependency or to split off
some of the metrics contained inside.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
This commits adds a new section to the "Guide to Hive" page explaining
the new `cell.Metric` cell type.

Signed-off-by: Dylan Reimerink <dylan.reimerink@isovalent.com>
@dylandreimerink dylandreimerink force-pushed the feature/modular-metrics-simplified branch from b4f30d8 to 33da195 Compare June 2, 2023 15:22
@dylandreimerink dylandreimerink removed the request for review from a team June 2, 2023 15:23
@dylandreimerink
Copy link
Member Author

I think in the last rebase you included one commit more than intended, hence the tophat requirement.

I excellently did a git rebase upstream/master instread of git rebase upstream/main which included that extra commit, thx for the heads up

@dylandreimerink
Copy link
Member Author

/test

@dylandreimerink dylandreimerink merged commit 5551b76 into cilium:main Jun 2, 2023
61 checks passed
@ysksuzuki ysksuzuki mentioned this pull request Aug 1, 2023
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metrics Impacts statistics / metrics gathering, eg via Prometheus. area/modularization release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants