New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
services: Fix metric from not publishing #26719
services: Fix metric from not publishing #26719
Conversation
dd8255b
to
38e71ab
Compare
Odd. I'm reading the docs on
Do we do anything like call Reset or Delete*? |
@@ -33,12 +33,6 @@ import ( | |||
|
|||
const anyPort = "*" | |||
|
|||
var ( | |||
updateMetric = metrics.ServicesCount.WithLabelValues("update") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is called before NewLegacyMetrics
initializes ServicesCount, and calls NoOpCounterVec.WithLabelValues
(returns NoOpGauge
). That's the reason services_events_total
is missing. Is my understanding correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think you're right. I opened a small thread about this in #development and that's what Andre suspected as well. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @chancez
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. How can we prevent this from reoccurring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too sure to be honest. We have https://github.com/cilium/customvet where we can write our own linting checkers. However, this is the only case that I've found in the code base. Moving forward there is an effort to move metrics into their modules (see #25651) which should help, but techincally not a guarantee.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seams quite reasonable to me. I guess I didn't question the structure, but now that I think about it a bit, they're likely defined this way (NoOp) because of the problem of having global metrics defined in one place rather than per module. Due to them being exposed as global vars, they can be accessed at any time without any guarantee that they've been initialized, so NoOp wrapper was created to prevent panics / crashes in case that happened. That's my guess.
cc @aanm (for adding NoOp) and @dylandreimerink (for whether we still need the idea of "NoOp" metrics going forward)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The option is to get rid of these NoOp metrics potentially. The approach we're using is a bit odd from my perspective but I haven't really dug into why we do it this way
The NoOp metrics exist as a stopgap, currently a lot of the code base still uses global variables to get metrics which is subject to ordering.
they're likely defined this way (NoOp) because of the problem of having global metrics defined in one place rather than per module. Due to them being exposed as global vars, they can be accessed at any time without any guarantee that they've been initialized, so NoOp wrapper was created to prevent panics / crashes in case that happened.
Yes, that is indeed correct.
Ideally, all metrics are provided to components via parameters during construction, but this conversion takes some time. We can remove NoOps once all global metric variables are gone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an intermediate step I mentioned that wasn't addressed: You can have globals, without NoOp metrics, but not call registry.Register(yourMetric)
. This avoids ordering issues and nil globals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, I've updated the second commit's description to describe the global variable vs initialization ordering as the root cause of the bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an intermediate step I mentioned that wasn't addressed: You can have globals, without NoOp metrics, but not call registry.Register(yourMetric). This avoids ordering issues and nil globals.
Yes, you could init the global var and then upon creating the registry determine if you want to add that metric to the registry or not.
I am not sure why NoOps were chosen over that approach, it is not something invented for the modular metrics, it was already the way things worked. The main thing that changed was init ordering.
I would argue that removing all global variables in favor of dependency injection ASAP is a better use of cycles then refactoring the NoOps away first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good to me
The metric name is called "cilium_services_events_total" yet the variable name is called ServicesCount. The typical code pattern is to name the variable after a substring of the metric name for ease of grep. This commit does so and is a non-functional change. Signed-off-by: Chris Tarazi <chris@isovalent.com>
8b79650
to
4f66806
Compare
When the metric variable is defined as a global variable (within the `var` scope at the package level), then it will be instantiated as a NoOp metric. Once the metrics package is initialized, then all the metrics variables will transition from NoOp metrics to a real metric type. This problem occurred because the global variables instantiation happened before the metrics package initialization. This commit fixes it by using the metrics variable after the metrics package has been initialized. We can assume it's been initialized when the code executed is production ("live") code. Fixes: cilium#26511 Fixes: 978b27c ("Metrics: Add services metrics") Signed-off-by: Chris Tarazi <chris@isovalent.com>
4f66806
to
722fcea
Compare
@chancez Is there anything that you think is blocking merging this PR? |
/test |
Fixes: #26511