-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export metrics from agent #2127
Conversation
pkg/metrics/util.go
Outdated
log "github.com/sirupsen/logrus" | ||
) | ||
|
||
func Enable(addr string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exported function Enable should have comment or be unexported
pkg/metrics/metrics.go
Outdated
// ConstLabels: prometheus.Labels{"node": "a node ID, from k8s?", | ||
}) | ||
|
||
NumEndpointsRegenerating = prometheus.NewGauge(prometheus.GaugeOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exported var NumEndpointsRegenerating should have comment or be unexported
pkg/metrics/metrics.go
Outdated
// This is the namespace presented to prometheus and should be scoped to us | ||
Namespace = "cilium" | ||
|
||
// Endpoints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment on exported var NumEndpoints should be of the form "NumEndpoints ..."
pkg/metrics/metrics.go
Outdated
) | ||
|
||
var ( | ||
// This is the namespace presented to prometheus and should be scoped to us |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment on exported var Namespace should be of the form "Namespace ..."
pkg/metrics/util.go
Outdated
log "github.com/sirupsen/logrus" | ||
) | ||
|
||
func Enable(addr string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exported function Enable should have comment or be unexported
pkg/metrics/metrics.go
Outdated
// ConstLabels: prometheus.Labels{"node": "a node ID, from k8s?", | ||
}) | ||
|
||
NumEndpointsRegenerating = prometheus.NewGauge(prometheus.GaugeOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exported var NumEndpointsRegenerating should have comment or be unexported
pkg/metrics/metrics.go
Outdated
// This is the namespace presented to prometheus and should be scoped to us | ||
Namespace = "cilium" | ||
|
||
// Endpoints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment on exported var NumEndpoints should be of the form "NumEndpoints ..."
pkg/metrics/metrics.go
Outdated
) | ||
|
||
var ( | ||
// This is the namespace presented to prometheus and should be scoped to us |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment on exported var Namespace should be of the form "Namespace ..."
pkg/metrics/util.go
Outdated
log "github.com/sirupsen/logrus" | ||
) | ||
|
||
func Enable(addr string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exported function Enable should have comment or be unexported
pkg/metrics/metrics.go
Outdated
// ConstLabels: prometheus.Labels{"node": "a node ID, from k8s?", | ||
}) | ||
|
||
NumEndpointsRegenerating = prometheus.NewGauge(prometheus.GaugeOpts{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exported var NumEndpointsRegenerating should have comment or be unexported
pkg/metrics/metrics.go
Outdated
// This is the namespace presented to prometheus and should be scoped to us | ||
Namespace = "cilium" | ||
|
||
// Endpoints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment on exported var NumEndpoints should be of the form "NumEndpoints ..."
pkg/metrics/metrics.go
Outdated
) | ||
|
||
var ( | ||
// This is the namespace presented to prometheus and should be scoped to us |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment on exported var Namespace should be of the form "Namespace ..."
pkg/metrics/metrics.go
Outdated
Help: "Number of endpoints currently regenerating", | ||
}) | ||
|
||
// CountEndpointsRegeneration is a count of the number of times any endpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment on exported var CountEndpointsRegenerations should be of the form "CountEndpointsRegenerations ..."
pkg/metrics/metrics.go
Outdated
var ( | ||
registry = prometheus.NewRegistry() | ||
|
||
// This is the namespace presented to prometheus and should be scoped to us |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment on exported var Namespace should be of the form "Namespace ..."
155dac4
to
3fcce78
Compare
Righto. I have, finally, tested this with a real prometheus deployment. The basic metrics export works and should be reviewed. I will follow up with a PR that includes docs, example prometheus configuration and code changes that might come up. For the numbered points I mention above
After some offline discussion, the answer is no.
This is less automatic but it boils down to: prometheus can be configured to match k8s nodes/pods/deploys etc. and collect metrics from matching entities. In our case, this is easy to setup when running as a DaemonSet but a little more complicated as a host-agent. In both cases, the prometheus configuration can further tag/annotate these metrics.
I disabled this since it cannot be namespaced. This might be better if toggled by the debug flag but I'll leave it disabled for now.
This isn't necessary since the prometheus collector can/will tag these metrics. This also leaves this decision up to the user. |
daemon/k8s_watcher.go
Outdated
@@ -258,6 +259,7 @@ func (d *Daemon) EnableK8sWatcher(reSyncPeriod time.Duration) error { | |||
reSyncPeriod, | |||
cache.ResourceEventHandlerFuncs{ | |||
AddFunc: func(obj interface{}) { | |||
metrics.SetTSValue(metrics.LastK8sEventTS, time.Now()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not terribly happy adding this to each and every one of these functions. I'm not sure if there is a better place for it. It can go in the actual handler function but it seemed best to do it closest to the event it captured (if we can't copy the objects we still want to record that we saw an event).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's fine. We may have a generic controller soon to handle all events where this could be moved to eventually.
@@ -113,3 +113,9 @@ github.com/facebookgo/pidfile f242e2999868dcd267a2b86e49ce1f9cf9e15b16 | |||
github.com/facebookgo/atomicfile 2de1f203e7d5e386a6833233882782932729f27e | |||
github.com/cpuguy83/go-md2man 1d903dcb749992f3741d744c0f8376b4bd7eb3e1 | |||
github.com/russross/blackfriday 6d1ef893fcb01b4f50cb6e57ed7df3e2e627b6b2 | |||
github.com/prometheus/client_golang v0.9.0-pre1 | |||
github.com/prometheus/common 2e54d0b93cba2fd133edc32211dcc32c06ef72ca |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was really surprising but the prometheus internal dependencies seemed to not have versions that matched the client. I'm not sure how best to track this but I picked master on the day I committed this :/
pkg/metrics/metrics.go
Outdated
// NumPolicies is the number of policies loaded into the agent | ||
NumPolicies = prometheus.NewGauge(prometheus.GaugeOpts{ | ||
Namespace: Namespace, | ||
Name: "policies", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to just name this policy
to be consistent, we never call it policies
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. I also renamed endpoint and event to be consistent all around :)
daemon/k8s_watcher.go
Outdated
@@ -258,6 +259,7 @@ func (d *Daemon) EnableK8sWatcher(reSyncPeriod time.Duration) error { | |||
reSyncPeriod, | |||
cache.ResourceEventHandlerFuncs{ | |||
AddFunc: func(obj interface{}) { | |||
metrics.SetTSValue(metrics.LastK8sEventTS, time.Now()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's fine. We may have a generic controller soon to handle all events where this could be moved to eventually.
pkg/metrics/metrics.go
Outdated
func SetTSValue(c prometheus.Gauge, ts time.Time) { | ||
// Build time in seconds since the epoch. Prometheus only takes floating | ||
// point values, however, and urges times to be in seconds | ||
scaled_ts := float64(ts.UnixNano()) / float64(1000000000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use underscores in Go names; var scaled_ts should be scaledTs
Ok, redone. I added event tracking under one metrics for k8s, containerd and API (aka cli). I renamed some of the variables and metric names to be consistent with |
f5c2ee2
to
792fb64
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really cool. I hope adding more metrics will be as easy as it looked while reading this PR :)
Small note, perhaps we should start thinking on having methods to increment the policy revision and to everything in the Prometheus increment and the policy revision increment in a single place.
@@ -0,0 +1,45 @@ | |||
package metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing copywrite header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
pkg/endpoint/policy.go
Outdated
metrics.EndpointCountRegenerating.Inc() | ||
defer func() { | ||
metrics.EndpointCountRegenerating.Dec() | ||
tags := map[bool]string{true: "succeeded", false: "failed"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use a simply if statement?
if retErr != nil {
metrics.CountEndpointsRegenerations.WithLabelValues("failed").Inc()
} else {
metrics.CountEndpointsRegenerations.WithLabelValues("succeeded").Inc()
}
Do you need to set up a constant for succeeded
and failed
or not necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I made constants for the values... I'm not sure if I should also make constants for the field names...
Signed-off-by: Ray Bejjani <ray@covalent.io>
We use the prometheus client to present a pull-based metrics endpoint. The default collector also exports go runtime information. Signed-off-by: Ray Bejjani <ray@covalent.io>
pkg/metrics/metrics.go
Outdated
// LabelValueOutcomeSuccess is used as a successful outcome of an operation | ||
LabelValueOutcomeSuccess = "success" | ||
|
||
// LabelValueOutcomeSuccess is used as an unsuccessful outcome of an operation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment on exported var LabelValueOutcomeFail should be of the form "LabelValueOutcomeFail ..."
Signed-off-by: Ray Bejjani <ray@covalent.io>
Signed-off-by: Ray Bejjani <ray@covalent.io>
Signed-off-by: Ray Bejjani <ray@covalent.io>
An increment function is a good idea. I don't remember why, but I did think about it and decided not to try in this PR. It might be that the increments are done in different locking contexts but that should still be ok. I can make a followup PR to do that. |
yay! |
The agent can export prometheus metrics if configured. This isn't complete but illustrates the basic parts of the metrics collection and export.
Things to consider
Should we serve metrics under the agent API?
This might be bad, since we'd be mixing concerns. I also don't know if prometheus can collect from a socket without a custom collector.
Some of the things we want to collect in Expose runtime metrics for use in monitoring system #1935 aren't amenable as metrics, but make for decent statistics to present. I'll likely add a
cilium metrics
command that ships back all the metrics we collect, along with the things in Expose runtime metrics for use in monitoring system #1935 that aren't easy to export.It seems like there are annotations to tell k8s to slurp up the metrics from a pod. I will try to include that in our examples.
We collect the go runtime metrics. The might end up being too much noise.
I need to figure out if we can tag all metrics from a node by a node-id of some sort. This might be bad for prometheus, however, since it creates a new date series per label (well, per dynamic one, the static ones may not).
Sample metrics (in the prometheus text format, I think)
Partially fixes: #1935
How To Test
Run agent with
--prometheus-serve-addr ":9090"
(this will bind to all interfaces, you can pass in an address directly"