Skip to content

CFP: Add hive startup metric #68

@ILL1A

Description

@ILL1A

This CFP proposes adding metrics to high level Hive structure to be able to track hive startup time and report it along with other provider's (e.g. Cilium) metrics. A newly added metrics should have the same value as the one which is seen in the logs next to "Started hive":

hive/hive.go

Line 389 in a129bcc

log.Info("Started hive", "duration", time.Since(start))

While it is the initial motivation, the idea is to implement this mechanism in hive to be reusable for any of the future use cases.

Background

Following the PR, created quite some time ago, about making hive timeouts configurable (#60), this CFP proposes to add a metric to track the actual startup time. This would allow to set SLOs and monitor regressions.

Solutions

Proposed solution

In my opinion, perhaps the most generic and scalable solution would be to implement singleton HiveMetrics structure (similar to what Cilium has for each metric in pkg/metrics/metrics.go) and initialize it with concrete implementation somewhere inside the metrics provider (e.g. cilium-agent):

type Metrics interface {
  StartupTimeout(duration time.Duration)
}

type NopMetrics struct {}

...

var _ Metrics = NopMetrics{}

var HiveMetrics Metrics = NopMetrics{}

In, for example, cilium-agent:

type hiveMetricsImpl struct{}

func (hiveMetricsImpl) StartupTimeout(duration time.Duration) {
  metrics.HiveStartupDuration.WithLabelValue(<some-label>).Set(duration)
}

...

hive.HiveMetrics = hiveMetricsImpl

Similar idea would be to not have HiveMetrics as a singleton, but rather embed it inside Hive structure. Implementation would be similar, though here HiveMetrics is initialized from RunOptions:

func WithMetricsProvider(provider Metrics) RunOptionFunc {
  return func(h *Hive) {
    h.opts.MetricsProvider = provider
  }
}

And then calling hive.Run() with WithMetricsProvider() with an implementation specific to Cilium.

While this option also seems reasonable, the first one looks more natural and similar to what Cilium already has.

Those two approaches would allow to easily extend hive with new high-level metrics without any structural modifications.

Alternatives

Another approach could be to use pre/post start hooks (i.e. callbacks) which would be defined in metrics provider. Those hooks could be stored inside Options and initialized from RunOptions:

type hiveHook func()

type hookWrapper struct {
  preHook hiveHook
  postHook hiveHook
}

type Options struct {
  ...

  startHooks hookWrapper
}

func WithStartHooks(preHook, postHook hiveHook) {
  return func(h *Hive) {
    h.opts.startHooks = hookWrapper {preHook, postHook}
  }
}

And then in, for example, Cilium:

func (hm hiveMetrics) preStartHook() {
  hm.hiveDuration = spanStat.Start()
}

func (hm hiveMetrics) portStartHook() {
  metrics.HiveStartupDuration.WithLabelValue(<some-label>).Set(hm.hiveDuration.End().Total().Seconds())
}

New().WithStartHooks(hm.preStartHook(), hm.postStartHook())
...

Not a big fan of this solution as I believe that use cases of this approach are limited, though I might be missing something. From the advantages side, it does not require any metric or provider specific logic to be introduced in cilium/hive.

Summary

I believe there might be better solutions to this problem, so I would be happy to hear them or overall the feedback regarding this CFP.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions