Skip to content

Commit

Permalink
feat!: Switch metrics to OpenTelemetry and add support for push metri…
Browse files Browse the repository at this point in the history
…cs (#1887)

OpenCensus is now EOL and OpenTelemetry is stable ('ish) enough to
migrate to.

The metrics names have not changed but, due to quirks in exporters,
there could be minor breaking changes to existing dashboards with this
change.

Also adds support for pushing metrics through OTLP. 

Fixes #341

---------

Signed-off-by: Charith Ellawala <charith@cerbos.dev>
  • Loading branch information
charithe committed Nov 23, 2023
1 parent 1722454 commit ce425d9
Show file tree
Hide file tree
Showing 29 changed files with 669 additions and 786 deletions.
2 changes: 1 addition & 1 deletion cmd/cerbos/run/run.go
Expand Up @@ -204,7 +204,7 @@ func (c *Cmd) startPDP(ctx context.Context) (*pdpInstance, error) {
instance.stopFn = stopFn

c.goroutine(func() {
instance.errors <- server.Start(serverCtx, false)
instance.errors <- server.Start(serverCtx)
close(instance.errors)
})

Expand Down
17 changes: 13 additions & 4 deletions cmd/cerbos/server/server.go
Expand Up @@ -19,6 +19,7 @@ import (

"github.com/cerbos/cerbos/internal/config"
"github.com/cerbos/cerbos/internal/observability/logging"
"github.com/cerbos/cerbos/internal/observability/otel"
"github.com/cerbos/cerbos/internal/observability/tracing"
"github.com/cerbos/cerbos/internal/server"
)
Expand Down Expand Up @@ -52,7 +53,6 @@ type Cmd struct {
Config string `help:"Path to config file" optional:"" placeholder:".cerbos.yaml" env:"CERBOS_CONFIG"`
HubBundle string `help:"Use Cerbos Hub to pull the policy bundle with the given label. Overrides the store defined in the configuration." optional:"" env:"CERBOS_HUB_BUNDLE,CERBOS_CLOUD_BUNDLE"`
Set []string `help:"Config overrides" placeholder:"server.adminAPI.enabled=true"`
ZPagesEnabled bool `help:"Enable zpages" hidden:""`
}

func (c *Cmd) Run() error {
Expand All @@ -61,16 +61,25 @@ func (c *Cmd) Run() error {

logging.InitLogging(ctx, string(c.LogLevel))
defer zap.L().Sync() //nolint:errcheck

log := zap.S().Named("server")

undo, err := maxprocs.Set(maxprocs.Logger(log.Infof))
defer undo()

if err != nil {
log.Warnw("Failed to adjust GOMAXPROCS", "error", err)
}

// initialize metrics
metricsDone, err := otel.InitMetrics(ctx, otel.Env(os.LookupEnv))
if err != nil {
return err
}
defer func() {
if err := metricsDone(); err != nil {
log.Warnw("Metrics exporter did not shutdown cleanly", "error", err)
}
}()

if c.DebugListenAddr != "" {
startDebugListener(c.DebugListenAddr)
defer agent.Close()
Expand Down Expand Up @@ -118,7 +127,7 @@ func (c *Cmd) Run() error {
}
}()

if err := server.Start(ctx, c.ZPagesEnabled); err != nil {
if err := server.Start(ctx); err != nil {
log.Errorw("Failed to start server", "error", err)
return err
}
Expand Down
3 changes: 2 additions & 1 deletion docs/modules/configuration/nav.adoc
Expand Up @@ -2,8 +2,9 @@
* xref:audit.adoc[Audit]
* xref:auxdata.adoc[AuxData]
* xref:engine.adoc[Engine]
* xref:observability.adoc[Observability (metrics and traces)]
* xref:schema.adoc[Schema]
* xref:server.adoc[Server]
* xref:storage.adoc[Storage]
* xref:telemetry.adoc[Telemetry]
* xref:tracing.adoc[Tracing]
* xref:tracing.adoc[Tracing (deprecated)]
99 changes: 99 additions & 0 deletions docs/modules/configuration/pages/observability.adoc
@@ -0,0 +1,99 @@
include::ROOT:partial$attributes.adoc[]

= Observability

Cerbos is designed from the ground up to be cloud native and has first-class support for observability via OpenTelemetry metrics and distributed traces.

[#metrics]
== Metrics

By default, Cerbos exposes a metrics endpoint at `/_cerbos/metrics` that can be scraped by Prometheus or other metrics scrapers that support the Prometheus metrics format. This endpoint can be disabled by setting `server.metricsEnabled` configuration value to `false` (see xref:server.adoc[]).

Cerbos also has support for OpenTelemetry protocol (OTLP) push metrics. It can be configured using link:https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/[OpenTelemetry environment variables]. The following environment variables are supported.

[%header,cols=".^1m,6a",grid=rows]
|===
| Environment variable | Description

| OTEL_EXPORTER_OTLP_METRICS_ENDPOINT or OTEL_EXPORTER_OTLP_ENDPOINT
| Address of the OTLP metrics receiver (for example: `https://localhost:9090/api/v1/otlp/v1/metrics`). If not defined, OTLP metrics are disabled.

| OTEL_EXPORTER_OTLP_METRICS_INSECURE or OTEL_EXPORTER_OTLP_INSECURE
| Skip validating the TLS certificate of the endpoint

| OTEL_EXPORTER_OTLP_METRICS_CERTIFICATE or OTEL_EXPORTER_OTLP_CERTIFICATE
| Path to the certificate to use for validating the server's TLS credentials.

| OTEL_EXPORTER_OTLP_METRICS_CLIENT_CERTIFICATE or OTEL_EXPORTER_OTLP_CLIENT_CERTIFICATE
| Path to the client certificate to use for mTLS

| OTEL_EXPORTER_OTLP_METRICS_CLIENT_KEY or OTEL_EXPORTER_OTLP_CLIENT_KEY
| Path to the client key to use for mTLS

| OTEL_EXPORTER_OTLP_METRICS_PROTOCOL or OTEL_EXPORTER_OTLP_PROTOCOL
| OTLP protocol. Supported values are `grpc` and `http/protobuf`. Defaults to `grpc`.

| OTEL_METRIC_EXPORT_INTERVAL
| The export interval in milliseconds. Defaults to 60000.

| OTEL_METRIC_EXPORT_TIMEOUT
| Timeout for exporting the data in milliseconds. Defaults to 30000.

| OTEL_METRICS_EXPORTER
| Set to `otlp` to enable the OTLP exporter. Defaults to `prometheus`.
|===

Refer to https://opentelemetry.io/docs/specs/otel/protocol/exporter/ for more information about exporter configuration through environment variables. Note that the OpenTelemetry Go SDK used by Cerbos might not have full support for some of the environment variables listed on the OpenTelemetry specification.

TIP: `OTEL_METRICS_EXPORTER` and `OTEL_EXPORTER_OTLP_METRICS_ENDPOINT` are the only required environment variables to enable OTLP metrics.

[#traces]
== Traces

Cerbos supports distributed tracing to provide insights into application performance and request lifecycle. Traces from Cerbos can be exported to any compatible collector that supports the OpenTelemetry protocol (OTLP).

Trace configuration should be done using link:https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/[OpenTelemetry environment variables]. The following environment variables are supported.

[%header,cols=".^1m,6a",grid=rows]
|===
| Environment variable | Description

| OTEL_SERVICE_NAME
| Service name reported in the traces. Defaults to `cerbos`.

| OTEL_TRACES_SAMPLER
| link:https://opentelemetry.io/docs/specs/otel/trace/sdk/#sampling[Trace sampler]. Defaults to `parentbased_always_off`. Supported values: +
--
`always_on`:: Record every trace.
`always_off`:: Don't record any traces.
`traceidratio`:: Record a fraction of traces based on ID. Set `OTEL_TRACES_SAMPLER_ARG` to a value between 0 and 1 to define the fraction.
`parentbased_always_on`:: Record all traces except those where the parent span is not sampled.
`parentbased_always_off`:: Don't record any traces unless the parent span is sampled.
`parentbased_traceidratio`:: Record a fraction of traces where the parent span is sampled. Set `OTEL_TRACES_SAMPLER_ARG` to a value between 0 and 1 to define the fraction.
--

| OTEL_TRACES_SAMPLER_ARG
| Set the sampling ratio when `OTEL_TRACES_SAMPLER` is a ratio-based sampler. Defaults to `0.1`.

| OTEL_EXPORTER_OTLP_TRACES_ENDPOINT or OTEL_EXPORTER_OTLP_ENDPOINT
| Address of the OTLP collector (for example: `https://localhost:4317`). If not defined, traces are disabled.

| OTEL_EXPORTER_OTLP_TRACES_INSECURE or OTEL_EXPORTER_OTLP_INSECURE
| Skip validating the TLS certificate of the endpoint

| OTEL_EXPORTER_OTLP_TRACES_CERTIFICATE or OTEL_EXPORTER_OTLP_CERTIFICATE
| Path to the certificate to use for validating the server's TLS credentials.

| OTEL_EXPORTER_OTLP_TRACES_CLIENT_CERTIFICATE or OTEL_EXPORTER_OTLP_CLIENT_CERTIFICATE
| Path to the client certificate to use for mTLS

| OTEL_EXPORTER_OTLP_TRACES_CLIENT_KEY or OTEL_EXPORTER_OTLP_CLIENT_KEY
| Path to the client key to use for mTLS

| OTEL_EXPORTER_OTLP_TRACES_PROTOCOL or OTEL_EXPORTER_OTLP_PROTOCOL
| OTLP protocol. Supported values are `grpc` and `http/protobuf`. Defaults to `grpc`.
|===

Refer to https://opentelemetry.io/docs/specs/otel/protocol/exporter/ for more information about exporter configuration through environment variables. Note that the OpenTelemetry Go SDK used by Cerbos might not have full support for some of the environment variables listed on the OpenTelemetry specification.

TIP: `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT` is the only required environment variable to enable OTLP trace exports.
53 changes: 3 additions & 50 deletions docs/modules/configuration/pages/tracing.adoc
@@ -1,60 +1,13 @@
include::ROOT:partial$attributes.adoc[]

= Distributed traces
= Tracing block

Cerbos supports distributed tracing to provide insights into application performance and request lifecycle. Traces from Cerbos can be exported to any compatible collector that uses the OpenTelemetry OTLP protocol.

Trace configuration should be done using link:https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/[OpenTelemetry environment variables]. The following environment variables are supported.

[%header,cols=".^1m,6a",grid=rows]
|===
| Environment variable | Description

| OTEL_SDK_DISABLED
| Disable traces if set to `true`

| OTEL_SERVICE_NAME
| Service name reported in the traces. Defaults to `cerbos`.

| OTEL_TRACES_SAMPLER
| link:https://opentelemetry.io/docs/specs/otel/trace/sdk/#sampling[Trace sampler]. Defaults to `parentbased_always_off`. Supported values: +
--
`always_on`:: Record every trace.
`always_off`:: Don't record any traces.
`traceidratio`:: Record a fraction of traces based on ID. Set `OTEL_TRACES_SAMPLER_ARG` to a value between 0 and 1 to define the fraction.
`parentbased_always_on`:: Record all traces except those where the parent span is not sampled.
`parentbased_always_off`:: Don't record any traces unless the parent span is sampled.
`parentbased_traceidratio`:: Record a fraction of traces where the parent span is sampled. Set `OTEL_TRACES_SAMPLER_ARG` to a value between 0 and 1 to define the fraction.
--

| OTEL_TRACES_SAMPLER_ARG
| Set the sampling ratio when `OTEL_TRACES_SAMPLER` is a ratio-based sampler. Defaults to `0.1`.

| OTEL_EXPORTER_OTLP_TRACES_ENDPOINT or OTEL_EXPORTER_OTLP_ENDPOINT
| Address of the OTLP collector (for example: `https://localhost:4317`). If not defined, traces are disabled.

| OTEL_EXPORTER_OTLP_TRACES_INSECURE or OTEL_EXPORTER_OTLP_INSECURE
| Skip validating the TLS certificate of the endpoint

| OTEL_EXPORTER_OTLP_TRACES_CERTIFICATE or OTEL_EXPORTER_OTLP_CERTIFICATE
| Path to the certificate to use for validating the server's TLS credentials.

| OTEL_EXPORTER_OTLP_TRACES_CLIENT_CERTIFICATE or OTEL_EXPORTER_OTLP_CLIENT_CERTIFICATE
| Path to the client certificate to use for mTLS

| OTEL_EXPORTER_OTLP_TRACES_CLIENT_KEY or OTEL_EXPORTER_OTLP_CLIENT_KEY
| Path to the client key to use for mTLS

| OTEL_EXPORTER_OTLP_TRACES_PROTOCOL or OTEL_EXPORTER_OTLP_PROTOCOL
| OTLP protocol. Supported values are `grpc` and `http/protobuf`. Defaults to `grpc`.
|===

Refer to https://opentelemetry.io/docs/specs/otel/protocol/exporter/ for more information about exporter configuration through environment variables. Note that the OpenTelemetry Go SDK used by Cerbos might not have full support for some of the environment variables listed on the OpenTelemetry specification.
IMPORTANT: The `tracing` block is deprecated and will be removed in Cerbos 0.33.0. Refer to xref:observability.adoc#traces[observability configuration] for information about configuring traces.

[#migration]
== Migrating tracing configuration from previous Cerbos versions

From Cerbos 0.32.0, the preferred method of trace configuration is through the OpenTelemetry environment variables described above. The `tracing` section of the Cerbos configuration file is deprecated and will be removed in Cerbos 0.33.0. Native Jaeger protocol is deprecated as well and will be removed in Cerbos 0.33.0. Follow the instructions below to migrate your existing configuration.
From Cerbos 0.32.0, the preferred method of trace configuration is through the OpenTelemetry environment variables described in xref:observability.adoc#traces[observability configuration]. The `tracing` section of the Cerbos configuration file is deprecated and will be removed in Cerbos 0.33.0. Native Jaeger protocol is deprecated as well and will be removed in Cerbos 0.33.0. Follow the instructions below to migrate your existing configuration.

[%header,cols=".^1m,6a",grid=rows]
|===
Expand Down

0 comments on commit ce425d9

Please sign in to comment.