Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opentelemetry environment variables cause the CLI to error without decent error message #2447

Open
HozahAled opened this issue May 6, 2024 · 7 comments
Labels
kind/bug Something isn't working

Comments

@HozahAled
Copy link

HozahAled commented May 6, 2024

Description

When building with some opentelemetry environment variables set, I get an unhelpful error.

$ docker build .
ERROR: unsupported opentelemetry tracer logging

These environment variables are supported by the opentelemetry java agent, but not within whatever docker build uses.

Reproduce

In a bash shell

export OTEL_TRACES_EXPORTER=logging
export OTEL_METRICS_EXPORTER=logging
export OTEL_LOGS_EXPORTER=logging
export OTEL_METRIC_EXPORT_INTERVAL=15000
docker build .

Expected behavior

Error with a more descriptive error message, or proceed and print a warning without whatever otel integration is included.

docker version

Client:
 Cloud integration: v1.0.35+desktop.13
 Version:           26.1.1
 API version:       1.45
 Go version:        go1.21.9
 Git commit:        4cf5afa
 Built:             Tue Apr 30 11:46:57 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Desktop
 Engine:
  Version:          26.1.1
  API version:      1.45 (minimum version 1.24)
  Go version:       go1.21.9
  Git commit:       ac2de55
  Built:            Tue Apr 30 11:48:28 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.31
  GitCommit:        e377cd56a71523140ca6ae87e30244719194a521
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client:
 Version:    26.1.1
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.14.0-desktop.1
    Path:     /usr/local/lib/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.27.0-desktop.2
    Path:     /usr/local/lib/docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.29
    Path:     /usr/local/lib/docker/cli-plugins/docker-debug
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.2
    Path:     /usr/local/lib/docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.23
    Path:     /usr/local/lib/docker/cli-plugins/docker-extension
  feedback: Provide feedback, right in your terminal! (Docker Inc.)
    Version:  v1.0.4
    Path:     /usr/local/lib/docker/cli-plugins/docker-feedback
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v1.1.0
    Path:     /usr/local/lib/docker/cli-plugins/docker-init
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     /usr/local/lib/docker/cli-plugins/docker-sbom
  scout: Docker Scout (Docker Inc.)
    Version:  v1.8.0
    Path:     /usr/local/lib/docker/cli-plugins/docker-scout
WARNING: Plugin "/usr/local/lib/docker/cli-plugins/docker-scan" is not valid: failed to fetch metadata: fork/exec /usr/local/lib/docker/cli-plugins/docker-scan: no such file or directory

Server:
 Containers: 3
  Running: 2
  Paused: 0
  Stopped: 1
 Images: 51
 Server Version: 26.1.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: e377cd56a71523140ca6ae87e30244719194a521
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  seccomp
   Profile: unconfined
 Kernel Version: 5.15.133.1-microsoft-standard-WSL2
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 31.21GiB
 Name: docker-desktop
 ID: 824f8d14-c64e-4253-8bfa-0c491149ecc2
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Labels:
  com.docker.desktop.address=unix:///var/run/docker-cli.sock
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No blkio throttle.read_bps_device support
WARNING: No blkio throttle.write_bps_device support
WARNING: No blkio throttle.read_iops_device support
WARNING: No blkio throttle.write_iops_device support
WARNING: daemon is not using the default seccomp profile

Additional Info

No response

@HozahAled HozahAled added the kind/bug Something isn't working label May 6, 2024
@HozahAled
Copy link
Author

Reproduced on more recent build, updated description

@thaJeztah
Copy link
Member

Thanks for reporting!

So, looking at the Exporter Selection section in the OTEL specs, it looks like logging was deprecated as a value for exporters, and should not be supported by implementations;

"logging": Standard Output. It is a deprecated value left for backwards compatibility. It SHOULD NOT be supported by new implementations.

That said, the implementation guidelines for Enum values describes that values not recognized must be logged as a warning, but handled gracefully;

Enum value

For variables which accept a known value out of a set, i.e., an enum value, implementations MAY support additional values not listed here. For variables accepting an enum value, if the user provides a value the implementation does not recognize, the implementation MUST generate a warning and gracefully ignore the setting.

This actual error is produced by BuildKit's util/tracing/detect package; https://github.com/moby/buildkit/blob/821fa45cd8960e332b7a7ba55b4f105de16dcd24/util/tracing/detect/detect.go#L86-L90

func detectExporter[T any](envVar string, fn func(d ExporterDetector) (T, bool, error)) (exp T, err error) {
	if n := os.Getenv(envVar); n != "" {
		d, ok := detectors[n]
		if !ok {
			return exp, errors.Errorf("unsupported opentelemetry exporter %v", n)

That package is currently used both in the docker daemon, and in buildx (which is used as CLI plugin to run docker build).

Doing some testing, and it looks like this error only affects docker build and other commands backed by buildx (so I'll move this ticket to the buildx repository);

export OTEL_TRACES_EXPORTER=logging
export OTEL_METRICS_EXPORTER=logging
export OTEL_LOGS_EXPORTER=logging
export OTEL_METRIC_EXPORT_INTERVAL=15000

echo 'FROM scratch' | docker build -
ERROR: unsupported opentelemetry exporter logging

docker builder ls
NAME/NODE     DRIVER/ENDPOINT   STATUS    BUILDKIT   PLATFORMS
default*      docker
 \_ default    \_ default       error

Failed to get status for default (default): unsupported opentelemetry exporter logging

docker builder prune
WARNING! This will remove all dangling build cache. Are you sure you want to continue? [y/N] y
ERROR: unsupported opentelemetry exporter logging

Other commands continue to work successfully;

docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

It looks like the docker daemon itself handles this gracefully, although the error appears twice in the daemon logs; BuildKit code is logging it incorrectly as an ERROR (not a WARNING);

export OTEL_TRACES_EXPORTER=logging
export OTEL_METRICS_EXPORTER=logging
export OTEL_LOGS_EXPORTER=logging
export OTEL_METRIC_EXPORT_INTERVAL=15000

dockerd
INFO[2024-05-07T07:58:08.815562967Z] Starting up
...
...
WARN[2024-05-07T07:58:09.828211926Z] Failed to initialize tracing, skipping        error="unsupported opentelemetry tracer logging"
...
...
INFO[2024-05-07T07:58:10.122811176Z] Docker daemon                                 commit=ac2de55 containerd-snapshotter=false storage-driver=overlay2 version=26.1.1
INFO[2024-05-07T07:58:10.122850051Z] Daemon has completed initialization
ERRO[2024-05-07T07:58:10.135250134Z] Failed to detect trace exporter for buildkit controller  error="unsupported opentelemetry tracer logging"
INFO[2024-05-07T07:58:10.140401717Z] API listen on /var/run/docker.sock

@thaJeztah thaJeztah transferred this issue from docker/cli May 7, 2024
@thaJeztah
Copy link
Member

Move this to the buildx repository, but it looks like there's a couple of issues to look at if we want to follow the OTEL specification;

  • buildx (docker buildx ... / docker build) should produce a warning instead of a fatal error, and handle it gracefully
  • on the daemon side, it looks like it's logged as an ERROR by the BuildKit controller; this should likely be a WARN (matching thee log-level used by the docker daemon itself.
  • we should check if other (CLI) components handle these env-vars and if they do; if they should print a warning (if they currently silently ignore)
  • perhaps for the docker daemon; should we include a warning in the docker info output?

cc @jsternberg @milas @krissetto @Benehiko

@krissetto
Copy link

  • we should check if other (CLI) components handle these env-vars and if they do; if they should print a warning (if they currently silently ignore)

On the CLI we currently don't explicitly check for those variables, and we don't use particular detectors either.
I remember hearing that @jsternberg wanted to generalize the detectors in buildkit so they could then be consolidated into the CLI (or some otel pkg) and reused across the projects, is that still a plan? (i could be remembering something wrong here)

So yes, at the moment the CLI gracefully continues when the env vars re set to invalid values, but silently since they just aren't handled explicitly. I'm not sure if the otel SDK uses them internally though, but i see no particular errors

@jsternberg
Copy link
Collaborator

We can likely move this to buildkit and fix it there since that's where the bug originates.

Yes, a future idea is to refactor the detect package so that it can be moved out of buildkit and used more generally across the moby/docker stack. For the CLI, it's just because, as @krissetto mentioned, we haven't implemented user environment variables so these variables aren't even checked. But I agree with turning this into a warning and logging it with otel.Handle and then overwriting the otel error handler on a per-application basis.

@thaJeztah
Copy link
Member

@jsternberg is this error coming from the daemon, or is it in the vendor code used in buildx? (I guess I assumed it was client side, because there was no "error response from daemon" prefix, but I guess that's only for the Engine API that we add that prefix, and buildx may not do that when using the BuildKit gRPC API

@jsternberg
Copy link
Collaborator

I believe it's coming from the vendor code used in buildx so it is an error coming from buildx but the fix happens in buildkit. My assumption is this same problem would also occur in buildkit. We have an environment variable (OTEL_IGNORE_ERROR) that suppresses the error, but it seems like the OTEL standard is for that to be the default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants