Summary
docker-agent run … can hang forever before the TUI ever appears, with no error and no log output, when the configured Docker credential helper (docker-credential-desktop on macOS) gets stuck. The agent is uncancellable from Ctrl+C (it's not even in its main loop yet) and consumes no CPU. Multiple invocations in parallel terminals all wedge in the same way.
Environment
- macOS 26.5.1 (arm64), Apple Silicon
- Docker Desktop 4.78.0 running and otherwise functional (
docker info works)
~/.docker/config.json has "credsStore": "desktop"
docker-agent built from current main
- Config used: a multi-agent YAML with
mcps: entries that reference Docker-hosted MCPs (e.g. ref: docker:context7) and sub-agents pulled from registries (docker/gordon:latest, docker/grafana-agent).
Reproduction
- Have
~/.docker/config.json with "credsStore": "desktop".
- Get
docker-credential-desktop into a stuck state. In my case it happened spontaneously, but it can be reproduced by running:
echo '{"ServerURL":"https://index.docker.io/v1/"}' | docker-credential-desktop get
and observing that the helper never returns. (Possibly a stale lock or dismissed auth prompt inside Docker Desktop — the helper's open() syscall blocks indefinitely while Docker Desktop itself answers /ping and docker info normally.)
- Run
docker-agent run my-config.yaml. It hangs forever before any UI is drawn.
Observed behaviour
Two stuck instances captured live:
PID PPID ELAPSED COMMAND
97057 93702 4m40s docker-agent run …/jarvis.yaml
98100 97057 3m39s └─ docker-credential-desktop get ← child of docker-agent
98545 98351 2m32s docker-agent run …/jarvis.yaml
98557 98545 3m22s └─ docker-credential-desktop get ← child of docker-agent
docker-agent itself is parked in Go runtime cond-waits with FDs 76/77 connected to the helper child's stdin/stdout — it is synchronously waiting on the helper's reply via a pipe.
sample of docker-credential-desktop:
Thread_… DispatchQueue_1: com.apple.main-thread (serial)
open (in libsystem_kernel.dylib) + 64
__open (in libsystem_kernel.dylib) + 8
It has only KQUEUE + a netsrc systm fd open — no Unix socket to Docker Desktop, no progress.
docker info returns immediately. docker-credential-osxkeychain returns immediately. So Docker Desktop's backend is healthy; only the credential-helper IPC channel is wedged.
Root cause hypothesis
docker-agent uses github.com/google/go-containerregistry (crane.Pull / crane.Digest) in pkg/remote/pull.go when resolving registry references. That library's default keychain shells out to whatever helper ~/.docker/config.json declares — here docker-credential-desktop. The helper invocation has no timeout and no cancellation path: cmd.Run() blocks the goroutine, which blocks startup, which blocks the TUI.
The two relevant blast-radius paths I see:
pkg/remote/pull.go → crane.Digest / crane.Pull for OCI agent / sub-agent references.
pkg/environment/credential_helper.go (runCommand in pkg/environment/cmd_provider.go) — same shape, cmd.Run() with only the caller's context as a deadline. If startup passes a context.Background(), it never returns.
runCommand in pkg/environment/cmd_provider.go:
cmd := exec.CommandContext(ctx, name, args...)
…
if err := cmd.Run(); err != nil { … }
…will only kill the helper if the caller's context is cancelled. Nothing in the startup path appears to apply a deadline.
Impact
- One unresponsive credential helper = one totally unusable
docker-agent on that machine, with no error message and no way to know what's wrong without ps / sample.
- Affects every user with
credsStore: desktop (the default on Docker Desktop installations) any time Desktop's helper IPC misbehaves — which seems to happen occasionally without visible cause.
Suggested fix
Bound every credential-helper invocation with a short, aggressive deadline (5–10s feels right) and surface a clear error/log line on timeout, so the agent can either:
- fall back to no-credentials / anonymous pull for public artifacts, and/or
- start the TUI anyway and let the user see what's happening.
Concretely:
- Wrap
runCommand (and the equivalent path inside crane's keychain) with context.WithTimeout independent of the caller's context.
- On timeout, log
WARN (credential helper %s timed out after %s, falling back to anonymous) and return ("", false) instead of blocking.
- Make sure the
exec.Cmd is killed (cmd.Cancel / process-group kill) so we don't leak docker-credential-* zombies as observed above.
Optionally, gate any registry-pull on Docker Desktop being responsive (desktop.IsDockerDesktopRunning is already used in pkg/remote/transport.go) before consulting the Desktop keychain at all.
Workaround
Switch ~/.docker/config.json to "credsStore": "osxkeychain" (or quit Docker Desktop fully and reopen it).
Diagnostic snippets
lsof -p <docker-agent-pid> showing the pipe to the helper child:
docker-ag … 76 PIPE 0x9ef929cc65a95036 16384 ->0x179fdf520fadd15d
docker-cr … 1 PIPE 0x179fdf520fadd15d 16384 ->0x9ef929cc65a95036
sample <docker-agent-pid> (truncated): all goroutines in __psynch_cvwait, no progress.
Summary
docker-agent run …can hang forever before the TUI ever appears, with no error and no log output, when the configured Docker credential helper (docker-credential-desktopon macOS) gets stuck. The agent is uncancellable fromCtrl+C(it's not even in its main loop yet) and consumes no CPU. Multiple invocations in parallel terminals all wedge in the same way.Environment
docker infoworks)~/.docker/config.jsonhas"credsStore": "desktop"docker-agentbuilt from currentmainmcps:entries that reference Docker-hosted MCPs (e.g.ref: docker:context7) and sub-agents pulled from registries (docker/gordon:latest,docker/grafana-agent).Reproduction
~/.docker/config.jsonwith"credsStore": "desktop".docker-credential-desktopinto a stuck state. In my case it happened spontaneously, but it can be reproduced by running:open()syscall blocks indefinitely while Docker Desktop itself answers/pinganddocker infonormally.)docker-agent run my-config.yaml. It hangs forever before any UI is drawn.Observed behaviour
Two stuck instances captured live:
docker-agentitself is parked in Go runtime cond-waits with FDs 76/77 connected to the helper child's stdin/stdout — it is synchronously waiting on the helper's reply via a pipe.sampleofdocker-credential-desktop:It has only
KQUEUE+ a netsrc systm fd open — no Unix socket to Docker Desktop, no progress.docker inforeturns immediately.docker-credential-osxkeychainreturns immediately. So Docker Desktop's backend is healthy; only the credential-helper IPC channel is wedged.Root cause hypothesis
docker-agentusesgithub.com/google/go-containerregistry(crane.Pull/crane.Digest) inpkg/remote/pull.gowhen resolving registry references. That library's default keychain shells out to whatever helper~/.docker/config.jsondeclares — heredocker-credential-desktop. The helper invocation has no timeout and no cancellation path:cmd.Run()blocks the goroutine, which blocks startup, which blocks the TUI.The two relevant blast-radius paths I see:
pkg/remote/pull.go→crane.Digest/crane.Pullfor OCI agent / sub-agent references.pkg/environment/credential_helper.go(runCommandinpkg/environment/cmd_provider.go) — same shape,cmd.Run()with only the caller's context as a deadline. If startup passes acontext.Background(), it never returns.runCommandinpkg/environment/cmd_provider.go:…will only kill the helper if the caller's context is cancelled. Nothing in the startup path appears to apply a deadline.
Impact
docker-agenton that machine, with no error message and no way to know what's wrong withoutps/sample.credsStore: desktop(the default on Docker Desktop installations) any time Desktop's helper IPC misbehaves — which seems to happen occasionally without visible cause.Suggested fix
Bound every credential-helper invocation with a short, aggressive deadline (5–10s feels right) and surface a clear error/log line on timeout, so the agent can either:
Concretely:
runCommand(and the equivalent path insidecrane's keychain) withcontext.WithTimeoutindependent of the caller's context.WARN(credential helper %s timed out after %s, falling back to anonymous) and return("", false)instead of blocking.exec.Cmdis killed (cmd.Cancel/ process-group kill) so we don't leakdocker-credential-*zombies as observed above.Optionally, gate any registry-pull on Docker Desktop being responsive (
desktop.IsDockerDesktopRunningis already used inpkg/remote/transport.go) before consulting the Desktop keychain at all.Workaround
Switch
~/.docker/config.jsonto"credsStore": "osxkeychain"(or quit Docker Desktop fully and reopen it).Diagnostic snippets
lsof -p <docker-agent-pid>showing the pipe to the helper child:sample <docker-agent-pid>(truncated): all goroutines in__psynch_cvwait, no progress.