Summary
The shellout compose backend (runtime/docker/compose.go → docker compose up -d) does not pass --no-recreate, which causes docker compose to destroy and recreate primary-service containers whenever it detects config drift — even when the caller passed Recreate=false and an existing container is present.
The container's writable layer (and anything in $HOME inside the container, e.g. ~/.claude/projects/<encoded>/<id>.jsonl) is lost as a result.
Upstream devcontainers/cli gates --no-recreate on whether a container already exists:
const args = ['--project-name', projectName, ...composeGlobalArgs];
args.push('up', '-d');
if (container || params.expectExistingContainer) {
args.push('--no-recreate');
}
Our shellout path (runtime/docker/compose.go:75) builds only up -d <services> — never --no-recreate. The lib's outer code (up.go:189-206) already knows whether existing != nil, but never threads that signal into the compose argv.
How we hit it
DAP workspaces are k8s pods with the docker data-root persisted on a PVC (/workspace/docker). When a session's pod is destroyed (idle timeout, deploy, eviction) and a new pod boots from the same PVC, dockerd restores the prior containers from the PVC and the runtime calls Engine.Up with Recreate=false (session resume). Expected behavior: the existing app container restarts and ~/.claude/projects/... survives so the Claude SDK can resume the conversation.
Observed: the primary service container is recreated, its writable layer is gone, and the Claude SDK fails with "No conversation found with session ID: <id>".
Sidecar services with a restart policy (e.g. mailcatcher) keep the same container ID across pod restarts because dockerd auto-starts them — so by the time the orchestrator inspects them they're already Running and config-hash drift doesn't trigger a recreate. The primary service has no restart policy, sits Exited, and gets caught by docker compose's default recreate-on-drift behavior.
Reproduction (in DAP context, but mechanism is generic)
- Cold-start a compose-based devcontainer workspace in a k8s pod with
/var/lib/docker (or equivalent) on a PVC.
- Touch a file in the container, e.g.
docker exec <primary> sh -c 'echo hi > /home/<user>/marker'.
- Kill the pod. Wait for a new pod to come up against the same PVC.
- Call
Engine.Up again with Recreate=false.
- Observe: primary service container has a new ID;
/home/<user>/marker is gone.
Root cause
runtime/docker/compose.go:75-80:
func buildUpArgs(spec runtime.ComposeUpSpec) []string {
args := composeArgs(spec.ProjectName, spec.Files)
args = append(args, "up", "-d")
args = append(args, spec.Services...)
return args
}
Without --no-recreate, compose recreates on any drift in:
- generated
dc-run.yaml override content (any pod-scoped env in ExtraEnvironment)
- the resolved image digest stamped on the existing container vs the newly resolved one
- normalized project hash differences
Even when the user's intent (opts.Recreate=false) is unambiguous, the lib can't communicate it to compose.
Fix
Mirror upstream:
- Add
NoRecreate bool to runtime.ComposeUpSpec.
buildUpArgs appends --no-recreate when spec.NoRecreate is set.
upComposeShellout (up.go:597) sets NoRecreate: existing != nil, where existing is the value already computed at up.go:167. Requires threading existing (or a bool derived from it) into upComposeShellout.
Same class of bug on the native backend (not exercised yet, but worth fixing together)
compose/orchestrator.go:460-470 decides reuse on three conditions:
if details.Labels[LabelConfigHash] == hash &&
details.Labels[LabelImageDigest] == imageDigest &&
c.State == runtime.StateRunning {
return c.ID, nil
}
// Different config or not running — recreate.
The c.State == runtime.StateRunning check fails after a daemon restart (containers are restored in Exited state) and falls through to stop+remove+create. A config-matched stopped container should be started, not recreated. Same root cause as the shellout flag gap; should be fixed in the same PR so the bug doesn't follow us when we flip the backend.
Scope
runtime/runtime.go — add NoRecreate to ComposeUpSpec.
runtime/docker/compose.go — append --no-recreate when set; update buildUpArgs test.
up.go — set NoRecreate: existing != nil in upComposeShellout.
compose/orchestrator.go — replace the State == Running gate with a start-if-stopped branch.
- Integration test: cold-start compose, write a marker into the container, simulate daemon-restart-like recreation conditions, second
Engine.Up with Recreate=false, assert the marker survives and the container ID is preserved.
Summary
The shellout compose backend (
runtime/docker/compose.go→docker compose up -d) does not pass--no-recreate, which causes docker compose to destroy and recreate primary-service containers whenever it detects config drift — even when the caller passedRecreate=falseand an existing container is present.The container's writable layer (and anything in
$HOMEinside the container, e.g.~/.claude/projects/<encoded>/<id>.jsonl) is lost as a result.Upstream
devcontainers/cligates--no-recreateon whether a container already exists:Our shellout path (
runtime/docker/compose.go:75) builds onlyup -d <services>— never--no-recreate. The lib's outer code (up.go:189-206) already knows whetherexisting != nil, but never threads that signal into the compose argv.How we hit it
DAP workspaces are k8s pods with the docker data-root persisted on a PVC (
/workspace/docker). When a session's pod is destroyed (idle timeout, deploy, eviction) and a new pod boots from the same PVC, dockerd restores the prior containers from the PVC and the runtime callsEngine.UpwithRecreate=false(session resume). Expected behavior: the existing app container restarts and~/.claude/projects/...survives so the Claude SDK can resume the conversation.Observed: the primary service container is recreated, its writable layer is gone, and the Claude SDK fails with
"No conversation found with session ID: <id>".Sidecar services with a restart policy (e.g. mailcatcher) keep the same container ID across pod restarts because dockerd auto-starts them — so by the time the orchestrator inspects them they're already
Runningand config-hash drift doesn't trigger a recreate. The primary service has no restart policy, sitsExited, and gets caught by docker compose's default recreate-on-drift behavior.Reproduction (in DAP context, but mechanism is generic)
/var/lib/docker(or equivalent) on a PVC.docker exec <primary> sh -c 'echo hi > /home/<user>/marker'.Engine.Upagain withRecreate=false./home/<user>/markeris gone.Root cause
runtime/docker/compose.go:75-80:Without
--no-recreate, compose recreates on any drift in:dc-run.yamloverride content (any pod-scoped env inExtraEnvironment)Even when the user's intent (
opts.Recreate=false) is unambiguous, the lib can't communicate it to compose.Fix
Mirror upstream:
NoRecreate booltoruntime.ComposeUpSpec.buildUpArgsappends--no-recreatewhenspec.NoRecreateis set.upComposeShellout(up.go:597) setsNoRecreate: existing != nil, whereexistingis the value already computed atup.go:167. Requires threadingexisting(or a bool derived from it) intoupComposeShellout.Same class of bug on the native backend (not exercised yet, but worth fixing together)
compose/orchestrator.go:460-470decides reuse on three conditions:The
c.State == runtime.StateRunningcheck fails after a daemon restart (containers are restored inExitedstate) and falls through to stop+remove+create. A config-matched stopped container should be started, not recreated. Same root cause as the shellout flag gap; should be fixed in the same PR so the bug doesn't follow us when we flip the backend.Scope
runtime/runtime.go— addNoRecreatetoComposeUpSpec.runtime/docker/compose.go— append--no-recreatewhen set; updatebuildUpArgstest.up.go— setNoRecreate: existing != nilinupComposeShellout.compose/orchestrator.go— replace theState == Runninggate with a start-if-stopped branch.Engine.UpwithRecreate=false, assert the marker survives and the container ID is preserved.